# DAY 6 — **Introduction to Large Language Models (LLMs)**

# 1. What Are Large Language Models?
Large Language Models (LLMs) are deep learning models trained on massive amounts of text data to understand, generate, and reason using human language.

**They are called:**
*   **Large →** billions/trillions of parameters
*   **Language →** trained on text (books, code, articles, web data)
*   **Models →** mathematical systems that learn patterns

LLMs do not store facts like a database.
Instead, they learn probability patterns of language.

At their core, LLMs answer one question repeatedly:
“What is the most likely next word?”

# 2. Why LLMs Became Possible Only Recently
LLMs are not new ideas — but they became practical due to three breakthroughs:

**a) Big Data**
*   Internet-scale text datasets
*   Code repositories, books, research papers

**b) Powerful Compute**
*   GPUs / TPUs
*   Parallel matrix operations
*   Cloud infrastructure

**c) Deep Learning Advances**
*   Transformers
*   Better optimizers
*   Efficient training techniques

Without all three, LLMs would fail.

# 3. How LLMs Work
An LLM works in three main phases:

**Pretraining**

*   Model reads massive text
*   Learns grammar, facts, reasoning patterns
*   Objective: predict next token

**Fine-tuning**

*   Adjusted for specific tasks
*   Chat, QA, summarization, coding

**Inference**

*   Model generates responses
*   Uses learned probabilities

# 4. Tokens vs Words (Very Important Concept)
LLMs do not read words — they read tokens.

**Example:**

**"Artificial Intelligence" → ["Artificial", "Intelli", "gence"]**

**Why tokens?**

*   Efficient vocabulary
*   Handles multiple languages
*   Reduces memory usage

This is why token limits matter.

# 5. Parameters: The “Brain Size” of LLMs
Parameters are learnable weights inside the network.

**Examples:**

*   GPT-2 → 1.5B parameters
*   GPT-3 → 175B parameters
*   LLaMA-2 → 7B–70B parameters

**More parameters →**

*   Better reasoning
*   Better generalization
*   Higher compute cost

# 6. What Can LLMs Do?

**LLMs can perform multiple tasks without retraining:**

*   Text generation
*   Question answering
*   Summarization
*   Translation
*   Code generation
*   Reasoning
*   Conversational AI

This is called emergent behavior.

# 7. Popular LLM Families


\begin{array}{|l|l|l|}
\hline
\textbf{Model Name} & \textbf{Organization} & \textbf{Primary Focus} \\
\hline
GPT (Generative Pre-trained Transformer) & OpenAI & General-purpose language generation \\
\hline
BERT (Bidirectional Encoder Representations) & Google & Language understanding tasks \\
\hline
LLaMA & Meta (Facebook) & Open-weight research models \\
\hline
Claude & Anthropic & Safety-aligned conversational AI \\
\hline
Gemini & Google & Multimodal language understanding \\
\hline
\end{array}




# 8. LLMs vs Traditional ML Models


\begin{array}{|l|l|l|}
\hline
\textbf{Aspect} & \textbf{Traditional Machine Learning} & \textbf{Large Language Models} \\
\hline
Scope & Task-specific models & Multi-task, general-purpose models \\
\hline
Training Data Size & Small to medium datasets & Internet-scale text data \\
\hline
Feature Engineering & Manual feature extraction required & Automatic feature learning \\
\hline
Context Understanding & Limited context window & Long-context understanding \\
\hline
Adaptability & Requires retraining for new tasks & Performs new tasks via prompting \\
\hline
Model Complexity & Low to moderate complexity & Extremely high (billions of parameters) \\
\hline
Inference Style & Fixed outputs & Generative and probabilistic outputs \\
\hline
\end{array}


# 9. Simple Hands-On: Using an LLM via API


In [1]:
# Example using Hugging Face transformers (lightweight demo)

from transformers import pipeline

# Load a text-generation pipeline
generator = pipeline("text-generation", model="gpt2")

# Generate text
output = generator(
    "Large Language Models are powerful because",
    max_length=50
)

print(output[0]["generated_text"])


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Large Language Models are powerful because they allow you to write and type in plain text as if it were a language.

In this article, I'll describe a set of languages in C++. In the next section, I'll show you how to write a language that is compatible with C++11.

Creating a language

The most common language creation task is to create a language model. If you are an experienced programmer, you know how to create a language model.

The syntax for a language model is quite simple. It's written in C++ as a base class, and you can easily extend it. If you want to implement a language that is compatible with C++11, you will need to add new features and interfaces.

A language model is just a collection of objects that can be used to represent a language or a type. A language model is an object that is used as a key to represent a language.

The model is a collection of functions, which represent the semantics of a language. The function names are stored in the model. The function argument

**What this shows:**

*   Model predicts next tokens
*   No training required
*   Uses pretrained weights

# 10. Limitations of LLMs (Reality Check)

**LLMs can:**

*   Hallucinate incorrect facts
*   Be biased
*   Struggle with math
*   Lack true understanding
*   Fail with outdated information

**This is why:**

Evaluation | RAG | Safety are critical.

# 11. Where LLMs Are Used in Real Life

*   Chatbots & assistants
*   Code completion
*   Legal document analysis
*   Healthcare summaries
*   Education tools
*   Research assistants
*   Business automation