<a href="https://colab.research.google.com/github/anuragsingh17ai/Genai/blob/main/01_Generate_one_Token_at_a_TIme.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Understanding Key Models in AI  

## 1. Autoregressive Algorithms (Used in LLMs)  
**Key Concept**: Predict the next token in a sequence based on prior tokens.  

### How It Works:  
- Models conditional probabilities of sequences:  
  \[
  P(x_1, x_2, \ldots, x_n) = P(x_1) \cdot P(x_2|x_1) \cdot P(x_3|x_1, x_2) \cdots
  \]  
- Generates one token at a time, conditioning on previously generated tokens.  

### Examples:  
- GPT, GPT-2, GPT-3, GPT-4.  

### Applications:  
- Language generation, machine translation, text summarization.  

### Strengths:  
- Excellent at maintaining coherence in sequential data.  

### Limitations:  
- Errors accumulate over long sequences (error compounding).  

---

## 2. Variational Autoencoders (VAEs, Used for Image Generation)  
**Key Concept**: Encode data into a latent space and reconstruct it while adding variability.  

### How It Works:  
- **Components**:  
  1. **Encoder**: Compresses input data into a latent representation.  
  2. **Decoder**: Reconstructs data from the latent representation.  
  3. **Loss Function**: Combines reconstruction loss and KL-divergence (smooth latent space).  
- Noise is introduced during training to enable generative capabilities.  

### Applications:  
- Image generation, anomaly detection, data compression.  

### Strengths:  
- Smooth latent space allows controlled generation.  

### Limitations:  
- May produce blurry images compared to diffusion models.  

---

## 3. Diffusion Models (Used for High-Quality Image Generation)  
**Key Concept**: Learn to reverse a noise process to generate data step by step.  

### How It Works:  
1. **Forward Process**: Gradually add noise to an image until it becomes unrecognizable.  
2. **Reverse Process**: Learn to step-by-step remove noise using a trained model.  
- Typically optimized using denoising score matching.  

### Examples:  
- DALL·E 2, Stable Diffusion.  

### Applications:  
- Image generation, inpainting, super-resolution.  

### Strengths:  
- Generates highly detailed and realistic images.  

### Limitations:  
- Computationally intensive due to iterative denoising.  


# 2. Generating one token at a time

## 2.1 Load a tokenizer and a model

First we load a tokenizer and a model from HuggingFace's transformers library.

A tokenizer is a function that splits a string into a list of numbers that the model can understand

In [5]:
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

text = "Udacity is the best place to learn about generative"
inputs = tokenizer(text, return_tensors="pt")

inputs["input_ids"]

tensor([[  52,   67, 4355,  318,  262, 1266, 1295,  284, 2193,  546, 1152,  876]])

## 2.2 Examine the tokenization

In [7]:
import pandas as pd

def show_tokenization(inputs):
  return pd.DataFrame([(id,tokenizer.decode(id)) for id in inputs['input_ids'][0]],columns=["id","token"])

show_tokenization(inputs)

Unnamed: 0,id,token
0,tensor(52),U
1,tensor(67),d
2,tensor(4355),acity
3,tensor(318),is
4,tensor(262),the
5,tensor(1266),best
6,tensor(1295),place
7,tensor(284),to
8,tensor(2193),learn
9,tensor(546),about


### subword tokenization

The interesting thing is that tokens in this case are neither just letter not just words. Sometimes
shorter words are represented by a single token, but other times a single token represents a part of
a word, or even a single letter. This is called subword tokenization

## 2.3 Calculate the probability of the next token

In [13]:
import torch

with torch.inference_mode():
  logits = model(**inputs).logits[:,-1,:]
  probabilities = torch.nn.functional.softmax(logits[0],dim=-1)


def show_next_token_choices(probabilities, top_n=5):
  return pd.DataFrame([(id,tokenizer.decode(id),p.item()) for id,p in enumerate(probabilities) if p.item()],
                      columns=['id','token','probability']).sort_values("probability",ascending=False)[:top_n]


show_next_token_choices(probabilities)

Unnamed: 0,id,token,probability
8300,8300,programming,0.157599
4673,4673,learning,0.148413
4981,4981,models,0.048504
17219,17219,biology,0.046482
16113,16113,algorithms,0.027794


In [20]:
next_token_id = torch.argmax(probabilities).item()

print(f"Next token id: {next_token_id}")
print(f"Next token: {tokenizer.decode(next_token_id)}")

Next token id: 8300
Next token:  programming


In [21]:
text = text + tokenizer.decode(8300)
text

'Udacity is the best place to learn about generative programming'

### 2.4 Or we can simply use .generate

In [22]:
output = model.generate(**inputs, max_length=100, pad_token_id=tokenizer.eos_token_id)

tokenizer.decode(output[0])

'Udacity is the best place to learn about generative programming.\n\nThe first thing you need to know is that generative programming is a very powerful programming language. It is a very powerful programming language that can be used to create complex programs. It is also very powerful for debugging.\n\nThe second thing you need to know is that generative programming is very powerful for debugging. It is very powerful for debugging because it is very powerful for debugging.\n\nThe third thing you need'