# Introduction to Generative AI

## What is Generative AI?

Generative AI refers to artificial intelligence systems that can create new content, data, or outputs that resemble human-created work. Unlike traditional AI systems that primarily analyze and classify existing data, generative AI models learn patterns from training data to produce novel, original content.

## Discriminative vs Generative Models

| **Discriminative Models** | **Generative Models** |
|---------------------------|----------------------|
| **Purpose**: Classify or predict labels for given inputs | **Purpose**: Generate new data samples similar to training data |
| **Question**: "What is this?" | **Question**: "What could this be?" |
| **Examples**: Image classifiers, spam detectors, sentiment analysis | **Examples**: GPT models, DALL-E, Stable Diffusion |
| **Output**: Categories, labels, or predictions | **Output**: New text, images, code, audio, or other content |
| **Learning**: Learns decision boundaries between classes | **Learning**: Learns underlying data distribution and patterns |

## Real-World Applications

### 🤖 Chatbots and Conversational AI
- **Customer service automation**
- **Virtual assistants** (ChatGPT, Claude, Bard)
- **Educational tutoring systems**
- **Mental health support bots**

### 🎨 Image and Visual Content Generation
- **Art creation** (DALL-E, Midjourney, Stable Diffusion)
- **Photo editing and enhancement**
- **Product design and prototyping**
- **Marketing content creation**

### 💻 Code Assistants and Development Tools
- **Code completion** (GitHub Copilot, CodeT5)
- **Bug detection and fixing**
- **Documentation generation**
- **Code translation between programming languages**
- **Architecture and design suggestions**

### 📝 Additional Applications
- **Content writing** (blogs, articles, creative writing)
- **Music and audio generation**
- **Video creation and editing**
- **3D model generation**
- **Scientific research and drug discovery**

## Generative Models Overview

### Generative Adversarial Networks (GANs)

**GANs** consist of two neural networks competing against each other:

- **Generator**: Creates fake data samples
- **Discriminator**: Tries to distinguish real from fake data

The generator learns to create increasingly realistic data by trying to fool the discriminator, while the discriminator becomes better at detecting fakes. This adversarial training process results in high-quality generated content.

**Key Characteristics:**
- Excellent for generating realistic images
- Fast inference once trained
- Can suffer from mode collapse and training instability
- Examples: StyleGAN, CycleGAN, Pix2Pix

### Variational Autoencoders (VAEs)

**VAEs** learn to encode data into a compressed latent space and then decode it back to the original format.

**Architecture:**
- **Encoder**: Maps input data to a probability distribution in latent space
- **Latent Space**: Compressed representation with probabilistic properties
- **Decoder**: Reconstructs data from latent representations

**Key Characteristics:**
- Provides smooth interpolation between data points
- More stable training than GANs
- Lower quality outputs compared to GANs
- Useful for data compression and anomaly detection

### Diffusion Models (Stable Diffusion)

**Diffusion models** learn to reverse a gradual noising process by progressively denoising data.

**Process:**
1. **Forward Process**: Gradually add noise to training data
2. **Reverse Process**: Learn to remove noise step by step
3. **Generation**: Start with pure noise and iteratively denoise

**Key Characteristics:**
- State-of-the-art quality for image generation
- Highly controllable with text prompts
- Slower generation compared to GANs
- Examples: DALL-E 2, Stable Diffusion, Midjourney

### Comparison Summary

| Model Type | Quality | Speed | Training Stability | Control |
|------------|---------|-------|-------------------|---------|
| **GANs** | High | Fast | Moderate | Limited |
| **VAEs** | Moderate | Fast | High | Good |
| **Diffusion** | Very High | Slow | High | Excellent |

## Large Language Models (LLMs)

### Architecture Basics: Transformers

**Transformers** are the foundational architecture behind modern LLMs, introduced in the paper "Attention Is All You Need" (2017).

**Key Components:**
- **Self-Attention Mechanism**: Allows the model to weigh the importance of different words in a sequence
- **Multi-Head Attention**: Multiple attention mechanisms running in parallel
- **Position Encoding**: Provides information about word order since transformers don't inherently understand sequence
- **Feed-Forward Networks**: Process the attention outputs
- **Layer Normalization**: Stabilizes training

**Architecture Types:**
- **Encoder-Only**: Processes input sequences (BERT)
- **Decoder-Only**: Generates sequences autoregressively (GPT)
- **Encoder-Decoder**: Combines both for translation tasks (T5)

### Pre-trained Models

#### GPT (Generative Pre-trained Transformer)
- **Architecture**: Decoder-only transformer
- **Training**: Autoregressive language modeling (predict next word)
- **Versions**: GPT-1 (117M params) → GPT-4 (1.76T params estimated)
- **Strengths**: Text generation, conversation, code completion

#### LLaMA (Large Language Model Meta AI)
- **Architecture**: Decoder-only transformer with optimizations
- **Training**: Efficient training on diverse datasets
- **Versions**: LLaMA 1 (7B-65B params), LLaMA 2 (7B-70B params)
- **Strengths**: Open-source alternative, efficient inference

#### BERT (Bidirectional Encoder Representations from Transformers)
- **Architecture**: Encoder-only transformer
- **Training**: Masked language modeling + next sentence prediction
- **Versions**: BERT-Base (110M), BERT-Large (340M)
- **Strengths**: Understanding context, classification tasks

### Use Cases in Natural Language Processing

#### 📝 Text Generation and Completion
- **Creative writing** and storytelling
- **Email drafting** and professional communication
- **Code generation** and programming assistance
- **Content creation** for marketing and social media

#### 🔍 Text Analysis and Understanding
- **Sentiment analysis** for customer feedback
- **Named entity recognition** (people, places, organizations)
- **Text classification** and categorization
- **Question answering** systems

#### 🌐 Language Translation and Processing
- **Machine translation** between languages
- **Text summarization** for long documents
- **Paraphrasing** and style transfer
- **Grammar correction** and language learning

#### 🤖 Conversational AI Applications
- **Chatbots** for customer service
- **Virtual assistants** for task automation
- **Educational tutoring** and personalized learning
- **Research assistance** and information retrieval

#### 💼 Business and Enterprise Applications
- **Document processing** and information extraction
- **Contract analysis** and legal document review
- **Meeting transcription** and summary generation
- **Knowledge management** and search enhancement

## Prompt Engineering

### What is Prompt Engineering?

**Prompt engineering** is the practice of designing and optimizing input prompts to elicit the best possible responses from large language models. It's a crucial skill for effectively leveraging AI systems to achieve specific goals and outcomes.

**Key Principles:**
- **Clarity**: Be specific and unambiguous in your instructions
- **Context**: Provide relevant background information
- **Structure**: Use consistent formatting and organization
- **Iteration**: Refine prompts based on model responses

### Writing Effective Prompts

#### 🎯 Components of a Good Prompt

1. **Task Description**: Clearly state what you want the model to do
2. **Context**: Provide necessary background information
3. **Instructions**: Specify format, tone, and constraints
4. **Examples**: Show desired input-output patterns (when applicable)
5. **Output Format**: Define how you want the response structured

#### ✅ Best Practices

- **Be specific**: Instead of "Write about dogs," use "Write a 200-word informative article about dog training techniques for puppies"
- **Use clear language**: Avoid ambiguous terms and jargon
- **Set constraints**: Specify length, style, audience, and format requirements
- **Provide context**: Give the model relevant background information
- **Use positive framing**: Say what you want, not what you don't want

### Prompting Techniques

#### 🚀 Zero-Shot Prompting

**Definition**: Asking the model to perform a task without providing any examples.

**Example:**
```
Classify the following email as spam or not spam:
"Congratulations! You've won $1,000,000! Click here to claim your prize now!"
```

**When to use:**
- Simple, well-defined tasks
- When you don't have examples available
- For general knowledge questions

#### 📚 Few-Shot Prompting

**Definition**: Providing a few examples of input-output pairs before asking the model to perform the task.

**Example:**
```
Classify these emails as spam or not spam:

Email: "Meeting scheduled for tomorrow at 2 PM in conference room A"
Classification: Not spam

Email: "URGENT: Verify your account immediately or it will be deleted!"
Classification: Spam

Email: "Your order #12345 has been shipped and will arrive in 2-3 business days"
Classification: Not spam

Email: "Make money fast! Work from home and earn $5000/week!"
Classification: ?
```

**When to use:**
- Complex or nuanced tasks
- When you have good examples available
- For tasks requiring specific formatting

#### 🧠 Chain-of-Thought (CoT) Prompting

**Definition**: Encouraging the model to break down complex problems into step-by-step reasoning.

**Example:**
```
Solve this math word problem step by step:

Sarah has 15 apples. She gives 1/3 of them to her brother and 2/5 of the remaining apples to her sister. How many apples does Sarah have left?

Let me think through this step by step:
1. Sarah starts with 15 apples
2. She gives 1/3 to her brother: 15 × 1/3 = 5 apples
3. Remaining after giving to brother: 15 - 5 = 10 apples
4. She gives 2/5 of remaining to sister: 10 × 2/5 = 4 apples
5. Final amount: 10 - 4 = 6 apples

Therefore, Sarah has 6 apples left.
```

**When to use:**
- Mathematical problems
- Complex reasoning tasks
- Multi-step processes
- When you need to understand the model's reasoning

### Hands-on with Open-Source LLM APIs

#### 🤗 Hugging Face Transformers

**Popular Models Available:**
- **Text Generation**: GPT-2, GPT-Neo, BLOOM, LLaMA
- **Text Classification**: BERT, RoBERTa, DistilBERT
- **Question Answering**: BERT-QA, RoBERTa-QA
- **Translation**: MarianMT, T5

**Basic Usage Pattern:**
```python
from transformers import pipeline

# Initialize a pipeline for specific task
generator = pipeline('text-generation', model='gpt2')

# Use the pipeline with your prompt
result = generator("Your prompt here", max_length=100)
```

#### 🔧 API Integration Best Practices

- **Rate Limiting**: Respect API rate limits and implement backoff strategies
- **Error Handling**: Account for timeouts, server errors, and quota limits
- **Cost Management**: Monitor usage and optimize prompt efficiency
- **Security**: Keep API keys secure and never commit them to version control
- **Version Control**: Track which model versions you're using for reproducibility

#### 🎛️ Parameter Tuning

**Key Parameters to Experiment With:**
- **Temperature**: Controls randomness (0.0 = deterministic, 1.0 = very random)
- **Max Tokens**: Limits response length
- **Top-p**: Nucleus sampling parameter
- **Top-k**: Limits vocabulary to top k most likely tokens
- **Frequency Penalty**: Reduces repetition
- **Presence Penalty**: Encourages topic diversity

In [None]:
from openai import OpenAI

client = OpenAI(
)

In [4]:
response = client.responses.create(
    model="openai/gpt-oss-20b",
    input="Explain the theory of relativity in simple terms."
)

# print(response)
message = response.output_text
print(message)

**The Theory of Relativity – in plain words**

Einstein’s relativity isn’t one single idea; it’s two big ideas that changed how we think about space, time and gravity. Think of them as two parts of the same story.

---

## 1. Special Relativity (1905)

**Core Idea:**  
The laws of physics are the same for every observer who is moving at a constant speed (no acceleration), and the speed of light is always the same, no matter how fast you’re moving.

### Why that matters

| Old thinking | New reality (special relativity) |
|--------------|-----------------------------------|
| Time is absolute: everyone experiences it the same. | Time is relative: it can tick faster or slower depending on how fast you’re moving. |
| Length is the same for everyone. | Length “shrinks” in the direction you’re moving, but only at speeds close to light. |
| Adding speeds like in everyday life: 50 km/h + 50 km/h = 100 km/h. | Adding speeds isn’t simple because the speed of light (≈ 300 000 km/s) is the ultima

In [None]:

import openai

client = openai.OpenAI(
    base_url="https://api.groq.com/openai/v1",
)

In [5]:
response = client.responses.create(
    model="openai/gpt-oss-20b",
    input="Explain the theory of relativity in simple terms."
)

# print(response)
message = response.output_text
print(message)

**The theory of relativity** is a way of understanding how space, time, and gravity work. It’s actually two closely related ideas that Einstein developed in the early 1900s:

---

## 1. Special Relativity (1905)

| Idea | What it means | Everyday analogy |
|------|---------------|------------------|
| **The speed of light is the same everywhere** | No matter how fast you’re moving, light always travels at the same speed (≈ 300 000 km/s). | Imagine you’re on a bicycle and a light flash appears behind you. No matter how fast you go, the flash still moves at the same speed relative to *you*. |
| **Time can slow down or speed up** | When two clocks are moving relative to each other, the moving clock ticks slower. | Think of a video that’s slowed down: the moving clock “plays” slower. |
| **Lengths can shrink** | Moving objects get shorter in the direction of motion. | A moving train looks a bit “squashed” if you’re moving beside it very quickly. |
| **Mass and energy are interchangeable** 