# 🐍 Notebook 02: Python Essentials for Gen AI

**Week 1-2: Python & ML Foundations**  
**Gen AI Masters Program**

---

## 📋 Objectives

By the end of this notebook, you will master:
1. ✅ Python data structures (lists, dicts, sets, tuples)
2. ✅ Control flow (if/else, loops, comprehensions)
3. ✅ Functions and lambda expressions
4. ✅ Object-oriented programming basics
5. ✅ File I/O and exception handling
6. ✅ Python best practices for ML

**Estimated Time:** 2-3 hours

---

## 📚 Why Python for Gen AI?

Python is the de facto language for AI/ML because:
- 🚀 **Rich Ecosystem**: PyTorch, TensorFlow, HuggingFace
- 📊 **Data Science**: NumPy, Pandas, Matplotlib
- 🤝 **Easy Integration**: REST APIs, databases, cloud services
- 👥 **Community**: Massive support and resources

Let's master the essentials! 🎯

## 1️⃣ Python Data Structures

### Lists - Ordered, Mutable Collections

In [None]:
# Creating lists
models = ["GPT-4", "Claude", "Gemini", "Llama"]
scores = [0.95, 0.92, 0.90, 0.88]

print("Models:", models)
print("First model:", models[0])
print("Last model:", models[-1])

# Adding elements
models.append("Mistral")
print("\nAfter append:", models)

# Slicing
print("First 3 models:", models[:3])
print("Last 2 models:", models[-2:])

# List operations
print(f"\nTotal models: {len(models)}")
print(f"Is GPT-4 in list? {'GPT-4' in models}")

### Dictionaries - Key-Value Pairs

In [None]:
# Model configuration (common in ML)
model_config = {
    "name": "GPT-4",
    "parameters": "175B",
    "context_length": 8192,
    "temperature": 0.7,
    "top_p": 0.9
}

print("Model Configuration:")
for key, value in model_config.items():
    print(f"  {key}: {value}")

# Accessing values
print(f"\nModel name: {model_config['name']}")
print(f"Context: {model_config.get('context_length', 'N/A')}")

# Adding new key
model_config["provider"] = "OpenAI"
print(f"\nUpdated config: {model_config}")

## 2️⃣ Control Flow

### If-Else Statements

In [None]:
# Model selection based on score
def select_model(score):
    if score >= 0.95:
        return "Excellent model - Use for production"
    elif score >= 0.85:
        return "Good model - Fine-tune further"
    elif score >= 0.75:
        return "Average model - Consider alternatives"
    else:
        return "Poor model - Re-train needed"

# Test different scores
test_scores = [0.98, 0.87, 0.76, 0.65]

for score in test_scores:
    result = select_model(score)
    print(f"Score {score:.2f}: {result}")

### Loops and Comprehensions

In [None]:
# Traditional for loop
prompts = ["Summarize this text", "Translate to Spanish", "Extract entities"]

print("For loop:")
for i, prompt in enumerate(prompts, 1):
    print(f"  {i}. {prompt}")

# List comprehension (Pythonic way!)
prompt_lengths = [len(p) for p in prompts]
print(f"\nPrompt lengths: {prompt_lengths}")

# Filtering with comprehension
long_prompts = [p for p in prompts if len(p) > 15]
print(f"Long prompts: {long_prompts}")

# Dictionary comprehension
prompt_dict = {i: p for i, p in enumerate(prompts)}
print(f"\nPrompt dictionary: {prompt_dict}")

## 3️⃣ Functions and Lambda Expressions

In [None]:
from typing import List, Dict, Optional

# Type hints make code clearer (important for ML pipelines)
def preprocess_text(text: str, lowercase: bool = True, remove_punctuation: bool = False) -> str:
    """
    Preprocess text for NLP tasks.
    
    Args:
        text: Input text to preprocess
        lowercase: Whether to convert to lowercase
        remove_punctuation: Whether to remove punctuation
    
    Returns:
        Preprocessed text
    """
    if lowercase:
        text = text.lower()
    
    if remove_punctuation:
        import string
        text = text.translate(str.maketrans('', '', string.punctuation))
    
    return text.strip()

# Test the function
sample_text = "Hello, World! This is Gen AI."
print("Original:", sample_text)
print("Lowercased:", preprocess_text(sample_text))
print("Clean:", preprocess_text(sample_text, remove_punctuation=True))

### Lambda Functions (Anonymous Functions)

In [None]:
# Lambda for quick operations
normalize = lambda x, min_val, max_val: (x - min_val) / (max_val - min_val)

scores = [65, 75, 85, 95, 100]
min_score, max_score = min(scores), max(scores)

normalized_scores = [normalize(s, min_score, max_score) for s in scores]
print("Original scores:", scores)
print("Normalized (0-1):", [f"{s:.2f}" for s in normalized_scores])

# Using lambda with map and filter
texts = ["hello world", "gen ai is amazing", "python rocks"]

# Map: Apply function to all elements
word_counts = list(map(lambda x: len(x.split()), texts))
print(f"\nWord counts: {word_counts}")

# Filter: Keep elements that meet condition
long_texts = list(filter(lambda x: len(x) > 15, texts))
print(f"Long texts: {long_texts}")

## 4️⃣ Object-Oriented Programming (OOP)

Classes are essential for building ML systems!

In [None]:
from dataclasses import dataclass
from typing import List, Optional

@dataclass
class LLMConfig:
    """Configuration for an LLM."""
    name: str
    model_id: str
    temperature: float = 0.7
    max_tokens: int = 512
    top_p: float = 0.9
    
    def to_dict(self) -> dict:
        """Convert config to dictionary."""
        return {
            "name": self.name,
            "model_id": self.model_id,
            "temperature": self.temperature,
            "max_tokens": self.max_tokens,
            "top_p": self.top_p
        }

class TextGenerator:
    """Simple text generator class."""
    
    def __init__(self, config: LLMConfig):
        self.config = config
        self.history: List[str] = []
    
    def generate(self, prompt: str) -> str:
        """Generate text from prompt."""
        # Simulate generation
        response = f"[{self.config.name}] Response to: '{prompt}'"
        self.history.append(prompt)
        return response
    
    def clear_history(self):
        """Clear generation history."""
        self.history = []
        print("History cleared")
    
    def __repr__(self):
        return f"TextGenerator(model={self.config.name}, history_len={len(self.history)})"

# Create and use the generator
config = LLMConfig(name="GPT-4", model_id="gpt-4-turbo")
generator = TextGenerator(config)

print(generator)
print("\nGeneration 1:", generator.generate("What is Gen AI?"))
print("Generation 2:", generator.generate("Explain transformers"))
print(f"\nHistory: {generator.history}")
print(f"Config: {config.to_dict()}")

## 5️⃣ File I/O and Exception Handling

In [None]:
import json
import os
from pathlib import Path

# Create a sample dataset
dataset = {
    "prompts": [
        {"id": 1, "text": "Explain machine learning", "category": "education"},
        {"id": 2, "text": "Summarize this article", "category": "summarization"},
        {"id": 3, "text": "Translate to French", "category": "translation"}
    ],
    "metadata": {
        "version": "1.0",
        "created": "2025-10-13",
        "total_samples": 3
    }
}

# Write to JSON file
output_file = "sample_dataset.json"

try:
    with open(output_file, 'w') as f:
        json.dump(dataset, f, indent=2)
    print(f"✅ Dataset saved to {output_file}")
    
    # Read back
    with open(output_file, 'r') as f:
        loaded_dataset = json.load(f)
    
    print(f"\n📊 Loaded {len(loaded_dataset['prompts'])} prompts")
    print(f"Metadata: {loaded_dataset['metadata']}")
    
except FileNotFoundError:
    print("❌ File not found")
except json.JSONDecodeError:
    print("❌ Invalid JSON format")
except Exception as e:
    print(f"❌ Error: {e}")
finally:
    print("\n✨ File operation completed")

## 6️⃣ Python Best Practices for ML

### Type Hints and Documentation

In [None]:
from typing import List, Dict, Tuple, Union
import numpy as np

def calculate_metrics(
    predictions: List[float],
    targets: List[float]
) -> Dict[str, float]:
    """
    Calculate regression metrics.
    
    Args:
        predictions: Model predictions
        targets: Ground truth values
    
    Returns:
        Dictionary containing MSE and RMSE
    
    Example:
        >>> preds = [1.0, 2.0, 3.0]
        >>> targets = [1.1, 2.1, 2.9]
        >>> metrics = calculate_metrics(preds, targets)
        >>> print(metrics['mse'])
        0.01
    """
    if len(predictions) != len(targets):
        raise ValueError("Predictions and targets must have same length")
    
    mse = sum((p - t) ** 2 for p, t in zip(predictions, targets)) / len(predictions)
    rmse = mse ** 0.5
    
    return {
        "mse": round(mse, 4),
        "rmse": round(rmse, 4),
        "samples": len(predictions)
    }

# Test
preds = [1.0, 2.0, 3.0, 4.0]
targets = [1.1, 2.2, 2.9, 4.1]

metrics = calculate_metrics(preds, targets)
print("Metrics:", metrics)

### Context Managers and Decorators

In [None]:
import time
from functools import wraps

# Decorator for timing functions
def timer(func):
    """Decorator to time function execution."""
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        end = time.time()
        print(f"⏱️  {func.__name__} took {end - start:.4f} seconds")
        return result
    return wrapper

# Apply decorator
@timer
def process_large_dataset(size: int):
    """Simulate processing a large dataset."""
    data = [i ** 2 for i in range(size)]
    return sum(data)

# Test
result = process_large_dataset(1000000)
print(f"Result: {result:,}")

# Context manager example
from contextlib import contextmanager

@contextmanager
def model_inference_mode():
    """Context manager for model inference."""
    print("🔄 Entering inference mode...")
    try:
        yield
    finally:
        print("✅ Exiting inference mode")

# Use context manager
with model_inference_mode():
    print("   Running inference...")

## 🎯 Practice Exercises

Try these exercises to reinforce your learning!

In [None]:
# Exercise 1: Create a function that tokenizes text
def tokenize(text: str) -> List[str]:
    """
    Split text into tokens (words).
    
    - Convert to lowercase
    - Split on whitespace
    - Remove empty strings
    """
    return [token for token in text.lower().split() if token]

# Test
tokens = tokenize("Hello World! This is Gen AI.")
print(f"Tokenize Exercise: {tokens}")

# Exercise 2: Build a simple caching decorator
def cache_results(func):
    """
    Decorator that caches function results.
    """
    _cache = {}
    @wraps(func)
    def wrapper(*args):
        if args in _cache:
            print(f"(from cache)")
            return _cache[args]
        result = func(*args)
        _cache[args] = result
        return result
    return wrapper

@cache_results
@timer
def expensive_calculation(a, b):
    time.sleep(1) # Simulate a long computation
    return a + b

print("\nCaching Decorator Exercise:")
expensive_calculation(1, 2)
expensive_calculation(1, 2)


# Exercise 3: Create a DataLoader class
class DataLoader:
    """
    Simple data loader for batching data.
    """
    def __init__(self, data: List, batch_size: int):
        self.data = data
        self.batch_size = batch_size
    
    def __len__(self):
        """Returns the number of batches."""
        return (len(self.data) + self.batch_size - 1) // self.batch_size
    
    def __iter__(self):
        """Yields batches of data."""
        for i in range(0, len(self.data), self.batch_size):
            yield self.data[i:i + self.batch_size]

print("\nDataLoader Exercise:")
my_data = list(range(25))
loader = DataLoader(my_data, batch_size=10)
print(f"Number of batches: {len(loader)}")

print("Iterating through batches:")
for i, batch in enumerate(loader):
    print(f"  Batch {i+1}: {batch}")

# You can still get a specific batch if needed, but iteration is more common
all_batches = list(loader)
print(f"\nLast batch: {all_batches[-1]}")

print("\n\n✅ All exercises complete!")

## 🎉 Summary

You've mastered Python essentials for Gen AI! Key takeaways:

### Data Structures
- ✅ Lists, dictionaries, sets, tuples
- ✅ List/dict comprehensions for concise code

### Control Flow
- ✅ If/else statements
- ✅ For/while loops
- ✅ Comprehensions

### Functions
- ✅ Type hints for clarity
- ✅ Lambda expressions for quick operations
- ✅ Decorators for reusable logic

### OOP
- ✅ Classes and dataclasses
- ✅ Methods and properties
- ✅ Inheritance

### Best Practices
- ✅ Exception handling
- ✅ File I/O with context managers
- ✅ Documentation and type hints

---

### 📚 Next Steps

Continue to **Notebook 03: NumPy & Pandas** to learn data manipulation!

<div align="center">
<b>Great job! Ready for data science libraries! 📊</b>
</div>