# Week 9: Modern AI Applications and Large Language Models

## Learning Objectives:
- Understand transformer architecture (lightly)
- Learn about large language models (lightly)

## Topics Covered:
- Transformer architecture
- Attention mechanisms
- Large Language Models (GPT, BERT)
- Natural Language Processing applications
- Computer vision applications

## Homework:
Build-Your-Own Spotify Daylist

## Case Study:
Attention Is All You Need

In [None]:
# Import essential libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
import re
import warnings
warnings.filterwarnings('ignore')

# NLP libraries
try:
    import nltk
    from nltk.corpus import stopwords
    from nltk.tokenize import word_tokenize
    from nltk.stem import WordNetLemmatizer
    print("NLTK available")
except ImportError:
    print("NLTK not available - install with: pip install nltk")
    nltk = None

# Advanced NLP libraries
try:
    from transformers import pipeline, AutoTokenizer, AutoModel
    print("Transformers library available")
except ImportError:
    print("Transformers not available - install with: pip install transformers")
    pipeline = None

# Deep Learning libraries
try:
    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras import layers
    print(f"TensorFlow version: {tf.__version__}")
except ImportError:
    print("TensorFlow not available - install with: pip install tensorflow")
    tf = None

# Set plotting style
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)

# Set random seeds for reproducibility
np.random.seed(42)
if tf is not None:
    tf.random.set_seed(42)

print("Libraries imported successfully!")

## 1. Introduction to Modern AI and Natural Language Processing

Modern AI has been revolutionized by advances in Natural Language Processing (NLP), particularly through the development of transformer architectures and large language models.

### Evolution of NLP:
1. **Traditional NLP** (1950s-2000s): Rule-based systems, bag-of-words
2. **Statistical NLP** (2000s-2010s): Machine learning, feature engineering
3. **Deep Learning NLP** (2010s): RNNs, LSTMs, word embeddings
4. **Transformer Era** (2017-present): Attention mechanisms, BERT, GPT

### Key Breakthroughs:
- **Word Embeddings**: Word2Vec, GloVe (2013-2014)
- **Attention Mechanisms**: Neural Machine Translation (2015)
- **Transformer Architecture**: "Attention Is All You Need" (2017)
- **BERT**: Bidirectional Encoder Representations (2018)
- **GPT Series**: Generative Pre-trained Transformers (2018-present)

### Modern Applications:
- Machine translation
- Text summarization
- Question answering
- Chatbots and virtual assistants
- Content generation
- Sentiment analysis
- Code generation

In [None]:
# Create sample text data for demonstration
sample_texts = [
    "I love this product! It's amazing and works perfectly.",
    "This is terrible. Worst purchase ever. Complete waste of money.",
    "The movie was okay. Not great, but not bad either.",
    "Fantastic service! The staff was very helpful and friendly.",
    "I hate waiting in long lines. This is so frustrating.",
    "The weather is beautiful today. Perfect for a walk in the park.",
    "This book is boring. I couldn't get past the first chapter.",
    "Outstanding performance! The team did an excellent job.",
    "The food was delicious. I'll definitely come back here.",
    "Poor customer service. They were rude and unhelpful."
]

# Labels: 0 = negative, 1 = neutral, 2 = positive
labels = [2, 0, 1, 2, 0, 2, 0, 2, 2, 0]
label_names = ['Negative', 'Neutral', 'Positive']

# Create DataFrame
df = pd.DataFrame({
    'text': sample_texts,
    'label': labels,
    'sentiment': [label_names[l] for l in labels]
})

print("Sample Text Data:")
print(df)

# Visualize sentiment distribution
plt.figure(figsize=(10, 6))
sentiment_counts = df['sentiment'].value_counts()
plt.bar(sentiment_counts.index, sentiment_counts.values, color=['red', 'gray', 'green'])
plt.title('Sentiment Distribution in Sample Data')
plt.xlabel('Sentiment')
plt.ylabel('Count')
plt.grid(True, alpha=0.3)
plt.show()

## 2. Attention Mechanisms

Attention mechanisms allow models to focus on relevant parts of the input when making predictions, similar to how humans selectively focus on important information.

### Key Concepts:
- **Query (Q)**: What we're looking for
- **Key (K)**: What we're looking at
- **Value (V)**: The actual information
- **Attention Weights**: How much to focus on each part
- **Context Vector**: Weighted combination of values

### Types of Attention:
1. **Additive Attention**: Uses a neural network to compute attention scores
2. **Multiplicative Attention**: Uses dot product for efficiency
3. **Self-Attention**: Attention within the same sequence
4. **Multi-Head Attention**: Multiple attention mechanisms in parallel

### Mathematical Foundation:
- **Attention Score**: score(Q, K) = Q · K^T
- **Attention Weights**: α = softmax(score(Q, K))
- **Context Vector**: C = Σ(α_i × V_i)

In [None]:
# Simple attention mechanism demonstration
def simple_attention_demo():
    # Example sentence: "The cat sat on the mat"
    words = ['The', 'cat', 'sat', 'on', 'the', 'mat']
    
    # Simple word embeddings (random for demonstration)
    np.random.seed(42)
    embeddings = np.random.randn(len(words), 4)  # 4-dimensional embeddings
    
    # Query: let's focus on the word "cat"
    query_idx = 1  # "cat"
    query = embeddings[query_idx]
    
    # Calculate attention scores (dot product)
    scores = np.dot(embeddings, query)
    
    # Apply softmax to get attention weights
    attention_weights = np.exp(scores) / np.sum(np.exp(scores))
    
    # Calculate context vector
    context_vector = np.sum(attention_weights[:, np.newaxis] * embeddings, axis=0)
    
    print("=== ATTENTION MECHANISM DEMO ===")
    print(f"Query word: {words[query_idx]}")
    print(f"\nAttention weights:")
    for i, (word, weight) in enumerate(zip(words, attention_weights)):
        print(f"{word}: {weight:.4f}")
    
    # Visualize attention weights
    plt.figure(figsize=(10, 6))
    bars = plt.bar(words, attention_weights, color='skyblue')
    bars[query_idx].set_color('orange')  # Highlight query word
    plt.title(f'Attention Weights for Query: "{words[query_idx]}"')
    plt.xlabel('Words')
    plt.ylabel('Attention Weight')
    plt.xticks(rotation=45)
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()
    
    return context_vector, attention_weights

context, weights = simple_attention_demo()

## 3. Transformer Architecture

The Transformer architecture, introduced in "Attention Is All You Need" (2017), revolutionized NLP by relying entirely on attention mechanisms without recurrent or convolutional layers.

### Key Components:
1. **Multi-Head Attention**: Multiple attention mechanisms in parallel
2. **Position Encoding**: Adds positional information to embeddings
3. **Layer Normalization**: Stabilizes training
4. **Feed-Forward Networks**: Point-wise transformations
5. **Residual Connections**: Skip connections for gradient flow

### Architecture:
- **Encoder**: Processes input sequence
- **Decoder**: Generates output sequence
- **Self-Attention**: Relates positions within same sequence
- **Cross-Attention**: Relates encoder and decoder

### Advantages:
- Parallelizable (unlike RNNs)
- Captures long-range dependencies
- More efficient training
- Better performance on many tasks

### Limitations:
- Quadratic complexity with sequence length
- Requires large amounts of data
- Computationally expensive
- Less interpretable than attention

In [None]:
# Visualize transformer architecture components
def visualize_transformer_concepts():
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    
    # 1. Multi-Head Attention concept
    ax1 = axes[0, 0]
    heads = ['Head 1', 'Head 2', 'Head 3', 'Head 4']
    attention_patterns = np.random.rand(4, 6)  # 4 heads, 6 words
    im1 = ax1.imshow(attention_patterns, cmap='Blues', aspect='auto')
    ax1.set_title('Multi-Head Attention Pattern')
    ax1.set_xlabel('Word Position')
    ax1.set_ylabel('Attention Head')
    ax1.set_yticks(range(4))
    ax1.set_yticklabels(heads)
    plt.colorbar(im1, ax=ax1, shrink=0.8)
    
    # 2. Position Encoding
    ax2 = axes[0, 1]
    seq_len = 20
    d_model = 8
    pos_encoding = np.zeros((seq_len, d_model))
    
    for pos in range(seq_len):
        for i in range(0, d_model, 2):
            pos_encoding[pos, i] = np.sin(pos / (10000 ** (i / d_model)))
            if i + 1 < d_model:
                pos_encoding[pos, i + 1] = np.cos(pos / (10000 ** (i / d_model)))
    
    im2 = ax2.imshow(pos_encoding.T, cmap='RdYlBu', aspect='auto')
    ax2.set_title('Positional Encoding')
    ax2.set_xlabel('Position')
    ax2.set_ylabel('Embedding Dimension')
    plt.colorbar(im2, ax=ax2, shrink=0.8)
    
    # 3. Self-Attention Matrix
    ax3 = axes[1, 0]
    sentence = "The quick brown fox jumps".split()
    n_words = len(sentence)
    
    # Simulate attention matrix
    attention_matrix = np.random.rand(n_words, n_words)
    # Make it more realistic (higher attention to nearby words)
    for i in range(n_words):
        for j in range(n_words):
            distance = abs(i - j)
            attention_matrix[i, j] *= np.exp(-distance * 0.3)
    
    # Normalize
    attention_matrix = attention_matrix / attention_matrix.sum(axis=1, keepdims=True)
    
    im3 = ax3.imshow(attention_matrix, cmap='Greens', aspect='auto')
    ax3.set_title('Self-Attention Matrix')
    ax3.set_xlabel('Key Words')
    ax3.set_ylabel('Query Words')
    ax3.set_xticks(range(n_words))
    ax3.set_yticks(range(n_words))
    ax3.set_xticklabels(sentence, rotation=45)
    ax3.set_yticklabels(sentence)
    plt.colorbar(im3, ax=ax3, shrink=0.8)
    
    # 4. Layer Structure
    ax4 = axes[1, 1]
    layers = ['Input\nEmbedding', 'Multi-Head\nAttention', 'Add & Norm', 'Feed\nForward', 'Add & Norm', 'Output']
    y_pos = np.arange(len(layers))
    colors = ['lightblue', 'orange', 'lightgreen', 'orange', 'lightgreen', 'lightcoral']
    
    bars = ax4.barh(y_pos, [1]*len(layers), color=colors)
    ax4.set_yticks(y_pos)
    ax4.set_yticklabels(layers)
    ax4.set_xlabel('Layer Processing')
    ax4.set_title('Transformer Layer Structure')
    ax4.set_xlim(0, 1.2)
    
    # Add arrows
    for i in range(len(layers)-1):
        ax4.annotate('', xy=(0.5, i+1), xytext=(0.5, i),
                    arrowprops=dict(arrowstyle='->', lw=2, color='black'))
    
    plt.tight_layout()
    plt.show()

visualize_transformer_concepts()

print("\nTransformer Key Innovations:")
print("1. Multi-Head Attention: Multiple attention mechanisms capture different relationships")
print("2. Positional Encoding: Sine/cosine functions encode position information")
print("3. Self-Attention: Each word attends to all other words in the sequence")
print("4. Layer Normalization: Stabilizes training and enables deeper networks")
print("5. Residual Connections: Skip connections help with gradient flow")

## 4. Large Language Models (LLMs)

Large Language Models are transformer-based models trained on vast amounts of text data to understand and generate human-like text.

### Key Models:
1. **BERT** (2018): Bidirectional Encoder Representations from Transformers
2. **GPT Series** (2018-2023): Generative Pre-trained Transformers
3. **T5** (2019): Text-to-Text Transfer Transformer
4. **PaLM** (2022): Pathways Language Model
5. **ChatGPT/GPT-4** (2022-2023): Conversational AI systems

### Training Process:
1. **Pre-training**: Large-scale unsupervised learning on text corpora
2. **Fine-tuning**: Task-specific supervised learning
3. **Instruction Tuning**: Training to follow instructions
4. **Reinforcement Learning from Human Feedback (RLHF)**: Alignment with human preferences

### Capabilities:
- Text generation and completion
- Question answering
- Summarization
- Translation
- Code generation
- Reasoning and problem-solving
- Creative writing

### Challenges:
- Computational requirements
- Hallucination (generating false information)
- Bias and fairness
- Interpretability
- Safety and alignment

In [None]:
# Demonstrate basic NLP tasks that LLMs excel at
def demonstrate_nlp_tasks():
    print("=== LARGE LANGUAGE MODEL CAPABILITIES ===")
    
    # Sample texts for different tasks
    tasks = {
        'Text Classification': {
            'description': 'Classify text into categories (sentiment, topic, etc.)',
            'example': 'The new iPhone is amazing! I love the camera quality.',
            'output': 'Sentiment: Positive, Topic: Technology'
        },
        'Text Summarization': {
            'description': 'Create concise summaries of longer texts',
            'example': 'Climate change is a long-term shift in global temperatures and weather patterns. While climate change is natural, human activities have been the main driver since the 1800s...',
            'output': 'Summary: Climate change refers to long-term temperature and weather shifts, primarily caused by human activities since the 1800s.'
        },
        'Question Answering': {
            'description': 'Answer questions based on context or knowledge',
            'example': 'Context: Paris is the capital of France. Question: What is the capital of France?',
            'output': 'Answer: Paris'
        },
        'Text Generation': {
            'description': 'Generate coherent text based on prompts',
            'example': 'Write a story about a robot who learns to paint...',
            'output': 'Generated: In a small workshop, R2-D7 discovered brushes and colors. Day by day, it learned to express emotions through art...'
        },
        'Translation': {
            'description': 'Translate text between languages',
            'example': 'English: Hello, how are you? → Spanish: Hola, ¿cómo estás?',
            'output': 'Translation successful with high accuracy'
        },
        'Code Generation': {
            'description': 'Generate code from natural language descriptions',
            'example': 'Create a Python function that calculates the factorial of a number',
            'output': 'def factorial(n):\n    if n <= 1:\n        return 1\n    return n * factorial(n-1)'
        }
    }
    
    for task_name, task_info in tasks.items():
        print(f"\n{task_name}:")
        print(f"  Description: {task_info['description']}")
        print(f"  Example Input: {task_info['example'][:100]}..." if len(task_info['example']) > 100 else f"  Example Input: {task_info['example']}")
        print(f"  Expected Output: {task_info['output']}")
    
    # Visualize model scale evolution
    plt.figure(figsize=(12, 8))
    
    models = ['BERT-Base', 'GPT-1', 'GPT-2', 'GPT-3', 'PaLM', 'GPT-4']
    parameters = [110, 117, 1500, 175000, 540000, 1000000]  # in millions
    years = [2018, 2018, 2019, 2020, 2022, 2023]
    
    colors = ['blue', 'green', 'orange', 'red', 'purple', 'brown']
    
    plt.scatter(years, parameters, c=colors, s=100, alpha=0.7)
    
    for i, (model, param, year) in enumerate(zip(models, parameters, years)):
        plt.annotate(f'{model}\n{param}M params', 
                    (year, param), 
                    xytext=(5, 5), 
                    textcoords='offset points',
                    fontsize=9,
                    ha='left')
    
    plt.yscale('log')
    plt.xlabel('Year')
    plt.ylabel('Number of Parameters (millions)')
    plt.title('Evolution of Large Language Models')
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

demonstrate_nlp_tasks()

## 5. BERT: Bidirectional Encoder Representations

BERT revolutionized NLP by introducing bidirectional training, allowing the model to understand context from both directions.

### Key Features:
- **Bidirectional**: Reads text in both directions
- **Encoder-only**: Uses only the encoder part of transformer
- **Pre-training Tasks**: Masked Language Modeling, Next Sentence Prediction
- **Fine-tuning**: Adapts to specific tasks

### Architecture:
- **BERT-Base**: 12 layers, 768 hidden units, 12 attention heads
- **BERT-Large**: 24 layers, 1024 hidden units, 16 attention heads
- **WordPiece Tokenization**: Subword units
- **Special Tokens**: [CLS], [SEP], [MASK]

### Applications:
- Text classification
- Named entity recognition
- Question answering
- Sentiment analysis
- Text similarity

In [None]:
# Demonstrate BERT concepts (without actual BERT model)
def demonstrate_bert_concepts():
    print("=== BERT CONCEPTS DEMONSTRATION ===")
    
    # Demonstrate masked language modeling concept
    original_sentence = "The cat sat on the mat"
    masked_sentence = "The cat [MASK] on the mat"
    
    print("\n1. Masked Language Modeling:")
    print(f"Original: {original_sentence}")
    print(f"Masked: {masked_sentence}")
    print(f"BERT predicts: 'sat' (based on bidirectional context)")
    
    # Demonstrate next sentence prediction
    sentence_a = "I went to the store."
    sentence_b_correct = "I bought some milk."
    sentence_b_incorrect = "The weather is nice today."
    
    print("\n2. Next Sentence Prediction:")
    print(f"Sentence A: {sentence_a}")
    print(f"Sentence B (correct): {sentence_b_correct}")
    print(f"Sentence B (incorrect): {sentence_b_incorrect}")
    print(f"BERT predicts: Sentence B is a valid continuation (IsNext=True)")
    print(f"BERT predicts: Sentence B is NOT a valid continuation (IsNext=False)")
    
    # Demonstrate tokenization
    sample_text = "Hello, world! This is BERT."
    print(f"\n3. Tokenization (WordPiece):")
    print(f"Original: {sample_text}")
    print(f"Tokens: ['[CLS]', 'Hello', ',', 'world', '!', 'This', 'is', 'BE', '##RT', '.', '[SEP]']")
    print(f"Note: '##RT' indicates a subword continuation")
    
    # Visualize BERT architecture
    plt.figure(figsize=(12, 8))
    
    # BERT vs GPT comparison
    models = ['BERT-Base', 'BERT-Large', 'GPT-1', 'GPT-2', 'GPT-3']
    params = [110, 340, 117, 1500, 175000]
    layers = [12, 24, 12, 48, 96]
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
    
    # Parameters comparison
    colors = ['blue', 'darkblue', 'green', 'orange', 'red']
    ax1.bar(models, params, color=colors, alpha=0.7)
    ax1.set_yscale('log')
    ax1.set_ylabel('Parameters (millions)')
    ax1.set_title('Model Size Comparison')
    ax1.tick_params(axis='x', rotation=45)
    
    # Layers comparison
    ax2.bar(models, layers, color=colors, alpha=0.7)
    ax2.set_ylabel('Number of Layers')
    ax2.set_title('Model Depth Comparison')
    ax2.tick_params(axis='x', rotation=45)
    
    plt.tight_layout()
    plt.show()

demonstrate_bert_concepts()

## 6. GPT: Generative Pre-trained Transformers

The GPT series represents the generative approach to language modeling, focusing on text generation rather than understanding.

### Key Features:
- **Autoregressive**: Predicts next token based on previous tokens
- **Decoder-only**: Uses only the decoder part of transformer
- **Causal Attention**: Can only attend to previous tokens
- **Zero-shot/Few-shot Learning**: Performs tasks without specific training

### Evolution:
- **GPT-1** (2018): 117M parameters, proof of concept
- **GPT-2** (2019): 1.5B parameters, impressive text generation
- **GPT-3** (2020): 175B parameters, few-shot learning breakthrough
- **GPT-4** (2023): Multimodal capabilities, improved reasoning

### Applications:
- Text generation and completion
- Creative writing
- Code generation
- Conversational AI
- Content creation

In [None]:
# Demonstrate GPT concepts and capabilities
def demonstrate_gpt_concepts():
    print("=== GPT CONCEPTS DEMONSTRATION ===")
    
    # Demonstrate autoregressive generation
    prompt = "The future of artificial intelligence is"
    tokens = prompt.split()
    
    print("\n1. Autoregressive Text Generation:")
    print(f"Prompt: {prompt}")
    print("Generation process:")
    
    # Simulate token-by-token generation
    generated_tokens = ["bright", "because", "it", "will", "help", "solve", "complex", "problems"]
    current_text = prompt
    
    for i, token in enumerate(generated_tokens):
        print(f"  Step {i+1}: '{current_text}' → '{token}'")
        current_text += " " + token
    
    print(f"\nFinal output: {current_text}")
    
    # Demonstrate few-shot learning
    print("\n2. Few-shot Learning Example:")
    few_shot_prompt = '''Examples:
Input: "I love this movie!"
Output: Positive

Input: "This is terrible."
Output: Negative

Input: "The weather is okay."
Output: Neutral

Input: "This restaurant is amazing!"
Output: '''
    
    print(few_shot_prompt)
    print("Expected: Positive")
    
    # Compare BERT vs GPT
    print("\n3. BERT vs GPT Comparison:")
    comparison = {
        'Architecture': ['Encoder-only', 'Decoder-only'],
        'Training': ['Bidirectional', 'Autoregressive'],
        'Best for': ['Understanding', 'Generation'],
        'Attention': ['Bidirectional', 'Causal (masked)'],
        'Applications': ['Classification, QA', 'Generation, Chat']
    }
    
    print(f"{'Aspect':<15} {'BERT':<20} {'GPT':<20}")
    print("-" * 55)
    for aspect, (bert_val, gpt_val) in comparison.items():
        print(f"{aspect:<15} {bert_val:<20} {gpt_val:<20}")

demonstrate_gpt_concepts()

## 7. Build-Your-Own Spotify Daylist (Homework Project)

For this week's homework, you'll create a simple recommendation system inspired by Spotify's Daylist feature, which creates personalized playlists based on time of day and user preferences.

### Project Overview:
Create a system that:
1. Analyzes user listening patterns by time of day
2. Categorizes music by mood/energy level
3. Generates time-appropriate playlists
4. Uses basic NLP for mood detection from song lyrics/titles

### Components:
- **Data**: Sample music dataset with features
- **Time Analysis**: Listening patterns by hour
- **Mood Classification**: Energy, valence, danceability
- **Recommendation Engine**: Content-based filtering
- **Playlist Generation**: Time-aware recommendations

In [None]:
# Sample implementation of Spotify Daylist concept
def create_sample_daylist_system():
    print("=== BUILD-YOUR-OWN SPOTIFY DAYLIST ===")
    
    # Sample music data
    music_data = {
        'song_id': range(1, 21),
        'title': [
            'Morning Sunshine', 'Coffee Blues', 'Energetic Workout', 'Smooth Jazz',
            'Upbeat Pop', 'Chill Vibes', 'Rock Anthem', 'Acoustic Calm',
            'Electronic Dance', 'Mellow Evening', 'Late Night Ballad', 'Sunrise Melody',
            'Power Workout', 'Relaxing Piano', 'Party Time', 'Quiet Reflection',
            'Morning Motivation', 'Afternoon Delight', 'Evening Jazz', 'Nighttime Lullaby'
        ],
        'energy': [0.8, 0.3, 0.9, 0.2, 0.7, 0.4, 0.8, 0.3, 0.9, 0.2, 0.1, 0.6, 0.9, 0.2, 0.8, 0.3, 0.7, 0.5, 0.3, 0.1],
        'valence': [0.9, 0.4, 0.8, 0.6, 0.8, 0.7, 0.7, 0.8, 0.9, 0.6, 0.3, 0.8, 0.9, 0.9, 0.9, 0.5, 0.8, 0.7, 0.6, 0.4],
        'danceability': [0.6, 0.2, 0.8, 0.3, 0.7, 0.5, 0.6, 0.2, 0.9, 0.3, 0.2, 0.5, 0.8, 0.1, 0.9, 0.2, 0.6, 0.6, 0.4, 0.1],
        'optimal_hour': [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 6, 19, 23]
    }
    
    df_music = pd.DataFrame(music_data)
    print("Sample Music Dataset:")
    print(df_music.head(10))
    
    # Define time periods and their characteristics
    time_periods = {
        'Early Morning (6-8)': {'energy': 0.6, 'valence': 0.7, 'danceability': 0.4},
        'Morning (9-11)': {'energy': 0.7, 'valence': 0.8, 'danceability': 0.6},
        'Afternoon (12-17)': {'energy': 0.6, 'valence': 0.7, 'danceability': 0.5},
        'Evening (18-21)': {'energy': 0.5, 'valence': 0.6, 'danceability': 0.4},
        'Night (22-23)': {'energy': 0.3, 'valence': 0.5, 'danceability': 0.2}
    }
    
    # Generate daylist for different times
    def generate_daylist(hour, num_songs=5):
        # Determine time period
        if 6 <= hour <= 8:
            period = 'Early Morning (6-8)'
        elif 9 <= hour <= 11:
            period = 'Morning (9-11)'
        elif 12 <= hour <= 17:
            period = 'Afternoon (12-17)'
        elif 18 <= hour <= 21:
            period = 'Evening (18-21)'
        else:
            period = 'Night (22-23)'
        
        target_features = time_periods[period]
        
        # Calculate similarity scores
        df_music['similarity'] = (
            abs(df_music['energy'] - target_features['energy']) * 0.4 +
            abs(df_music['valence'] - target_features['valence']) * 0.3 +
            abs(df_music['danceability'] - target_features['danceability']) * 0.3
        )
        
        # Get top recommendations
        recommendations = df_music.nsmallest(num_songs, 'similarity')
        
        return period, recommendations[['title', 'energy', 'valence', 'danceability']]
    
    # Generate playlists for different times
    test_hours = [7, 10, 15, 20, 23]
    
    for hour in test_hours:
        period, playlist = generate_daylist(hour)
        print(f"\n{hour}:00 - {period} Daylist:")
        print(playlist.to_string(index=False))
    
    # Visualize music features by time
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    # Energy by optimal hour
    axes[0, 0].scatter(df_music['optimal_hour'], df_music['energy'], alpha=0.6, color='red')
    axes[0, 0].set_xlabel('Hour of Day')
    axes[0, 0].set_ylabel('Energy Level')
    axes[0, 0].set_title('Energy vs Time of Day')
    axes[0, 0].grid(True, alpha=0.3)
    
    # Valence by optimal hour
    axes[0, 1].scatter(df_music['optimal_hour'], df_music['valence'], alpha=0.6, color='blue')
    axes[0, 1].set_xlabel('Hour of Day')
    axes[0, 1].set_ylabel('Valence (Positivity)')
    axes[0, 1].set_title('Valence vs Time of Day')
    axes[0, 1].grid(True, alpha=0.3)
    
    # Danceability by optimal hour
    axes[1, 0].scatter(df_music['optimal_hour'], df_music['danceability'], alpha=0.6, color='green')
    axes[1, 0].set_xlabel('Hour of Day')
    axes[1, 0].set_ylabel('Danceability')
    axes[1, 0].set_title('Danceability vs Time of Day')
    axes[1, 0].grid(True, alpha=0.3)
    
    # Feature correlation heatmap
    correlation_matrix = df_music[['energy', 'valence', 'danceability']].corr()
    sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, ax=axes[1, 1])
    axes[1, 1].set_title('Feature Correlation Matrix')
    
    plt.tight_layout()
    plt.show()
    
    return df_music

sample_data = create_sample_daylist_system()

## 8. Summary

Congratulations! You've completed your introduction to modern AI applications and large language models. Here's what you learned:

### Key Concepts Mastered:
1. **Attention Mechanisms**: Query, Key, Value paradigm and self-attention
2. **Transformer Architecture**: Multi-head attention, positional encoding, layer normalization
3. **Large Language Models**: BERT, GPT series, and their capabilities
4. **Pre-training and Fine-tuning**: Modern AI training paradigms
5. **NLP Applications**: Text classification, generation, translation, summarization
6. **Recommendation Systems**: Content-based filtering and personalization

### Key Skills Acquired:
- Understanding modern AI architectures
- Recognizing when to use different model types
- Building simple recommendation systems
- Analyzing text data and user preferences
- Implementing basic NLP pipelines
- Creating time-aware applications

### Revolution in AI:
- **"Attention Is All You Need"** (2017) changed everything
- Transformers enabled large-scale language models
- BERT showed the power of bidirectional understanding
- GPT demonstrated impressive text generation capabilities
- Modern LLMs can perform many tasks with minimal training

### Real-world Applications:
- **Search Engines**: Better understanding of user queries
- **Virtual Assistants**: More natural conversations
- **Content Creation**: Automated writing and editing
- **Customer Service**: Intelligent chatbots
- **Education**: Personalized tutoring systems
- **Healthcare**: Medical text analysis
- **Finance**: Document processing and analysis

### Current Challenges:
- **Computational Cost**: Training and inference are expensive
- **Data Requirements**: Need massive amounts of training data
- **Bias and Fairness**: Models can perpetuate societal biases
- **Interpretability**: Difficult to understand model decisions
- **Hallucination**: Models can generate false information
- **Safety**: Ensuring AI systems are aligned with human values

### Future Directions:
- **Multimodal Models**: Combining text, images, and audio
- **Efficient Architectures**: Reducing computational requirements
- **Better Alignment**: Making AI systems more helpful and harmless
- **Specialized Applications**: Domain-specific AI assistants
- **Human-AI Collaboration**: Augmenting human capabilities

### Homework Project:
Your "Build-Your-Own Spotify Daylist" project demonstrates how modern AI concepts can be applied to create personalized, time-aware recommendations. This combines:
- Content analysis (music features)
- User behavior modeling (time preferences)
- Recommendation algorithms (similarity matching)
- Personalization (individual preferences)

This project showcases how AI can enhance user experiences by understanding context and preferences, similar to how large language models adapt their responses based on context and user needs.