# 🧠 Day 1: Hands-on with Language Models using Phi-3
Welcome to the first session of the Generative AI workshop!

Today we'll explore the basics of large language models (LLMs) and use Microsoft's **Phi-3 Mini** model.

### 🎯 Objectives
- Understand what a language model is
- Load and run a small LLM
- Generate text from prompts
- Modify prompts and analyze outputs
- Reflect on tokenization and model behavior

## 🔧 Setup the Environment

In [None]:
 %%capture
 !pip install transformers accelerate

## 📦 Load the Phi-3 Model
Fill in the missing arguments to complete the model and tokenizer loading.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# TODO: Load the model here
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",               # Hint: it's a Phi-3 variant, look for the 4k instruct model , 4k: 4 put cunatization -> 32-bit to 4-bit
    device_map="cuda",   # Hint: 'cuda' or 'auto'
    torch_dtype="auto",  # Hint: dtype hint
    trust_remote_code=False
)

# TODO: Load the tokenizer here
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

## 📦 Create a Text Generation Pipeline

Wrap the model and tokenizer into a convenient `pipeline` object for easy inference.


In [None]:
from transformers import pipeline

# TODO: Create a pipeline for text-generation using the model and tokenizer
generator = pipeline(
    "text-generation",               # Hint: task type
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,  # Hint: token cap
    do_sample=False
)


## 💬 Create and Send a Prompt

Finally, we create our prompt as a user and give it to the model:


In [None]:
# TODO: Define a user message that asks for a joke
messages = [
    {"role": "user", "content": "Give me a joke for kids"}  # Hint: Something humorous
]

# Generate output
output = generator(messages)

# TODO: Extract and print the model’s response
print(output[0]['generated_text'])


## ✉️ Generate a Custom Output from a Prompt

Now let’s directly tokenize a prompt and run it through the model to generate a complete response.


In [None]:
# TODO: Write a detailed prompt that includes "<|assistant|>" at the end
prompt = "Help me to understand what does tokenization mean with examples <|assistant|>"

# Tokenize the prompt
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")

# TODO: Use the model to generate output from input_ids
generation_output = model.generate(
    input_ids=input_ids,
    max_new_tokens=1000  # Hint: max tokens for generation, somewhere along the range of 100 to 1000
)

# TODO: Decode and print the output
print(tokenizer.decode(generation_output[0], skip_special_tokens=True))


# ✍️ Reflection Prompt
## Try changing the prompt to a sarcastic tone or use specific instructions.
## What do you observe in the outputs? Discuss with your team.


## 1️⃣ View the Token IDs

After tokenizing the prompt, we can print out the list of token IDs generated by the tokenizer.


In [None]:
# TODO: Print the token IDs from our tokenized input
print(input_ids)  # Hint: What variable stores our tokenized input?

# Expected output format:
# tensor([[14359, 385, 4376, 27746, 5281, 394, 19235, 363, 278, 25305, ...]])

## 2️⃣ Decode Each Token

We can decode each token ID individually to better understand how the model splits the input into subwords.


In [None]:
# TODO: Create a loop to decode each token individually
for token_id in input_ids[0]:  # Hint: What should we iterate over?
    print(tokenizer.decode(token_id.item()))  # Hint: What method converts token IDs back to text?

## 🧬 Inspect the Raw Model Output

The model returns a tensor of token IDs as its output. These represent the full generated sequence (input + new tokens).


In [None]:
generation_output

## 🔡 Combine Tokens into Words

Subword tokenizers may split a word into multiple pieces. You can decode them individually or as a group to see how they combine into meaningful text.


In [None]:
# TODO: Decode individual tokens to see subword splitting
print(tokenizer.decode([3323]))   # Hint: Try token ID 3323
print(tokenizer.decode([622]))   # Hint: Try token ID 622
print(tokenizer.decode([3323, 622]))  # Hint: Combine tokens
print(tokenizer.decode([20000]))   # Hint: Try token ID 29901

# Expected output pattern:
# Sub
# ject
# Subject
# .

###🌈 Token Coloring Function

This function uses ANSI escape codes to highlight each token in a different background color. It helps visualize how text is broken down into subword units.


In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

colors_list = [
    '102;194;165', '252;141;98', '141;160;203',
    '231;138;195', '166;216;84', '255;217;47'
]

def show_tokens(sentence, tokenizer_name):
   # TODO: Load the tokenizer from the pretrained model
   tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)  # Hint: What variable contains the model name?

   # TODO: Tokenize the sentence to get token IDs
   token_ids = tokenizer(sentence).input_ids  # Hint: What text should we tokenize?

   for idx, t in enumerate(token_ids):
       print(
           f'\033[38;2;0;{colors_list[idx % len(colors_list)]}m' +
           tokenizer.decode(t) +  # Hint: What method converts token ID back to text?
           '\033[0m',
           end='' # Hint: What should this end with?
       )

## 🧪 Try It on a Complex Sentence

Now test the tokenizer on a sentence with mixed content: capital letters, emojis, symbols, numbers, and spacing.


In [None]:
# TODO: Create a test sentence with mixed content
text = """
Hello WORLD! 😄 test: 123 #  words ن
"""

## 🆚 `bert-base-uncased` Tokenizer

This tokenizer lowercases all input and splits words into WordPiece subwords. Notice how it handles casing and unknown characters.


In [None]:
show_tokens(text, "bert-base-uncased")  # Hint: What text and tokenizer name?

## 🆚 `bert-base-cased` Tokenizer

Unlike the uncased version, this tokenizer preserves capitalization. Compare the tokenization output to see how casing influences token splitting.


In [None]:
show_tokens(text, "bert-base-cased")  # Hint: What text and tokenizer name?

## 🆚 `gpt2` Tokenizer

GPT-2 uses Byte-Pair Encoding (BPE), which often results in different token splits, especially with punctuation, emojis, or spacing.


In [None]:
show_tokens(text, "gpt2")  # Hint: What text and tokenizer name?

## 🆚 `google/flan-t5-small` Tokenizer

This model uses SentencePiece, which breaks down text in a more language-agnostic way. Observe how it segments common phrases and subwords.


In [None]:
show_tokens(text, "google/flan-t5-small")  # Hint: What text and tokenizer name?

## 🆚 `Xenova/gpt-4` Tokenizer

This Hugging Face-hosted tokenizer mirrors OpenAI's `tiktoken`. It uses Byte-Pair Encoding and handles punctuation, numbers, and special symbols distinctly.


In [None]:
# The official is `tiktoken` but this the same tokenizer on the HF platform
show_tokens(text, "Xenova/gpt-4")  # Hint: What text and tokenizer name?

## 🆚 `bigcode/starcoder2-15b` Tokenizer

You need access to use the actual model, but the tokenizer is available. It's optimized for code and performs differently on natural language and structured inputs.


In [None]:
# You need to request access before being able to use this tokenizer
show_tokens(text, "bigcode/starcoder2-15b")  # Hint: What text and tokenizer name?

## 🆚 `microsoft/Phi-3-mini-4k-instruct` Tokenizer

This tokenizer is designed for compact, efficient language modeling. Notice how it splits and groups tokens differently than BERT or GPT models.


In [None]:
show_tokens(text, "microsoft/Phi-3-mini-4k-instruct")  # Hint: What text and tokenizer name?

## 🧠 Load a Model to Extract Embeddings

We can use a pretrained transformer (like DeBERTa) to convert text into token-level embeddings.


In [None]:
from transformers import AutoModel, AutoTokenizer

# TODO: Load a tokenizer for embeddings extraction
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-base")  # Hint: What model name for DeBERTa-base?

# TODO: Load a language model for embeddings
model = AutoModel.from_pretrained("microsoft/deberta-base")  # Hint: Same model name as tokenizer

# TODO: Tokenize the sentence with proper tensor format
promot = 'Hello my name is EMAN, What about you?'
tokens = tokenizer(promot, return_tensors='pt')  # Hint: What text and tensor format?

# TODO: Process the tokens through the model to get embeddings
output = model(**tokens)["last_hidden_state"]  # Hint: What variable contains our tokenized input?

## 🔢 View Embedding Dimensions

Each token is mapped to a high-dimensional vector. Let’s inspect the shape of the output to understand the model’s internal representation.


In [None]:
# TODO: Check the shape of the embedding output
output.shape  # Hint: What variable contains our model output?

# Expected output: torch.Size([1, 4, 384])

## 🔍 Decode Input Tokens

We can decode the tokens back to their original text to understand which words each embedding vector corresponds to.


In [None]:
# TODO: Decode the input tokens back to text
for token in tokens['input_ids'][0]:  # Hint: What key contains the input IDs?
    print(tokenizer.decode(token.item()))  # Hint: What method decodes tokens?

# Expected output:
# [CLS]
# Hello
# world
# [SEP]

## 🧬 View Token-Level Embeddings

Each row in the tensor represents a single token’s embedding — a high-dimensional numerical representation that captures meaning and context.


In [None]:
# TODO: View the token-level embeddings tensor
print(output)  # Hint: What variable contains our model output?

# Each row represents a single token's embedding - a high-dimensional numerical representation

## 🧠 Encode a Sentence into a Vector

We can convert an entire sentence into a fixed-size embedding vector using a pretrained model. This helps machines understand and compare sentences by their meaning.


In [None]:
from sentence_transformers import SentenceTransformer

# TODO: Load a sentence transformer model
model = SentenceTransformer('all-mpnet-base-v2')  # Hint: What's the model name for all-mpnet-base-v2?

# TODO: Convert text to sentence embeddings
vector = model.encode("Best movie ever!")  # Hint: What text should we encode, like "Best movie ever!"?

## 📏 Check the Sentence Embedding Size

Let’s inspect the dimensions of the sentence vector. This tells us how many numerical features are used to represent the meaning of the sentence.


In [None]:
# TODO: Check the dimensions of the sentence embedding vector
vector.shape  # Hint: What variable contains our sentence vector?

# Expected output: (768,)

## 🌐 Load Pretrained GloVe Embeddings

We can use GloVe (trained on Wikipedia and Gigaword) to explore classic word embeddings. These models capture word meaning based on co-occurrence patterns.


In [None]:
!pip install gensim

In [None]:
import gensim.downloader as api

# TODO: Download GloVe embeddings (50MB, trained on Wikipedia, vector size: 50)
model = api.load('glove-wiki-gigaword-50')  # Hint: What's the model name for glove-wiki-gigaword-50?

## 🔍 Find Similar Words in Embedding Space

We can now explore semantic similarity using vector distance. The model returns words that are closest to `"king"` in embedding space.


In [None]:
# TODO: Find words most similar to "king" in embedding space
model.most_similar(['king'], topn=10)  # Hint: What word to search for and how many results?

# Expected output: List of (word, similarity_score) tuples

## 📂 Load and Parse Playlist Data

We begin by downloading a playlist dataset and parsing it into a usable format. Each playlist is represented as a sequence of song IDs.


In [None]:
import pandas as pd
from urllib import request

# TODO: Get the playlist dataset file
data = request.urlopen('https://storage.googleapis.com/maps-premium/dataset/yes_complete/train.txt')

# TODO: Parse the playlist dataset file, skipping metadata lines
lines = data.read().decode('utf-8').split('\n')[2:]  # Hint: What encoding and split character?

# TODO: Remove playlists with only one song
playlists = [s.strip().split() for s in lines if len(s.strip().split()) > 1]  # Hint: Minimum songs per playlist?

# TODO: Load song metadata
songs_file = request.urlopen('https://storage.googleapis.com/maps-premium/dataset/yes_complete/song_hash.txt')
songs_file = songs_file.read().decode('utf-8').split('\n')  # Hint: Encoding and split character?
songs = [s.strip().split('\t') for s in songs_file if s.strip()]  # clean up empty lines

# TODO: Create a DataFrame with song information
songs_df = pd.DataFrame(data=songs, columns=['song_id', 'artist_name', 'track_name'])  # Hint: What data and column names?
songs_df = songs_df.set_index('song_id')  # Hint: What column to use as index?
songs_df

## 🎧 Display Sample Playlists

Let’s preview a couple of playlists to understand the structure. Each ID corresponds to a specific song.


In [None]:
# TODO: Display sample playlists to understand the structure
print('Playlist #1:\n ', playlists[0], '\n')  # Hint: Which playlist list and index?
print('Playlist #2:\n ', playlists[1])        # Hint: Which playlist list and index?


## 🧠 Train a Word2Vec Model on Playlists

We treat playlists like sentences and songs like words. Training Word2Vec on this lets us learn embeddings that capture song co-occurrence patterns.


In [None]:
from gensim.models import Word2Vec

# TODO: Train our Word2Vec model on playlist data
model = Word2Vec(
   playlists, vector_size=64, window=5, negative=10, min_count=1, workers=4
)

# Hint: What data to train on and what are reasonable parameter values?

## 🔍 Find Similar Songs by ID

Using our trained model, we can now retrieve songs that are most similar to a given song based on their playlist co-occurrence.


In [None]:
song_id = 2172

# TODO: Ask the model for songs similar to song #2172
model.wv.most_similar(positive=[str(song_id)])  # Hint: What song ID should we convert to string?

## 🎵 Look Up the Original Song

Let’s look up the details (title and artist) of the query song to better understand the recommendations.


In [None]:
# TODO: Look up the song details in our songs DataFrame
print(songs_df.iloc[song_id])  # Hint: What song ID should we look up?

## 📊 Define a Function to Print Song Recommendations

This helper function prints the top 5 recommended songs for any song ID by mapping the result back to human-readable titles and artists.


In [None]:
import numpy as np

def print_recommendations(song_id):
   # TODO: Get similar songs from the Word2Vec model
   similar_songs = np.array(
       model.wv.most_similar(positive=[str(song_id)], topn=3)  # Hint: What song ID and how many recommendations?
   )[:,0]

   # TODO: Return the song details from our DataFrame
   return songs_df.iloc[similar_songs]  # Hint: What variable contains the similar song IDs?

# TODO: Extract recommendations for song 2172
print_recommendations(song_id)  # Hint: What song ID should we test?

## 🎧 View Song Recommendations

Use the function to explore which songs are most similar to any track based on the playlist embedding model. Try it with different song IDs to see how recommendations vary across genres.


In [None]:
# TODO: Use the function to explore recommendations for different songs
print_recommendations(1500)  # Hint: Try a different song ID to see how recommendations vary

In [None]:
# 🤖 Load AI Models for Prompt Engineering
# Run this cell ONCE at the beginning - models will stay loaded

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

print("🔧 Loading Language Models...")
print("=" * 50)

# Load Phi-3 model and tokenizer
print("Loading Phi-3 model and tokenizer...")
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

print("✅ Models loaded successfully!")
print("🎯 Models are now available for all prompt engineering experiments!")
print("=" * 50)

In [None]:
# 🛠️ Setup Generation Function
# Run this cell after loading the models above

def generate_response(messages, max_tokens=200):
    # Simple prompt formatting
    if isinstance(messages, list) and len(messages) > 0:
        prompt = messages[0]['content']
    else:
        prompt = str(messages)

    # Tokenize input
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    # Generate response with minimal settings to avoid cache issues
    with torch.no_grad():
        outputs = model.generate(
            inputs['input_ids'],
            max_new_tokens=max_tokens,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            use_cache=False  # Disable cache to avoid version issues
        )

    # Decode only the new tokens
    response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
    return response

print("✅ Generation function ready!")
print("🚀 Ready for prompt engineering experiments!")
print("=" * 50)

# Test the generation function
test_messages = [{"role": "user", "content": "Hello! Can you introduce yourself briefly?"}]
test_output = generate_response(test_messages)
print("🧪 Test Output:")
print(test_output)
print("\n" + "="*50)
print("🎯 Now let's explore how different prompts affect AI responses!")

In [None]:
# 🎯 EXPERIMENT 1: Basic vs Advanced Prompting
# Compare how prompt quality affects AI responses

print("🔍 BASIC vs ADVANCED PROMPTING EXPERIMENT")
print("=" * 60)

# Basic prompt (often gives generic responses)
basic_prompt = "Write about artificial intelligence"

print("❌ BASIC PROMPT:")
print(f"Prompt: '{basic_prompt}'")
print("-" * 40)
basic_messages = [{"role": "user", "content": basic_prompt}]
basic_output = generate_response(basic_messages)
print("Result:")
print(basic_output)
print("\n" + "="*60)

# Advanced prompt (specific, structured, with context)
advanced_prompt = """You are an AI researcher writing for a tech magazine. Write a 200-word article about how artificial intelligence is transforming healthcare. Include:
- One specific real-world example
- One challenge that still needs solving
- A prediction for the next 5 years
Use an engaging, accessible tone for general readers."""

print("✅ ADVANCED PROMPT:")
print(f"Prompt: '{advanced_prompt}'")
print("-" * 40)
advanced_messages = [{"role": "user", "content": advanced_prompt}]
advanced_output = generate_response(advanced_messages)
print("Result:")
print(advanced_output)
print("\n" + "="*60)

print("🤔 REFLECTION QUESTIONS:")
print("1. Which response was more useful and specific?")
print("2. What elements made the advanced prompt more effective?")
print("3. How did structure and context change the output quality?")

In [None]:
# 🎯 YOUR TURN: Improve This Code Prompt
# Try to get better, more complete code from the AI

print("🚀 STUDENT EXPERIMENT: Code Generation")
print("=" * 50)

# Weak starting prompt
weak_prompt = "Write a function to sort numbers"

print("😐 STARTING PROMPT (Needs Improvement):")
print(f"'{weak_prompt}'")
weak_messages = [{"role": "user", "content": weak_prompt}]
weak_output = generate_response(weak_messages)
print("Result:")
print(weak_output)
print("\n" + "-"*50)

# YOUR IMPROVED PROMPT - Edit this!
your_improved_prompt = "Write function to sort float numbers using python with clear comments and steps without using build function from math library"

print("🔧 YOUR IMPROVED PROMPT:")
print(f"'{your_improved_prompt}'")
# Uncomment the lines below when you're ready to test your improved prompt:
improved_messages = [{"role": "user", "content": your_improved_prompt}]
improved_output = generate_response(improved_messages)
print("Your Result:")
print(improved_output)

print("\n📝 CHALLENGE: Rewrite 'your_improved_prompt' to get the best possible code!")
print("💡 TIPS: Be specific about documentation, error handling, examples, etc.")

In [None]:
# 🎯 YOUR TURN: Creative Writing Challenge
# Transform a boring prompt into something that generates amazing stories

print("✨ STUDENT EXPERIMENT: Creative Writing")
print("=" * 50)

# Generic starting prompt
boring_prompt = "Tell me a story"

print("😴 BORING PROMPT:")
print(f"'{boring_prompt}'")
boring_messages = [{"role": "user", "content": boring_prompt}]
boring_output = generate_response(boring_messages)
print("Result:")
print(boring_output)
print("\n" + "-"*50)

# YOUR CREATIVE PROMPT - Make it amazing!
your_creative_prompt = "tell me a creative story about good intentions of people within 100 words using advanced words"

print("🎨 YOUR CREATIVE PROMPT:")
print(f"'{your_creative_prompt}'")
# Uncomment when ready to test:
creative_messages = [{"role": "user", "content": your_creative_prompt}]
creative_output = generate_response(creative_messages)
print("Your Result:")
print(creative_output)

print("\n📝 CHALLENGE: Create a prompt that generates a compelling, specific story!")
print("💡 TIPS: Add constraints, vivid details, specific genres, character traits, etc.")

In [None]:
# 🎯 YOUR TURN: Problem Solving Challenge
# Get structured, actionable solutions instead of vague advice

print("💡 STUDENT EXPERIMENT: Problem Solving")
print("=" * 50)

# Vague starting prompt
vague_prompt = "How can schools reduce student stress?"

print("🤔 VAGUE PROMPT:")
print(f"'{vague_prompt}'")
vague_messages = [{"role": "user", "content": vague_prompt}]
vague_output = generate_response(vague_messages)
print("Result:")
print(vague_output)
print("\n" + "-"*50)

# YOUR STRUCTURED PROMPT - Make it actionable!
your_structured_prompt = """
You're an education consultant advising schools on reducing student stress.
Tell me how can schools reduce student stress? In term of study hour, sleep hour and rest hour.
I want 100 words.
"""

print("📊 YOUR STRUCTURED PROMPT:")
print(f"'{your_structured_prompt}'")
# Uncomment when ready to test:
structured_messages = [{"role": "user", "content": your_structured_prompt}]
structured_output = generate_response(structured_messages)
print("Your Result:")
print(structured_output)

print("\n📝 CHALLENGE: Get specific, actionable solutions with clear steps!")
print("💡 TIPS: Ask for formats, timelines, metrics, specific constraints, etc.")

In [None]:
# ⏱️ EXPERIMENT: Speed vs Quality Trade-offs
# Time how long different prompts take and compare efficiency

import time

print("⏱️ SPEED & EFFICIENCY EXPERIMENT")
print("=" * 50)

def timed_generation(messages, label, max_tokens=150):
    print(f"\n🔄 Generating: {label}")
    start_time = time.time()

    result = generate_response(messages, max_tokens=max_tokens)

    end_time = time.time()
    duration = end_time - start_time

    print(f"⏱️  Time taken: {duration:.2f} seconds")
    print(f"📝 Word count: ~{len(result.split())} words")
    print(f"⚡ Words per second: {len(result.split())/duration:.1f}")
    print(f"📄 Result:\n{result}")
    print("-" * 40)

    return result, duration

# Test different prompt lengths and complexity
prompts_to_test = [
    ("Short & Simple", "Explain AI in one sentence"),
    ("Medium & Specific", "Explain AI to a 12-year-old using simple examples"),
    ("Long & Detailed", """You are a teacher explaining artificial intelligence to middle school students.
    Create a 100-word explanation that includes:
    - What AI means in simple terms
    - One everyday example they know
    - Why it's useful
    - One limitation or concern
    Use conversational tone and avoid technical jargon."""),
    ("Your Custom Prompt", "ADD YOUR OWN PROMPT HERE TO TEST!")
]

results = {}
for label, prompt in prompts_to_test:
    if prompt != "You are now my teacher, Explain AI from basics to advanced examples":  # Skip placeholder
        messages = [{"role": "user", "content": prompt}]
        result, duration = timed_generation(messages, label)
        results[label] = {"duration": duration, "result": result}

print("\n📊 SPEED COMPARISON SUMMARY:")
print("=" * 50)
for label, data in results.items():
    print(f"{label}: {data['duration']:.2f}s")

print("\n🤔 REFLECTION QUESTIONS:")
print("- Did more detailed prompts take longer?")
print("- Which prompt gave the best quality/speed ratio?")
print("- When might you prefer speed vs. detailed prompts?")

In [None]:
# 🤖 EXPERIMENT: Multiple Models Comparison
# Load different models and compare their responses to the same prompt

print("🤖 MULTIPLE MODELS EXPERIMENT")
print("=" * 50)

# Load a second, smaller model for comparison
print("Loading GPT-2 for comparison...")
from transformers import GPT2LMHeadModel, GPT2Tokenizer

try:
    gpt2_model = GPT2LMHeadModel.from_pretrained("gpt2")
    gpt2_tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
    gpt2_tokenizer.pad_token = gpt2_tokenizer.eos_token

    def generate_gpt2_response(prompt, max_tokens=100):
        inputs = gpt2_tokenizer.encode(prompt, return_tensors="pt")
        with torch.no_grad():
            outputs = gpt2_model.generate(
                inputs,
                max_new_tokens=max_tokens,
                temperature=0.7,
                do_sample=True,
                pad_token_id=gpt2_tokenizer.eos_token_id
            )
        response = gpt2_tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
        return response

    print("✅ GPT-2 loaded successfully!")
    gpt2_available = True

except Exception as e:
    print(f"❌ Could not load GPT-2: {e}")
    print("Continuing with Phi-3 only...")
    gpt2_available = False

print("=" * 50)

In [None]:
# Test prompt for comparison
test_prompt = "Write a creative opening sentence for a mystery novel."

print(f"🎯 TEST PROMPT: '{test_prompt}'")
print("=" * 50)

# Phi-3 Response
print("🔷 PHI-3 RESPONSE:")
phi3_start = time.time()
phi3_messages = [{"role": "user", "content": test_prompt}]
phi3_response = generate_response(phi3_messages, max_tokens=100)
phi3_time = time.time() - phi3_start
print(f"⏱️  Time: {phi3_time:.2f}s")
print(f"📝 Response: {phi3_response}")
print("-" * 40)

# GPT-2 Response (if available)
if gpt2_available:
    print("🔶 GPT-2 RESPONSE:")
    gpt2_start = time.time()
    gpt2_response = generate_gpt2_response(test_prompt, max_tokens=100)
    gpt2_time = time.time() - gpt2_start
    print(f"⏱️  Time: {gpt2_time:.2f}s")
    print(f"📝 Response: {gpt2_response}")
    print("-" * 40)

    print("📊 MODEL COMPARISON:")
    print(f"Phi-3 speed: {phi3_time:.2f}s")
    print(f"GPT-2 speed: {gpt2_time:.2f}s")
    print(f"Speed winner: {'GPT-2' if gpt2_time < phi3_time else 'Phi-3'}")

print("\n🤔 REFLECTION QUESTIONS:")
print("- Which model gave more creative responses?")
print("- Which was faster?")
print("- How did response quality differ?")
print("- Which would you choose for different tasks?")

In [None]:
# 🌡️ EXPERIMENT: Temperature Settings
# See how creativity settings affect output consistency and variety

print("🌡️ TEMPERATURE & CREATIVITY EXPERIMENT")
print("=" * 50)

def generate_with_temperature(messages, temp, max_tokens=80):
    prompt = messages[0]['content']
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            inputs['input_ids'],
            max_new_tokens=max_tokens,
            temperature=temp,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            use_cache=False
        )

    response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
    return response

# Test prompt
creativity_prompt = "Come up with a unique name for a coffee shop and explain the concept."
messages = [{"role": "user", "content": creativity_prompt}]

temperatures = [0.1, 0.5, 0.9, 1.2]

print(f"🎯 TEST PROMPT: '{creativity_prompt}'")
print("=" * 50)

for temp in temperatures:
    print(f"🌡️  TEMPERATURE: {temp}")
    print("🔄 Generating 3 responses to show variety...")

    for i in range(3):
        start_time = time.time()
        response = generate_with_temperature(messages, temp)
        duration = time.time() - start_time

        print(f"  Response {i+1} ({duration:.2f}s): {response[:100]}...")

    print("-" * 40)

print("\n🤔 ANALYSIS QUESTIONS:")
print("- Which temperature gave the most creative responses?")
print("- Which was most consistent?")
print("- Which would you use for:")
print("  • Creative writing?")
print("  • Technical documentation?")
print("  • Factual answers?")

In [None]:
# 🎓 ADVANCED PROMPT ENGINEERING TECHNIQUES
# Experiment with different prompting strategies

print("🎓 ADVANCED PROMPTING TECHNIQUES")
print("=" * 50)

base_question = "How can a small business increase customer loyalty?"

techniques = {
    "Zero-Shot": base_question,

    "Few-Shot": """Here are examples of business loyalty strategies:
Example 1: Restaurant - Loyalty card with free meal after 10 visits
Example 2: Bookstore - Monthly book club with member discounts
Example 3: Gym - Referral bonus for bringing friends

Now answer: How can a small business increase customer loyalty?""",

    "Chain-of-Thought": """Think step by step about how a small business can increase customer loyalty:

Step 1: First, identify what makes customers loyal
Step 2: Then, consider what small businesses can realistically implement
Step 3: Finally, suggest specific actionable strategies

How can a small business increase customer loyalty?""",

    "Role-Playing": """You are a successful small business consultant with 15 years of experience helping local shops and services grow their customer base. You've seen what works and what doesn't.

A new small business owner asks: How can I increase customer loyalty?""",

    "Structured Output": """Provide strategies for small business customer loyalty in this format:

IMMEDIATE ACTIONS (0-30 days):
- [Strategy 1]: [Expected impact]
- [Strategy 2]: [Expected impact]

MEDIUM TERM (1-6 months):
- [Strategy 1]: [Expected impact]
- [Strategy 2]: [Expected impact]

LONG TERM (6+ months):
- [Strategy 1]: [Expected impact]

How can a small business increase customer loyalty?"""
}

results = {}
for technique, prompt in techniques.items():
    print(f"🔍 TECHNIQUE: {technique}")
    print(f"Prompt: {prompt[:100]}...")

    start_time = time.time()
    messages = [{"role": "user", "content": prompt}]
    response = generate_response(messages, max_tokens=150)
    duration = time.time() - start_time

    print(f"⏱️  Time: {duration:.2f}s")
    print(f"📝 Response: {response}")
    print("-" * 50)

    results[technique] = {"response": response, "time": duration}

print("📊 TECHNIQUE COMPARISON:")
print("=" * 30)
for technique, data in results.items():
    print(f"{technique}: {data['time']:.2f}s")

print("\n🎯 YOUR EXPERIMENT:")
print("Try creating your own advanced prompt using multiple techniques!")
your_advanced_prompt = "Explain for me briefly about LLms and what"

print(f"Your prompt: {your_advanced_prompt}")
# Uncomment to test:
# messages = [{"role": "user", "content": your_advanced_prompt}]
# your_result = generate_response(messages)
# print(f"Your result: {your_result}")