<a href="https://colab.research.google.com/github/amalinc/Introduction_to_LLMs/blob/main/Amalin_Cohen_Assignment_01_Introduction__to__LLMs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [36]:
# # Replace 'your_key_here' with your actual API key
# env_content = """
# OPENAI_API_KEY=key
# GEMINI_API_KEY=key
# """

# with open(".env", "w") as f:
#     f.write(env_content.strip())

# print(".env file created!")


.env file created!


# Introduction to Large Language Models and Prompt Engineering

**Course:** GenAI development and LLM applications


**Instructors:** Ori Shapira, Yuval Belfer

**Semester:** Summer
    
## Overview

This assignment provides a **hands‑on tour of the GenAI workflow**: from calling different kinds of language models to critically comparing their outputs and evaluating real‑world products built on top of them. You will practice with both *open‑source checkpoints* (via Hugging Face) and *hosted APIs* from OpenAI and Google, and reflect on where each approach shines.

Along the way you will explore how model size, instruction‑tuning, and deployment UX influence quality, cost, and user trust. The notebook culminates in a creative exercise where you invent a GenAI product to fill an unmet need—connecting technical capability to business value.

## Learning Objectives

- **Load and run** open‑source, OpenAI, and Google Gemini models using their latest SDKs/APIs.
- **Compare** small (≤ 3B) and large (≥ 7 B) models on simple vs. complex prompts.
- **Assess** output quality across dimensions such as correctness, coherence, and creativity.
- **Evaluate** end‑user GenAI products, identifying UX trade‑offs and guardrails.
- **Design** a new GenAI product concept aligned with a specific business pain‑point.

## Prerequisites
- Basic Python knowledge
- Familiarity with Jupyter notebooks
- Internet connection for API calls

## \*\*IMPORTANT NOTE

* **DO NOT HESITATE TO USE Gen AI tools such as ChatGPT to HELP you solve this**

* **DO NOT USE Gen AI tools such as ChatGPT to solve this for you**

# 1. Main stream APIs


## Setup

### Setup Guide: OpenAI & Google Gemini APIs

Follow these steps to get your notebook ready to call the mainstream LLM APIs used in this course.

#### 1. Install Python packages

In [1]:
# Install necessary Python packages
%pip install --upgrade openai google-generativeai python-dotenv




If you prefer installing from a terminal instead of within Colab/Jupyter, the equivalent command is:

```bash
pip install --upgrade openai google-generativeai python-dotenv
```

#### 2. Obtain your API keys

| Provider | Where to generate the key |
|----------|--------------------------|
| **OpenAI** | 1. Sign in at <https://platform.openai.com/>  <br>2. Go to **API Keys → Create new secret key**  <br>3. Copy the key (`sk-…`) |
| **Google Gemini** | 1. Visit <https://aistudio.google.com/app/apikey>  <br>2. Click **Create API key** and follow the prompts  <br>3. Copy the key (`AI-…`) |

#### 3. Store keys as environment variables

In [None]:
# import os

# # Replace the placeholders below with your real keys.
# os.environ['OPENAI_API_KEY'] = "sk-..."
# os.environ['GEMINI_API_KEY'] = "AI-..."

# print("Environment variables set for this notebook session ✅")


Alternatively, you can create a `.env` file in your project directory with the following contents:

```text
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AI-...
```

Then load the variables automatically in Python using **python-dotenv**:

In [2]:
from dotenv import load_dotenv
load_dotenv()  # Loads variables from .env into the current environment


True

#### 4. Quick verification

In [3]:
import os
import openai
import google.generativeai as genai
# from google.colab import userdata

openai.api_key = os.getenv("OPENAI_API_KEY")
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))

# openai.api_key = userdata.get('OPENAI_API_KEY')
# genai.configure(api_key= userdata.get('GEMINI_API_KEY') )

# Try listing available OpenAI models (show first 3)
print("OpenAI models:", [m.id for m in openai.models.list().data[:3]])

# List available Gemini models that support generateContent
print("\nAvailable Gemini models that support generateContent:")
for m in genai.list_models():
    if 'generateContent' in m.supported_generation_methods:
        print(m.name)

OpenAI models: ['gpt-4-0613', 'gpt-4', 'gpt-3.5-turbo']

Available Gemini models that support generateContent:
models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-002
models/gemini-1.5-pro
models/gemini-1.5-flash-latest
models/gemini-1.5-flash
models/gemini-1.5-flash-002
models/gemini-1.5-flash-8b
models/gemini-1.5-flash-8b-001
models/gemini-1.5-flash-8b-latest
models/gemini-2.5-pro-preview-03-25
models/gemini-2.5-flash-preview-04-17
models/gemini-2.5-flash-preview-05-20
models/gemini-2.5-flash
models/gemini-2.5-flash-preview-04-17-thinking
models/gemini-2.5-flash-lite-preview-06-17
models/gemini-2.5-pro-preview-05-06
models/gemini-2.5-pro-preview-06-05
models/gemini-2.5-pro
models/gemini-2.0-flash-exp
models/gemini-2.0-flash
models/gemini-2.0-flash-001
models/gemini-2.0-flash-exp-image-generation
models/gemini-2.0-flash-lite-001
models/gemini-2.0-flash-lite
models/gemini-2.0-flash-preview-image-generation
models/gemini-2.0-fl

## Using OAI API

In [9]:
# Exercise: Implement `call_openai`
# Your task is to complete the function below so that it sends a prompt to the OpenAI
# Chat Completion API and returns the assistant's reply as a **string**.
#
# Requirements ✍️
# ------------
# 1. Read your key from the environment variable `OPENAI_API_KEY` (do **not** hard‑code keys).
# 2. Use `openai.ChatCompletion.create` with the given *model* (default: `gpt-4o-mini`).
# 3. The function should take a single user *prompt* and return the model's reply **text only**.
# 4. Raise a clear `EnvironmentError` if the key is missing.
# 5. Let any `openai.error.OpenAIError` exceptions propagate––don’t swallow them.
#
# Example
# -------
# >>> call_openai("Explain why the sky is blue.")
# 'Rayleigh scattering causes shorter (blue) wavelengths ...'
#
# This exercise tests your ability to work with third‑party GenAI REST APIs in Python.
#
# 🚨 Delete the `raise NotImplementedError` line once you add your code.

import os
from openai import OpenAI

def call_openai(prompt: str, model: str = "gpt-4o") -> str:
    """Send *prompt* to the OpenAI Chat Completion API and return the reply text."""

    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        raise EnvironmentError("Missing OPENAI_API_KEY environment variable.")

    client = OpenAI(api_key=api_key)

    # Let OpenAIError propagate naturally if it occurs
    completion = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
    )

    return completion.choices[0].message.content


prompt = "Explain why the sky is blue."
print(call_openai(prompt))



The sky appears pink under certain conditions due to the scattering of sunlight in the atmosphere. This phenomenon is largely influenced by the scattering process known as Rayleigh scattering. Here's a simplified explanation:

1. **Sunlight Composition**: Sunlight is composed of a spectrum of colors, each with different wavelengths. Blue light has shorter wavelengths, while red light has longer wavelengths.

2. **Rayleigh Scattering**: During the day, shorter wavelengths (blue and violet light) are scattered in all directions by the gases and particles in the Earth's atmosphere, which is why the sky appears blue.

3. **Sunrise and Sunset**: During sunrise and sunset, the sun is positioned lower on the horizon, meaning its light has to pass through a thicker part of the atmosphere. This extended path causes more scattering of the shorter wavelengths (like blue and green), leaving the longer wavelengths (reds, oranges, and pinks) to dominate the sky's appearance.

4. **Particulate Matter

## Using Google Gemini API

In [None]:
# Exercise: Implement `call_gemini`
# Your task is to complete the function below so that it sends a prompt to Google’s **Gemini** model
# (via the `google-generativeai` Python SDK) and returns the model's reply as a **string**.
#
# Requirements ✍️
# ------------
# 1. Read your key from the environment variable `GEMINI_API_KEY`.
# 2. Use `google.generativeai.GenerativeModel(model).generate_content` to get a response.
# 3. Accept the same arguments as `call_openai` (prompt, model) and return *text only*.
# 4. Raise `EnvironmentError` if the key is missing.
#
# Example
# -------
# >>> call_gemini("Suggest three sci‑fi book titles set on Mars.")
# '1. Red Horizon ...'
#
# 🚨 Delete the `raise NotImplementedError` once you add your code.

import os
import google.generativeai as genai

def call_gemini(prompt: str, model: str = "gemini-pro") -> str:
    """Send *prompt* to the Google Gemini API and return the reply text."""
    # TODO: Implement this function following the requirements above.
    raise NotImplementedError("call_gemini() is not implemented yet.")


# Open source (Hugging Face)



1. Use **Gemma 3 4B** model.
2. Ensure your Hugging Face access token (`HF_TOKEN`) is set correctly in your environment (e.g., `export HF_TOKEN=YOUR_TOKEN`), or use `huggingface-cli login`.
3. Confirm that the student has **accepted the model terms** for Gemma on Hugging Face before attempting to download or load the model.


## Setup

In this subsection we'll install and configure the dependencies needed to load an **open‑source large language model (LLM)** from **Hugging Face**.

1. We install the latest versions of the `transformers`, `accelerate`, and `bitsandbytes` libraries.
2. If the model you want to use is behind a gated license (e.g. *Llama 2*), create a free Hugging Face account, accept the model license, and export your **HF token** as an environment variable so that `transformers` can download it.


In [None]:
# Install the required libraries (run once)
!pip -q install --upgrade "transformers[torch]" accelerate bitsandbytes --progress-bar off

# Set your Hugging Face access token
import os
os.environ["HF_TOKEN"] = "hf_YOUR_TOKEN"

## Running Gemma‑3‑4B via `transformers` pipeline

Below we load the **Gemma‑3‑4B** instruction‑tuned checkpoint (`google/gemma-3-4b-it`) and generate a short response.  
Make sure you have accepted the model license on Hugging Face **and** that your `HF_TOKEN` is set (or run `huggingface-cli login`) before running the cell.


**IMPORTANT** If you are using google colab change your notebook's runtime type to include a GPU. Go to "Runtime" -> "Change runtime type" and select "T4 GPU" under "Hardware accelerator"

**NOTE:** you can use ***FlashAttention2*** for faster infference. you'll have to install `!pip install flash_attn --no-build-isolation`, update `attn_implementation` and hope that's it :)

In [None]:
from transformers import (
    AutoTokenizer,
    Gemma3ForConditionalGeneration,
    pipeline,
)
import torch

# Instruction‑tuned Gemma 3 checkpoint
model_id = "google/gemma-3-4b-it"

# Check for CUDA availability and set device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Determine torch_dtype based on device and availability
torch_dtype = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float16 if torch.cuda.is_available() else "auto"
print(f"Using torch_dtype: {torch_dtype}")


# Load tokenizer & model (first run ~8‑10 GB download)
tokenizer = AutoTokenizer.from_pretrained(model_id, token=True)
model = Gemma3ForConditionalGeneration.from_pretrained(
    model_id,
    attn_implementation="eager",             # Fallback to eager attention / You can try "flash_attention_2" if you have a GPU with flash attention support
    device_map=device,                       # Explicitly set device
    torch_dtype=torch_dtype,                 # use bf16/fp16 if available
    token=True                               # pick up HF_TOKEN env var
)

# Build a text‑generation pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=128,
    do_sample=True,
    temperature=0.7,
)

prompt = "Explain the difference between supervised and unsupervised learning in two sentences."
response = generator(prompt)[0]["generated_text"]
print("\n--- Generated Response ---\n")
print(response)

## Comparing Output Quality: Gemma 3 4B vs. Gemini / GPT‑4o

In this section you will **run the *same prompt*** through three different models and compare their outputs:

1. **Gemma 3 4B** – use the `generator` pipeline you built earlier (`google/gemma-3-4b-it` checkpoint).  
2. **Gemini 1.5 Pro (call_gemini)** – use your `call_gemini(prompt)` helper.  
3. **GPT‑4o (call_openai)** – use your `call_openai(prompt)` helper.

### What to do
* Pick a single prompt of your choice (keep it identical for all three calls).  
* Capture each model’s response.  
* In a short paragraph, compare the outputs along these axes: **factual accuracy / hallucinations, completeness, style / tone, and overall usefulness**.  
* Optionally repeat with a *second* prompt that stresses reasoning or creativity to see whether conclusions change.

**Fill in the code cell below and then write your reflection in the provided markdown cell.**


In [None]:
# ▶️ Compare outputs from Gemma 3‑4B, Gemini, and GPT‑4o
prompt = "Explain the most important differences between classical mechanics and quantum mechanics in two paragraphs."

# Gemma (pipeline defined earlier)
gemma_out = generator(prompt)[0]["generated_text"]

# Gemini (helper function you implemented)
gemini_out = call_gemini(prompt)

# GPT‑4o (helper function you implemented)
gpt4o_out = call_openai(prompt)

print("\n--- Gemma 3‑4B ---\n", gemma_out)
print("\n--- Gemini ---\n", gemini_out)
print("\n--- GPT‑4o ---\n", gpt4o_out)


# Try out a GenAI‑Based Product



Choose **one** generative‑AI product (for example:  
* **Cursor** – an AI‑powered code editor,  
* **GitHub Copilot** – an AI pair‑programmer,  
* **Perplexity AI** – an answer‑engine / research assistant,  
* **Claude**, **Notion AI**, **Llama Index Lobe**,
or any other tool that interests you).  

Spend **10–15 minutes** exploring its features. Then:

1. Briefly describe the *task or workflow* you tried.  
2. List **two things you liked** and **two things you wish were better**.  
3. Reflect on how underlying model capabilities (e.g., context window, fine‑tuning, retrieval, chain‑of‑thought) manifest in the product experience.  
4. (Optional) Include a screenshot or snippet illustrating something interesting you observed.

Use the markdown cell below to record your reflections.


# What GenAI products does not exist

## Thinking Exercise – Design a New GenAI Product

Invent **one GenAI‑powered product that does not yet exist** but could solve a real problem. Write **100‑150 words** addressing the prompts below.

**Prompts**
1. **Product name & one‑line pitch**  
2. **Target users / market segment**  
3. **Pain point / business need** – why current solutions are insufficient  
4. **User flow & where GenAI fits** – describe the key interactions  
5. **Technical or ethical challenges** – data, safety, IP, bias, etc.  
6. **MVP success metric** – what would you measure to confirm traction  


### ✍️ Your concept

**1. Product name & pitch**:  

**2. Target users**:  

**3. Pain point**:  

**4. User flow**:  

**5. Challenges**:  

**6. Success metric**:  
