<a href="https://colab.research.google.com/github/amalinc/AmalinJustBreathe.github.io/blob/master/Copy_of_Amalin_Cohen_Assignment_01_Introduction__to__LLMs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# # Replace 'your_key_here' with your actual API key
# env_content = """
# OPENAI_API_KEY=key
# GEMINI_API_KEY=key
# """

# with open(".env", "w") as f:
#     f.write(env_content.strip())

# print(".env file created!")


In [None]:
# new_key = ""
# g_new_key = ""

# with open(".env", "w") as f:
#     f.write(f"OPENAI_API_KEY={new_key}\n")
#     f.write(f"GEMINI_API_KEY={g_new_key}\n")

# print(".env file overwritten with new key.")


In [None]:
!cat .env

cat: .env: No such file or directory


# Introduction to Large Language Models and Prompt Engineering

**Course:** GenAI development and LLM applications


**Instructors:** Ori Shapira, Yuval Belfer

**Semester:** Summer
    
## Overview

This assignment provides a **hands‑on tour of the GenAI workflow**: from calling different kinds of language models to critically comparing their outputs and evaluating real‑world products built on top of them. You will practice with both *open‑source checkpoints* (via Hugging Face) and *hosted APIs* from OpenAI and Google, and reflect on where each approach shines.

Along the way you will explore how model size, instruction‑tuning, and deployment UX influence quality, cost, and user trust. The notebook culminates in a creative exercise where you invent a GenAI product to fill an unmet need—connecting technical capability to business value.

## Learning Objectives

- **Load and run** open‑source, OpenAI, and Google Gemini models using their latest SDKs/APIs.
- **Compare** small (≤ 3B) and large (≥ 7 B) models on simple vs. complex prompts.
- **Assess** output quality across dimensions such as correctness, coherence, and creativity.
- **Evaluate** end‑user GenAI products, identifying UX trade‑offs and guardrails.
- **Design** a new GenAI product concept aligned with a specific business pain‑point.

## Prerequisites
- Basic Python knowledge
- Familiarity with Jupyter notebooks
- Internet connection for API calls

## \*\*IMPORTANT NOTE

* **DO NOT HESITATE TO USE Gen AI tools such as ChatGPT to HELP you solve this**

* **DO NOT USE Gen AI tools such as ChatGPT to solve this for you**

# 1. Main stream APIs


## Setup

### Setup Guide: OpenAI & Google Gemini APIs

Follow these steps to get your notebook ready to call the mainstream LLM APIs used in this course.

#### 1. Install Python packages

In [None]:
# Install necessary Python packages
%pip install --upgrade openai google-generativeai python-dotenv


Collecting openai
  Downloading openai-1.96.1-py3-none-any.whl.metadata (29 kB)
Collecting python-dotenv
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 kB)
Downloading openai-1.96.1-py3-none-any.whl (757 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m757.5/757.5 kB[0m [31m15.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading python_dotenv-1.1.1-py3-none-any.whl (20 kB)
Installing collected packages: python-dotenv, openai
  Attempting uninstall: openai
    Found existing installation: openai 1.94.0
    Uninstalling openai-1.94.0:
      Successfully uninstalled openai-1.94.0
Successfully installed openai-1.96.1 python-dotenv-1.1.1


If you prefer installing from a terminal instead of within Colab/Jupyter, the equivalent command is:

```bash
pip install --upgrade openai google-generativeai python-dotenv
```

#### 2. Obtain your API keys

| Provider | Where to generate the key |
|----------|--------------------------|
| **OpenAI** | 1. Sign in at <https://platform.openai.com/>  <br>2. Go to **API Keys → Create new secret key**  <br>3. Copy the key (`sk-…`) |
| **Google Gemini** | 1. Visit <https://aistudio.google.com/app/apikey>  <br>2. Click **Create API key** and follow the prompts  <br>3. Copy the key (`AI-…`) |

#### 3. Store keys as environment variables

In [None]:
# import os

# # Replace the placeholders below with your real keys.
# os.environ['OPENAI_API_KEY'] = "sk-..."
# os.environ['GEMINI_API_KEY'] = "AI-..."

# print("Environment variables set for this notebook session ✅")


Alternatively, you can create a `.env` file in your project directory with the following contents:

```text
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AI-...
```

Then load the variables automatically in Python using **python-dotenv**:

In [None]:
# from dotenv import load_dotenv
# load_dotenv()  # Loads variables from .env into the current environment


In [None]:
from google.colab import userdata
import os

os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')
os.environ['GEMINI_API_KEY'] = userdata.get('GEMINI_API_KEY')
os.environ['HF_TOKEN'] = userdata.get('HF_TOKEN')

print("API keys loaded from Secrets Manager.")

API keys loaded from Secrets Manager.


#### 4. Quick verification

In [None]:
import os
import openai
import google.generativeai as genai
# from google.colab import userdata

openai.api_key = os.getenv("OPENAI_API_KEY")
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))

# print(os.getenv("OPENAI_API_KEY"))
# print(os.getenv("GEMINI_API_KEY"))

#

# openai.api_key = userdata.get('OPENAI_API_KEY')
# genai.configure(api_key= userdata.get('GEMINI_API_KEY') )

# Try listing available OpenAI models (show first 3)
print("OpenAI models:", [m.id for m in openai.models.list().data[:3]])

# List available Gemini models that support generateContent
print("\nAvailable Gemini models that support generateContent:")
for m in genai.list_models():
    if 'generateContent' in m.supported_generation_methods:
        print(m.name)

OpenAI models: ['gpt-4-0613', 'gpt-4', 'gpt-3.5-turbo']

Available Gemini models that support generateContent:
models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-002
models/gemini-1.5-pro
models/gemini-1.5-flash-latest
models/gemini-1.5-flash
models/gemini-1.5-flash-002
models/gemini-1.5-flash-8b
models/gemini-1.5-flash-8b-001
models/gemini-1.5-flash-8b-latest
models/gemini-2.5-pro-preview-03-25
models/gemini-2.5-flash-preview-05-20
models/gemini-2.5-flash
models/gemini-2.5-flash-lite-preview-06-17
models/gemini-2.5-pro-preview-05-06
models/gemini-2.5-pro-preview-06-05
models/gemini-2.5-pro
models/gemini-2.0-flash-exp
models/gemini-2.0-flash
models/gemini-2.0-flash-001
models/gemini-2.0-flash-exp-image-generation
models/gemini-2.0-flash-lite-001
models/gemini-2.0-flash-lite
models/gemini-2.0-flash-preview-image-generation
models/gemini-2.0-flash-lite-preview-02-05
models/gemini-2.0-flash-lite-preview
models/gemini-2.0-pro-exp

## Using OAI API

In [None]:
# Exercise: Implement `call_openai`
# Your task is to complete the function below so that it sends a prompt to the OpenAI
# Chat Completion API and returns the assistant's reply as a **string**.
#
# Requirements ✍️
# ------------
# 1. Read your key from the environment variable `OPENAI_API_KEY` (do **not** hard‑code keys).
# 2. Use `openai.ChatCompletion.create` with the given *model* (default: `gpt-4o-mini`).
# 3. The function should take a single user *prompt* and return the model's reply **text only**.
# 4. Raise a clear `EnvironmentError` if the key is missing.
# 5. Let any `openai.error.OpenAIError` exceptions propagate––don’t swallow them.
#
# Example
# -------
# >>> call_openai("Explain why the sky is blue.")
# 'Rayleigh scattering causes shorter (blue) wavelengths ...'
#
# This exercise tests your ability to work with third‑party GenAI REST APIs in Python.
#
# 🚨 Delete the `raise NotImplementedError` line once you add your code.

import os
from openai import OpenAI

def call_openai(prompt: str, model: str = "gpt-4o") -> str:
    """Send *prompt* to the OpenAI Chat Completion API and return the reply text."""

    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        raise EnvironmentError("Missing OPENAI_API_KEY environment variable.")

    client = OpenAI(api_key=api_key)

    # Let OpenAIError propagate naturally if it occurs
    completion = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
    )

    return completion.choices[0].message.content


prompt = "Explain why the sky is blue."
print(call_openai(prompt))



The sky appears blue primarily due to a phenomenon called Rayleigh scattering. This involves the scattering of sunlight by the gases and small particles in the Earth's atmosphere. Here's how it works:

1. **Sunlight Composition**: Sunlight, or white light, is made up of a spectrum of colors, each with different wavelengths. These colors range from violet and blue, which have shorter wavelengths, to red and orange, which have longer wavelengths.

2. **Atmosphere Interaction**: As sunlight enters the Earth's atmosphere, it collides with gas molecules and small particles. Shorter wavelengths of light (blue and violet) are scattered in all directions by these particles much more than the longer wavelengths (red and orange).

3. **Human Perception**: Although violet light is scattered even more than blue light, our eyes are more sensitive to blue light and much less sensitive to violet light. Additionally, some of the violet light is absorbed by the upper atmosphere, which contains ozone, f

## Using Google Gemini API

In [None]:
# Exercise: Implement `call_gemini`
# Your task is to complete the function below so that it sends a prompt to Google’s **Gemini** model
# (via the `google-generativeai` Python SDK) and returns the model's reply as a **string**.
#
# Requirements ✍️
# ------------
# 1. Read your key from the environment variable `GEMINI_API_KEY`.
# 2. Use `google.generativeai.GenerativeModel(model).generate_content` to get a response.
# 3. Accept the same arguments as `call_openai` (prompt, model) and return *text only*.
# 4. Raise `EnvironmentError` if the key is missing.
#
# Example
# -------
# >>> call_gemini("Suggest three sci‑fi book titles set on Mars.")
# '1. Red Horizon ...'
#
# 🚨 Delete the `raise NotImplementedError` once you add your code.

import os
import google.generativeai as genai

def call_gemini(prompt: str, model_name: str = "gemini-2.0-flash-lite-preview") -> str:
    """Send *prompt* to the Google Gemini API and return the reply text."""

    api_key = os.getenv("GEMINI_API_KEY")
    if not api_key:
        raise EnvironmentError("Missing GEMINI_API_KEY environment variable.")

    genai.configure(api_key=os.getenv("GEMINI_API_KEY"))

    model = genai.GenerativeModel(model_name)

    response = model.generate_content(prompt)

    # client = OpenAI(api_key=api_key)

    # # Let OpenAIError propagate naturally if it occurs
    # completion = client.chat.completions.create(
    #     model=model,
    #     messages=[{"role": "user", "content": prompt}],
    # )

    return response.text


prompt = "Suggest three sci‑fi book titles set on Mars."
print(call_gemini(prompt))

1.  **Red Dust Requiem:** This title evokes a sense of loss and mystery, hinting at a potential tragedy or the lingering echoes of a forgotten civilization on Mars.

2.  **The Martian Echo:** This title suggests themes of communication, secrets, and perhaps the discovery of past events or signals left behind by a previous Martian inhabitant.

3.  **Beneath the Crimson Veil:** This title has a more romantic and evocative tone, alluding to hidden wonders and the secrets that lie hidden beneath the Martian surface.



# Open source (Hugging Face)



1. Use **Gemma 3 4B** model.
2. Ensure your Hugging Face access token (`HF_TOKEN`) is set correctly in your environment (e.g., `export HF_TOKEN=YOUR_TOKEN`), or use `huggingface-cli login`.
3. Confirm that the student has **accepted the model terms** for Gemma on Hugging Face before attempting to download or load the model.


## Setup

In this subsection we'll install and configure the dependencies needed to load an **open‑source large language model (LLM)** from **Hugging Face**.

1. We install the latest versions of the `transformers`, `accelerate`, and `bitsandbytes` libraries.
2. If the model you want to use is behind a gated license (e.g. *Llama 2*), create a free Hugging Face account, accept the model license, and export your **HF token** as an environment variable so that `transformers` can download it.


In [None]:
# Install the required libraries (run once)
!pip -q install --upgrade "transformers[torch]" accelerate bitsandbytes --progress-bar off

# Set your Hugging Face access token
import os
# os.environ["HF_TOKEN"] = "hf_YOUR_TOKEN"

## Running Gemma‑3‑4B via `transformers` pipeline

Below we load the **Gemma‑3‑4B** instruction‑tuned checkpoint (`google/gemma-3-4b-it`) and generate a short response.  
Make sure you have accepted the model license on Hugging Face **and** that your `HF_TOKEN` is set (or run `huggingface-cli login`) before running the cell.


**IMPORTANT** If you are using google colab change your notebook's runtime type to include a GPU. Go to "Runtime" -> "Change runtime type" and select "T4 GPU" under "Hardware accelerator"

In [None]:
# # Install and test access to unsloth/gemma-3-4b-it
# !pip install -q huggingface_hub

# from huggingface_hub import login, hf_hub_download

# # Login (you'll be prompted for your token)
# login()

# # Try downloading a file from the model repo
# try:
#     hf_hub_download(repo_id="unsloth/gemma-3-4b-it", filename="README.md")
#     print("✅ Access confirmed: You have accepted the terms.")
# except Exception as e:
#     print("❌ Access denied: You may not have accepted the terms yet.")
#     print(e)


**NOTE:** you can use ***FlashAttention2*** for faster infference. you'll have to install `!pip install flash_attn --no-build-isolation`, update `attn_implementation` and hope that's it :)

In [None]:
# from transformers import (
#     AutoTokenizer,
#     Gemma3ForConditionalGeneration,
#     pipeline,
# )
# import torch

# # Instruction‑tuned Gemma 3 checkpoint
# model_id = "google/gemma-3-4b-it"

# # Check for CUDA availability and set device
# device = "cuda" if torch.cuda.is_available() else "cpu"
# print(f"Using device: {device}")

# # Determine torch_dtype based on device and availability
# torch_dtype = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float16 if torch.cuda.is_available() else "auto"
# print(f"Using torch_dtype: {torch_dtype}")


# # Load tokenizer & model (first run ~8‑10 GB download)
# # Make sure you have accepted the model license on Hugging Face
# # and your HF_TOKEN has permissions to access public gated repositories.
# tokenizer = AutoTokenizer.from_pretrained(model_id, token=True)
# model = Gemma3ForConditionalGeneration.from_pretrained(
#     model_id,
#     attn_implementation="eager",             # Fallback to eager attention / You can try "flash_attention_2" if you have a GPU with flash attention support
#     device_map=device,                       # Explicitly set device
#     torch_dtype=torch_dtype,                 # use bf16/fp16 if available
#     token=True                               # pick up HF_TOKEN env var
# )

# # Build a text‑generation pipeline
# generator = pipeline(
#     "text-generation",
#     model=model,
#     tokenizer=tokenizer,
#     max_new_tokens=128,
#     do_sample=True,
#     temperature=0.7,
# )

# prompt = "Explain the difference between supervised and unsupervised learning in two sentences."
# response = generator(prompt)[0]["generated_text"]
# print("\n--- Generated Response ---\n")
# print(response)

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

model_id = "unsloth/gemma-3-4b-it"

device = "cuda" if torch.cuda.is_available() else "cpu"
torch_dtype = (
    torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported()
    else torch.float16 if torch.cuda.is_available()
    else "auto"
)

# Load tokenizer & model
tokenizer = AutoTokenizer.from_pretrained(model_id, token=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    attn_implementation="eager",
    device_map=device,
    torch_dtype=torch_dtype,
    token=True
)

generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=128,
    do_sample=True,
    temperature=0.7,
)

prompt = "Explain the difference between supervised and unsupervised learning in two sentences."
response = generator(prompt)[0]["generated_text"]
print("\n--- Generated Response ---\n")
print(response)


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/4.69M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/33.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/35.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/670 [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.64G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.96G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/210 [00:00<?, ?B/s]

Device set to use cuda



--- Generated Response ---

Explain the difference between supervised and unsupervised learning in two sentences.
**Supervised learning** uses labeled data to train a model to predict outcomes, while **unsupervised learning** uses unlabeled data to discover hidden patterns and structures within the data.  Essentially, in supervised learning you're teaching the model with answers, whereas in unsupervised learning you're letting the model explore on its own.


## Comparing Output Quality: Gemma 3 4B vs. Gemini / GPT‑4o

In this section you will **run the *same prompt*** through three different models and compare their outputs:

1. **Gemma 3 4B** – use the `generator` pipeline you built earlier (`google/gemma-3-4b-it` checkpoint).  
2. **Gemini 1.5 Pro (call_gemini)** – use your `call_gemini(prompt)` helper.  
3. **GPT‑4o (call_openai)** – use your `call_openai(prompt)` helper.

### What to do
* Pick a single prompt of your choice (keep it identical for all three calls).  
* Capture each model’s response.  
* In a short paragraph, compare the outputs along these axes: **factual accuracy / hallucinations, completeness, style / tone, and overall usefulness**.  
* Optionally repeat with a *second* prompt that stresses reasoning or creativity to see whether conclusions change.

**Fill in the code cell below and then write your reflection in the provided markdown cell.**


In [None]:
# ▶️ Compare outputs from Gemma 3‑4B, Gemini, and GPT‑4o
prompt = "Explain the most important differences between classical mechanics and quantum mechanics in two paragraphs."

# Gemma (pipeline defined earlier)
gemma_out = generator(prompt)[0]["generated_text"]

# Gemini (helper function you implemented)
gemini_out = call_gemini(prompt)

# GPT‑4o (helper function you implemented)
gpt4o_out = call_openai(prompt)

print("\n--- Gemma 3‑4B ---\n", gemma_out)
print("\n--- Gemini ---\n", gemini_out)
print("\n--- GPT‑4o ---\n", gpt4o_out)



--- Gemma 3‑4B ---
 Explain the most important differences between classical mechanics and quantum mechanics in two paragraphs.

Classical mechanics, which governs the motion of everyday objects, assumes that physical quantities like position and momentum are continuous and precisely measurable. It's based on deterministic laws, meaning that if you know the initial conditions of a system, you can predict its future state with certainty. Furthermore, objects have definite properties at all times, regardless of whether they are being observed. In contrast, quantum mechanics deals with the behavior of matter and energy at the atomic and subatomic levels. It reveals that physical quantities are often quantized, meaning they can only take on discrete values, and that the act of observation fundamentally affects the system being observed – a concept known

--- Gemini ---
 Classical mechanics describes the motion of macroscopic objects like planets and baseballs. It assumes that objects have

# Try out a GenAI‑Based Product



Choose **one** generative‑AI product (for example:  
* **Cursor** – an AI‑powered code editor,  
* **GitHub Copilot** – an AI pair‑programmer,  
* **Perplexity AI** – an answer‑engine / research assistant,  
* **Claude**, **Notion AI**, **Llama Index Lobe**,
or any other tool that interests you).  

Spend **10–15 minutes** exploring its features. Then:

1. Briefly describe the *task or workflow* you tried.  
2. List **two things you liked** and **two things you wish were better**.  
3. Reflect on how underlying model capabilities (e.g., context window, fine‑tuning, retrieval, chain‑of‑thought) manifest in the product experience.  
4. (Optional) Include a screenshot or snippet illustrating something interesting you observed.

Use the markdown cell below to record your reflections.


I have used Cursor, I asked the agent to generate a simple Trivias task, It has explained to me what I need to do and created the code. The deployment proces was not straight forward, took a long time and finally I have managged to run it on the web but not mobile (I got tired). I liked: 1. conversation felt smooth, instruction where clear and easy to follow.  2. Some automatic actions of the agent in changing code. I did not like: 1. History got deleted at some point when I have opened files explorer (I think), 2. The instruction didn't allways fit and I ended up not running on mobile. Chain-of-thought: only 1 or 2 times, retrieval: not sure, it might have, fine-tuning: I believe it was fine tuned for code, contex window - included chat history and the code.

# What GenAI products does not exist

## Thinking Exercise – Design a New GenAI Product

Invent **one GenAI‑powered product that does not yet exist** but could solve a real problem. Write **100‑150 words** addressing the prompts below.

**Prompts**
1. **Product name & one‑line pitch**  
2. **Target users / market segment**  
3. **Pain point / business need** – why current solutions are insufficient  
4. **User flow & where GenAI fits** – describe the key interactions  
5. **Technical or ethical challenges** – data, safety, IP, bias, etc.  
6. **MVP success metric** – what would you measure to confirm traction  


מדובר באפליקציה עבור שחקנים שעוזרת להם ללמוד טקסט בעל פה וכן לצלם אודישן.
המשתמש מעלה קובץ טקסט עם סצנה והאפליקציה מייצרת קבצי אודיו שבהם השחקנים האחרים בסצנה מדברים בקולות המתאימים לאיפיוני הדמויות.
כשלוחצים אקשן האפליקציה תשמיע את קולות השחקנים, כשמגיעים לקול של המשתמש יהיה שקט, האפליקציה צעקוב אחרי דברי המשתמש ותדע מתי סיים להגיד את הטקסט שלו ואז תמשיך בסצנה.

יש נה אפשרות לטלפרומפטר ואפשרות הקלטת וידאו

---

**1. Product Name & One-Line Pitch**
**RehearseAI** – אפליקציית GenAI לשחקנים לחזרות ואודישנים עם דמויות קוליות חכמות והקלטת וידאו מקצועית.

**2. Target Users / Market Segment**
שחקנים מקצועיים וחובבים, תלמידי משחק, אולפני ליהוק ובתי ספר לאומנויות הבמה.

**3. Pain Point / Business Need**
קשה למצוא פרטנרים לחזרות, וזמן ההכנה לאודישנים מוגבל. פתרונות קיימים (קריאת טקסט לבד, אפליקציות טלפרומפטר) לא יוצרים סימולציה אמיתית של סצנה.

**4. User Flow & Where GenAI Fits**
המשתמש מעלה תסריט → בוחר דמויות וסגנונות קול → GenAI יוצר אודיו מותאם → המשתמש מתרגל תוך כדי הקלטה, עם טלפרומפטר אופציונלי → GenAI מזהה סיום משפט וממשיך → ניתן לשמור ולשלוח וידאו.

**5. Technical or Ethical Challenges**
שמירה על זכויות יוצרים של טקסטים; מניעת שימוש לרעה בקולות מדומים; דיוק בזיהוי דיבור בזמן אמת.

**6. MVP Success Metric**
מספר משתמשים שמסיימים תרגול ומקליטים סצנה מלאה באפליקציה.


### ✍️ Your concept

**1. Product name & pitch**:  

**2. Target users**:  

**3. Pain point**:  

**4. User flow**:  

**5. Challenges**:  

**6. Success metric**:  
