In [1]:
# DS776 Auto-Update (runs in ~2 seconds, only updates when needed)
# If this cell fails, see Lessons/Course_Tools/AUTO_UPDATE_SYSTEM.md for help
%run ../Course_Tools/auto_update_introdl.py

✅ introdl v1.6.21 already up to date


### DO THIS FIRST

Change `force_update=True` in the last line and run the next cell to install an updated course package.  Once it's done restart your kernel and change back to `force_update=False`.  You only need to do this once per server (not once per notebook).

#### L07_1_Getting_Started_with_NLP Video

<iframe 
    src="https://media.uwex.edu/content/ds/ds776/ds776_l07_1_getting_started_with_nlp/" 
    width="800" 
    height="450" 
    style="border: 5px solid cyan;"  
    allowfullscreen>
</iframe>
<br>
<a href="https://media.uwex.edu/content/ds/ds776/ds776_l07_1_getting_started_with_nlp/" target="_blank">Open UWEX version of video in new tab</a>
<br>
<a href="https://share.descript.com/view/oDi5d1FbYBx" target="_blank">Open Descript version of video in new tab</a>

## A Tiny History of Natural Language Processing

Natural Language Processing (NLP) has evolved significantly over the past few decades. Initially, NLP relied heavily on rule-based systems and statistical methods to understand and generate human language. These early approaches, prominent in the 1980s and 1990s, focused on the syntactic structure of text, using techniques such as n-grams and Hidden Markov Models (HMMs) to model language. However, these methods struggled with capturing the semantic meaning and context of words.

The introduction of word embeddings in the early 2010s, such as Word2Vec and GloVe, marked a significant advancement in NLP. These embeddings allowed for the representation of words in continuous vector space, capturing semantic relationships between words. This shift enabled more sophisticated models, such as Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks, to process sequences of text and maintain context over longer passages. RNNs, in particular, played a crucial role in tasks like language translation and sentiment analysis.

The advent of transformers in 2017 revolutionized NLP by addressing the limitations of RNNs. Transformers, introduced with the Attention is All You Need paper, utilize self-attention mechanisms to process entire sequences of text simultaneously, allowing for better handling of long-range dependencies and parallelization. This led to the development of powerful models like BERT, GPT, and T5, which have set new benchmarks in various NLP tasks by providing a deeper semantic understanding of text.

Transformers have almost entirely supplanted previous approaches to NLP because:

1. **Superior Performance:** Models like BERT, GPT, T5, and their successors dominate leaderboards on tasks such as text classification, translation, summarization, and question answering.
2. **Pretraining and Transfer Learning:** Unlike traditional methods that required training separate models from scratch for different tasks, transformers leverage large-scale pretraining on vast text corpora and fine-tune efficiently on specific tasks.
3. **Self-Attention and Contextual Representations:** Transformers provide rich, context-dependent word representations, whereas earlier models like Word2Vec and GloVe generated static embeddings.
4. **Scalability and Adaptability:** With advancements in scaling laws, models can achieve better performance just by increasing their size and training data, an advantage that RNNs and classical machine learning approaches lacked.

There are a few areas where older approaches still exist:

1. **Small Datasets & Low Compute Environments:** Logistic regression, SVMs, and Lasso-penalized models often remain competitive when data is limited or when computational efficiency is a concern.
2. **Domain-Specific Applications:** Some applications, like biomedical text mining, may still rely on domain-specific feature engineering approaches alongside transformers.
3. **Traditional ML for Interpretability:** Some NLP applications in finance, healthcare, and legal fields still favor older methods due to the need for interpretability and robustness.

However, since transformer models for NLP are now so dominant we will focus exclusively on them in this class.

## NLP Tasks Instead of Transformer Details

Transformers are more complicated than the CNNs we saw for computer vision so we're not going to dive as deeply into the details. We will, in Lesson 9 - Transformer Details, learn about some of the nuts and bolts especially the self-attention mechanism that allows transformers to figure out relationships between words and to understand context. Mostly, though, we will focus on the applications of transformers. To this end we'll dive into the open source HuggingFace ecosystem which hosts thousands of NLP models and datasets and makes it quite simple to dive into NLP applications without having to master too much code. All of the newest, biggest open source transformer models are hosted there including those from Meta, Mistral, and Deepseek. The only thing keeping us from running the biggest state-of-the-art models will be lack of compute, but we can run their smaller cousins on the GPU in CoCalc's compute server, a decent gaming GPU, or even a CPU.

## API-based LLMs versus Fine-tuning Specialized Models

As large language models (LLMs) continue to improve, their use as general NLP task solvers via prompting is increasingly popular, especially when we don't have access to large amounts of training data. In this course, we'll focus on two main approaches to solving NLP tasks:

1. **Using LLMs via APIs** (like GPT-4o, Claude, or Gemini through OpenRouter)
2. **Fine-tuning specialized transformer models** for specific tasks

*(We'll also explore running LLMs locally in Lesson 11, but for Lessons 7-10 and 12 we'll use API-based models and task-specific fine-tuned models.)*

### Example: Text Classification

For a text-classification task, you could choose:

**LLM via API (GPT-4o, Claude, Gemini, etc.)**
- When you need **a quick, general-purpose classifier** without training a model
- When **zero-shot or few-shot classification** (via prompting) is sufficient
- When categories may evolve frequently, making retraining impractical
- When you don't have a large labeled dataset
- Example: Categorizing support tickets by topic

**Fine-tune BERT / RoBERTa / DistilBERT**
- When you have a **moderate to large labeled dataset** and need **high accuracy**
- When you need **fast inference at scale**, as fine-tuned models are more efficient than large LLMs
- When your classification task requires **domain-specific adaptation**
- When you need **very low latency** or **predictable costs**
- Example: Sentiment analysis on customer feedback in a specific industry

**Note on terminology:** Zero-shot classification means classifying text without seeing any examples - the LLM just gets a prompt with the possible categories. Few-shot classification means providing a small number of examples in the LLM prompt to guide the model's behavior.

### Choosing the Right Approach

**Use API-based LLMs when:**
- You need **quick, adaptable solutions** without training infrastructure
- You **don't have much labeled data** for fine-tuning
- You want to **experiment rapidly** with different task formulations
- Task requirements may change frequently
- You're prototyping or building proof-of-concepts

**Fine-tune a specialized model when:**
- You have **domain-specific labeled data** and need **high accuracy**
- You need **very fast inference** or processing at large scale
- You need **predictable costs** (no per-token API charges)
- You require **consistent, structured outputs**
- Latency is critical (milliseconds matter)

### Understanding Data Privacy with API-based LLMs

A common concern with API-based LLMs is: **"Will my data be used to train the model?"** or **"Is my sensitive data secure?"** The answer depends on the provider and the agreements in place.

**Privacy Protections Available:**

Most major LLM providers now offer enterprise-grade privacy protections:
- **Zero Data Retention (ZDR):** Your API requests are not stored or logged after processing
- **Data Processing Agreements (DPAs):** Legal contracts preventing use of your data for model training
- **HIPAA and SOC 2 Compliance:** Meeting healthcare and security standards for regulated industries
- **Private Deployments:** Dedicated instances in your own cloud environment (e.g., Azure OpenAI, AWS Bedrock)
- **Regional Data Residency:** Keep data within specific geographic boundaries (e.g., EU-only processing)

**Examples:**
- **OpenAI API:** Has a default policy not to use API data for training. Enterprise customers can enable additional protections.
- **Azure OpenAI Service:** Fully isolated deployments in your Azure subscription with complete data control
- **Google Vertex AI:** Private endpoints with data residency controls and enterprise security
- **Anthropic Claude:** API data not used for training; enterprise options for additional controls

**When API Privacy May Not Be Enough:**

Even with these protections, there are situations where API-based solutions may not be acceptable:
- **Air-gapped environments:** Systems physically isolated from external networks (e.g., classified government systems)
- **Extreme regulatory restrictions:** Some industries may prohibit any external data transmission regardless of agreements
- **Zero-trust requirements:** Organizations that cannot accept any third-party processing, even contractually protected
- **Competitive intelligence:** Proprietary algorithms or trade secrets that cannot be exposed, even with DPAs

In these cases, running models locally (Lesson 11) or fine-tuning your own specialized models on internal infrastructure becomes necessary.

**Bottom Line:** For most educational, research, and business applications, modern API providers offer sufficient privacy protections through contractual agreements and technical controls. Understanding your specific regulatory requirements and risk tolerance will guide your choice.

### Course Approach (with changes for Fall 2025)

For each NLP task in Lessons 7-10 and 12, we'll explore both API-based LLM approaches and fine-tuned specialized models. In Lesson 11, we'll dive deeper into text generation and demonstrate running LLMs locally for complete control and privacy.

*You may notice some slight differences between the lesson notebooks and some of the recorded videos in the NLP portion of the class.  When the course was developed we ran very small LLMs locally on the compute servers in CoCalc.  It was slow and we were limited to very small models.  This semester we're providing some credits on OpenRouter to allow you to use various models (even the biggest, newest ones ... but you may need to pay for more credits).  We'll explain more about OpenRouter further below. First we want to remind you to get a Hugging Face token.*   

## Hugging Face Token Setup

Before we get into using LLMs via APIs, you'll need to set up access to **Hugging Face**, which hosts thousands of transformer models and datasets. Many models and datasets require authentication to download, so you'll need a free Hugging Face token.

### Why You Need a Hugging Face Token

Hugging Face requires authentication for:
- **Gated models** - Popular models like Meta's Llama require accepting terms before downloading
- **Datasets** - Some datasets have usage agreements or privacy controls
- **Upload/sharing** - If you want to share your fine-tuned models
- **Rate limiting** - Authenticated users get higher rate limits for downloads

The token is **completely free** - you just need to create an account.

### Step 1: Create a Hugging Face Account

1. Go to [https://huggingface.co](https://huggingface.co)
2. Click "Sign Up" in the top right corner
3. Create your account (you can use your university email or personal email)

### Step 2: Generate an Access Token

1. Once logged in, click your profile picture in the top right
2. Select **"Access Tokens"** from the dropdown menu.
3. Click **"Create new token"** button 
4. Choose token-type **"Read"**
5. Give your token a name (e.g., "DS776 Course Token")
7. Click **"Create token"**
8. **Copy the token** - you won't be able to see it again (but you could create another token)

### Step 3: Add Token to Your `api_keys.env` File

Your `api_keys.env` file is located at `~/home_workspace/api_keys.env`. Open this file and add a new line:

```
HF_TOKEN=hf_YourTokenHere
```

Replace `hf_YourTokenHere` with the token you copied from Hugging Face.

**Important Notes:**
- The token format starts with `hf_` followed by random characters
- Don't include quotes or spaces around the token
- Save the file after adding the token
- Never commit this file to git or share it publicly

### Step 4: Verify It Works

When you run `config_paths_keys()` (in the import cell below), you should see:

```
✅ HuggingFace Hub: Logged in
```

This confirms your token was loaded successfully and you can now access gated models and datasets.

### Troubleshooting

If you see `❌ HuggingFace Hub: Not logged in`, check:
- Did you save the `api_keys.env` file after adding the token?
- Is the token on a new line with format `HF_TOKEN=hf_...`?
- Did you copy the complete token (starts with `hf_`)?
- Try restarting your kernel and running the import cell again

## OpenRouter API and Your Course API Keys

For this course, we've set up access to LLMs through **OpenRouter**, a unified API that provides access to all major commercial models (GPT-4o, Claude, Gemini) and most open-weight models (Llama, Mistral, DeepSeek, Qwen, and many more). This means you can experiment with different models using a single API interface.

### Your API Credit

**Each student has been provided with $15 in OpenRouter API credit.** This should be more than sufficient to complete all coursework if you use small and medium-sized models appropriately. For reference:

- **Small models** (like `gemini-flash-lite`, `llama-3.2-3b`, `gpt-4o-mini`): Very inexpensive, typically $0.075-0.15 per million input tokens
- **Medium models** (like `gemini-flash`, `claude-haiku`): Moderate cost, good quality
- **Premium models** (like `gpt-4o`, `claude-sonnet`, `o3-mini`): Higher cost, best quality

We recommend using **`gemini-flash-lite`** as your default model for coursework - it's fast, inexpensive, and produces good results for learning tasks.

If you want to experiment beyond the course assignments or try premium models, you can always purchase your own OpenRouter API key and load it with whatever credit you choose.

### Checking Your Remaining Credit

You can check your remaining OpenRouter credit using the `llm_get_credits()` function from the course package. This will show you how much of your $15 credit remains:

```python
from introdl.nlp import llm_get_credits

credits = llm_get_credits()
print(f"Remaining credit: ${credits['usage']:.2f} of ${credits['limit']:.2f}")
print(f"Credit remaining: ${credits['limit'] - credits['usage']:.2f}")
```

### Your API Keys Are Already Configured

Your OpenRouter API key has already been distributed to your CoCalc project and is stored in:
```
~/home_workspace/api_keys.env
```

When you run `config_paths_keys()` in your import cell (as shown below), this API key will be automatically loaded and available for use with `llm_generate()`. You don't need to do anything else!

**Security Note:** Never commit your `api_keys.env` file to git or share it publicly. The file is stored in `home_workspace`.

### Exploring Available Models

The course package includes 16 carefully curated models covering a range of capabilities and price points. You can see them all with `llm_list_models()`, which we'll demonstrate shortly.

**Want to try models beyond our curated list?** OpenRouter provides access to hundreds of models! You can:

1. **Browse all available models** at: https://openrouter.ai/models
2. **Use any model** by providing its full OpenRouter model ID

For example, to use OpenAI's new GPT-5-nano model (not in our curated list), you would use:

```python
response = llm_generate('openai/gpt-5-nano', "Your prompt here")
```

The full model ID format is typically `provider/model-name` (e.g., `openai/gpt-5-nano`, `anthropic/claude-opus-4.1`, `google/gemini-2.5-pro`).

**Note:** Models outside our curated list won't show pricing or metadata with `llm_list_models()`, but they'll work fine if you provide the correct model ID from the OpenRouter website.

## Using `llm_generate` with OpenRouter

The course package provides a simple, unified interface for working with LLMs through the `llm_generate()` function. This function handles all the complexity of API calls, cost tracking, and response formatting.  (Don't worry, we'll dive into some of those details in Lesson 11.)

### Setting Up and Checking Your Credit

First, let's import the necessary functions, configure our environment, and check your OpenRouter credit balance:

In [2]:
from introdl import (
    config_paths_keys, wrap_print_text,
    llm_generate, llm_list_models, llm_get_credits,
    display_markdown, show_session_spending
)

# Configure paths and load API keys (cost tracking initialized automatically)
paths = config_paths_keys()

# Wrap print to format text nicely at 120 characters
print = wrap_print_text(print, width=120)

# Check your OpenRouter credit balance
credits = llm_get_credits()
print(f"OpenRouter Credit Status:")
print(f"  Total limit: ${credits['limit']:.2f}")
print(f"  Used so far: ${credits['usage']:.2f}")
print(f"  Remaining:   ${credits['limit'] - credits['usage']:.2f}")

✅ Environment: Unknown Environment | Course root: /mnt/e/GDrive_baggett.jeff/Teaching/Classes_current/2025-2026_Fall_DS776/DS776
   Using workspace: <DS776_ROOT_DIR>/home_workspace

📂 Storage Configuration:
   DATA_PATH: <DS776_ROOT_DIR>/home_workspace/data
   MODELS_PATH: <DS776_ROOT_DIR>/Lessons/Lesson_07_Transformers_Intro/Lesson_07_Models (local to this notebook)
   CACHE_PATH: <DS776_ROOT_DIR>/home_workspace/downloads
🔑 API keys: 9 loaded from home_workspace/api_keys.env
🔐 Available: ANTHROPIC_API_KEY, GEMINI_API_KEY, GOOGLE_API_KEY... (9 total)
✅ HuggingFace Hub: Logged in
✅ Loaded pricing for 330 OpenRouter models
✅ Cost tracking initialized ($9.92 credit remaining)
📦 introdl v1.6.21 ready

OpenRouter Credit Status:
  Total limit: $9.92
  Used so far: $0.02
  Remaining:   $9.91


### Simple Example

Now let's try a simple text generation example. The new `llm_generate()` API is very straightforward:

In [3]:
# Simple text generation
response = llm_generate('gemini-flash-lite', "What is the capital of France?")
print(response)

The capital of France is **Paris**.


### Using System Prompts

System prompts help guide the model's behavior and tone:

In [4]:
system_prompt = "You are a helpful AI assistant who is also sarcastic and talks like a pirate."

response = llm_generate(
    'gemini-flash-lite',
    "Tell me three interesting facts about space.",
    system_prompt=system_prompt
)

print(response)

Ahoy there, matey! Ye want to know some scurvy facts about the vast, dark sea of space, do ye? Well, buckle yer
bootstraps, 'cause ol' Captain AI has a few treasures to share:

1.  **There be more stars in the universe than grains of sand on all the beaches of Earth.** Aye, ye heard that right!
Imagine every single sandy shore ye've ever trod upon, then multiply that by… well, a number so big it'd make a kraken
weep. That's how many stars be out there, glintin' like lost doubloons in the cosmic abyss. Makes ye feel like a tiny
barnacle on a colossal galleon, don't it?

2.  **A day on Venus is longer than its year.** Now, this be a real head-scratcher, even for a seasoned navigator. Venus
spins slower than a drunken sailor trying to swab the deck. It takes about 243 Earth days to do one full rotation
(that's its "day"), but it only takes about 225 Earth days to zip around the Sun (that's its "year"). So, it's
technically older *each day* than it is *each year*. Makes ye wonder if they e

### Displaying Markdown Output

Many LLM responses use markdown formatting. You can display them nicely using `display_markdown()`:

In [5]:
response = llm_generate(
    'gemini-flash-lite',
    "Write a short bullet-point list of tips for learning machine learning."
)

display_markdown(response)

Here's a short bullet-point list of tips for learning machine learning:

*   **Master the Fundamentals:** Solidify your understanding of linear algebra, calculus, probability, and statistics. These are the bedrock of ML.
*   **Learn a Programming Language:** Python is the de facto standard due to its extensive libraries (NumPy, Pandas, Scikit-learn, TensorFlow, PyTorch).
*   **Start with Core Concepts:** Understand supervised vs. unsupervised learning, regression vs. classification, and common algorithms like linear regression, logistic regression, decision trees, and k-means.
*   **Get Hands-On with Libraries:** Practice implementing algorithms and working with data using libraries like Scikit-learn.
*   **Work with Real Datasets:** Apply your knowledge to publicly available datasets (Kaggle, UCI Machine Learning Repository) to gain practical experience.
*   **Understand Evaluation Metrics:** Learn how to properly assess the performance of your models (accuracy, precision, recall, F1-score, RMSE, etc.).
*   **Explore Deep Learning:** Once comfortable with traditional ML, dive into neural networks, CNNs, RNNs, and frameworks like TensorFlow and PyTorch.
*   **Read and Understand Research Papers:** Stay updated with new developments and gain deeper insights by reading influential ML papers.
*   **Join a Community:** Engage with other learners and practitioners online (forums, Discord, Stack Overflow) or in person.
*   **Build Projects:** The best way to learn is by doing. Work on personal projects that interest you to solidify your understanding.
*   **Be Patient and Persistent:** Machine learning has a steep learning curve. Don't get discouraged by challenges; keep practicing and learning.

### Tracking Costs

You can see estimated costs for your API calls:

In [6]:
response = llm_generate(
    'gemini-flash-lite',
    "Tell me five dad jokes.",
    print_cost=True
)

print(response)

💰 Cost: $0.000049 | Tokens: 13 in / 120 out | Model: google/gemini-2.5-flash-lite
Here are five dad jokes for you:

1.  Why don't scientists trust atoms?
    Because they make up everything!

2.  What do you call a fish with no eyes?
    Fsh!

3.  I'm reading a book about anti-gravity.
    It's impossible to put down!

4.  Did you hear about the restaurant on the moon?
    I heard the food was good, but it had no atmosphere.

5.  What's orange and sounds like a parrot?
    A carrot!


### Controlling Output Length

By default, the output of `llm_generate` is limited to 200 tokens.  You can control how much text the model generates:

In [7]:
# Longer response
response = llm_generate(
    'gemini-flash-lite',
    "Write a short story about a cat who learns to play the piano.",
    max_tokens=500
)

display_markdown(response)

Whiskers twitched, a low rumble vibrated in Bartholomew’s chest. The object of his intense focus was a monstrous, ebony beast that occupied a significant portion of the living room. It was the piano, a source of both fascination and mild irritation for Bartholomew. His human, Eleanor, would spend hours coaxing strange, sometimes beautiful, sometimes jarring sounds from its depths.

Bartholomew, a sleek, black cat with eyes the color of polished emeralds, had always been an observer. He’d watched Eleanor’s fingers dance across the keys, the way her brow furrowed in concentration, the triumphant smile that bloomed when a particularly tricky passage finally flowed. He was particularly drawn to the high notes, the ones that shimmered and seemed to hang in the air like tantalizing dust motes.

One afternoon, Eleanor left the piano lid ajar. Bartholomew, emboldened by her absence, leaped onto the bench. The keys, cool and smooth beneath his paws, beckoned. With a tentative step, he pressed down on a single key. A clear, resonant C note echoed through the room. Bartholomew’s ears perked. His tail gave an inquisitive flick. He tried again, deliberately stepping on another key. A dissonant E followed.

This was… interesting.

Over the next few weeks, Bartholomew’s secret piano lessons began. When Eleanor was out, he’d hop onto the bench. At first, it was pure accident – a clumsy paw landing on a cluster of keys, producing a cacophony that would have sent any self-respecting cat scurrying. But Bartholomew was a cat of discerning taste, and he quickly learned to associate certain paw placements with particular sounds.

He started with the lower register, his paws too large to hit individual notes cleanly, but he discovered the satisfying thrum of chords. Then, he moved to the higher keys, his delicate toes capable of more precision. He’d sit, utterly absorbed, his emerald eyes scanning the keys, his head tilted as if deciphering an ancient code.

Eleanor, meanwhile, began to notice peculiar things. Sometimes, she’d come home to find the piano lid slightly ajar when she was certain she’d closed it. More than once, she’d heard faint, almost melodic tinkling sounds from the living room, only to find Bartholomew asleep on the rug, looking angelic and utterly innocent. She’d chalked

### Viewing Available Models

We provide a curated list of models which are suitable for use in this class.  They're typically small to medium sized models that are good at following instructions.  You can see this curated list with details about each model: 

In [8]:
# Get model information dictionary
models = llm_list_models()

# The function displays the table and returns a dictionary with model details


Available OpenRouter Models:



Unnamed: 0,Short Name,Size,Released,In/M,Out/M,JSON Schema,Open Weights
0,claude-haiku,~30-50B,2024-10,$0.80,$4.00,❌,❌
1,deepseek-v3.1,37B×18E,2025-08,$0.20,$0.80,✅,✅
2,gemini-flash,~20B,2025-04,$0.30,$2.50,✅,❌
3,gemini-flash-lite,~5B,2025-09,$0.10,$0.40,✅,❌
4,gemma-3-12b,12B,2025-03,$0.03,$0.10,✅,✅
5,gemma-3-27b,27B,2025-03,$0.09,$0.16,✅,✅
6,gpt-4o-mini,~8B,2024-07,$0.15,$0.60,✅,❌
7,gpt-oss-120b,5.1B×23E,2025-08,$0.04,$0.40,❌,✅
8,gpt-oss-20b,3.6B×6E,2025-08,$0.03,$0.14,❌,✅
9,llama-3.2-1b,1B,2024-09,$0.01,$0.01,❌,✅



Default model: gemini-flash-lite
Size format: Dense models show total params (e.g., '70B'), MoE models show active×experts (e.g., '17B×128E')
JSON Schema = User-defined JSON schemas supported

You can also use any OpenRouter model by its full ID (e.g., 'openai/gpt-4o')


In [9]:
# Examples of using the models dictionary

# Example 1: Look up information about a specific model
gemini_info = models['gemini-flash-lite']
print("Example 1: Looking up a specific model")
print(f"Model: gemini-flash-lite")
print(f"  Full ID: {gemini_info['model_id']}")
print(f"  Provider: {gemini_info['provider']}")
print(f"  Input cost: ${gemini_info['cost_in_per_m']:.2f} per million tokens")
print(f"  Output cost: ${gemini_info['cost_out_per_m']:.2f} per million tokens")
print(f"  JSON schema support: {gemini_info['json_schema']}")
print()

# Example 2: Find the cheapest models (input tokens < $0.15/M)
cheap_models = {name: info for name, info in models.items() 
                if info['cost_in_per_m'] < 0.15}
print(f"Example 2: Cheapest models (< $0.15/M input)")
print(f"  Found {len(cheap_models)} models:")
for name in list(cheap_models.keys())[:5]:  # Show first 5
    cost = cheap_models[name]['cost_in_per_m']
    print(f"    - {name} (${cost:.2f}/M in)")
print()

# Example 3: Find models with JSON schema support
json_capable = [name for name, info in models.items() if info['json_schema']]
print(f"Example 3: Models with JSON schema support")
print(f"  Found {len(json_capable)} models with JSON schema support:")
print(f"  {', '.join(json_capable[:6])}...")  # Show first 6
print()

# Example 4: Compare costs between different models
print("Example 4: Cost comparison for generating 1M input tokens + 1M output tokens")
for model_name in ['gemini-flash-lite', 'llama-3.2-3b', 'claude-haiku', 'gpt-4o-mini']:
    info = models[model_name]
    total_cost = info['cost_in_per_m'] + info['cost_out_per_m']
    print(f"  {model_name:20} ${total_cost:6.2f}")

Example 1: Looking up a specific model
Model: gemini-flash-lite
  Full ID: google/gemini-2.5-flash-lite
  Provider: google
  Input cost: $0.10 per million tokens
  Output cost: $0.40 per million tokens
  JSON schema support: True

Example 2: Cheapest models (< $0.15/M input)
  Found 10 models:
    - gemini-flash-lite ($0.10/M in)
    - gemma-3-12b ($0.03/M in)
    - gemma-3-27b ($0.09/M in)
    - gpt-oss-120b ($0.04/M in)
    - gpt-oss-20b ($0.03/M in)

Example 3: Models with JSON schema support
  Found 11 models with JSON schema support:
  deepseek-v3.1, gemini-flash, gemini-flash-lite, gemma-3-12b, gemma-3-27b, gpt-4o-mini...

Example 4: Cost comparison for generating 1M input tokens + 1M output tokens
  gemini-flash-lite    $  0.50
  llama-3.2-3b         $  0.04
  claude-haiku         $  4.80
  gpt-4o-mini          $  0.75


### Trying Different Models

It's easy to compare different models:

In [10]:
prompt = "Explain quantum computing in one sentence."

# Try a small model
print("A small model (llama-3.2-3b):")
response1 = llm_generate('llama-3.2-3b', prompt)
print(response1)
print("\n" + "="*60 + "\n")

# Try the recommended model
print("Recommended model (gemini-flash-lite):")
response2 = llm_generate('gemini-flash-lite', prompt, estimate_cost=True)
print(response2)
print("\n" + "="*60 + "\n")

# A larger model
print("A larger model (mistral-medium):")
response3 = llm_generate('mistral-medium', prompt, estimate_cost=True)
print(response3)

A small model (llama-3.2-3b):
Quantum computing is a new type of computing that uses the principles of quantum mechanics to perform calculations that
are exponentially faster and more powerful than those of classical computers, exploiting the unique properties of
quantum-mechanical phenomena such as superposition, entanglement, and interference.


Recommended model (gemini-flash-lite):
💰 Cost: $0.000010 | Tokens: 14 in / 21 out | Model: google/gemini-2.5-flash-lite
Quantum computing leverages quantum mechanical phenomena like superposition and entanglement to perform calculations
that are intractable for classical computers.


A larger model (mistral-medium):
💰 Cost: $0.000122 | Tokens: 20 in / 57 out | Model: mistralai/mistral-medium-3
Quantum computing is a type of computation that uses quantum bits, or qubits, which can exist in multiple states at once
due to the principles of quantum superposition and entanglement, allowing for complex calculations to be performed much
faster than 

### Using Models Outside the Curated List

You can use any model from [OpenRouter's model list](https://openrouter.ai/models) by providing the full model ID. For example, let's try OpenAI's GPT-5-nano (which isn't in our curated list):

In [11]:
# Use the full OpenRouter model ID
response = llm_generate(
    'z-ai/glm-4.5-air',  # Full model ID from openrouter.ai/models
    "What are three benefits of learning Python?",
    estimate_cost=True
)

print(response)

💰 Cost: $0.000142 | Tokens: 26 in / 161 out | Model: z-ai/glm-4.5-air
Here are three key benefits of learning Python:

1. **Versatility**: Python is a multi-purpose programming language used in various fields including web development,
data science, artificial intelligence, automation, and more. This versatility means you can apply Python skills across
different domains and projects.

2. **Beginner-Friendly**: Python has a simple, readable syntax that resembles plain English, making it one of the
easiest programming languages for beginners to learn. Its straightforward nature allows newcomers to focus on
programming concepts rather than complex syntax.

3. **Strong Job Market and Community Support**: Python consistently ranks among the most in-demand programming skills in
the job market, particularly in data science and machine learning. Additionally, Python has a vast global community that
provides extensive documentation, tutorials, and support through forums like Stack Overflow and 

## Processing Multiple Prompts

You can process multiple prompts at once by passing a list of strings. This is useful for batch processing tasks like sentiment analysis or classification.

### Simple Batch Example

In [12]:
prompts = [
    'What is the capital of France?',
    'What is the capital of Germany?',
    'What is the capital of Italy?'
]

responses = llm_generate('gemini-flash-lite', prompts)

for prompt, response in zip(prompts, responses):
    print(f"Q: {prompt}")
    print(f"A: {response}")
    print("-" * 60)

Q: What is the capital of France?
A: The capital of France is **Paris**.
------------------------------------------------------------
Q: What is the capital of Germany?
A: The capital of Germany is **Berlin**.
------------------------------------------------------------
Q: What is the capital of Italy?
A: The capital of Italy is **Rome**.
------------------------------------------------------------


### Programmatic Prompt Construction

Often we want to construct prompts programmatically from data. Here's an example of sentiment analysis:

In [13]:
# Define the system prompt for sentiment analysis
system_prompt = "You are a sentiment analysis AI. Classify text as Positive, Negative, or Neutral."

# List of texts to analyze
texts = [
    "I love the new design of your website!",
    "The service was terrible and I will not come back.",
    "The product is okay, but it could be better.",
    "Absolutely fantastic experience, highly recommend!",
    "I'm not sure how I feel about this."
]

# Construct prompts programmatically
instruction = "Analyze the sentiment of this text. Give only the sentiment classification (Positive, Negative, or Neutral).\n\nText: "
prompts = [instruction + text for text in texts]

# Generate responses
responses = llm_generate('gemini-flash-lite', prompts, system_prompt=system_prompt)

# Display results
print("Sentiment Analysis Results:\n")
for text, sentiment in zip(texts, responses):
    print(f"Text: {text}")
    print(f"Sentiment: {sentiment}")
    print("-" * 60)

Sentiment Analysis Results:

Text: I love the new design of your website!
Sentiment: Positive
------------------------------------------------------------
Text: The service was terrible and I will not come back.
Sentiment: Negative
------------------------------------------------------------
Text: The product is okay, but it could be better.
Sentiment: Neutral
------------------------------------------------------------
Text: Absolutely fantastic experience, highly recommend!
Sentiment: Positive
------------------------------------------------------------
Text: I'm not sure how I feel about this.
Sentiment: Neutral
------------------------------------------------------------


### Monitoring Spending

You can run `show_session_spending` at the end of a notebook to see your total OpenRouter usage and cost.  If you use `llm_generate` in multiple notebooks durning a session, the tracking may not be perfect, but it will give you an idea.

In [14]:
# Show spending for this notebook session

show_session_spending()


💰 Current Session Spending Summary
Total Cost:          $0.000909
Total API Calls:     11
Total Tokens:        480 in / 1,779 out

----------------------------------------------------------------------
By Model:
  google/gemini-2.5-flash-lite
    Cost: $0.000643 | Calls: 8 | Tokens: 403 in / 1,506 out
  z-ai/glm-4.5-air
    Cost: $0.000142 | Calls: 1 | Tokens: 26 in / 161 out
  mistralai/mistral-medium-3
    Cost: $0.000122 | Calls: 1 | Tokens: 20 in / 57 out
  meta-llama/llama-3.2-3b-instruct
    Cost: $0.000002 | Calls: 1 | Tokens: 31 in / 55 out

----------------------------------------------------------------------
Total Spent this session: $0.000909
Approximate Credit remaining: $9.92
(Note: This balance may not reflect the most recent spending)



### Using Other Providers

You can use other LLM providers in this class as well, but you don't need to do so.  The last notebook shows how to set up other providers if you're interested.

## What's Next

In the next notebook, we'll explore common NLP tasks including:
- Text classification and sentiment analysis
- Named Entity Recognition (NER)
- Question answering
- Translation
- Summarization

In Lesson 11, we'll dive deeper into how text generation works, explore the underlying APIs in detail, and learn about running LLMs locally for privacy-sensitive applications.

For now, practice using `llm_generate()` with different models and prompts to get comfortable with the interface!