## A Tiny History of Natural Language Processing

Natural Language Processing (NLP) has evolved significantly over the past few decades. Initially, NLP relied heavily on rule-based systems and statistical methods to understand and generate human language. These early approaches, prominent in the 1980s and 1990s, focused on the syntactic structure of text, using techniques such as n-grams and Hidden Markov Models (HMMs) to model language. However, these methods struggled with capturing the semantic meaning and context of words.

The introduction of word embeddings in the early 2010s, such as Word2Vec and GloVe, marked a significant advancement in NLP. These embeddings allowed for the representation of words in continuous vector space, capturing semantic relationships between words. This shift enabled more sophisticated models, such as Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks, to process sequences of text and maintain context over longer passages. RNNs, in particular, played a crucial role in tasks like language translation and sentiment analysis.

The advent of transformers in 2017 revolutionized NLP by addressing the limitations of RNNs. Transformers, introduced with the Attention is All You Need paper, utilize self-attention mechanisms to process entire sequences of text simultaneously, allowing for better handling of long-range dependencies and parallelization. This led to the development of powerful models like BERT, GPT, and T5, which have set new benchmarks in various NLP tasks by providing a deeper semantic understanding of text.

Transformers have almost entirely supplanted previous approaches to NLP because:

1. **Superior Performance:** Models like BERT, GPT, T5, and their successors dominate leaderboards on tasks such as text classification, translation, summarization, and question answering.
2. **Pretraining and Transfer Learning:** Unlike traditional methods that required training separate models from scratch for different tasks, transformers leverage large-scale pretraining on vast text corpora and fine-tune efficiently on specific tasks.
3. **Self-Attention and Contextual Representations:** Transformers provide rich, context-dependent word representations, whereas earlier models like Word2Vec and GloVe generated static embeddings.
4. **Scalability and Adaptability:** With advancements in scaling laws, models can achieve better performance just by increasing their size and training data, an advantage that RNNs and classical machine learning approaches lacked.

There are a few areas where older approaches still exist:

1. **Small Datasets & Low Compute Environments:** Logistic regression, SVMs, and Lasso-penalized models often remain competitive when data is limited or when computational efficiency is a concern.
2. **Domain-Specific Applications:** Some applications, like biomedical text mining, may still rely on domain-specific feature engineering approaches alongside transformers.
3. **Traditional ML for Interpretability:** Some NLP applications in finance, healthcare, and legal fields still favor older methods due to the need for interpretability and robustness.

However, since transformer mdoels for NLP are now so dominant we will focus excusively on them in this class.

## NLP Tasks Instead of Transformer Details

Transformers are more complicated than the CNNs we saw for computer vision so we're not going to dive as deeply into the details.  We will, in Lesson 8 - Transformer Details, learn about some of the nuts and bolts especially the self-attention mechanism that allows tranformers to figure out relationships between words and to understand context.  Mostly, though, we will focus on the applications of transformers.  To this end we'll dive into the open source HuggingFace ecosystem which hosts thousands of NLP models and datasets and makes it quite simple to dive into NLP applications without having to master too much code.  All of the newest, biggest open source transformer models are hosted there including those from Meta, Mistral, and Deepseek.  The only thing keeping us from running the biggest state-of-the-art models will be lack of compute, but we can run their smaller cousins on the GPU in CoCalc's compute server or on a decent gaming GPU.  

## Fine-tuning a Specialized Model versus Using a Large Language Model

As large language models (LLM) continue to improve, their use as general NLP task solvers via prompting is increasing.  Particularly in situations where we don't have access to a lot of training data.  Our choices for solving an NLP task come down to
1.  Using a LLM via an API (in the cloud) like GPT-4o or Gemini.
2.  Using a LLM model running on local hardware.
3.  Fine-tuning and using a specialized transformer model designed for the task.

For example, for a text-classification task we could choose:

- **LLM via API (GPT-4o, Claude, Gemini, etc.)**
    - When you need **a quick, general-purpose classifier** without training a model.
    - When **zero-shot or few-shot classification** (via prompting) is sufficient.
    - When categories may evolve frequently, making retraining impractical.
    - Example: Categorizing support tickets by topic.

- **Local LLM (LLaMA, Mistral, OpenChat)**
    - When you need to classify text **without sending data to an external API** (e.g., **privacy-sensitive data**).
    - When you need **occasional classification** and want to avoid API costs.
    - Works well for **prompt-based classification** if the model is large enough (e.g., LLaMA-2 13B or Mistral 7B).
    - Example: **Classifying internal legal documents**.

- **Fine-tune BERT / RoBERTa / DistilBERT**
    - When you have a **moderate to large labeled dataset** and need **high accuracy**.
    - When you need **fast inference at scale**, as fine-tuned models are more efficient than large LLMs.
    - When your classification task requires **domain-specific adaptation**.
    - Example: **Sentiment analysis on customer feedback** in a specific industry.

Don't worry if you don't know all those terms yet, especially the various models mentioned such as BERT.  Zero-shot classification means classifying text without seeing any examples - the LLM just gets a prompt with the possible categories.  Few-shot classification means seeing a small number of examples provided in the LLM prompt.  

Here's some thoughts on choosing the right approach for a given NLP task:

- **Use API-based LLMs (GPT-4o, Claude, Gemini, etc.) when**:
  - You need **quick, adaptable solutions** without training.
  - You **don’t have much data** for fine-tuning.
  - Privacy and latency are not major concerns.

- **Use Local LLMs (LLaMA, Mistral, Falcon) when**:
  - You need **private, offline inference**.
  - You want **control over deployment** without external dependencies.
  - **Few-shot learning is sufficient**, and you don’t want to fine-tune.

- **Fine-Tune a Model (BERT, BART, T5, RoBERTa) when**:
  - You have **domain-specific data** and need **high accuracy**.
  - Privacy, cost, or latency concerns prevent LLM use.
  - You require **structured, predictable outputs**.

For each NLP task we study over the next several lessons we'll consider all three approaches.  We won't demonstrate using APIs at scale because API use isn't free, but it's very cheap for experimentation and Google's Gemini API is free for testing with rate limits.  To use APIs you'll need to to sign up for accounts and get API keys.  

## Getting API Keys and a HuggingFace Token

An API key is a private code that allows you to interact with applications running in the cloud or on private servers.  In this section we'll describe how to get api keys and how to get a HuggingFace token.  At the end of the section we'll describe how you can store your api keys.  Generally, you don't want to put your keys directly in notebooks or other places that might be publically visible.

#### Costs

We'll show more details about pricing later, but here's the basics:

* Google's Gemini API is **free** to use for testing but there are rate limits and daily maximums.  It's cheap to use if you want to do more.
* OpenAI's API is not free to use, but is still cheap to use.  
* HuggingFace is free to use unless you get into some of their (or their affiliates) hosting solutions.  You may not even need the token to get access to everything, but it doesn't hurt.

**You should at least get a Google Gemini API Key and a HuggingFace Token:**

### Obtaining a Google Gemini API Key

To get started with the Gemini API and obtain an API key, follow these steps:

1.  **Go to the Google AI Studio website:** Visit [ai.google.dev](https://ai.google.dev/).
2.  **Sign in with your Google account.**
3.  **Create a new project (if needed):** If you don't have a project, you'll be prompted to create one.
4.  **Get an API key:** Once you have a project, you can generate an API key. This key will be used to authenticate your requests to the Gemini API.
5.  **Store the API key securely:** After obtaining the API key, store it securely. You can set it as an environment variable or store it in a configuration file.

I've found the [Google Gemini API docs](https://ai.google.dev/gemini-api/docs/quickstart?lang=python) to be quite helpful.  

As long as you have a Google account, limited use of the Gemini models is free so you should definitely set up a key for yourself.

### Obtaining an OpenAI API Key

OpenAI doesn't offer a free tier, but their non-reasoning models such as GPT-4o and GPT-4o-mini are quite cheap to use. I've been playing with their API sporadicaly for months and have yet to spend $15.  We'll show some sample prompts later along with their estimated costs.  You're not required to use the OpenAI API but you can if you're interested.

To get started with the OpenAI API and obtain an API key, follow these steps:

1. **Go to the OpenAI website:** Visit [openai.com](https://www.openai.com/).
2. **Sign up for an account:** If you don't have an account, sign up using your email address.
3. **Log in to your account:** Once you have an account, log in with your credentials.
4. **Buy credit:** Navigate to the billing section and purchase the desired amount of credit. OpenAI offers various pricing plans based on your usage needs.
5. **Generate an API key:** After purchasing credit, go to the API section and generate a new API key. This key will be used to authenticate your requests to the OpenAI API.
6. **Store the API key securely:** After obtaining the API key, store it securely. You can set it as an environment variable or store it in a configuration file.


### Getting a HuggingFace Token

We'll be using many models from the HuggingFace ecosystem in the NLP part of the course.  Some models, like the Llama LLM models from Meta require you to agree to terms before you download their models.  Your access to those models is associated with your HuggingFace token which is essentially an api key tied to your HuggingFace account.  Don't worry, it's free.

1. **Go to the HuggingFace website:** Visit [huggingface.co](https://huggingface.co/).
2. **Sign up for an account:** If you don't have an account, sign up using your email address or GitHub account.
3. **Log in to your account:** Once you have an account, log in with your credentials.
4. **Navigate to your profile settings:** Click on your profile picture in the top right corner and select "Settings" from the left navigation bar.
5. **Access the API tokens section:** In the settings menu, find and click on "Access Tokens" under the "API tokens" section.  You may have to authenticate here.
6. **Generate a new token:** Click on the "Create new token" button, give your token a name, and select the appropriate scope (e.g., "read" for downloading models). Then, click "Generate".
7. **Store the token securely:** After generating the token, store it securely. You can set it as an environment variable or store it in a configuration file.

### Storing and using your API keys

On my personal computers I store my api keys as environment variables.  Ask an AI how to do this for your machine if you want.  Another way to store them locally is to put them in a file, often called a ".env" file.  For example here are the contents of a sample api_keys.env file:

```
HF_TOKEN=abcdefg
OPENAI_API_KEY=abcdefg
GEMINI_API_KEY=abcdefg
```

Use the `dotenv` library to read the environment variables from the `.env` file. Here's how you can do it:

1. **Install the `python-dotenv` library:** If you haven't already installed it, you can do so using pip:
    ```bash
    pip install python-dotenv
    ```

2. **Create a `.env` file:** Save your API keys in a file named `apikeys.env` (or any name you prefer) in your project directory.

3. **Load the environment variables in your Python script:** Use the following code to load the environment variables from the `.env` file:
    ```python
    from dotenv import load_dotenv
    import os

    # Load environment variables from the .env file
    load_dotenv('path/to/apikeys.env')

    # Access the environment variables
    hf_token = os.getenv('HF_TOKEN')
    openai_api_key = os.getenv('OPENAI_API_KEY')
    gemini_api_key = os.getenv('GEMINI_API_KEY')

    print(f"HuggingFace Token: {hf_token}") #remove these print statements after you've tested this
    print(f"OpenAI API Key: {openai_api_key}")
    print(f"Gemini API Key: {gemini_api_key}")
    ```

If you're working in CoCalc and have the course package installed, you can edit the file api_keys.env in Lessons/Course_Tools to include your keys. Then when you run the code below in your imports cell, the keys will be read and set:

```python
from introdl.utils import config_paths_keys
paths = config_paths_keys()
```

## Using the APIs

We'll just give you a couple of brief examples and point you toward the documentation in case you want to explore more.  We incorporated Google and OpenAI API use into our course tools which we'll introduce in a bit.  You'll still need api keys to use them though.

### Google Gemini API

If you want to try this now.  Get your GEMINI_API_KEY and add it to the api_keys.env in Lessons/Course_Tools and run the cells below to try a simple Gemini API request.  If necessary you may need to install the `google-genai` package by running `!pip install google-genai` in a code cell.

We included the Jupyter magic command "%%capture" in the next cell to capture the output to keep things clean.  Jupyter magic commands extend the functionality of notebooks beyond standard Python.  You can learn about a few particularly useful [magic commands here](https://www.kdnuggets.com/jupyter-notebook-magic-methods-cheat-sheet).


In [1]:
%%capture
import os
from google import genai
from introdl.utils import config_paths_keys, wrap_print_text
from introdl.nlp import display_markdown

# set keys and paths
paths = config_paths_keys()

# overload print with a version of print that wraps text at 80 characters
print = wrap_print_text(print)

In [2]:
# Calling Gemini API to generate content

client = genai.Client(api_key = os.getenv("GEMINI_API_KEY"))
response = client.models.generate_content(
    model="gemini-2.0-flash", contents="Tell me three interesting facts about space."
)
print(response.text)

Okay, here are three interesting facts about space:

1.  **There's a planet made of diamond:**  55 Cancri e, located 40 light-years
away in the constellation Cancer, is a rocky planet that's roughly twice the
size of Earth and eight times more massive. Scientists believe it's primarily
composed of pure carbon in the form of diamond.  Its estimated value is 26.9
nonillion dollars (that's 26 followed by 30 zeros!).

2.  **Space is not completely silent:** While space is a vacuum and doesn't
transmit sound in the way we experience it on Earth, it does contain plasma,
which can produce electromagnetic waves. NASA has instruments that can translate
these waves into audio, creating eerie and otherworldly sounds. These sounds
aren't "heard" in the traditional sense, but rather detected and converted.

3.  **There are rogue planets with no star:** These are planets that have been
ejected from their original planetary systems, drifting through the galaxy on
their own. Scientists estimate there 

Or we can use introdl.nlp.display_markdown to display the response as formatted markdown in our noteook, like this:

In [9]:
display_markdown(response.text)

Okay, here are three interesting facts about space:

1.  **There's a planet made of diamond:**  55 Cancri e, located 40 light-years away in the constellation Cancer, is a rocky planet that's roughly twice the size of Earth and eight times more massive. Scientists believe it's primarily composed of pure carbon in the form of diamond.  Its estimated value is 26.9 nonillion dollars (that's 26 followed by 30 zeros!).

2.  **Space is not completely silent:** While space is a vacuum and doesn't transmit sound in the way we experience it on Earth, it does contain plasma, which can produce electromagnetic waves. NASA has instruments that can translate these waves into audio, creating eerie and otherworldly sounds. These sounds aren't "heard" in the traditional sense, but rather detected and converted.

3.  **There are rogue planets with no star:** These are planets that have been ejected from their original planetary systems, drifting through the galaxy on their own. Scientists estimate there could be billions of these rogue planets in the Milky Way, outnumbering stars! They are very difficult to detect because they don't reflect light from a star.


There are many things we can do with the API including sending additional instructions (a system prompt) and configuring how the underlying language model generates the output.  To see more about directly working with API refer to Google's [documentation about text generation](https://ai.google.dev/gemini-api/docs/text-generation?lang=python).  We will learn more about how text-generation models work and how they can be configured to alter the results in later lessons.



### Using the OpenAI API

After getting and setting up your OPENAI_API_KEY as an environment variable or using api_keys.env as we did for Gemini you should be able to run the following cell.  It's almost exactly the same code we used for accessing Gemini through the OpenAI API.  We just have to change to the OPENAI_API_KEY and remove the URL so the request gets routed to OpenAI's servers.  We also changed the model to "gpt-4o-mini" which is currently their cheapest model and quite good for general use.

The next cell shows an example of using the OpenAI API.  We include a system prompt and some configuration parameters as an example.  Temperature is a parameter that controls the randomness of the ouput.  A temperature of 0 gives deterministic results and a value of 1 is the most random.  We'll see more about temperature in Lesson 11.

The cell won't run if you don't have an OPENAI_API_KEY stored in the appropriate environment variable.

In [10]:
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("OPENAI_API_KEY")
)

sys_instruct="You are helpful AI assistant who is also sarcastic and talks like a pirate."

response = client.chat.completions.create(
    model="gpt-4o-mini",
    n=1,
    messages=[
        {"role": "system", "content": sys_instruct},
        {
            "role": "user",
            "content": "Tell me three interesting facts about space."
        }
    ],
    temperature=0.1,  # Added temperature parameter
    max_tokens=100    # Added max_tokens parameter
)

display_markdown(response.choices[0].message.content)

Arrr, matey! Here be three fascinating tidbits 'bout the vastness of space that’ll make ye say “shiver me timbers!” 

1. **The Universe is Expanding**: Aye, just like me belly after a feast o' grog and grub! The universe be stretchin' out faster than a ship in full sail. Scientists reckon it’s expandin’ at an accelerated rate, thanks to a mysterious force they call dark energy. Sounds like a pirate

### Using Local LLM Models

We'll see more about text generation models in Lesson 11, but they're really easy to use in the HuggingFace ecosystem.  The benefits to running an LLM locally include data security, ease of use, and no subscription fees.  A company wanting to protect its propietary data may invest in considerable computing infrastructure to deploy larger, private LLM models.  We can mimic this experience by running smaller versions of LLMs like the Llama-3.3-3B model from Meta which, as of early 2025, is a state-of-the-art small text generation model.  We'll use a quantized model where the model weights are stored in 4-bit precision to enable faster inference and lower memory use at the cost of a little precision.

The downside to local models is that you're limited by the hardware you have available which means smaller models and slower results. These small models will demonstrate the ideas, but their performance can't compete with the hosted larger models.  Competitive models are freely available on HuggingFace but they require servers with multiple top-of-the-line GPUs.  

We'll explain more code like this later, but here is a simple way to load and use the model locally using a *pipeline*.  This should automatically detect and use a GPU if one is available.

In [11]:
from transformers import pipeline

chatbot = pipeline(
    "text-generation", 
    model="unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit"
)

# System instruction
sys_instruct = "You are a helpful AI assistant who is also sarcastic and talks like a pirate."

# Construct the chat prompt (Llama models often use specific formatting)
prompt = f"<|system|>\n{sys_instruct}\n<|user|>\nTell me three interesting facts about space.\n<|assistant|>"

# Generate response
response = chatbot(
    prompt, 
    max_length=200, 
    temperature=0.1
)

# Print the model's output
display_markdown(response[0]['generated_text'])


<|system|>
You are a helpful AI assistant who is also sarcastic and talks like a pirate.
<|user|>
Tell me three interesting facts about space.
<|assistant|> 
Arrrr, ye landlubber! Ye be wantin' to know some swashbucklin' space facts, eh? Alright then, settle yerself down with a pint o' grog and listen close:

1. **The universe be full o' mysteries, matey!** Did ye know that there be a giant storm on Jupiter that's been ragin' fer centuries? The Great Red Spot, it's called. It's a storm so big that three Earths could fit inside it, and it's been churnin' away fer so long that it's lost count o' the years!
2. **Space be full o' weird and wonderful things, me hearty!** Did ye know that there be a type o' star called a "red

Notice that the response also includes the system and input prompts.  That's typical of local LLM models from Huggingface.  We'll learn more about system prompts later.

In Lesson 11 we'll see how to use lower-level HuggingFace tools to get more control over our local LLM models or to be able to fine-tune the models.  For now, I encourage you to use `llm_configure` and `llm_generate` from our course package as we demonstrate in the next section.  As a bonus, `llm_generate`, be default, cleans the response text to remove the input and system prompts.



## Using the LLM tools in the Course Package

We included some functions in the course package to help you use LLMs from Python.  These are the kinds of helper functions you'd write for yourself to expedite sending prompts to an LLM and get responses.  `llm_configure` is used to choose a model and set some configuration options, while `llm_generate` is used for prompting.  These tools can be used to access local models as well to access Gemini and OpenAI APIs.

### Running Local Models

Here's an example where we load a local model called Mistral-7B-Instruct which is a small LLM from Mistral that has been fine-tuned to follow instructions.  Even with a GPU you'll likely notice that using a local LLM is slower than using one of the APIs like OpenAI or Gemini.

In [13]:
from introdl.nlp import llm_configure, llm_generate
from introdl.utils import wrap_print_text

print = wrap_print_text(print)

mistral_config = llm_configure("mistral-7B")
response = llm_generate(mistral_config, "What is the capital of France?")
print(response)

🚀 Loading model: unsloth/mistral-7b-instruct-v0.3-bnb-4bit (this may take a while)...
🟢 Model unsloth/mistral-7b-instruct-v0.3-bnb-4bit loaded successfully.

The capital of France is Paris.


Here, we'll repeat one of our prompts from the previous section.  This also shows how to pass a system prompt.  Note that `llm_generate` defaults to produce at most 200 new tokens.  We'll also switch to the smaller Llama-3.3-3B model because its faster.  The actual model that gets loaded is a quantized version of the model that has been fine-tuned to follow instructions.

In [14]:

llama32_config = llm_configure("llama-3p2-3B")
sys_instruct="You are helpful AI assistant who is also sarcastic and talks like a pirate."
response = llm_generate(llama32_config, "Tell me three interesting facts about space.", system_prompt=sys_instruct)
display_markdown(response)



🛑 Unloading model: unsloth/mistral-7b-instruct-v0.3-bnb-4bit from GPU...
✅ Model unsloth/mistral-7b-instruct-v0.3-bnb-4bit has been fully unloaded.
🚀 Loading model: unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit (this may take a while)...
🟢 Model unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit loaded successfully.



Yer lookin' fer some swashbucklin' space facts, eh? Alright then, settle yerself down with a pint o' grog and listen close:
1. **Space be full o' mysteries**: Did ye know that there's still so much we don't know about the universe? Scientists estimate that only about 10% o' it has been explored or mapped! That means there be plenty o' hidden treasures waitin' to be discovered.
2. **Black holes be sneaky devils**: These cosmic monsters can warp light around 'em, makin' stars and galaxies disappear from view! But here be the thing, matey - black holes might not be as empty as they seem. Some theories suggest they could even contain other worlds... (But don't go thinkin' ye'll find yer treasure in one, savvy?)
3. **The cosmos have their own rhythm**: The universe is filled with rhythms and cycles, just like

To allow the model to generate more output tokens, pass `max_new_tokens = 500` or some suitable value to `llm_configure`.

In [15]:
prompt = """Write a short story about a cat who learns to play the piano."""

response = llm_generate(llama32_config, prompt, max_new_tokens=500)
display_markdown(response)

Whiskers, a sleek black feline with bright green eyes, had always been fascinated by the sounds emanating from the grand piano in her owner's living room. Every day, she'd sit by its side and watch as her human mom tickled the keys, creating beautiful melodies.
One evening, while exploring the instrument, Whiskers' curious paws accidentally pressed down on two adjacent keys. To everyone's surprise, a gentle tune filled the air – it was "Twinkle, Twinkle Little Star." Enthralled, Whiskers tried again, this time managing to produce another note or two of the same melody.
Her owner, delighted by Whiskers' natural talent, began teaching the cat how to play more complex pieces. At first, Whiskers struggled, her paw movements clumsy but determined. But every session ended with laughter and encouragement for both mother-daughter duo.
As days turned into weeks, Whiskers improved dramatically. Her agility allowed her to dance across the keyboard with ease, nimbly navigating between notes with uncanny precision. She developed finger independence, capable of plucking multiple strings simultaneously like a skilled violinist.
Soon enough, music became an integral part of their home life. When company came over, guests marveled at the enchanting concerts performed solely by a talented feline pianist. Word spread throughout town, drawing visitors eager to witness the extraordinary collaboration between Whiskers and her devoted mentor.
Years passed, and Whiskers grew wise beyond her years. Yet whenever sorrow struck, she would climb onto the piano bench beside her human mom, wrap herself around one leg, and tenderly stroke her fingers against the keys. As soothing harmonies flowed through their shared space, all pain melted away.
With each passing season, love and devotion remained constant companions for these remarkable pair, bonded together forever through the universal language they called music. And so, amidst whispers of praise echoing everywhere, Whiskers stood proudly as a testament to what could be achieved when curiosity met passion.

OK, it's probably not a great story, but we're just getting the idea of how locally run models can be used to respond to prompts.

### Accessing the APIs

The nice thing about using our course pacakge helper functions is that it's simple to try different models and APIs using the same syntax so we can focus on the programatic use of LLMs.  `llm_generate` also cleans the returned prompts so that don't include the input prompt and other extras.

For example, to use the most recent Gemini model (as of February 19, 2025).  Note: you'll need to have already set the GEMINI_API_KEY environment variable as we did previously.

In [16]:
gemini_config = llm_configure("gemini-2.0-flash")
response = llm_generate(gemini_config, "Tell me three interesting facts about space.", 
                        system_prompt=sys_instruct,
                        max_new_tokens=500)
display_markdown(response)

Aye, I'll spin ye a yarn about the cosmos, I will.
1.  **Space be silent as a ghost ship:** Sound needs a medium to travel, like air or water. But space be a vast emptiness, so no one can hear ye scream... or sing a sea shanty, for that matter.
2.  **A day on Venus be longer than a year:** That's right, ye scurvy dog! Venus takes longer to rotate on its axis than it does to orbit the sun. A Venusian day be about 243 Earth days, while a Venusian year be only 225 Earth days.
3.  **There be a planet made of diamond:** They call it 55 Cancri e, and it be twice the size of Earth and eight times the mass. Scientists reckon it be made mostly of pure carbon that's been compressed into a giant diamond. Now that be a treasure even ol' Captain Jack Sparrow would envy!
Arrr, there ye have it! Three bits o' space trivia to shiver yer timbers!

Or to use OpenAI's gpt-4o-mini model (must have OPENAI_API_KEY):

In [18]:
openai_config = llm_configure("gpt-4o-mini")
response = llm_generate(openai_config, "Tell me three interesting facts about space.", 
                        system_prompt=sys_instruct,
                        max_new_tokens=500)
display_markdown(response)

Arrr, matey! Here be three treasure troves o’ knowledge ‘bout that vast, dark abyss we call space:
1. **The Great Void**: Did ye know that most o’ space is empty? Aye, it be a colossal vacuum, with less than one atom per cubic meter in some parts. So if ye ever feel lonely, just remember, ye be in good company with the emptiness!
2. **Time Dilation**: In the vast reaches o’ the cosmos, time be nothin’ but a construct! If ye were to sail close to a black hole, time would slow down for ye compared to yer mates far away. So if ye ever wanted to be a time traveler, just find yerself a black hole—though I reckon ye won’t be comin’ back to tell the tale!
3. **Galactic Cannibalism**: Believe it or not, galaxies can be downright greedy! They often gobble up smaller galaxies, like a pirate plunderin’ treasure. Our Milky Way be on a collision course with the Andromeda galaxy, and they be settin’ sail for a grand feast in about 4.5 billion years. So, no rush, right?
There ye have it, savvy? Space be a wild place!

Here's how you can see all of the models that are currently available.  Through our course package.

In [20]:
from introdl.nlp import llm_list_models
llm_list_models();

Available models:
Short Name Models:
  llama-3p1-8B => unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
  mistral-7B => unsloth/mistral-7b-instruct-v0.3-bnb-4bit
  llama-3p2-3B => unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit
  gemini-flash-lite => gemini-2.0-flash-lite-preview-02-05
  gemini-flash => gemini-2.0-flash
OpenAI Models:
  gpt-4o
  gpt-4o-mini
  o1-mini
  o3-mini
Gemini Models:
  gemini-2.0-flash-lite-preview-02-05
  gemini-2.0-flash

To use an OPENAI or GEMINI model, set the appropriate environment variable:
OPENAI_API_KEY or GEMINI_API_KEY


### Pricing

As of February 11, 2025, Google's API pricing for its Gemini models is as follows:

| Model           | Input Tokens (per 1M) | Output Tokens (per 1M) | Context Length | Modalities Supported |
|-----------------|-----------------------|------------------------|----------------|----------------------|
| **Gemini 2.0 Flash**| $0.10             | $0.40                    | 1M         | Text, Images, Video, Audio* |
| **Gemini 2.0 Flash Lite** | $0.075 | $0.30 | 1M | Text Images, Video, Audio |

*Audio costs more.

A nice thing about the Gemini models is they support free, limited API use for testing.  For Flash / Flash Lite the free tier is limited to 30 / 15 requests per minute or 1500 requests per day.  You can learn more about Gemini [pricing here](https://ai.google.dev/pricing#2_0flash).  


As of February 7, 2025, OpenAI's API pricing for various models is as follows:

| Model           | Input Tokens (per 1M) | Output Tokens (per 1M) | Context Length | Modalities Supported |
|-----------------|-----------------------|------------------------|----------------|----------------------|
| **OpenAI o1**   | $15                   | $60                    | 200k           | Text and Vision      |
| **OpenAI o3-mini** | $1.10               | $4.40                  | 200k           | Text                 |
| **GPT-4o**      | $2.50                 | $10                    | 128k           | Text and Vision      |
| **GPT-4o mini** | $0.15                 | $0.60                  | 128k           | Text and Vision      |

These models offer varying capabilities and pricing structures to accommodate different application needs. For more detailed information, you can refer to OpenAI's official API [pricing page](https://openai.com/api/pricing/). 


If you set the cost per M tokens in using llm_configure, then you can see the estimated cost of using the API like this:



In [20]:
openai_config = llm_configure("gpt-4o-mini", cost_per_M_input=0.15, cost_per_M_output=0.60)
response = llm_generate(openai_config, "Tell me five dad jokes.", 
                        system_prompt=sys_instruct,
                        max_new_tokens=500,
                        estimate_cost=True)
display_markdown(response)

💰 Estimated Cost: $0.000093 (Input: 33.0 tokens, Output: 147.0 tokens)


Arrr, matey! Here be five dad jokes fer ye, fit fer a scallywag’s laughter:
1. Why did the pirate go to school?  
   To improve his "arrrticulation!"
2. What do ye call a fish with no eyes?  
   Fshhh! (Get it? No eyes, no "I"!)
3. Why don’t skeletons fight each other?  
   They don’t have the guts, savvy?
4. How do ye organize a space party?  
   Ye planet, of course!
5. What do ye call a fake noodle?  
   An impasta! Arrr, I hope ye be laughing, or I’ll walk the plank!

In [23]:
!pip install ../Course_Tools/introdl/

Processing c:\users\bagge\my drive\python_projects\ds776_develop_project\ds776\lessons\course_tools\introdl
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: introdl
  Building wheel for introdl (pyproject.toml): started
  Building wheel for introdl (pyproject.toml): finished with status 'done'
  Created wheel for introdl: filename=introdl-1.0-py3-none-any.whl size=43133 sha256=95d22efc5820e5e9eedf1e6954f4846409589dc2d0b099735b1132bd28c2f69c
  Stored in directory: C:\Users\bagge\AppData\Local\Temp\pip-ephem-wheel-cache-c17moorq\wheels\f5\d5\0f\11f1d5af64d00defb23fa33cf51b2946a0899888d73571e687
Successfully built introdl
Installing collected packages: introdl
  Attempting 

In [1]:
from introdl.nlp import JupyterChat

chat = JupyterChat("gemini-2.0-flash")

Text(value='', description='User:', layout=Layout(width='100%'), placeholder='Type your message here...')

HBox(children=(Button(button_style='primary', description='Send', style=ButtonStyle()), Button(button_style='w…

Output()

✅ Chat copied to clipboard!
