#### Hands-On Lab: A Beginner's Guide to Using LLMs

Welcome to your first hands-on lab! In this notebook, we'll move from theory to practice. You'll get to interact with and command several of the most influential Large Language Models available, including a closed-source API model (OpenAI), an open-source model from a hub (Hugging Face), and a powerful model running on your own machine (Llama)

### **PART 1 - Interacting with the OpenAI API**

Our first exercise is to connect to a state-of-the-art model through an API. OpenAI's models are a great starting point because they are powerful and easy to use with a simple library. This is the most common way businesses integrate AI into their applications.

The cell below uses the %pip command to install the official openai Python library into our notebook environment. This library provides all the necessary tools to easily communicate with the OpenAI API.

In [None]:
%pip install openai

In [None]:
import os
from openai import OpenAI

# IMPORTANT: Replace "YOUR_API_KEY" with the key you got from the OpenAI Platform.
# For better security, it's best to set this as an environment variable,
# but we'll put it here for this simple first test.
api_key = "YOUR_API_KEY"

if api_key != "YOUR_API_KEY":
    raise ValueError("API key not found. Please add your API key to the `api_key` variable.")

client = OpenAI(api_key=api_key)

# Our prompt to the model
prompt_text = "Write a catchy and short product description for a new brand of smart coffee mug that keeps your drink at the perfect temperature."

# Sending the request to the model
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful marketing assistant."},
        {"role": "user", "content": prompt_text}
    ]
)

# Printing the model's generated content
print(response.choices[0].message.content)

### **Part 2 - Interacting with Groq**

**Text Generation Example**

In [2]:
!pip install groq

In [5]:
from groq import Groq
from google.colab import userdata

client = Groq(
    api_key=userdata.get('GROQ_API_Key')
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Which province is the biggest in Rwanda ?",
        }
    ],
    model="llama-3.3-70b-versatile",
)

print(chat_completion.choices[0].message.content)

**1.1
Performing a Basic Chat Completion** <br>

  The simplest way to use the Chat Completions API is to send a list of messages and receive a single response. Messages are provided in chronological order, with each message containing a role ("system", "user", or "assistant") and content.

In [None]:
from groq import Groq
from google.colab import userdata

import os

client = Groq(
    # api_key
    api_key=userdata.get('GROQ_API_Key')
)

user_prompt = "Where is Rwanda located ?"

completion = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[
      {
        "role": "user",
        "content": user_prompt
      }
    ],
    temperature=1,
    max_completion_tokens=8192,
    top_p=1,
    reasoning_effort="medium",
    stream=True,
    stop=None
)

for chunk in completion:
    print(chunk.choices[0].delta.content or "", end="")

In [None]:
from groq import Groq
from google.colab import userdata

client = Groq(
    api_key=userdata.get('GROQ_API_Key')
)

user_prompt = "Where is Rwanda located ?"

completion = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[
      {
        "role": "system",
        "content": "You are a helpful geography assistant."
      },
      {
        "role": "user",
        "content": "Can you tell me about the capital of France?"
      },
      {
          "role": "assistant",
          "content": "The capital of France is Paris."
      },
      {
        "role": "user",
        "content": "Can you tell me about the capital of France?"
      },
      {
          "role": "assistant",
          "content": "The capital of France is Paris."
      },
      {
        "role": "user",
        "content": user_prompt
      }
    ],
    temperature=1,
    max_completion_tokens=8192,
    top_p=1,
    reasoning_effort="medium",
    stream=True,
    stop=None
)

for chunk in completion:
    print(chunk.choices[0].delta.content or "", end="")

**Performing a Chat Completion with a Stop Sequence**  <br>
Stop sequences allow you to control where the model should stop generating. When the model encounters any of the specified stop sequences, it will halt generation at that point. This is useful when you need responses to end at specific points.

In [6]:
from groq import Groq

client = Groq(
    api_key=userdata.get('GROQ_API_Key')
)

chat_completion = client.chat.completions.create(
    #
    # Required parameters
    #
    messages=[
        # Set an optional system message. This sets the behavior of the
        # assistant and can be used to provide specific instructions for
        # how it should behave throughout the conversation.
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        # Set a user message for the assistant to respond to.
        {
            "role": "user",
            "content": "Count to 10.  Your response must begin with \"1, \".  example: 1, 2, 3, ...",
        }
    ],

    # The language model which will generate the completion.
    model="llama-3.3-70b-versatile",

    #
    # Optional parameters
    #

    # Controls randomness: lowering results in less random completions.
    # As the temperature approaches zero, the model will become deterministic
    # and repetitive.
    temperature=0.5,

    # The maximum number of tokens to generate. Requests can use up to
    # 2048 tokens shared between prompt and completion.
    max_completion_tokens=1024,

    # Controls diversity via nucleus sampling: 0.5 means half of all
    # likelihood-weighted options are considered.
    top_p=1,

    # A stop sequence is a predefined or user-specified text string that
    # signals an AI to stop generating content, ensuring its responses
    # remain focused and concise. Examples include punctuation marks and
    # markers like "[end]".
    # For this example, we will use ", 6" so that the llm stops counting at 5.
    # If multiple stop values are needed, an array of string may be passed,
    # stop=[", 6", ", six", ", Six"]
    stop=", 6",

    # If set, partial message deltas will be sent.
    stream=False,
)

# Print the completion returned by the LLM.
print(chat_completion.choices[0].message.content)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [12]:
import os
from groq import Groq
from google.colab import userdata
from pathlib import Path

client = Groq(
    api_key=userdata.get('GROQ_API_Key')
)

# Specify the directory and filename to save the speech file
speech_file_path = Path("/content") / "speech.wav" # Save to the /content directory
response = client.audio.speech.create(
  model="playai-tts",
  voice="Aaliyah-PlayAI",
  response_format="wav",
  input="My name is Kevin and I am from RRA ",
)

# Write the audio content to the file in chunks
with open(speech_file_path, "wb") as f:
    for chunk in response.iter_bytes():
        f.write(chunk)

**1.2 Speech to Text** <br>
Groq API is designed to provide fast speech-to-text solution available

In [None]:
import os
from groq import Groq

# from google.colab import drive
# drive.mount('/content/drive')

# Initialize the Groq client
client = Groq(
    api_key=userdata.get('GROQ_API_Key')
)

# Specify the path to the audio file
filename = "/content/sample_data/Sample_Audio_File.mp4" # Replace with your audio file!

# Open the audio file
with open(filename, "rb") as file:
    # Create a translation of the audio file
    translation = client.audio.translations.create(
      file=(filename, file.read()), # Required audio file
      model="whisper-large-v3", # Required model to use for translation
      prompt="Specify context or spelling",  # Optional
      response_format="json",  # Optional
      temperature=0.0  # Optional
    )
    # Print the translation text
    print(translation.text)

**1.3 Vision Example**

In [30]:
from groq import Groq
import os
import base64
import mimetypes


client = Groq(
    api_key=userdata.get('GROQ_API_Key')
)

# Specify the path to the local image file
image_path = "/content/inspiration_2.jpeg"

# Detect MIME type
mime_type, _ = mimetypes.guess_type(image_path)
if mime_type is None:
    mime_type = "image/jpeg"  # Default fallback

# Read and encode the image
with open(image_path, "rb") as image_file:
    image_data = image_file.read()

base64_image = base64.b64encode(image_data).decode('utf-8')

completion = client.chat.completions.create(
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "List what you observe in this photo in JSON format."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:{mime_type};base64,{base64_image}"
                    }
                }
            ]
        }
    ],
    temperature=1,
    max_completion_tokens=1024,
    top_p=1,
    stream=False,
    response_format={"type": "json_object"},
    stop=None,
)

print(completion.choices[0].message.content)

### Part 3: Using an Open-Source Model from Hugging Face

Next, we'll explore the world of open-source AI with Hugging Face.

Hugging Face is a massive community hub where developers share thousands of pre-trained models. We'll use their transformers library, which makes it incredibly simple to download and use these models for specific tasks.

First, we need to install the transformers library. We also install torch, which is the underlying machine learning framework that transformers uses to run the models.

In [None]:
%pip install transformers torch

In this cell, we'll perform a text summarization task. The pipeline function from the transformers library is a high-level helper that abstracts away a lot of complex code. We simply tell it we want a "summarization" pipeline and which model to use (facebook/bart-large-cnn). The library handles the model download and setup for us.

In [3]:
from transformers import pipeline

# Load the summarization pipeline. The model will be downloaded on the first run.
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

long_text = """
Generative AI is a type of artificial intelligence (AI) that can create new and original content, such as text, images, music, and code.
Unlike traditional AI systems that are designed to recognize patterns and make predictions, generative AI systems can generate novel outputs after being trained on massive amounts of data.
This capability has led to a wide range of applications, revolutionizing industries from entertainment and art to software development and scientific research.
However, the rapid advancement of generative AI also raises important ethical considerations, including the potential for misuse, the spread of misinformation, and biases present in the training data.
"""

# Generate the summary
summary = summarizer(long_text, max_length=50, min_length=25, do_sample=False)

# Print the result
print(summary[0]['summary_text'])

###  Running a Llama Model Locally with Ollama
This next step is a game-changer: running a powerful LLM directly on your own computer. This provides maximum privacy and control. We'll use a tool called Ollama that makes this process surprisingly easy.

⚠️ Note: The following commands are not for this notebook. You need to run them in your computer's own terminal (or Command Prompt/PowerShell on Windows).

Step 1: Download and install Ollama from the official website.

Step 2: After installing, open your terminal and use the command in the cell below. It will download the Llama 3 model (this may take a while) and start a chat session right in your terminal.

### THIS COMMAND IS FOR YOUR TERMINAL, NOT THE NOTEBOOK
ollama pull llama3

ollama pull nomic-embed-text

ollama run llama3

### Part 4: Comparative Exercise
To wrap up, let's see how these different models handle the exact same task. This will highlight the unique "personalities" and capabilities of each. Our task is to explain a complex topic to a child.

First, let's ask our OpenAI model. The following cell sends the prompt to gpt-3.5-turbo.

In [None]:
# Make sure you've already run the OpenAI setup cells above!
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "Explain the concept of 'machine learning' to a 10-year-old in three sentences."}
    ]
)

print("--- OpenAI's Response ---")
print(response.choices[0].message.content)

In [None]:
from transformers import pipeline

# We will use a text-generation model for this task
generator = pipeline('text-generation', model='distilgpt2')

# Generate a response
response = generator(
    "Explain the concept of 'machine learning' to a 10-year-old in three sentences.",
    max_length=60,
    num_return_sequences=1
)

print("--- Hugging Face's Response ---")
print(response[0]['generated_text'])

Finally, it's time to test your local Llama 3 model.

Go to the terminal window where you have ollama run llama3 active and type in the same prompt.

Prompt for your terminal: Explain the concept of 'machine learning' to a 10-year-old in three sentences.

Compare the output you get from Llama 3 with the outputs from OpenAI and Hugging Face in this notebook. Discuss with your peers which one you think did the best job and why.