# Using Open Source LLMs with Groq Cloud

## Overview
In this notebook, we'll learn how to use **Groq Cloud** to access and run open-source Large Language Models (LLMs) like Meta's Llama models.

### What is Groq?
Groq is a cloud platform that provides ultra-fast inference for open-source LLMs. It uses custom hardware called **Language Processing Units (LPUs)** designed specifically for AI inference, making it one of the fastest ways to run LLMs.

### Key Benefits of Groq:
- **Blazing Fast Inference**: LPU technology provides exceptionally low latency
- **Free Tier Available**: Generous free tier for experimentation and learning
- **Access to Latest Models**: Run state-of-the-art open-source models like Llama, Mixtral, and more
- **Simple API**: OpenAI-compatible API for easy integration

### What You'll Learn:
1. How to set up Groq API authentication
2. How to create a Groq client
3. How to send prompts and receive completions
4. How to use different Llama model variants


## Get Groq API

Here you need to get an access token to be able to access models using Grok's platform via APIs:

- Groq API Key: Go [here](https://console.groq.com/keys) and create an API key. You need to setup an account which is totally free of cost. Also while Groq has a generous free tier, there are also paid plans if you are interested.


1. Go to [Groq Cloud -> Create API Key](https://console.groq.com/keys) after creating your account and make sure to create a new API Key as shown

![](https://i.imgur.com/tgHXlcV.png)

2. Remember to __Save__ your key somewhere safe as it will just be shown once as shown below. So copy and save it in a local secure file to use it later on. If you forget, just create a new key anytime.

![](https://i.imgur.com/Q27AgA1.png)

In [16]:
# ============================================================================
# STEP 1: Load Environment Variables
# ============================================================================
# We use python-dotenv to securely load our API key from a .env file
# This is a best practice to keep sensitive credentials out of your code

from dotenv import load_dotenv
import os

# Load all environment variables from .env file into the environment
load_dotenv()

# Retrieve the Groq API key from environment variables
# Your .env file should contain: GROQ_API_KEY=your_api_key_here
groq_key = os.getenv("GROQ_API_KEY")

# Verify the key was loaded (optional - for debugging)
# print("API Key loaded:", "Yes" if groq_key else "No")

## Step 2: Initialize the Groq Client

The Groq Python SDK provides a simple interface to interact with Groq's API. The client handles:
- Authentication with your API key
- Request/response serialization
- Error handling and retries


In [None]:
# ============================================================================
# STEP 2: Create the Groq Client
# ============================================================================
# The Groq SDK follows a similar pattern to OpenAI's SDK, making it easy
# to switch between providers

from groq import Groq

# Initialize the Groq client with your API key
# This client will be used to make all API calls to Groq's servers
groq_client = Groq(api_key=groq_key)

# Note: You can also set GROQ_API_KEY as an environment variable
# and initialize without passing the key: groq_client = Groq()

## Step 3: Create a Helper Function for Chat Completions

Let's create a reusable function that wraps the Groq API call. This function will:
- Accept a user prompt and optional model parameter
- Format the message in the chat completion format
- Return just the text content from the response

### Understanding the Chat Completion Format
The chat completion API uses a **messages** array where each message has:
- `role`: Can be "system", "user", or "assistant"
- `content`: The actual text content of the message


In [None]:
# ============================================================================
# STEP 3: Define Helper Function for Chat Completions
# ============================================================================

def get_completion_chatgroq(prompt, model="meta-llama/llama-4-scout-17b-16e-instruct"):
    """
    Send a prompt to Groq's chat completion API and get a response.
    
    Parameters:
    -----------
    prompt : str
        The user's input/question to send to the model
    model : str
        The model identifier to use (default: Llama 4 Scout)
        
    Returns:
    --------
    str
        The model's response text
        
    Available Models on Groq (as of 2024):
    - meta-llama/llama-4-scout-17b-16e-instruct  : Llama 4 Scout (efficient)
    - meta-llama/llama-4-maverick-17b-128e-instruct : Llama 4 Maverick (more capable)
    - llama-3.3-70b-versatile : Llama 3.3 70B
    - mixtral-8x7b-32768 : Mixtral 8x7B
    - gemma2-9b-it : Google's Gemma 2 9B
    
    Check https://console.groq.com/docs/models for the latest available models
    """
    
    # Format the prompt as a chat message
    # The messages array can contain multiple messages for multi-turn conversations
    messages = [{"role": "user", "content": prompt}]
    
    # Make the API call to Groq
    response = groq_client.chat.completions.create(
        model=model,        # Specify which LLM to use
        messages=messages,  # The conversation history/prompt
        temperature=0,      # Controls randomness (0 = deterministic, 1 = creative)
        # Other optional parameters:
        # max_tokens=1024,  # Maximum length of the response
        # top_p=1,          # Nucleus sampling parameter
        # stream=False,     # Whether to stream the response
    )
    
    # Extract and return just the text content from the response
    # The response structure: response.choices[0].message.content
    return response.choices[0].message.content

## Step 4: Test with Different Models

Now let's test our function with different Llama 4 model variants. Groq provides access to various open-source models, each with different capabilities:

### Llama 4 Scout vs Maverick
- **Scout (16 experts)**: Faster, more efficient, good for most tasks
- **Maverick (128 experts)**: More capable, better for complex reasoning tasks

Let's compare their outputs for the same prompt!


In [None]:
# ============================================================================
# EXAMPLE 1: Using Llama 4 Scout Model
# ============================================================================
# Scout is the lighter, faster variant with 16 experts
# Great for quick responses and standard tasks

prompt = 'Explain Generative AI in 2 bullet points'

# Call our helper function with the Scout model
response = get_completion_chatgroq(
    prompt=prompt, 
    model="meta-llama/llama-4-scout-17b-16e-instruct"
)

print("=" * 60)
print("LLAMA 4 SCOUT RESPONSE:")
print("=" * 60)
print(response)

Here are 2 bullet points explaining Generative AI:

* **Creates new content**: Generative AI is a type of artificial intelligence that can generate new, original content, such as images, videos, music, text, and more. It uses complex algorithms and machine learning techniques to create this content, often based on patterns and structures learned from large datasets.
* **Learns from data, not human input**: Unlike traditional AI systems that rely on human input and rules to generate output, Generative AI models learn from vast amounts of data and can produce novel, diverse, and often surprising results that may not be immediately recognizable as related to the training data.


In [None]:
# ============================================================================
# EXAMPLE 2: Using Llama 4 Maverick Model
# ============================================================================
# Maverick is the more powerful variant with 128 experts
# Better for complex reasoning and nuanced responses

prompt = 'Explain Generative AI in 2 bullet points'

# Call our helper function with the Maverick model
response = get_completion_chatgroq(
    prompt=prompt, 
    model="meta-llama/llama-4-maverick-17b-128e-instruct"
)

print("=" * 60)
print("LLAMA 4 MAVERICK RESPONSE:")
print("=" * 60)
print(response)

# Notice how both models give similar but slightly different responses
# The Maverick model may provide more nuanced explanations for complex topics

Here are 2 bullet points explaining Generative AI:

* **Generative AI creates new content**: Generative AI uses complex algorithms to generate new, original content, such as images, videos, music, text, or code, that is similar in style and structure to the data it was trained on.
* **Trained on existing data to learn patterns**: Generative AI models are trained on large datasets to learn patterns, relationships, and structures within the data, allowing them to generate new content that is often realistic and coherent, and sometimes even creative or surprising.


## Summary

In this notebook, we learned how to:

1. **Set up Groq API authentication** using environment variables
2. **Initialize the Groq client** for making API calls
3. **Create a reusable helper function** for chat completions
4. **Use different Llama 4 model variants** (Scout vs Maverick)

### Key Takeaways

| Aspect | Details |
|--------|---------|
| **Speed** | Groq's LPU technology provides ultra-fast inference |
| **Cost** | Free tier available for learning and experimentation |
| **Models** | Access to latest open-source models (Llama, Mixtral, Gemma) |
| **API** | OpenAI-compatible format makes migration easy |

### Next Steps
- Try different models available on Groq
- Experiment with the `temperature` parameter for more creative responses
- Build multi-turn conversations using the messages array
- Explore streaming responses for real-time output

### Resources
- [Groq Documentation](https://console.groq.com/docs)
- [Available Models](https://console.groq.com/docs/models)
- [Rate Limits & Pricing](https://console.groq.com/docs/rate-limits)
