<img src="https://toppng.com/uploads/preview/linkedin-logo-png-photo-116602552293wtc4qogql.png" width="20" height="20" /> [Bharath Hemachandran](https://www.linkedin.com/in/bharath-hemachandran/)

# ü§ñ Phase 1: One Groq API Call (No MCP)

Learn how **parameters** change the model's output and how **token usage** is reported. Same encode ‚Üí vectors ‚Üí decode as Phase 0 happens on the server; here we focus on **temperature**, **top_p**, **max_output_tokens**, and **truncation**.

<div style="background: #e3f2fd; padding: 14px; border-radius: 8px; border-left: 4px solid #1976d2;">
<strong>üéØ What you'll do:</strong> Call the Groq Responses API once, tweak parameters, and see how output and <code>input_tokens</code> / <code>output_tokens</code> change.
</div>

### üìã Notebook objective (table of contents)

This notebook covers:
- **Setup** ‚Äî Install OpenAI client (Groq-compatible)
- **API key** ‚Äî Set GROQ_API_KEY for Colab/local
- **Parameters** ‚Äî temperature, top_p, max_output_tokens, truncation, instructions
- **API call** ‚Äî Single Groq Responses API request
- **Token usage** ‚Äî input_tokens, output_tokens, total_tokens (with a simple chart)
- **Output text** ‚Äî Model reply and raw output
- **Try it yourself** ‚Äî Suggestions to tweak parameters
- **Additional reading** ‚Äî Videos and blogs


## üîß Setup (run once)

Install **openai** (Groq is OpenAI-compatible). On Colab, run this cell first.

In [None]:
!pip install -q openai

### üîë Set your Groq API key

Get a free key at [console.groq.com](https://console.groq.com/keys). In Colab you can use **Secrets** or run the cell below and paste when prompted.

In [None]:
import os
from getpass import getpass

if not os.environ.get("GROQ_API_KEY"):
    os.environ["GROQ_API_KEY"] = getpass("Paste your GROQ_API_KEY: ")

from openai import OpenAI

def get_groq_client():
    return OpenAI(
        api_key=os.environ["GROQ_API_KEY"],
        base_url="https://api.groq.com/openai/v1",
    )

print("‚úÖ Groq client ready.")

## üéõÔ∏è Parameters you can change

<div style="background: #fff8e1; padding: 12px; border-radius: 8px;">
<strong>Sampling:</strong> <code>temperature</code> (0 = deterministic, 2 = very random), <code>top_p</code> (nucleus sampling).<br>
<strong>Length:</strong> <code>max_output_tokens</code> caps the reply length.<br>
<strong>Context:</strong> <code>truncation</code> = "auto" trims long inputs; "disabled" = no trim.<br>
<strong>Behavior:</strong> <code>instructions</code> (system) fixes tone or task.
</div>

In [None]:
client = get_groq_client()

TEMPERATURE = 0.7
TOP_P = 1.0
MAX_OUTPUT_TOKENS = 150
TRUNCATION = "disabled"
INSTRUCTIONS = None

kwargs = {
    "model": "llama-3.3-70b-versatile",
    "input": "In one sentence, what is the Model Context Protocol?",
    "temperature": TEMPERATURE,
    "top_p": TOP_P,
    "max_output_tokens": MAX_OUTPUT_TOKENS,
    "truncation": TRUNCATION,
}
if INSTRUCTIONS is not None:
    kwargs["instructions"] = INSTRUCTIONS

response = client.responses.create(**kwargs)

## üìä Token usage (tokenizing)

Input and output are counted in **tokens**, not characters‚Äîsame idea as Phase 0.

In [None]:
usage = getattr(response, "usage", None)
if usage:
    inp = getattr(usage, "input_tokens", 0)
    out = getattr(usage, "output_tokens", 0)
    tot = getattr(usage, "total_tokens", inp + out)
    print("üìà Token usage:")
    print(f"   input_tokens:  {inp}")
    print(f"   output_tokens: {out}")
    print(f"   total_tokens:  {tot}")
else:
    print("Usage not available.")

In [None]:
import matplotlib.pyplot as plt

if usage:
    inp = getattr(usage, "input_tokens", 0)
    out = getattr(usage, "output_tokens", 0)
    fig, ax = plt.subplots(figsize=(5, 3))
    ax.bar(["Input", "Output"], [inp, out], color=["#1976d2", "#388e3c"], alpha=0.8)
    ax.set_ylabel("Tokens")
    ax.set_title("üìä Input vs output tokens")
    plt.tight_layout()
    plt.show()

## üì§ Output text

In [None]:
print(response.output_text)
print("\n--- raw output (first 400 chars) ---")
print(str(response.output)[:400])

## ‚úèÔ∏è Try it yourself

<div style="background: #e8f5e9; padding: 12px; border-radius: 8px; border-left: 4px solid #4caf50;">
Change <code>TEMPERATURE</code> to 0 and run again for a deterministic reply. Set <code>MAX_OUTPUT_TOKENS</code> to 50 for a shorter answer. Add <code>INSTRUCTIONS = "Answer in one short sentence only."</code> to constrain the model.
</div>

In [None]:
print("‚úÖ Phase 1 complete. Next: Phase 2 (Groq + one MCP).")

## üìö Additional reading

**YouTube (verified)**  
- [Getting Started with Groq API](https://www.youtube.com/watch?v=S53BanCP14c) ‚Äî Near real-time LLM chat with Groq.  
- [Groq API in Python](https://www.youtube.com/watch?v=jScpBCBoGdU) ‚Äî Running generative AI with Groq (popular tutorial).

**Blogs (popular)**  
- [Groq API Reference](https://console.groq.com/docs) ‚Äî Official docs: models, parameters, token usage.  
- [Sampling: temperature, top-k, top-p](https://huyenchip.com/2024/01/16/sampling.html) ‚Äî Chip Huyen: generation configs explained.