<!-- 
Generated by ALAIN (Applied Learning AI Notebooks) on 2025-09-14
Teacher Model: GPT-OSS-20B
Provider: poe
Target Model: gpt-oss-20b
Learn more: https://github.com/daniel-p-green/alain-ai-learning-platform/
-->


# TinyLlama GGUF: Your First Small Language Model

Learn how to use TinyLlama, a tiny but powerful language model in GGUF format. Perfect for beginners who want to run AI on their own computer!

> Provider: `poe`  •  Model: `gpt-oss-20b`
Runtime: poe

This notebook was generated by ALAIN. It calls AI models via OpenAI-compatible APIs (no arbitrary code).

---

### Reproducibility Tips
- Avoid network access in core cells.
- Seed randomness where applicable (e.g., numpy, random).
- Pin package versions in your own environment if needed.
- Set `OPENAI_BASE_URL` and `OPENAI_API_KEY` via env (or Colab userdata).
- Widgets optional: text-based MCQs are provided if widgets are unavailable.


In [None]:
# Create requirements.txt for reproducible environment
%%writefile requirements.txt
openai>=1.34.0
ipywidgets>=8.0.0
requests>=2.31.0
python-dotenv>=1.0.0
numpy>=1.24.0
pandas>=2.0.0


In [None]:
# Create .env.example template
%%writefile .env.example
# Copy this file to .env and fill in your actual values
OPENAI_API_KEY=your_api_key_here
OPENAI_BASE_URL=https://api.poe.com/v1
POE_API_KEY=your_poe_key_here
# For local models (LM Studio/Ollama), any non-empty string works for API_KEY


In [None]:
# Install dependencies from requirements.txt
!pip install -q -r requirements.txt
print('✅ Dependencies installed successfully')


In [None]:
# Configure OpenAI-compatible client
import os
from dotenv import load_dotenv
from getpass import getpass
# Load environment variables from .env file if it exists
load_dotenv()
# Try to read secrets from Colab userdata if available
try:
  from google.colab import userdata  # type: ignore
  _poe = userdata.get('POE_API_KEY')
  _openai = userdata.get('OPENAI_API_KEY')
except Exception:
  _poe = None; _openai = None
PROVIDER = "poe"  # "poe" or "openai-compatible"
os.environ.setdefault("OPENAI_BASE_URL", "https://api.poe.com/v1")
# Set your API key. For Poe, set POE_API_KEY; for local (LM Studio/Ollama) any non-empty string works.
os.environ.setdefault("OPENAI_API_KEY", _poe or _openai or os.getenv("POE_API_KEY") or os.getenv("OPENAI_API_KEY") or "")
# Local-friendly defaults to avoid prompting beginners
if (PROVIDER == 'openai-compatible') and not os.environ.get('OPENAI_API_KEY'):
  base = os.environ.get('OPENAI_BASE_URL','')
  if 'localhost:1234' in base or '127.0.0.1:1234' in base:
    os.environ['OPENAI_API_KEY'] = 'lm-studio'
  elif 'localhost:11434' in base or '127.0.0.1:11434' in base:
    os.environ['OPENAI_API_KEY'] = 'ollama'
# Fallback interactive prompt if still missing
if not os.environ.get('OPENAI_API_KEY'):
  os.environ['OPENAI_API_KEY'] = getpass('Enter API key (input hidden): ')
# OPENAI_BASE_URL and OPENAI_API_KEY environment variables are set above


In [None]:
# Pre-flight check: verify API connectivity
from openai import OpenAI
import os, sys
base = os.environ.get('OPENAI_BASE_URL')
key = os.environ.get('OPENAI_API_KEY')
if not base or not key:
    print('❌ Missing OPENAI_BASE_URL or OPENAI_API_KEY. Set them above.')
else:
    try:
        client = OpenAI(base_url=base, api_key=key)
        # lightweight call: list models or small completion
        ok = False
        try:
            _ = client.models.list()
            ok = True
        except Exception:
            # Fallback to a 1-token chat call
            _ = client.chat.completions.create(model="${meta.model}", messages=[{"role":"user","content":"ping"}], max_tokens=1)
            ok = True
        if ok:
            print('✅ API key is working and connected to provider.')
    except Exception as e:
        print('❌ Connection failed. Please check your API key and base URL.\n', e)


In [None]:
# Quick smoke test
from openai import OpenAI
import os
base = os.environ.get('OPENAI_BASE_URL')
key = os.environ.get('OPENAI_API_KEY')
assert base and key, 'Please set OPENAI_BASE_URL and OPENAI_API_KEY env vars'
client = OpenAI(base_url=base, api_key=key)
resp = client.chat.completions.create(model="gpt-oss-20b", messages=[{"role":"user","content":"Hello from ALAIN"}], max_tokens=32)
print(resp.choices[0].message.content)


## Step 1: What is TinyLlama GGUF?

TinyLlama is like having a mini AI assistant that can run on your laptop! GGUF is a special file format that makes the model smaller and faster - think of it like compressing a huge movie file so it fits on your phone. This model has 1.1 billion parameters (that's like 1.1 billion tiny brain connections) but only takes about 1GB of space.


In [None]:
# Run the step prompt using the configured provider
PROMPT = """
# Let's check if we have the requirements
import os
print('Python version:', os.sys.version)
print('Ready to download TinyLlama!')
"""
from openai import OpenAI
import os
client = OpenAI(base_url=os.environ['OPENAI_BASE_URL'], api_key=os.environ['OPENAI_API_KEY'])
resp = client.chat.completions.create(model="gpt-oss-20b", messages=[{"role":"user","content":PROMPT}], temperature=0.7, max_tokens=400)
print(resp.choices[0].message.content)


In [None]:
# Assessment for Step 1
question = "What does GGUF format do for AI models?"
options = ["Makes models smaller and faster to load","Makes models bigger and more accurate","Changes the model's personality","Converts text to images"]
correct_index = 0
print('Q:', question)
for i, o in enumerate(options):
    print(f"{i}. {o}")
choice = 0  # <- change this to your answer index
print('Correct!' if choice == correct_index else 'Incorrect')
print('Explanation:', "GGUF format compresses AI models to make them smaller and faster to load, just like how ZIP files make regular files smaller. This lets you run powerful AI models on regular computers instead of needing expensive servers.")


In [None]:
# Interactive quiz for Step 1
import ipywidgets as widgets
from IPython.display import display, Markdown
q = "What does GGUF format do for AI models?"
opts = ["Makes models smaller and faster to load","Makes models bigger and more accurate","Changes the model's personality","Converts text to images"]
correct = 0
rb = widgets.RadioButtons(options=[(o, i) for i, o in enumerate(opts)], description='', disabled=False)
btn = widgets.Button(description='Submit Answer')
out = widgets.Output()
def on_click(b):
  with out:
    out.clear_output()
    sel = rb.value if hasattr(rb, 'value') else 0
    if sel == correct:
      display(Markdown('**Correct!**' + ' — ' + "GGUF format compresses AI models to make them smaller and faster to load, just like how ZIP files make regular files smaller. This lets you run powerful AI models on regular computers instead of needing expensive servers."))
    else:
      display(Markdown('Incorrect, please try again.'))
btn.on_click(on_click)
display(Markdown(f"### {q}"))
display(rb, btn, out)


In [None]:
# Assessment for Step 1
question = "How many parameters does TinyLlama have?"
options = ["1.1 million parameters","1.1 billion parameters","1.1 trillion parameters","1.1 thousand parameters"]
correct_index = 1
print('Q:', question)
for i, o in enumerate(options):
    print(f"{i}. {o}")
choice = 0  # <- change this to your answer index
print('Correct!' if choice == correct_index else 'Incorrect')
print('Explanation:', "TinyLlama has 1.1 billion parameters. Think of parameters like brain connections - more connections usually mean smarter responses, but TinyLlama proves that even 'small' models with 1.1 billion connections can be very useful!")


In [None]:
# Interactive quiz for Step 1
import ipywidgets as widgets
from IPython.display import display, Markdown
q = "How many parameters does TinyLlama have?"
opts = ["1.1 million parameters","1.1 billion parameters","1.1 trillion parameters","1.1 thousand parameters"]
correct = 1
rb = widgets.RadioButtons(options=[(o, i) for i, o in enumerate(opts)], description='', disabled=False)
btn = widgets.Button(description='Submit Answer')
out = widgets.Output()
def on_click(b):
  with out:
    out.clear_output()
    sel = rb.value if hasattr(rb, 'value') else 0
    if sel == correct:
      display(Markdown('**Correct!**' + ' — ' + "TinyLlama has 1.1 billion parameters. Think of parameters like brain connections - more connections usually mean smarter responses, but TinyLlama proves that even 'small' models with 1.1 billion connections can be very useful!"))
    else:
      display(Markdown('Incorrect, please try again.'))
btn.on_click(on_click)
display(Markdown(f"### {q}"))
display(rb, btn, out)


## Step 2: Installing Requirements

Before we can use TinyLlama, we need to install some helper tools. Think of these like apps you need on your phone before you can use other apps. We'll use llama-cpp-python which is like a translator that helps Python talk to our AI model.


In [None]:
# Run the step prompt using the configured provider
PROMPT = """
# Install the required packages
!pip install llama-cpp-python requests
print('✅ Installation complete!')
"""
from openai import OpenAI
import os
client = OpenAI(base_url=os.environ['OPENAI_BASE_URL'], api_key=os.environ['OPENAI_API_KEY'])
resp = client.chat.completions.create(model="gpt-oss-20b", messages=[{"role":"user","content":PROMPT}], temperature=0.7, max_tokens=400)
print(resp.choices[0].message.content)


In [None]:
# Assessment for Step 2
question = "What is llama-cpp-python used for?"
options = ["Creating new AI models from scratch","Converting text to speech","Helping Python communicate with GGUF models","Training models on your data"]
correct_index = 2
print('Q:', question)
for i, o in enumerate(options):
    print(f"{i}. {o}")
choice = 0  # <- change this to your answer index
print('Correct!' if choice == correct_index else 'Incorrect')
print('Explanation:', "llama-cpp-python is like a translator that helps Python programs talk to AI models in GGUF format. Without it, Python wouldn't know how to load and use these special model files.")


In [None]:
# Interactive quiz for Step 2
import ipywidgets as widgets
from IPython.display import display, Markdown
q = "What is llama-cpp-python used for?"
opts = ["Creating new AI models from scratch","Converting text to speech","Helping Python communicate with GGUF models","Training models on your data"]
correct = 2
rb = widgets.RadioButtons(options=[(o, i) for i, o in enumerate(opts)], description='', disabled=False)
btn = widgets.Button(description='Submit Answer')
out = widgets.Output()
def on_click(b):
  with out:
    out.clear_output()
    sel = rb.value if hasattr(rb, 'value') else 0
    if sel == correct:
      display(Markdown('**Correct!**' + ' — ' + "llama-cpp-python is like a translator that helps Python programs talk to AI models in GGUF format. Without it, Python wouldn't know how to load and use these special model files."))
    else:
      display(Markdown('Incorrect, please try again.'))
btn.on_click(on_click)
display(Markdown(f"### {q}"))
display(rb, btn, out)


In [None]:
# Assessment for Step 2
question = "What happens when you set temperature=0.7 in the model?"
options = ["The model gets physically warmer","The model responds more creatively and randomly","The model responds more predictably and safely","The model runs 70% faster"]
correct_index = 1
print('Q:', question)
for i, o in enumerate(options):
    print(f"{i}. {o}")
choice = 0  # <- change this to your answer index
print('Correct!' if choice == correct_index else 'Incorrect')
print('Explanation:', "Temperature controls creativity! Higher temperature (like 0.7) makes the AI more creative and unpredictable, like a creative writer. Lower temperature (like 0.1) makes it more focused and predictable, like a careful scientist.")


In [None]:
# Interactive quiz for Step 2
import ipywidgets as widgets
from IPython.display import display, Markdown
q = "What happens when you set temperature=0.7 in the model?"
opts = ["The model gets physically warmer","The model responds more creatively and randomly","The model responds more predictably and safely","The model runs 70% faster"]
correct = 1
rb = widgets.RadioButtons(options=[(o, i) for i, o in enumerate(opts)], description='', disabled=False)
btn = widgets.Button(description='Submit Answer')
out = widgets.Output()
def on_click(b):
  with out:
    out.clear_output()
    sel = rb.value if hasattr(rb, 'value') else 0
    if sel == correct:
      display(Markdown('**Correct!**' + ' — ' + "Temperature controls creativity! Higher temperature (like 0.7) makes the AI more creative and unpredictable, like a creative writer. Lower temperature (like 0.1) makes it more focused and predictable, like a careful scientist."))
    else:
      display(Markdown('Incorrect, please try again.'))
btn.on_click(on_click)
display(Markdown(f"### {q}"))
display(rb, btn, out)


## Step 3: Downloading TinyLlama

Now we'll download the actual AI model. It's like downloading a smart app that can chat with you. The file will be about 1GB, so it might take a few minutes depending on your internet speed.


In [None]:
# Run the step prompt using the configured provider
PROMPT = """
import requests
from pathlib import Path

# Download TinyLlama GGUF model
url = 'https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf'
filename = 'tinyllama.gguf'

if not Path(filename).exists():
    print('Downloading TinyLlama...')
    response = requests.get(url)
    with open(filename, 'wb') as f:
        f.write(response.content)
    print('✅ Download complete!')
else:
    print('✅ TinyLlama already downloaded!')
"""
from openai import OpenAI
import os
client = OpenAI(base_url=os.environ['OPENAI_BASE_URL'], api_key=os.environ['OPENAI_API_KEY'])
resp = client.chat.completions.create(model="gpt-oss-20b", messages=[{"role":"user","content":PROMPT}], temperature=0.7, max_tokens=400)
print(resp.choices[0].message.content)


## Step 4: Your First Chat with TinyLlama

Time for the fun part! We'll load our AI model and have a conversation. It's like waking up your AI assistant and asking it questions. The model will generate responses based on what it learned during training.


In [None]:
# Run the step prompt using the configured provider
PROMPT = """
from llama_cpp import Llama

# Load the model
llm = Llama(model_path='tinyllama.gguf', n_ctx=2048)

# Have a conversation
prompt = 'Hello! Can you tell me a joke?'
response = llm(prompt, max_tokens=100, temperature=0.7)

print('You:', prompt)
print('TinyLlama:', response['choices'][0]['text'])
"""
from openai import OpenAI
import os
client = OpenAI(base_url=os.environ['OPENAI_BASE_URL'], api_key=os.environ['OPENAI_API_KEY'])
resp = client.chat.completions.create(model="gpt-oss-20b", messages=[{"role":"user","content":PROMPT}], temperature=0.7, max_tokens=400)
print(resp.choices[0].message.content)
