# Prompting GPT-OSS & Getting Started

Welcome! This beginner-friendly notebook will help you:
- Understand what GPT-OSS is and why you might use it
- Learn basic prompt structure and formatting
- Try your first prompts interactively

Tip: You don't need any prior experience. Just run cells from top to bottom.

## What is GPT-OSS?
GPT-OSS stands for “GPT-style, Open-Source and Self-hostable” language models. Examples include Llama, Mistral, Phi, and others released under open licenses. They are designed to understand and generate text like proprietary GPT models but can be run locally or via open APIs.

Why use GPT-OSS?
- Openness: Inspect, fine-tune, and customize models.
- Control: Run locally for privacy, or choose your own hosting.
- Cost flexibility: Avoid vendor lock-in; use commodity hardware or competitive providers.
- Community: Rapid improvement, tools, and examples from open-source contributors.

Common ways to access GPT-OSS:
- Local runtimes (e.g., llama.cpp, ollama)
- Open model hubs and inference APIs (e.g., Hugging Face Inference Endpoints, vLLM servers)
- Managed OSS providers that host open models for you

## Basic Prompt Structure
Good prompts are clear, specific, and give the model the right context. A simple structure:
1) Role/Context: Who is the model and what is the scenario?
2) Task: What do you want exactly?
3) Constraints: Length, style, language, format.
4) Examples (optional): Show a small input → output example.
5) Inputs: The actual data to process.

Example prompt:
- Role: “You are a helpful study coach for beginners.”
- Task: “Explain gradient descent in simple terms.”
- Constraints: “Use a short paragraph and one bullet list. Avoid equations.”
- Example (optional): “When I ask about an algorithm, give steps I can follow.”
- Input: “Topic: gradient descent”

Formatting tips:
- Use headings or bullet points for clarity.
- Ask for a specific output format (e.g., JSON, list, steps).
- State what to avoid (e.g., no code, or no jargon).

## Prompt Patterns You Can Reuse
- Instruction: “Do X with Y.”
- Chain-of-thought hints (lightweight): “List the steps you would take, then provide the final answer.”
- Role-play: “Act as a mentor; ask me 3 clarifying questions first.”
- Constrained output: “Return JSON with keys: summary, steps, pitfalls.”
- Few-shot examples: Show 1–3 small examples to guide style and format.

Keep it concise. The clearer your prompt, the better the results.

## Getting Started: Choose an Access Method
Below are two simple ways to try GPT-OSS:
1) Local with Ollama (easy setup):
   - Install from https://ollama.com
   - Pull a model, e.g., `ollama pull llama3`
   - Run from terminal: `ollama run llama3`
   - Or use the Python client.
2) Hosted via Hugging Face Inference API:
   - Create an account, get an access token
   - Pick a text-generation model
   - Call the endpoint from Python.

We’ll show minimal Python examples for both.

In [None]:
# Option A: Using Ollama locally
# Prerequisites:
# - Install Ollama: https://ollama.com
# - Pull a model, e.g.: `ollama pull llama3`

import json, sys, subprocess, shutil

def ollama_available():
    return shutil.which("ollama") is not None

def ollama_chat(model: str, prompt: str) -> str:
    """Call a local Ollama model with a simple prompt."""
    if not ollama_available():
        raise RuntimeError("Ollama not found. Install from https://ollama.com and pull a model.")
    # Use 'ollama run' for a quick single-turn prompt
    proc = subprocess.run(
        ["ollama", "run", model], input=prompt.encode(), stdout=subprocess.PIPE, stderr=subprocess.PIPE
    )
    if proc.returncode != 0:
        raise RuntimeError(proc.stderr.decode(errors="ignore"))
    return proc.stdout.decode(errors="ignore").strip()

print("Ollama available:", ollama_available())

In [None]:
# Option B: Using Hugging Face Inference API (hosted)
# Prerequisites:
# - pip install requests
# - Set your token: export HF_TOKEN=your_token_here

import os, requests

HF_TOKEN = os.getenv("HF_TOKEN", "")

def hf_generate(model_id: str, prompt: str, max_new_tokens: int = 200):
    if not HF_TOKEN:
        raise RuntimeError("Set HF_TOKEN environment variable with your access token.")
    url = f"https://api-inference.huggingface.co/models/{model_id}"
    headers = {"Authorization": f"Bearer {HF_TOKEN}"}
    payload = {
        "inputs": prompt,
        "parameters": {"max_new_tokens": max_new_tokens}
    }
    r = requests.post(url, headers=headers, json=payload, timeout=60)
    r.raise_for_status()
    data = r.json()
    # Many models return a list of dicts with 'generated_text'
    if isinstance(data, list) and data and "generated_text" in data[0]:
        return data[0]["generated_text"]
    # Some endpoints return a dict with 'generated_text' or 'content'
    if isinstance(data, dict):
        return data.get("generated_text") or json.dumps(data, ensure_ascii=False)
    return json.dumps(data, ensure_ascii=False)

print("HF token set:", bool(HF_TOKEN))

## Your First Prompt (Interactive)
We’ll craft a simple, well-structured prompt and send it to a model. Choose one method below.

Prompt goal: Explain a concept clearly, with constraints and a simple format.

Try editing the prompt variables to see how style and clarity change the output.

In [None]:
# Define a clean, structured prompt
role = "You are a helpful study coach for beginners."
task = "Explain the concept in simple terms that a high-school student can understand."
constraints = (
    "Use: 1 short paragraph, then 3 bullet points. "
    "Avoid equations and heavy jargon. Keep it under 120 words."
)
concept = "What is overfitting in machine learning?"

prompt = f"""
{role}

Task: {task}
Constraints: {constraints}
Input: {concept}
Format:
- Paragraph
- 3 bullet points
""".strip()

print(prompt)

In [None]:
# Run with Ollama (local) OR Hugging Face (hosted)

USE_OLLAMA = False       # Set True if using Ollama locally
OLLAMA_MODEL = "llama3"  # Or another pulled model, e.g., "mistral"
HF_MODEL_ID = "mistralai/Mistral-7B-Instruct-v0.2"  # Change to any instruct-capable model

try:
    if USE_OLLAMA:
        response = ollama_chat(OLLAMA_MODEL, prompt)
    else:
        response = hf_generate(HF_MODEL_ID, prompt)
    print(response)
except Exception as e:
    print("Error:", e)
    print("Tip: Check your configuration (Ollama installed? HF token set? Correct model ID?).")

## Improving Your Prompts
Use this quick checklist:
- Goal: Did I state exactly what I want?
- Audience: Did I set the role and reading level?
- Constraints: Length, style, do/don’t.
- Format: Headings, bullets, or JSON?
- Examples: 1–2 small examples if the model struggles.
- Iteration: Tweak one element at a time and compare results.

Try modifying the prompt below to practice.

In [None]:
# Practice: Tweak constraints and concept
constraints = (
    "Explain in exactly 2 short paragraphs, then a numbered list of 3 tips. "
    "Use plain language, avoid buzzwords."
)
concept = "How do decision trees work?"

prompt2 = f"""
You are a clear, friendly tutor.
Task: Teach the concept to a curious beginner.
Constraints: {constraints}
Input: {concept}
Output format:
1) Paragraph
2) Paragraph
3) 3 numbered tips
""".strip()

print(prompt2)

# Run if you want:
# response2 = hf_generate(HF_MODEL_ID, prompt2)
# print(response2)

## Common Pitfalls (and Fixes)
- Vague requests → Be specific about output and constraints.
- Overly long prompts → Keep only necessary context.
- Missing format instructions → Specify bullets, JSON, or sections.
- One-shot and done → Iterate. Change one thing at a time.
- Wrong model type → Use an instruct-tuned model for following directions.

You’re ready to explore! Keep prompts short, structured, and test different phrasings.