# Prompting GPT-OSS & Getting Started

A quick guide for beginners on how to talk to the open‑source GPT models (GPT‑OSS).

## What is GPT‑OSS and why use it?

- **GPT‑OSS** stands for *Open‑Source Generative Pre‑trained Transformers* – community‑maintained models that behave like OpenAI’s ChatGPT but can be run locally or on any cloud.
- **Benefits**:
  - No vendor lock‑in; you own the weights.
  - Full control over privacy and data.
  - Often cheaper at scale because you pay only for compute.
  - Ability to fine‑tune or extend the model for domain‑specific tasks.

These models are ideal for developers, researchers, and hobbyists who want a powerful LLM without the restrictions of commercial APIs.

## Basic prompt structure and formatting

1. **System message** – sets the overall behavior (e.g., "You are a helpful assistant.")
2. **User message** – the actual question or instruction.
3. **Assistant response** – what the model returns.

### Example format (JSON for many APIs)
```json
{
  "messages": [
    {"role": "system", "content": "You are a concise tech writer."},
    {"role": "user",   "content": "Explain GPT‑OSS in two sentences."}
  ]
}
```

**Tips**:
- Keep instructions clear and short.
- Use bullet points or numbered lists for multi‑step tasks.
- Add delimiters (e.g., triple backticks) when you want the model to treat text as code or data.

## Getting started with your first prompts

Below is a minimal Python snippet that loads a GPT‑OSS model with the `transformers` library and runs a prompt.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load a lightweight open‑source model (e.g., Llama‑2‑7B‑Chat)
model_name = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype="auto"
)

def ask_gpt_oss(prompt, system="You are a helpful assistant."):
    messages = [
        {"role": "system", "content": system},
        {"role": "user",   "content": prompt}
    ]
    # Convert messages to a single string the model expects
    full_prompt = "\n\n".join([m["content"] for m in messages])
    inputs = tokenizer(full_prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_new_tokens=150, do_sample=True)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Remove the original prompt from the output
    return response[len(full_prompt):].strip()

# Example usage
print(ask_gpt_oss("Explain the benefits of using GPT‑OSS.") )
```

Run the cell, and you should see a short answer generated by the model. Adjust `max_new_tokens`, `temperature`, or the system message to experiment with different behaviours.