# Hugging Face I

This guide shows how to call gpt-oss via Hugging Face Inference Providers (no local GPU). We’ll use the `huggingface_hub` client for chat completion requests.  

A model like gpt-oss is too big for my Macbook Pro, so we aren't installing it. Using Hugging Face Inference Providers means requests execute on hosted GPUs; expect shared rate limits and occasional cold starts.

If you later want to run models locally, you’ll need `transformers` plus a backend (e.g., PyTorch/MPS) and a much smaller model.

## Hugging Face Login

First, create a login on [https://huggingface.co/](https://huggingface.co/) for convenience. Create an API token with read permissions. I pasted my token in my `.env` file, adding a line `HF_TOKEN = 'key_here'` so I can use `os.getenv("HF_TOKEN")` instead of pasting the actual key into the program. 

## Install Python Libraries

Install these. The widget libraries might be needed if you are using jupyter. 

```bash
pip install -qU huggingface_hub ipywidgets>=8 jupyterlab_widgets>=3
```

## Notes

If you see a 429 or 503 error, you're being rate limited or the model is "cold starting." The former can happen if you modify this script to attempt many model requests. This use case will require more sophisticated retry patterns in the code.

In [61]:
import os

from huggingface_hub import InferenceClient
from huggingface_hub.errors import HfHubHTTPError

In [62]:
MODEL = "openai/gpt-oss-20b"
TOKEN = os.getenv("HF_TOKEN")  # or rely on `huggingface-cli login`
client = InferenceClient(MODEL, token=TOKEN, timeout=90)

# Sentiment Classification

In [63]:
system_prompt = "You are a sentiment classifier. Return ONLY 'POSITIVE' or 'NEGATIVE'."
user_prompt = "This sandwich is awesome."

msgs = [ {"role":"system","content": system_prompt},
        {"role":"user","content": user_prompt} ]

In [None]:
r = client.chat_completion(messages=msgs, max_tokens=64, temperature=0.0)

In [65]:
r.choices[0].message.content

'POSITIVE'

# Stance Classification

In [66]:
system_prompt = (
    "You will be given a message from a constituent to a legislator. "
    "Determine if the message implicitly endorses more or less government spending. "
    "Return ONLY one value from: "
    "['increase spending', 'decrease spending']." )

user_prompt = "We need more astronauts."

msgs = [ {"role":"system","content": system_prompt},
        {"role":"user","content": user_prompt} ]

In [67]:
r = client.chat_completion(messages=msgs, max_tokens=128, temperature=0.0)
r.choices[0].message.content

'increase spending'

# Topic Discovery

In [68]:
system_prompt = (
    "You analyze concerns voiced at a town hall meeting (short texts). "
    "Identify 1-2 topics (no predefined labels) that best capture the content. "
    "Return ONLY the topic names as a JSON array of strings."
)

responses = [
    "The bus line was cut and now I can't get to work on time.",
    "Potholes on Main Street have blown out two of my tires.",
    "The river is polluted.",
    "I don't like Facebook and I don't like the Internet."
]

user_prompt = (
    "Task: Discover topics in town-hall concerns and assign each response a mixture over the discovered topics (top-3 only).\n"
    "Responses (id: text):\n" +
    "\n".join(f"{i+1}: {txt}" for i, txt in enumerate(responses))
)

msgs = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt},
]

r = client.chat_completion(messages=msgs, max_tokens=2056, temperature=0.0)


In [69]:
print(r.choices[0].message.content)

```json
{
  "1": [
    {"topic":"Public transportation","prob":0.9},
    {"topic":"Road infrastructure","prob":0.05},
    {"topic":"Digital platforms","prob":0.05}
  ],
  "2": [
    {"topic":"Road infrastructure","prob":0.9},
    {"topic":"Public transportation","prob":0.05},
    {"topic":"Environmental pollution","prob":0.05}
  ],
  "3": [
    {"topic":"Environmental pollution","prob":0.9},
    {"topic":"Road infrastructure","prob":0.05},
    {"topic":"Public transportation","prob":0.05}
  ],
  "4": [
    {"topic":"Digital platforms","prob":0.9},
    {"topic":"Public transportation","prob":0.05},
    {"topic":"Road infrastructure","prob":0.05}
  ]
}
```


# Compare to Google Gemma

Google’s Gemma is a family of low-cost, open-weight large language models. 

In [None]:
MODEL = "google/gemma-2-2b-it"
# https://huggingface.co/google/gemma-2-2b-it

client_gemma = InferenceClient(MODEL, token=TOKEN, timeout=90)


In [71]:
# Re-use the prompt from above
r = client_gemma.chat_completion(messages=msgs, max_tokens=2056, temperature=0.0)


In [72]:
print(r.choices[0].message.content)

["Transportation", "Infrastructure", "Environmental"] 



In [None]:
# Increase the temperature and get nonsense (apologies for anything obscene)
r = client_gemma.chat_completion(messages=msgs, max_tokens=1024, temperature=1.77)
print(r.choices[0].message.content)

["infrastructure", "transportation", "environmental issues"] 思いさせて頂اگر

