<a href="https://colab.research.google.com/github/Topeemma/Hugging_Face-API/blob/main/HuggingFace_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Getting Started with the Hugging Face Hub (Inference API)**

* Heard of AI chat-based interfaces like **ChatGPT**, **Gemini**, **HuggingChat**?

**What exactly is a Transformer-based language model?**

These are large language models (LLMs) trained on massive datasets to understand and generate natural language. They are:

* **Generative** — able to produce text and other content.
* **Pre-trained** — trained in advance on large corpora.
* **Transformer-based** — built on the transformer architecture that converts input to context-aware output.

These models power many common NLP tasks: answering questions, summarizing content, translating languages, and generating human-like dialogue.

### **Adding Your API Key to Colab**

1. In Colab, click **🔑 Secrets** in the left panel.
2. Add a new secret with:

   * **Name**: `HF_TOKEN`
   * **Value**: *your API key*
3. Grant notebook access to that secret.

### **Loading the API Key in Your Code**

First, retrieve the key securely:




In [25]:
# Used to securely store your API key
from google.colab import userdata

In [26]:
HF_API_KEY=userdata.get('HF_TOKEN')

**Then pass it to the SDK:**

```python
import os
os.environ["HF_API_KEY"] = HF_API_KEY

from huggingface_hub import InferenceClient
client = InferenceClient(token=os.environ["HF_API_KEY"])
```

- <font color="red">Warning</font>: Ensure that there are no whitespaces in your API key.

## **Install the Hugging Face Hub SDK**

Now that your account and access token are ready, the next step is setting up your local environment. We’ll access models, datasets, and repos via the Hugging Face Hub Python library.

You can install it using pip using the command below:


In [27]:
!pip install huggingface_hub



## **Import packages**
Import the necessary packages.

In [28]:
import pathlib
import textwrap

import openai

from IPython.display import display
from IPython.display import Markdown


def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

In [29]:
import os
os.environ["HF_API_KEY"] = HF_API_KEY

### **Instantiate a client**

We will now create a client that can access various types of models globally. Please note that you should only provide an API key for authentication whenever you are initialising a client.

In [30]:
from huggingface_hub import InferenceClient
import os

# Create client
client = InferenceClient(token=os.environ["HF_API_KEY"])

prompt = "Write a short story about Vegetables."
messages = [{"role": "user", "content": prompt}]

response = client.chat_completion(
    messages=messages,
    temperature=0.7,
    model="mistralai/Mistral-7B-Instruct-v0.3",
    max_tokens=500
)

print(response.choices[0].message.content)

 Title: The Vibrant Symphony of Verdant Harmony

In a quaint, sun-kissed village nestled between rolling hills and a sparkling brook, there existed a peculiar community. This wasn't your ordinary village; it was inhabited by sentient vegetables, each with their own unique personalities, stories, and melodies.

At the heart of the village was a vibrant patch of soil, known as Harmony Field. It was here that the vegetables grew, nourished by the love and songs of the villagers who tended to them. Each vegetable had a distinctive voice, and when they sang together, it created a symphony that resonated throughout the valley.

One day, a small, timid lettuce named Leaflet arrived in the village. She was a stranger, having drifted downstream from a far-off land. The villagers, sensing her distress, welcomed her warmly, but Leaflet struggled to find her place. She had never heard the harmonious songs that filled the village, and her own voice was quiet and unassuming.

One day, after a partic

In [31]:
response = client.chat_completion(
    messages=messages,
    model="Qwen/Qwen2.5-7B-Instruct",
    max_tokens=500
)

print(response.choices[0].message.content)

In a small, bustling town nestled between rolling hills and a sparkling river, there lived a curious little carrot named Carrot. Carrot was not like the other vegetables in the garden. He was always eager to explore and learn about the world beyond the fence. His friends, the other vegetables, often teased him for his adventurous spirit, but Carrot didn't mind. He believed that every vegetable had a story to tell, and he was determined to find them all.

One sunny morning, as the dew still clung to the leaves, Carrot decided to venture out of the garden. He hopped over the fence and into the nearby meadow, where he met a friendly sunflower named Sunny. Sunny was tall and proud, with a warm smile that made Carrot feel at ease.

"Hello, Carrot! Where are you off to today?" Sunny asked, her petals rustling gently in the breeze.

"I'm on a quest to find the stories of all the vegetables in the world," Carrot replied excitedly. "I want to know what it's like to be a pepper, a potato, or eve

In [32]:
response = client.chat_completion(
    messages=messages,
    temperature=0.7,
    model="meta-llama/Llama-3.1-8B-Instruct",
    max_tokens=500
)

print(response.choices[0].message.content)

**The Great Harvest Festival**

In a lush green valley, surrounded by rolling hills and a sparkling river, the vegetables of the land gathered to celebrate the annual Harvest Festival. It was a time of great excitement and anticipation, as the vegetables had spent all year working tirelessly to grow and ripen in the warm sunshine.

At the center of the festival was a magnificent display of brightly colored vegetables, each one showcasing its unique beauty and charm. There was Ruby, the radiant red tomato, whose vibrant skin glowed like a sunset. Next to her stood Emerald, the emerald-green broccoli, whose tightly packed florets seemed to shimmer in the sunlight. Nearby, a cluster of golden-yellow corn stalks swayed gently in the breeze, their delicate kernels glistening with dew.

As the festival began, the vegetables took turns showcasing their skills and talents. Sammy the sweet potato danced a lively jig, his vibrant orange skin flashing with each step. Meanwhile, Lola the lettuce p

## **What models can be used with the Python SDK?**

Now you're ready to call models via the Hugging Face Inference API. Before generating responses, let’s explore the models available for use with the SDK.

For a more holistic view of available models, see the [Hugging Face Model Hub](https://huggingface.co/models).


In [33]:
from huggingface_hub import HfApi
import pandas as pd

# Initialize API
api = HfApi()

models = api.list_models(
    task="text-generation",
    library="transformers",
    sort="downloads",
    direction=-1,
    limit=50  # Get top 50
)

# Display as a list
model_list = []
for model in models:
    model_list.append({
        "Model ID": model.id,
        "Downloads": model.downloads if hasattr(model, 'downloads') else 0,
        "Likes": model.likes if hasattr(model, 'likes') else 0,
        "Tags": ", ".join(model.tags[:3]) if model.tags else ""
    })

# Show as DataFrame
df = pd.DataFrame(model_list)
df


Use `filter` instead.


Unnamed: 0,Model ID,Downloads,Likes,Tags
0,openai-community/gpt2,11286198,3003,"transformers, pytorch, tf"
1,Qwen/Qwen2.5-7B-Instruct,8037867,844,"transformers, safetensors, qwen2"
2,Qwen/Qwen3-0.6B,7285383,745,"transformers, safetensors, qwen3"
3,Gensyn/Qwen2.5-0.5B-Instruct,6438470,26,"transformers, safetensors, qwen2"
4,meta-llama/Llama-3.1-8B-Instruct,5260341,4844,"transformers, safetensors, llama"
5,openai/gpt-oss-20b,4751782,3824,"transformers, safetensors, gpt_oss"
6,dphn/dolphin-2.9.1-yi-1.5-34b,4724964,44,"transformers, safetensors, llama"
7,google/gemma-3-1b-it,4532452,677,"transformers, safetensors, gemma3_text"
8,Qwen/Qwen3-Embedding-0.6B,4342260,694,"sentence-transformers, safetensors, qwen3"
9,TinyLlama/TinyLlama-1.1B-Chat-v1.0,4329979,1437,"transformers, safetensors, llama"


### **Making Your First API Call**

In this section, you’ll learn how to use the Hugging Face Inference API to send requests. For text tasks, you’ll use endpoints designed for text generation or analysis. For image tasks, you’ll use models specifically for image generation or classification(which we would cover later in this course).

When interacting with chat-based models through Hugging Face’s `InferenceClient`, a common endpoint is the **chat completions** interface for instruction-following models.

#### **Chat Completions API Overview**

The chat-style API allows both single-turn and multi-turn interactions by processing a sequence of messages and generating coherent responses. It works well for both conversations and one-off queries.

#### **Input Structure & Parameters**

**A. Messages**
You send a list of messages structured like:

```python
[
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": "Tell me about Yoruba culture."}
]
```

Each message contains:

* **role**: Either `"system"`, `"user"`, or `"assistant"`
* **content**: The actual message string

**B. Max Tokens**
Controls how much text the model should generate.

```python
response = client.chat_completion(
    model="meta-llama/Llama-3-8B-Instruct",
    messages=messages,
    max_tokens=150
)
```

Tokens are subword units, not full words. For example, “unexpectedly” might tokenize to:
`["un", "expect", "ed", "ly"]`

Each model has its own token limit. For example, Llama 3 (8B) supports up to ~8,192 tokens (input + output combined).

**C. Temperature**
Controls the randomness of the output. A lower temperature leads to more predictable, deterministic text; a higher temperature increases creativity and variety but may also increase the risk of irrelevant or nonsensical responses. The value ranges between 0 and 2:

* `temperature = 0.0`: More deterministic
* `temperature = 2.0`: More diverse and creative

Example:

```python
response = client.chat_completion(
    messages=messages,
    temperature=0.7
)
```

Higher values produce more varied results, lower values are better for accuracy and repetition control.

In [34]:
# Define user Prompt
prompt="Recommend to me a very interesting movie."
# Creating a message as required by the API
messages = [{"role": "user", "content": prompt}]
completion = client.chat_completion(
    messages = messages,
    model="meta-llama/Llama-3.1-8B-Instruct",
    max_tokens=300,
    temperature=0.7,
)
print(completion)
Markdown(completion.choices[0].message.content)

ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='length', index=0, message=ChatCompletionOutputMessage(role='assistant', content='I\'d like to recommend a thought-provoking and visually stunning movie that has received critical acclaim: "Inception" (2010) directed by Christopher Nolan.\n\n**Movie Synopsis:**\n\n"Inception" is a sci-fi action film that delves into the concept of shared dreaming. The movie follows Cobb (played by Leonardo DiCaprio), a skilled thief who specializes in entering people\'s dreams and stealing their secrets. Cobb is tasked with performing a seemingly impossible task: planting an idea in someone\'s mind instead of stealing one. This requires him to lead a team of experts into a dream within a dream within a dream, where the boundaries between reality and fantasy are blurred.\n\n**Why it\'s interesting:**\n\n1. **Mind-bending plot:** The movie\'s complex storyline will keep you guessing and questioning what\'s real and what\'s a dream.\

I'd like to recommend a thought-provoking and visually stunning movie that has received critical acclaim: "Inception" (2010) directed by Christopher Nolan.

**Movie Synopsis:**

"Inception" is a sci-fi action film that delves into the concept of shared dreaming. The movie follows Cobb (played by Leonardo DiCaprio), a skilled thief who specializes in entering people's dreams and stealing their secrets. Cobb is tasked with performing a seemingly impossible task: planting an idea in someone's mind instead of stealing one. This requires him to lead a team of experts into a dream within a dream within a dream, where the boundaries between reality and fantasy are blurred.

**Why it's interesting:**

1. **Mind-bending plot:** The movie's complex storyline will keep you guessing and questioning what's real and what's a dream.
2. **Innovative action sequences:** The film's action scenes are expertly choreographed and visually stunning, with a blend of reality and fantasy.
3. **Philosophical themes:** "Inception" explores the nature of reality, identity, and the human mind, raising questions about the limits of human perception.
4. **Strong performances:** The all-star cast, including Leonardo DiCaprio, Joseph Gordon-Levitt, Ellen Page, and Tom Hardy, deliver memorable performances.

**Awards and accolades:**

"Inception" was nominated for eight Academy Awards and won four, including Best Cinematography and Best Sound Editing.

Next, we will now learn how to have a multi-turn conversation with this LLM. To do this, we will add the assistant's response to the previous conversation and also include the new prompt in the same message format. After that, we will provide a list of dictionaries to the chat completion function.

#### **Conversation Dynamics**
Conversations can vary in length from a single message to a series of exchanges. Typically, these interactions might start with a system message to guide the assistant's behavior, followed by a sequence of alternating messages between the user and the assistant.

#### **Roles Explained**
- **System**: The system message sets the initial tone or guidelines for the assistant’s behavior during the interaction. It can be used to imbue the assistant with a specific personality or to provide precise instructions on how it should conduct itself. While the system message is optional, its absence defaults the assistant's demeanor to that of a generally helpful nature, akin to starting with a message like "You are a helpful assistant."
  
- **User**: Messages from the user generally consist of queries or comments that prompt responses from the assistant. These are the driving force of the conversation, guiding the topics and flow of the dialogue.

- **Assistant**: This role involves messages generated in response to the user or system inputs. The assistant's messages can include responses based on previous interactions within the conversation. Alternatively, you can manually craft messages in this role to demonstrate preferred responses or to simulate typical interactions.

By understanding and effectively utilizing these roles, you can create nuanced and dynamic dialogues tailored to specific interaction scenarios or conversational needs.

In [35]:
# Multi-turn conversation with system message
response = client.chat_completion(
    model="mistralai/Mistral-7B-Instruct-v0.3",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
)

Markdown(response.choices[0].message.content)

 The 2020 World Series was played at Globe Life Field in Arlington, Texas. The series was played without fans in attendance due to the COVID-19 pandemic.

## Introduction to Gradio

[Gradio](https://www.gradio.app/docs) is an open‑source Python library that makes it easy to build a web‑based interface around any Python function, model, or API. With just a few lines of code you can wrap your machine‑learning model (or any processing function) into a usable UI and launch it locally or share it publicly. Gradio abstracts away the need for front‑end web development skills, enabling non‑technical users to interact with your model via browser input fields, sliders, image uploads, etc. Once your interface is running, you can also host it on platforms like Hugging Face Spaces so others can try it from anywhere.


In [36]:
!pip install gradio --quiet

In [37]:
# Import libraries
import gradio as gr
import os
"meta-llama/Llama-3.1-8B-Instruct"

'meta-llama/Llama-3.1-8B-Instruct'

In [51]:
# Memory storage (keeps system + all past user/assistant messages)
chat_history = [
    {"role": "system", "content": "You are a helpful assistant."}
]

# Optional: keep only this many past messages (after the system message).
# Set to None to keep everything (not recommended long-term).
MAX_HISTORY_MESSAGES = 20


In [52]:
def chat_with_model(prompt):
    try:
        global chat_history

        # Add the new user message to history
        chat_history.append({"role": "user", "content": prompt})



        #  Optionally trim history to keep only the most recent messages
        if MAX_HISTORY_MESSAGES is not None:
            system_msg = chat_history[0]
            tail = chat_history[1:]  # user+assistant messages
            tail = tail[-MAX_HISTORY_MESSAGES:]
            chat_history = [system_msg] + tail

        #  Prepare messages to send to the model (a copy of the history)
        messages = chat_history.copy()

        #  Call the chat API with the full messages list so the model 'remembers'
        response = client.chat_completion(
            model="meta-llama/Llama-3.1-8B-Instruct",
            messages=messages,
            max_tokens=500
        )

        #  Extract assistant text from the response (same as before)
        assistant_text = response.choices[0].message.content

        #  Append the assistant reply to history so future calls include it
        chat_history.append({"role": "assistant", "content": assistant_text})

        #  (optional) print debug info like you had before
        print(assistant_text)
        try:
            print(response.usage.prompt_tokens)
        except Exception:
            pass  # some clients/models may not include usage info

        return assistant_text

    except Exception as e:
        return f"Error: {str(e)}"


In [53]:
# Step 5: Create a Gradio interface
iface = gr.Interface(
    fn=chat_with_model,
    inputs=gr.Textbox(lines=2, placeholder="Ask me anything..."),
    outputs="text",
    title="Chat with AI Models",
    description="Ask the model any questions."
)

In [54]:
# Step 6: Launch the interface
iface.launch()

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://f9be0c6fe9a620490a.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




### **Assignment: Add Memory to Your Chatbot**

### Your Task
Modify the chatbot function above, so it remembers previous messages in the conversation.

### Current Problem
Right now, your chatbot forgets everything after each response. If you say "My name is Sarah" and then ask "What's my name?", it won't remember.

### What You Need to Do
Make your chatbot remember all previous messages and responses, so it can refer back to earlier parts of the conversation.

### Test Your Memory
Your chatbot should be able to handle this conversation:
```
User: "Hi, my name is Sarah and I love pizza."
Bot: [responds]
User: "What's my name?"
Bot: [should say "Sarah"]
User: "What do I love?"
Bot: [should say "pizza"]
```

### **Hint**

To give the chatbot a memory, you’ll need to keep *all* of the past messages (user + bot) in a list of dictionaries behind the scenes, and then include that history every time you make a new API call. Here’s step‑by‑step how you can do it:

1. At the top of your script (outside the function) create a variable, e.g.

   ```python
   chat_history = []
   ```

   This will hold the sequence of all messages.

2. Each time the user sends a prompt, add a dictionary representing the user message to `chat_history`, e.g.

   ```python
   chat_history.append({"role": "user", "content": prompt})
   ```

3. Then when you call the model, pass *all* of the previous messages + the current user message as the `messages` list. For example:

   ```python
   messages = chat_history.copy()
   messages.append({"role": "assistant", "content": ???})  # you’ll do this after you get the response
   ```

4. After the model returns a response, take the assistant’s reply content and append another dictionary into `chat_history`:

   ```python
   chat_history.append({"role": "assistant", "content": response_text})
   ```

5. That way, when you next ask “What’s my name?” or “What do I love?”, the history contains the earlier statement (“My name is Sarah and I love pizza.”) and then the question follows. The model sees the entire chain and can answer accordingly.

6. If you want, you can limit how many past messages you keep (for token/efficiency reasons) by slicing the list (e.g., `chat_history = chat_history[-10:]`).

By following those steps, your chatbot will “remember” previous messages in the conversation.