# **HuggingFace Inference Providers**

## **What's Covered?**
1. Inference Providers
    - What are Infererence Providers?
    - Example Usage - Inferencing Meta Llama Model with 
2. HuggingFace's Inference Providers
    - Why choose HF Inference Providers?
    - HF Inference Provider API
    - HF Inference Playground
3. Initial Set Up
    - Authentication
    - Installation
    - Syntax
    - Provider Selection Policy
4. Inferencing OpenAI's GPT OSS Model
5. Inferencing DeepSeek Model
6. Inferencing Text-to-Image Model

## **Inference Providers**

### **What are Infererence Providers?**
Inference Providers are AI-focused technology companies, specifically falling into the categories of **inference-as-a-service** providers. They allow developers to run, host, or deploy AI models without managing their own heavy infrastructure. Few popular inference providers are:
- Groq
- Replicate
- Novita AI
- Sambanova
- etc...

### **Example Usage - Inferencing Meta Llama Model with Groq**

In [None]:
# !pip install groq

In [None]:
from groq import Groq

client = Groq(api_key="ENTER_YOUR_API_KEY")

completion = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[
      {
        "role": "user",
        "content": "provide 3 generative ai mcq questions with answers?"
      }
    ],
    temperature=1,
    max_completion_tokens=500,
    top_p=1,
    stream=True,
)

for chunk in completion:
    print(chunk.choices[0].delta.content or "", end="")

## **HuggingFace's Inference Providers** 
Hugging Face’s Inference Providers give developers access to hundreds of machine learning models, powered by world-class inference providers. For this HuggingFace has provided a separate API.

### **Why choose HF Inference Providers?**
When you build AI applications, it’s tough to manage multiple provider APIs, comparing model performance, and dealing with varying reliability. Inference Providers solves these challenges by offering:
- **Instant Access to Cutting-Edge Models:** Go beyond mainstream providers to access thousands of specialized models across multiple AI tasks. Whether you need the latest language models, state-of-the-art image generators, or domain-specific embeddings, you’ll find them here.
- **Zero Vendor Lock-in:** Unlike being tied to a single provider’s model catalog, you get access to models from Cerebras, Groq, Together AI, Replicate, and more — all through one consistent interface.
- **Production-Ready Performance:** Built for enterprise workloads with the reliability your applications demand.
- **Get Started for Free:** Inference Providers includes a generous free tier, with additional credits for PRO users and Team & Enterprise organizations. Every Hugging Face user receives monthly credits to experiment with Inference Providers. **Free users get $0.10 monthly** credits **without needing to create provider account**, subject to change.
- **Cost-Effective:** No extra markup on provider rates

### **HF Inference Provider API**
The Inference Providers API acts as a unified proxy layer that sits between your application and multiple AI providers. When using Inference Providers, your requests go through Hugging Face’s proxy infrastructure, which provides several key benefits:
- **Unified Authentication & Billing:** Use a single Hugging Face token for all providers
- **Automatic Failover:** When using automatic provider selection (provider="auto"), requests are automatically routed to alternative providers if the primary provider is flagged as unavailable by our validation system
- **Consistent Interface through client libraries:** When using our client libraries, the same request format works across different providers

### **HF Inference Playground**
[Click here](https://huggingface.co/playground) to explore more.

## **Initial Set Up**

### **Authentication**
You’ll need a Hugging Face token to authenticate your requests. Create one by visiting your token settings and generating a fine-grained token with Make calls to Inference Providers permissions.

### **Installation**
For convenience, the `huggingface_hub` library provides an `InferenceClient` that automatically handles provider selection and request routing.
```
!pip install huggingface_hub
```

### **Syntax**
```python
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="groq",
    api_key="ENTER_YOUR_API_KEY_HERE",
)
```

### **Provider Selection Policy**
- "auto" (default): Selects the first available provider for the model, sorted by your preference order in Inference Provider settings
- "specific-provider": Forces use of a specific provider (e.g., “together”, “replicate”, “fal-ai”, …).
- "fastest" or "cheapest"

In [None]:
# !pip install huggingface_hub

## **Inferencing OpenAI's GPT OSS Model**

In [None]:
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="groq",
    api_key="ENTER_YOUR_API_KEY_HERE",
)

result = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[
        {
            "role": "user",
            "content": "provide 3 generative ai mcq questions with answers?"
        }
    ],
)

print(result.choices[0].message.content)

## **Inferencing DeepSeek Model**

In [None]:
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="novita",
    api_key="ENTER_YOUR_API_KEY_HERE"
)

result = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=[
        {
            "role": "user", 
            "content": "Hello!"
        }
    ],
)

print(result.choices[0].message.content)

## **Inferencing Text-to-Image Model**

In [None]:
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="auto",
    api_key="ENTER_YOUR_API_KEY_HERE"
)

image = client.text_to_image(
    prompt="A serene lake surrounded by mountains at sunset, photorealistic style",
    model="black-forest-labs/FLUX.1-dev"
)

# Save the generated image
image.save("generated_image.png")

image