<div align="center">
<img src="https://poorit.in/image.png" alt="Poorit" width="40" style="vertical-align: middle;"> <b>AI SYSTEMS ENGINEERING 1</b>

## Unit 2: Beyond Text - Images, Audio, and Multi-Model Access

**CV Raman Global University, Bhubaneswar**  
*AI Center of Excellence*

</div>

---

### What You'll Learn

In this notebook, you will:

1. **Use LiteLLM** to access multiple LLM providers with one interface
2. **Generate images** from text prompts using gpt-image-1-mini
3. **Create audio with text-to-speech** using OpenAI's TTS
4. **Practice** with hands-on exercises

**Duration:** ~45 minutes

---

## 1. Environment Setup

In [None]:
# Install required packages
!pip install -q litellm pillow

In [None]:
import os
import base64
from io import BytesIO
from getpass import getpass
from litellm import completion, image_generation, speech
from PIL import Image
from IPython.display import Audio, display

In [None]:
# Configure API key
os.environ['OPENAI_API_KEY'] = getpass("Enter your OpenAI API Key: ")

---

## 2. One Interface for Many Models -- LiteLLM

So far we've used the OpenAI Python library directly. But what if you want to switch to Google Gemini, Anthropic Claude, or a local model?

**LiteLLM** gives you a single `completion()` function that works with 100+ providers. You just change the model name -- the rest of your code stays the same.

| Provider | Model Name in LiteLLM | API Key Env Variable |
|----------|----------------------|---------------------|
| OpenAI | `openai/gpt-4o-mini` | `OPENAI_API_KEY` |
| Google | `gemini/gemini-2.0-flash` | `GOOGLE_API_KEY` |
| Anthropic | `anthropic/claude-sonnet-4-5-20250929` | `ANTHROPIC_API_KEY` |
| Local (Ollama) | `ollama/llama3.2` | -- |

Let's try it with OpenAI first, then with Gemini:

In [None]:
# LiteLLM uses the same messages format you already know
response = completion(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Tell me a fun fact about India in one sentence."}]
)

print(response.choices[0].message.content)
print(f"\nTokens used: {response.usage.total_tokens}")

In [None]:
# Set up Google API key for Gemini
os.environ['GOOGLE_API_KEY'] = getpass("Enter your Google API Key: ")

In [None]:
# Same code pattern, different provider — just change the model string
response = completion(
    model="gemini/gemini-2.0-flash",
    messages=[{"role": "user", "content": "Tell me a fun fact about India in one sentence."}]
)

print(response.choices[0].message.content)
print(f"\nTokens used: {response.usage.total_tokens}")

> **Key idea:** You just called two different providers (OpenAI and Google Gemini) with the exact same code pattern. The only change was the `model` string. That's the power of LiteLLM — one interface for 100+ models. And it's not just chat — LiteLLM also provides `image_generation()` for image models and `speech()` for TTS, so your entire AI toolkit stays unified.

---

## 3. Image Generation with LiteLLM

LiteLLM's `image_generation()` function wraps image models like `gpt-image-1-mini` with the same unified interface. We send a prompt, and get back an image.

**Cost:** ~$0.005 per image (1024x1024, low quality)

In [None]:
# Generate an image
response = image_generation(
    model="gpt-image-1-mini",
    prompt="The Taj Mahal at sunset with birds flying, vibrant colors",
    quality="low"
)

image = Image.open(BytesIO(base64.b64decode(response.data[0].b64_json)))
display(image)

---

## 4. Text-to-Speech with LiteLLM

LiteLLM's `speech()` function wraps TTS models like `gpt-4o-mini-tts-2025-03-20`. You choose a voice and send the text.

**Available voices:** alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse  
**Cost:** ~$0.016 per 1,000 characters

In [None]:
# Generate and play speech
audio = speech(
    model="openai/gpt-4o-mini-tts-2025-03-20",
    voice="alloy",
    input="Welcome to the AI Systems Engineering course at CV Raman University!"
)

audio.stream_to_file("welcome.mp3")
Audio("welcome.mp3")

---

## 5. Cost Awareness

AI APIs charge per use. Here's a quick reference so you can estimate costs before running code:

| Feature | Approximate Cost | Example |
|---------|------------------|--------|
| GPT-4o-mini (text) | ~$0.001 per request | Chat response |
| Gemini Flash | Free tier available | Chat response |
| gpt-image-1-mini (low) | ~$0.005 per image | One generated image |
| gpt-4o-mini-tts-2025-03-20 | ~$0.016 per 1K chars | ~1 paragraph of audio |

> **Tip:** During development and learning, use the cheapest models (GPT-4o-mini, Gemini Flash, gpt-image-1-mini with low quality). Save expensive calls (GPT-4o, high-quality images) for when you really need them.

---

## 6. Exercises

### Exercise 1: Compare Models on the Same Prompt

Send the same prompt to both OpenAI and Gemini using LiteLLM, and compare their responses.

**Steps:**
1. Pick a prompt (e.g., "Explain gravity to a 5-year-old in 2 sentences")
2. Call `completion()` with `openai/gpt-4o-mini` and `gemini/gemini-2.0-flash`
3. Print both responses and compare them — which do you prefer?

In [None]:
# Exercise 1: Compare responses from two models
# Try your own prompt — replace the example below

# prompt = "Explain gravity to a 5-year-old in 2 sentences."
# models = ["openai/gpt-4o-mini", "gemini/gemini-2.0-flash"]

# for model in models:
#     response = completion(model=model, messages=[{"role": "user", "content": prompt}])
#     print(f"--- {model} ---")
#     print(response.choices[0].message.content)
#     print()

### Exercise 2: Generate an Image

Use `image_generation()` from Section 3 to create an image of your choice.

**Steps:**
1. Write a descriptive prompt (be specific -- colors, style, scene)
2. Call `image_generation()` with your prompt
3. Decode and display the result

In [None]:
# Exercise 2: Generate an image with your own prompt
# Try describing a scene, place, or object in detail

# response = image_generation(
#     model="gpt-image-1-mini",
#     prompt="your prompt here",
#     quality="low"
# )
# image = Image.open(BytesIO(base64.b64decode(response.data[0].b64_json)))
# display(image)

---

## Key Takeaways

1. **LiteLLM provides a unified interface** -- `completion()` for chat, `image_generation()` for images, and `speech()` for TTS — one library for 100+ models across providers

2. **Generate images from text** -- call `image_generation()` with a model and descriptive prompt

3. **Create natural audio** -- call `speech()` to convert any text to speech

4. **Always be cost-aware** -- know the price of each API call before running it

### What's Next?

You've completed Unit 2! In Unit 3, we'll explore:
- Open-source models on Hugging Face
- Model fine-tuning basics
- Evaluation and benchmarking

---

## Additional Resources

- [LiteLLM Documentation](https://docs.litellm.ai/)
- [Image Generation Guide](https://platform.openai.com/docs/guides/image-generation)
- [Text-to-Speech Guide](https://platform.openai.com/docs/guides/text-to-speech)

---

**Course Information:**
- **Institution:** CV Raman Global University, Bhubaneswar
- **Program:** AI Center of Excellence
- **Course:** AI Systems Engineering 1
- **Developed by:** [Poorit Technologies](https://poorit.in) - *Transform Graduates into Industry-Ready Professionals*

---