# Getting Started with LiteLLM on MaaS Platform

This notebook demonstrates how to use the LLM models available on your MaaS platform through the LiteLLM proxy.

## Prerequisites
- Your notebook is connected to the MaaS platform
- You have an API key from the MaaS portal
- LiteLLM proxy is accessible at the configured endpoint

## 1. Setup and Configuration

In [None]:
import os
from openai import OpenAI

# The OPENAI_API_BASE is pre-configured to point to LiteLLM
# You just need to set your API key from the MaaS portal
print(f"LiteLLM Endpoint: {os.environ.get('OPENAI_API_BASE', 'Not configured')}")

# Set your API key here (get it from the MaaS portal -> API Keys)
# os.environ['OPENAI_API_KEY'] = 'sk-your-api-key-here'

# Initialize the OpenAI client with LiteLLM endpoint
client = OpenAI(
    base_url=os.environ.get('OPENAI_API_BASE', 'http://litellm.default.svc.cluster.local:4000'),
    api_key=os.environ.get('OPENAI_API_KEY', 'your-api-key')
)

## 2. List Available Models

In [None]:
# List all available models from LiteLLM
try:
    models = client.models.list()
    print("Available Models:")
    print("-" * 50)
    for model in models.data:
        print(f"  - {model.id}")
except Exception as e:
    print(f"Error listing models: {e}")
    print("Make sure your API key is set correctly!")

## 3. Basic Chat Completion

In [None]:
# Simple chat completion example
def chat(message, model="Qwen/Qwen2.5-7B-Instruct"):
    """Send a message to the LLM and get a response."""
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "user", "content": message}
        ],
        max_tokens=500,
        temperature=0.7
    )
    return response.choices[0].message.content

# Test with a simple question
response = chat("What is platform engineering? Explain in 2-3 sentences.")
print(response)

## 4. Streaming Responses

In [None]:
# Streaming chat completion for real-time output
def chat_stream(message, model="Qwen/Qwen2.5-7B-Instruct"):
    """Send a message and stream the response."""
    stream = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "user", "content": message}
        ],
        max_tokens=500,
        temperature=0.7,
        stream=True
    )
    
    for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
    print()  # New line at the end

# Test streaming
print("Streaming response:")
chat_stream("Write a haiku about Kubernetes.")

## 5. Multi-turn Conversation

In [None]:
# Multi-turn conversation with context
class Conversation:
    def __init__(self, model="Qwen/Qwen2.5-7B-Instruct", system_prompt=None):
        self.model = model
        self.messages = []
        if system_prompt:
            self.messages.append({"role": "system", "content": system_prompt})
    
    def chat(self, message):
        self.messages.append({"role": "user", "content": message})
        
        response = client.chat.completions.create(
            model=self.model,
            messages=self.messages,
            max_tokens=500,
            temperature=0.7
        )
        
        assistant_message = response.choices[0].message.content
        self.messages.append({"role": "assistant", "content": assistant_message})
        
        return assistant_message

# Create a conversation with a system prompt
conv = Conversation(system_prompt="You are a helpful DevOps assistant specializing in Kubernetes and cloud infrastructure.")

# First message
print("User: What is a Pod in Kubernetes?")
print(f"Assistant: {conv.chat('What is a Pod in Kubernetes?')}")
print()

# Follow-up question (context is maintained)
print("User: How is it different from a Deployment?")
print(f"Assistant: {conv.chat('How is it different from a Deployment?')}")

## 6. Code Generation

In [None]:
# Use the LLM to generate code
code_prompt = """
Write a Python function that:
1. Takes a list of numbers as input
2. Returns a dictionary with statistics: min, max, mean, and standard deviation
3. Include type hints and a docstring

Only output the code, no explanations.
"""

generated_code = chat(code_prompt)
print(generated_code)

In [None]:
# You can copy the generated code here and test it
# Example:
from typing import List, Dict
import statistics

def calculate_statistics(numbers: List[float]) -> Dict[str, float]:
    """Calculate basic statistics for a list of numbers."""
    return {
        "min": min(numbers),
        "max": max(numbers),
        "mean": statistics.mean(numbers),
        "std_dev": statistics.stdev(numbers) if len(numbers) > 1 else 0
    }

# Test it
test_numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(calculate_statistics(test_numbers))

## Next Steps

Now that you've learned the basics, you can:

1. **Use Jupyter AI Chat**: Click the chat icon in the sidebar to have a conversation with the AI
2. **Explore LangChain**: See the `02-langchain-workflows.ipynb` notebook for advanced workflows
3. **Build Applications**: Use these patterns to build your own AI-powered applications

### Tips
- Get your API key from the MaaS portal (API Keys tab)
- Check available models on the Models tab
- Monitor your usage on the Subscriptions tab