# Local LLM SDK - Hello World

## Overview
This notebook demonstrates the simplicity and power of the `local_llm_sdk` package for interacting with LM Studio and other OpenAI-compatible local LLM servers.

## Prerequisites
- LM Studio (or compatible server) running locally with API enabled
- The `local_llm_sdk` package installed (`pip install -e ..` from notebooks directory)

## What You'll Learn
- How to create a client with one line of code
- Simple chat interactions
- Advanced features with type safety
- Conversation history management
- Embeddings generation

In [None]:
# If the package is not installed, run this cell first (uncomment the line below)
!pip install -e ..

In [9]:
# Import the LocalLLMClient from our SDK
from local_llm_sdk import LocalLLMClient, create_client

# That's it! We're ready to go

## 2. Create the Client

One line to create a fully-featured, type-safe client!

In [10]:
# Create a client with your LM Studio URL
client = LocalLLMClient(
    base_url="http://169.254.83.107:1234/v1",
    model="mistralai/magistral-small-2509"  # Default model to use
)

print(f"✅ Client ready: {client}")

✅ Client ready: LocalLLMClient(base_url='http://169.254.83.107:1234/v1', model='mistralai/magistral-small-2509', tools=0)


## 3. List Available Models

The client handles all the HTTP details for you.

In [12]:
# List models with one method call - returns typed ModelList object
models = client.list_models()

print("📦 Available models in local server:")
print("-" * 40)
for model in models.data:
    print(f"• {model.id} (by {model.owned_by})")
    
print(f"\n✨ Total models: {len(models.data)}")

📦 Available models in local server:
----------------------------------------
• mistralai/magistral-small-2509 (by organization_owner)
• qwen/qwen3-coder-30b (by organization_owner)
• text-embedding-nomic-embed-text-v1.5 (by organization_owner)
• smolvlm2-2.2b-instruct (by organization_owner)
• google/gemma-3-27b (by organization_owner)
• text-embedding-mxbai-embed-large-v1 (by organization_owner)

✨ Total models: 6


In [14]:
# Simple chat - just pass a string!
response = client.chat("Reply to my hello world in a funny way!")

print("🤖 Model Response:")
print("-" * 40)
print(response)

🤖 Model Response:
----------------------------------------
Hello, World! It seems like we're in the middle of a code that's about to output some serious laughter! 😄✨


## 4. Simple Chat - The Easy Way

Just pass a string and get a response. The client handles everything!

In [15]:
# Import helper for creating messages
from local_llm_sdk import create_chat_message

# Create a conversation with proper messages
messages = [
    create_chat_message("system", "You are a helpful and creative assistant."),
    create_chat_message("user", "Write a haiku about Python programming.")
]

# Get full ChatCompletion object with all metadata
full_response = client.chat(messages, temperature=0.7)

# Access typed response data
print("📝 Haiku:")
print("-" * 40)
print(full_response.choices[0].message.content)
print("\n📊 Metadata:")
print(f"• Model: {full_response.model}")
print(f"• Tokens used: {full_response.usage.total_tokens}")
print(f"• Finish reason: {full_response.choices[0].finish_reason}")

📝 Haiku:
----------------------------------------


AttributeError: 'str' object has no attribute 'choices'

## 6. Conversation with History

The client can manage conversation context for you!

In [None]:
# Initialize conversation history
history = []

# First message
response1, history = client.chat_with_history("What's the capital of France?", history)
print("Q1: What's the capital of France?")
print(f"A1: {response1}\n")

# Follow-up that uses context
response2, history = client.chat_with_history("What's its population?", history)
print("Q2: What's its population?")
print(f"A2: {response2}\n")

# Another follow-up
response3, history = client.chat_with_history("Name 3 famous landmarks there", history)
print("Q3: Name 3 famous landmarks there")
print(f"A3: {response3}\n")

print(f"📚 Conversation history now has {len(history)} messages")

## 7. Generate Embeddings

Create vector embeddings for semantic search and similarity.

In [None]:
# Generate embeddings for text (requires an embedding model to be loaded)
try:
    # Single text
    embeddings = client.embeddings("Hello, world!")
    
    print(f"📐 Embedding generated:")
    print(f"• Dimension: {len(embeddings.data[0].embedding)}")
    print(f"• Model used: {embeddings.model}")
    print(f"• First 5 values: {embeddings.data[0].embedding[:5]}")
    
    # Multiple texts at once
    texts = ["Python is great", "I love programming", "Hello world"]
    multi_embeddings = client.embeddings(texts)
    print(f"\n📐 Generated {len(multi_embeddings.data)} embeddings")
    
except Exception as e:
    print(f"⚠️ Embeddings require an embedding model to be loaded")
    print(f"   Load a model like 'text-embedding-nomic-embed-text-v1.5' in LM Studio")

## 8. Quick Chat Without Client

For one-off queries without creating a client.

In [None]:
# Use the quick_chat function for one-off queries
from local_llm_sdk import quick_chat

response = quick_chat(
    "What's 2 + 2?", 
    base_url="http://169.254.83.107:1234/v1"
)

print(f"Quick answer: {response}")

## Summary

### What We Demonstrated

Using the `local_llm_sdk` package, we showed how easy it is to:

1. **Create a client** - One line of code
2. **List models** - Type-safe response objects
3. **Simple chat** - Just pass a string
4. **Advanced chat** - Full control with ChatCompletion objects
5. **Conversation history** - Automatic context management
6. **Embeddings** - Vector generation for semantic search
7. **Quick chat** - One-off queries without a client

### Key Benefits

✅ **Type Safety** - All responses are validated Pydantic models
✅ **Simple Interface** - Complex operations in one method call
✅ **Flexible** - Simple mode for quick tasks, advanced mode for control
✅ **Production Ready** - Error handling, retries, and timeouts built-in

### Compare: Before vs After

**Before (Raw Requests):**
```python
import requests
response = requests.post(
    "http://localhost:1234/v1/chat/completions",
    json={
        "model": "model-name",
        "messages": [{"role": "user", "content": "Hello"}]
    }
)
data = response.json()
print(data["choices"][0]["message"]["content"])
```

**After (LocalLLMClient):**
```python
from local_llm_sdk import LocalLLMClient
client = LocalLLMClient()
print(client.chat("Hello"))
```

### Next Steps

- Explore the `tool-use-simplified.ipynb` notebook to see tool/function calling
- Check the package documentation for more advanced features
- Try different models and compare their capabilities
- Build your own applications with the SDK!

### 🚀 Ready for Production

The `local_llm_sdk` is designed to be a production-ready SDK for local LLM interactions, providing the same developer experience as cloud APIs but with the privacy and control of local deployment.

## 5. Advanced Chat with Full Response Object

When you need more control and information, use messages and get the full ChatCompletion object.