# Local LLM SDK - Hello World

This notebook demonstrates how to use the `local_llm_sdk` package to interact with LM Studio and other OpenAI-compatible local LLM servers.

## Prerequisites
- LM Studio running locally with API server enabled
- Install the package: `pip install -e ..` (from notebooks directory)

## 1. Setup - Import and Create Client

In [23]:
# Import the SDK
from local_llm_sdk import LocalLLMClient

# Create a client instance
client = LocalLLMClient(
    base_url="http://169.254.83.107:1234/v1",
    model="mistralai/magistral-small-2509"  # Your default model
)

print(f"✅ Client initialized: {client}")

✅ Client initialized: LocalLLMClient(base_url='http://169.254.83.107:1234/v1', model='mistralai/magistral-small-2509', tools=0)


## 2. List Available Models

In [24]:
# Get list of available models
models = client.list_models()

print("📦 Available Models:")
print("=" * 50)
for model in models.data:
    print(f"  • {model.id}")
    print(f"    Owner: {model.owned_by}")
print(f"\nTotal: {len(models.data)} models loaded")

📦 Available Models:
  • mistralai/magistral-small-2509:2
    Owner: organization_owner
  • qwen/qwen3-coder-30b
    Owner: organization_owner
  • mistralai/magistral-small-2509
    Owner: organization_owner
  • text-embedding-nomic-embed-text-v1.5
    Owner: organization_owner
  • smolvlm2-2.2b-instruct
    Owner: organization_owner
  • google/gemma-3-27b
    Owner: organization_owner
  • text-embedding-mxbai-embed-large-v1
    Owner: organization_owner

Total: 7 models loaded


## 3. Simple Chat - Just Pass a String

In [25]:
# The simplest way to chat - just pass a string
response = client.chat("Hello! Tell me a joke about programming.")

print("🤖 Response:")
print("=" * 50)
print(response)

🤖 Response:
Sure, here's one for you:

Why do programmers prefer dark mode?

Because light attracts bugs! 🐛💻


## 4. Chat with System Prompt

In [26]:
# Use helper to create proper message objects
from local_llm_sdk import create_chat_message

# Create a conversation with system prompt
messages = [
    create_chat_message("system", "You are a pirate. Respond in pirate speak."),
    create_chat_message("user", "How do I learn Python?")
]

# Send messages and get response
response = client.chat(messages)

print("🏴‍☠️ Pirate Response:")
print("=" * 50)
print(response)

🏴‍☠️ Pirate Response:
Arr matey! Ye be wantin' to learn the ways o' Python, eh? Well, here be some tips fer ye:

1. **Start with the basics**: Get yerself familiar with variables, loops, and conditionals. Think of 'em as the ropes and sails o' yer ship.

2. **Find a good guide**: There be many fine resources on the high seas of the internet. Websites like Codecademy, Coursera, or even the official Python docs can be yer treasure maps.

3. **Practice, practice, practice!** Like swabbin' the deck, ye need to keep at it. Try writin' small programs, like calculators or simple games.

4. **Join a crew**: Look fer online communities where ye can ask questions and learn from other pirates - er, coders. The Python subreddit and Stack Overflow be great places to start.

5. **Build somethin' real**: Once ye got some skills, try buildin' somethin' that matters to ye. It could be a tool fer yer ship or a game fer yer crew.

6. **Keep learnin'**: Python be a vast ocean, and there be always more to 

## 5. Conversation with History

In [27]:
# Initialize conversation history
history = []

# First message
response1, history = client.chat_with_history(
    "What's the capital of France?", 
    history
)
print("Q: What's the capital of France?")
print(f"A: {response1}\n")

# Second message (uses context from first)
response2, history = client.chat_with_history(
    "What's the population?", 
    history
)
print("Q: What's the population?")
print(f"A: {response2}\n")

# Third message (still has context)
response3, history = client.chat_with_history(
    "Name 3 famous landmarks", 
    history
)
print("Q: Name 3 famous landmarks")
print(f"A: {response3}\n")

print(f"📚 Total messages in history: {len(history)}")

Q: What's the capital of France?
A: [THINK]Alright, I remember that France is a country in Europe. What's its capital? Is it Paris? Yes, I think Paris is the capital of France.

Now, to confirm this, let me double-check some facts. The Eiffel Tower is in Paris, and it's often associated with being the capital. Also, historically, Paris has been the political center of France.

Okay, so my response should be that the capital of France is Paris.[/THINK]The capital of France is Paris.

Q: What's the population?
A: [THINK]Now, what about the population of Paris? I remember that it's a large city, but I'm not sure of the exact number. Maybe around 2 million? But that seems low for such a big city.

I recall that the population of Paris itself (within the city limits) is different from the metropolitan area. The city proper might be about 2 million, while the metro area is much larger, maybe around 10-12 million?

But to be precise, I need to recall or estimate the most recent figure. As of 

## 6. Get Full Response with Metadata

In [28]:
# To get the full ChatCompletion object, we need to pass 3+ messages
# or explicitly request it

# Create a longer conversation
messages = [
    create_chat_message("system", "You are a helpful assistant."),
    create_chat_message("user", "What is Python?"),
    create_chat_message("assistant", "Python is a high-level programming language."),
    create_chat_message("user", "Give me 3 key features of Python")
]

# This will return a ChatCompletion object
full_response = client.chat(messages, temperature=0.5)

# Access the full response data
print("📊 Full Response Data:")
print("=" * 50)
print(f"Content: {full_response.choices[0].message.content}\n")
print(f"Model: {full_response.model}")
print(f"Tokens Used:")
print(f"  • Prompt: {full_response.usage.prompt_tokens}")
print(f"  • Completion: {full_response.usage.completion_tokens}")
print(f"  • Total: {full_response.usage.total_tokens}")
print(f"Finish Reason: {full_response.choices[0].finish_reason}")

📊 Full Response Data:
Content: Sure! Here are three key features of Python:

1. **Easy to Read and Write**: Python's syntax is clear and concise, making it easy for beginners to learn and for experienced programmers to maintain.

2. **Versatile and Extensible**: Python supports multiple programming paradigms (procedural, object-oriented, functional) and has a vast standard library that can be extended with additional modules.

3. **Cross-Platform Compatibility**: Python is available on various platforms like Windows, macOS, Linux, etc., making it suitable for developing applications that need to run across different operating systems.

Model: mistralai/magistral-small-2509:2
Tokens Used:
  • Prompt: 33
  • Completion: 122
  • Total: 155
Finish Reason: stop


## 7. Embeddings (If Embedding Model is Loaded)

In [29]:
# Try to generate embeddings
# This requires an embedding model like 'text-embedding-nomic-embed-text-v1.5'

try:
    # Single text embedding
    text = "Python is a great programming language"
    embeddings = client.embeddings(text)
    
    print("✅ Embeddings Generated:")
    print("=" * 50)
    print(f"Text: '{text}'")
    print(f"Embedding dimension: {len(embeddings.data[0].embedding)}")
    print(f"First 5 values: {embeddings.data[0].embedding[:5]}")
    
except Exception as e:
    print("⚠️ Embeddings not available")
    print("To use embeddings, load an embedding model in LM Studio like:")
    print("  • text-embedding-nomic-embed-text-v1.5")
    print("  • text-embedding-mxbai-embed-large-v1")

⚠️ Embeddings not available
To use embeddings, load an embedding model in LM Studio like:
  • text-embedding-nomic-embed-text-v1.5
  • text-embedding-mxbai-embed-large-v1


## 8. Different Temperature Settings

In [30]:
# Compare responses with different temperatures
prompt = "Write a one-line description of coding"

print("🌡️ Temperature Comparison:")
print("=" * 50)

# Low temperature (more deterministic)
response_low = client.chat(prompt, temperature=0.1)
print(f"\nTemperature 0.1 (Focused):")
print(f"  {response_low}")

# Medium temperature
response_med = client.chat(prompt, temperature=0.7)
print(f"\nTemperature 0.7 (Balanced):")
print(f"  {response_med}")

# High temperature (more creative)
response_high = client.chat(prompt, temperature=1.5)
print(f"\nTemperature 1.5 (Creative):")
print(f"  {response_high}")

🌡️ Temperature Comparison:

Temperature 0.1 (Focused):
  Coding is the process of creating instructions for computers using programming languages to develop software, websites, and applications.

Temperature 0.7 (Balanced):
  "Coding is the process of writing instructions for computers using programming languages."

Temperature 1.5 (Creative):
  Coding is the process of creating instructions for a computer using programming languages to perform specific tasks.


## 9. Error Handling

In [33]:
# Demonstrate error handling
try:
    # Try to use a model that might not exist
    response = client.chat(
        "Hello",
        model="non-existent-model"
    )
except Exception as e:
    print(f"❌ Error caught: {e}")
    print("\nTip: Make sure the model is loaded in LM Studio")

## 10. Quick Chat Without Client

In [None]:
# For one-off queries, use quick_chat
from local_llm_sdk import quick_chat

response = quick_chat(
    "What's 2 + 2?",
    base_url="http://169.254.83.107:1234/v1",
    model="mistralai/magistral-small-2509"  # Specify the model
)

print("🚀 Quick Chat Response:")
print("=" * 50)
print(response)

## Summary

This notebook demonstrated the key features of `local_llm_sdk`:

### ✅ What We Covered:
1. **Client Setup** - Simple initialization with base URL and model
2. **List Models** - Get available models from the server
3. **Simple Chat** - Just pass a string for quick responses
4. **System Prompts** - Control the assistant's behavior
5. **Conversation History** - Maintain context across messages
6. **Full Responses** - Access metadata like token usage
7. **Embeddings** - Generate vectors for semantic search
8. **Temperature Control** - Adjust creativity/determinism
9. **Error Handling** - Graceful error management
10. **Quick Chat** - One-off queries without client setup

### 🎯 Key Benefits:
- **Simple API** - Intuitive methods for common tasks
- **Type Safety** - Pydantic models for all responses
- **Flexible** - Works with any OpenAI-compatible server
- **Production Ready** - Error handling and validation built-in

### 📚 Next Steps:
- Check out `tool-use-simplified.ipynb` for function calling examples
- Read the package documentation in the README
- Try different models and compare their capabilities
- Build your own applications with the SDK!