# LM Studio API Testing Notebook

## Overview
This notebook demonstrates how to interact with LM Studio's OpenAI-compatible API using Python and Pydantic models for type safety.

## Prerequisites
- LM Studio running locally with API server enabled
- Python with `requests` and `pydantic` installed
- The `api_models.py` file in the same directory

## Configuration
- **LM Studio URL**: `http://169.254.83.107:1234/v1`
- **Test Model**: `mistralai/magistral-small-2509`

In [108]:
# Standard library imports
import requests

# Import Pydantic models from our custom module
from api_models import (
    ModelList,           # Response model for /v1/models endpoint
    ModelInfo,           # Individual model information
    ChatCompletion,      # Response model for chat completions
    ChatCompletionRequest,  # Request model for chat completions
    ChatMessage,         # Individual message in a conversation
    create_chat_message,  # Helper to create messages
    create_chat_completion_request  # Helper to create requests
)

## 1. Import Required Libraries

Import the necessary libraries and Pydantic models from our custom `api_models.py` file.

In [109]:
# Base URL for LM Studio API (OpenAI-compatible format)
# Change this to match your LM Studio server address
base_url = "http://169.254.83.107:1234/v1"

## 2. Configure API Connection

Set up the base URL for your LM Studio instance. This should match the URL shown in LM Studio's server settings.

In [110]:
# Define OpenAI-compatible endpoints
endpoint_models = f"{base_url}/models"              # List available models
endpoint_chat = f"{base_url}/chat/completions"      # Chat completions (main endpoint)
endpoint_completions = f"{base_url}/completions"    # Legacy completions
endpoint_embeddings = f"{base_url}/embeddings"      # Generate embeddings

## 3. Define API Endpoints

LM Studio implements OpenAI-compatible endpoints. Here we define the main endpoints we'll be using.

## 4. List Available Models

Query the API to see which models are currently loaded in LM Studio. This is useful to verify connectivity and see available model IDs.

In [111]:
# Make GET request to list models endpoint
resp = requests.get(endpoint_models, timeout=5)

# Raise exception if request failed
resp.raise_for_status()

# Parse response using Pydantic model for type safety
models = ModelList.model_validate(resp.json())

# Display available models
print("Available models in LM Studio:")
print("-" * 40)
for model in models.data:
    print(f"• {model.id}")
    
print(f"\nTotal models loaded: {len(models.data)}")

Available models in LM Studio:
----------------------------------------
• mistralai/magistral-small-2509
• qwen/qwen3-coder-30b
• text-embedding-nomic-embed-text-v1.5
• smolvlm2-2.2b-instruct
• google/gemma-3-27b
• text-embedding-mxbai-embed-large-v1

Total models loaded: 6


In [112]:
# Step 1: Create the conversation messages
messages = [
    create_chat_message("system", "You are a helpful assistant."),
    create_chat_message("user", "Reply to my hello world in a funny way.")
]

# Step 2: Create a properly formatted request using Pydantic model
chat_request = create_chat_completion_request(
    model="mistralai/magistral-small-2509",  # Use model ID from the list above
    messages=messages,
    temperature=0.7,  # Controls randomness (0=deterministic, 2=very random)
    stream=False      # Set True for streaming responses
)

# Step 3: Send POST request to chat completions endpoint
resp = requests.post(
    endpoint_chat, 
    json=chat_request.model_dump(exclude_none=True),  # Convert Pydantic model to JSON
    timeout=15  # Allow 15 seconds for response (local models can be slower)
)

# Step 4: Check if request was successful
resp.raise_for_status()

# Step 5: Parse and validate response using Pydantic model
chat_completion = ChatCompletion.model_validate(resp.json())

print("✅ Chat completion request successful!")

✅ Chat completion request successful!


## 5. Send a Chat Completion Request

Now let's send a chat completion request to the model. We'll use helper functions to create properly formatted messages and requests.

In [113]:
# Extract the assistant's response from the completion object
# The response structure: completion -> choices[0] -> message -> content
response_text = chat_completion.choices[0].message.content

print("🤖 Model Response:")
print("-" * 40)
print(response_text)

🤖 Model Response:
----------------------------------------
Hello, world! It's me, your friendly neighborhood chatbot. Ready to assist you like a super-caffeinated squirrel ready to bury an acorn the size of a football. What can I do for you today? 😄


## 7. Response Metadata

Let's examine the complete response structure to understand what information is available.

In [114]:
# Display token usage information
print("📊 Token Usage:")
print(f"  • Prompt tokens: {chat_completion.usage.prompt_tokens}")
print(f"  • Completion tokens: {chat_completion.usage.completion_tokens}")
print(f"  • Total tokens: {chat_completion.usage.total_tokens}")

# Display other metadata
print(f"\n📝 Metadata:")
print(f"  • Model used: {chat_completion.model}")
print(f"  • Request ID: {chat_completion.id}")
print(f"  • Finish reason: {chat_completion.choices[0].finish_reason}")

# LM Studio specific stats (if available)
if chat_completion.stats:
    print(f"\n⚡ Performance Stats (LM Studio):")
    for key, value in chat_completion.stats.items():
        print(f"  • {key}: {value}")

📊 Token Usage:
  • Prompt tokens: 21
  • Completion tokens: 49
  • Total tokens: 70

📝 Metadata:
  • Model used: mistralai/magistral-small-2509
  • Request ID: chatcmpl-4ohvj1v13jkc18y0o025m
  • Finish reason: stop


## Summary

This notebook demonstrated the basic workflow for interacting with LM Studio's API:

1. **Import models** - Use Pydantic for type-safe API interactions
2. **Configure connection** - Set the base URL for your LM Studio instance
3. **List models** - Query available models to find the correct ID
4. **Create messages** - Build a conversation with system and user messages
5. **Send request** - POST to the chat completions endpoint
6. **Parse response** - Extract the model's reply and metadata

### Key Takeaways

- LM Studio implements OpenAI-compatible endpoints
- Use Pydantic models for request/response validation
- Helper functions simplify message and request creation
- Always handle errors with try/except in production code
- Local models may be slower than cloud APIs - adjust timeouts accordingly

### Next Steps

- Try different models and compare responses
- Experiment with temperature and other parameters
- Implement streaming for real-time responses
- Add error handling and retry logic
- Test embeddings generation for semantic search

## 6. Display the Response

Extract and display the model's response from the structured completion object.