# Mechanex Serving Demo

This notebook demonstrates how to host an OpenAI-compatible server using Mechanex. This allows you to use standard LLM tools like the **OpenAI Python SDK** to interact with Mechanex, whether it's running a local model or using the remote API.

We also show how to use **mechanistic features** (Steering Vectors and SAEs) through the API.

In [1]:
# Install dependencies if needed
%pip install openai

import mechanex as mx
import threading
import time
from openai import OpenAI


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
# Set your API key
mx.set_key("demo-key-123") # Required for both local and remote modes

## 1. Local Model Serving

Load a small model locally. Mechanex will use this for all incoming requests to the server.

In [3]:
# Load gpt2 locally using transformer-lens
mx.load("gpt2")

Loading gpt2 locally...


`torch_dtype` is deprecated! Use `dtype` instead!


Loaded pretrained model gpt2 into HookedTransformer
SAE release automatically set to: gpt2-res-jb


<mechanex.client.Mechanex at 0x11bc0d940>

### Start the Server in the Background

We use `mx.serve()` to launch a FastAPI server that mirrors the OpenAI API format.

In [5]:
def start_server():
    # Run the OpenAI-compatible server on port 8001
    mx.serve(port=8001)

# Run server in a separate thread so it doesn't block the notebook
thread = threading.Thread(target=start_server, daemon=True)
thread.start()

# Give it a moment to initialize
time.sleep(5)

INFO:     Started server process [54297]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit)


Starting Mechanex OpenAI-compatible server on 0.0.0.0:8001


INFO:     127.0.0.1:53012 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     127.0.0.1:53012 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     127.0.0.1:53026 - "POST /v1/chat/completions HTTP/1.1" 200 OK


## 2. Interact using the OpenAI SDK

Now we can initialize a standard `OpenAI` client pointing to our local Mechanex server.

In [8]:
client = OpenAI(
    api_key="mechanex-is-cool", # Any non-empty string works for the local server
    base_url="http://localhost:8001/v1"
)

print("Sending request to local Mechanex server...")

completion = client.chat.completions.create(
    model="mechanex-local",
    messages=[
        {"role": "user", "content": "The capital of France is"}
    ],
    max_tokens=10
)

print("\nResponse from Mechanex:")
print(completion.choices[0].message.content)

Sending request to local Mechanex server...

Response from Mechanex:
The capital of France is expected to sign a deal to end the country's


## 3. Mechanistic Serving (Steering & SAE)

Mechanex allows you to apply steering vectors and SAE behaviors directly through the OpenAI-compatible API by passing **custom extra parameters**.

In [9]:
# 1. Create a local steering vector 
vector_id = mx.steering.generate_vectors(
    prompts=["The weather is"],
    positive_answers=[" extremely cold and snowy"],
    negative_answers=[" incredibly hot and sunny"],
    method="caa"
)

# 2. Use it via the OpenAI SDK's 'extra_body'
completion = client.chat.completions.create(
    model="mechanex-local",
    messages=[
        {"role": "user", "content": "The weather today is"}
    ],
    max_tokens=15,
    extra_body={
        "steering_vector": vector_id,
        "steering_strength": 2.0
    }
)

print("Steered Response:")
print(completion.choices[0].message.content)

Processing prompts to generate steering vectors...


100%|██████████| 1/1 [00:00<00:00, 12.27it/s]

Steering vector computation complete.





Steered Response:
The weather today is moderate- to-"very low- and maximum- in moderate to-


### SAE Behavior Correction
You can also enable behavior monitoring (like anti-toxicity) by passing `behavior_names`.

In [10]:
completion = client.chat.completions.create(
    model="mechanex-local",
    messages=[
        {"role": "user", "content": "Tell me a interesting fact about science."}
    ],
    max_tokens=20,
    extra_body={
        "behavior_names": ["toxicity"], 
        "auto_correct": True
    }
)

print("SAE-Monitored Response:")
print(completion.choices[0].message.content)

  0%|          | 0/20 [00:00<?, ?it/s]

SAE-Monitored Response:
Tell me a interesting fact about science. What's it like to work in an industry where you do most of the work yourself at your favorite


## 4. Switching to Remote API

When you unload the local model, the Mechanex server automatically switches to utilizing the remote Axionic API.

In [18]:
mx.unload()
print("Local model unloaded. Requests will now go to the remote API.")

Unloading gpt2...
Moving model to device:  cpu
Local model unloaded. Requests will now go to the remote API.
