# Getting Started with Mechanex

In [3]:
import mechanex as mx

## 1. Loading a Model

Mechanex makes it easy to load models locally for experimentation. It automatically configures necessary hook points and SAE releases for you.

In [4]:
# Load gpt2-small locally
mx.set_key("your-api-key")
mx.load("gpt2-small")

Loading gpt2-small locally...


`torch_dtype` is deprecated! Use `dtype` instead!


Loaded pretrained model gpt2-small into HookedTransformer
SAE release automatically set to: gpt2-small-res-jb


<mechanex.client.Mechanex at 0x129e85940>

## 2. Advanced Sampling

You can generate text from the model with standard sampling methods like `top-k` and `top-p`.

In [5]:
prompt = "The capital of France is"

print("Top-K Sampling:")
out_k = mx.generation.generate(prompt, max_tokens=10, sampling_method="top-k", top_k=50)
print(f"- {out_k}")

print("\nTop-P Sampling:")
out_p = mx.generation.generate(prompt, max_tokens=10, sampling_method="top-p", top_p=0.9)
print(f"- {out_p}")

Top-K Sampling:
- The capital of France is a sprawling, modern industrial city and its population of

Top-P Sampling:
- The capital of France is making plans to build 250 additional state-owned factories


## 3. Steering Vectors

Steering vectors allow you to nudge the model's internal representations toward specific behaviors. 

### Generate Vectors
We'll create a steering vector that encourages the model to be more "honest" using a few examples.

In [6]:
honesty_vector_id = mx.steering.generate_vectors(
    prompts=["I tell the", "My statement is", "The truth is"],
    positive_answers=[" truth", " factual", " correct"],
    negative_answers=[" lie", " false", " wrong"],
    method="caa" # Contrastive Activation Addition
)
print(f"Generated steering vector with ID: {honesty_vector_id}")

Processing prompts to generate steering vectors...


100%|██████████| 3/3 [00:00<00:00, 13.34it/s]

Steering vector computation complete.
Generated steering vector with ID: 62e561f7-597b-47bc-9cf0-5568ef124cc5





### Save and Load Vectors

Mechanex allows you to export steering vectors to files so you can reuse them across sessions without recomputing.

In [7]:
# Save the vector to disk
mx.steering.save_vectors(honesty_vector_id, "example_vector.json")

# Load it back as a dictionary of tensors
vectors = mx.steering.load_vectors("example_vector.json")

print(f"Loaded layers: {list(vectors.keys())}")

Steering vectors saved to example_vector.json
Steering vectors loaded from example_vector.json
Loaded layers: [8, 9, 10, 11]


### Steer Generation

Now we apply the loaded vector during generation. Note how we can pass the vector dictionary directly or the ID.

In [8]:
steered_output = mx.generation.generate(
    "Do I tell lies? Answer: ",
    max_tokens=20,
    steering_vector="3c676005-5610-4482-afa4-8b0405a228b0", # Passing the loaded dict
    steering_strength=2.0
)
print(f"Steered Output: {steered_output}")

Steered Output: Do I tell lies? Answer:  Yes . I don't. I tell truthful lies, and I only tell lies to people who


## 4. Sparse Autoencoders (SAEs)

Mechanex integrates with `sae-lens` to automatically download and apply Sparse Autoencoders for concept steering. 

In [9]:
behavior = mx.sae.create_behavior(
    "paris",
    prompts=["The city of light is", "I love visiting"],
    positive_answers=[" Paris", " Paris"],
    # Negative answers help isolate the concept
    negative_answers=[" London", " Rome"] 
)

Remote behavior creation failed ([401] Authentication failed: Invalid API key). Computing locally with sae-lens...
Loading SAE for blocks.8.hook_resid_pre (gpt2-small-res-jb)...


This SAE has non-empty model_from_pretrained_kwargs. 
For optimal performance, load the model like so:
model = HookedSAETransformer.from_pretrained_no_processing(..., **cfg.model_from_pretrained_kwargs)


### Steering with SAEs

You can also use these behaviors to steer the model towards specific SAE features.

In [10]:
sae_steered = mx.sae.generate(
    "Do I want to go to rome or paris? Choose Paris or Rome. Answer:  ",
    behavior_names=["paris"],
    max_new_tokens=64,
    force_steering=["paris"]
)
print(f"SAE Steered: {sae_steered}")

  0%|          | 0/64 [00:00<?, ?it/s]

SAE Steered: Do I want to go to rome or paris? Choose Paris or Rome. Answer:   ??? Suspect civility varies. Really. See the whole guide. If you have to do a lot of work though, you should be prepared to become anonymous to be sumptuous from accusations and questions. Look at the list of questions. In most cases, the one you are most (or it's sometimes even


## 5. Remote API Features


In [11]:
ads_output = mx.generation.generate(
    "Here is the story about the history of artificial intelligence:",
    sampling_method="ads",
    max_tokens=64
)
print(f"ADS Output: {ads_output}")

MechanexError: Add balance to your API key in order to use this feature.

In [None]:
# Create and save behaviors directly in your Axionic account for persistence
mx.sae.list_behaviors()