miStudio: End-to-End Interpretability Workflow (Live Run)
This Jupyter notebook demonstrates the full, end-to-end workflow of the miStudio platform. We will use the four core microservices (miStudioTrain, miStudioFind, miStudioExplain, and miStudioScore) to take a base language model and produce a scored, explained set of its internal features.

This is a live, non-simulated run. The steps are resource-intensive and will take a significant amount of time to complete.

Workflow Overview:

Train: Use miStudioTrain to train a Sparse Autoencoder (SAE) on the activations of microsoft/phi-4.

Find: Use miStudioFind to analyze the trained SAE and identify interesting features from the stas/openwebtext-10k dataset.

Explain: Use miStudioExplain to generate natural-language explanations for the found features.

Score: Use miStudioScore to apply quantitative scores to the features based on relevance and utility.

Setup: Imports and Configuration
First, let's import the necessary Python libraries and define the URLs for our four running services.

In [1]:
import requests
import json
import os
import time
import pandas as pd
import yaml
from getpass import getpass

# --- Service Endpoints ---
# These should match the ports you started the services on.
BASE_URL = "http://localhost"
TRAIN_URL = f"{BASE_URL}:8001"
FIND_URL = f"{BASE_URL}:8002"
EXPLAIN_URL = f"{BASE_URL}:8003"
SCORE_URL = f"{BASE_URL}:8004"

# --- Shared Data Paths ---
# We will assume a shared volume or directory accessible by all services.
# For this demo, we'll place inputs in a 'notebook_run' directory.
DATA_ROOT = "./notebook_run"
INPUT_DIR = os.path.join(DATA_ROOT, "input")
OUTPUT_DIR = os.path.join(DATA_ROOT, "output")
CONFIG_DIR = os.path.join(DATA_ROOT, "config")

os.makedirs(INPUT_DIR, exist_ok=True)
os.makedirs(OUTPUT_DIR, exist_ok=True)
os.makedirs(CONFIG_DIR, exist_ok=True)

# --- Hugging Face Token ---
# Required for accessing certain models like phi-4.
# Best practice is to load from an environment variable.
hf_token = os.getenv("HUGGING_FACE_TOKEN")
if not hf_token:
    print("Hugging Face token not found in environment variables.")
    hf_token = getpass("Please enter your Hugging Face token: ")

if not hf_token:
    print("⚠️ Warning: No Hugging Face token provided. The 'train' step may fail if the model is gated.")
else:
    print("✅ Hugging Face token loaded.")


# --- Global Variables ---
# To store the output paths from each step
sae_model_path = None
features_path = None
explanations_path = None
scores_path = None

print(f"\nService URLs configured:")
print(f"  - Train:   {TRAIN_URL}")
print(f"  - Find:    {FIND_URL}")
print(f"  - Explain: {EXPLAIN_URL}")
print(f"  - Score:   {SCORE_URL}")
print(f"\nData directories created at: {os.path.abspath(DATA_ROOT)}")

Hugging Face token not found in environment variables.
✅ Hugging Face token loaded.

Service URLs configured:
  - Train:   http://localhost:8001
  - Find:    http://localhost:8002
  - Explain: http://localhost:8003
  - Score:   http://localhost:8004

Data directories created at: /home/sean/app/miStudio/tests/notebook_run


Step 1: Train an SAE with miStudioTrain
This is the first and most time-consuming step. We will send a request to the miStudioTrain service to train an SAE on the activations of a mid-level layer from the Phi-4 model.

WARNING: This cell will run for a very long time (potentially hours).

In [4]:
print("▶️ Step 1: Train")
print("This step will take a long time. Please be patient.")

# Define the parameters for our training job
MODEL_NAME = "microsoft/phi-4" # Updated model
DATASET_NAME = "stas/openwebtext-10k"
# NOTE: This layer is a guess for a larger model. You may need to update it.
TARGET_LAYER = "model.layers.24.mlp" 

train_payload = {
    "model_name": MODEL_NAME,
    "dataset_name": DATASET_NAME,
    "layer": TARGET_LAYER,
    "output_dir": OUTPUT_DIR,
    "hf_token": hf_token # Pass the token to the service
}

print(f"\nSending training request to {TRAIN_URL}/train_model/ with payload:")
# Create a copy of the payload to print, but hide the token
printable_payload = train_payload.copy()
printable_payload["hf_token"] = "hf_...[REDACTED]"
print(json.dumps(printable_payload, indent=2))

try:
    # We use a very long timeout because this is a blocking, long-running task.
    # CORRECTED a typo in the URL from /train to /train_sae/
    response = requests.post(f"{TRAIN_URL}/train_model/", json=train_payload, timeout=10800) # 3 hour timeout
    response.raise_for_status()
    
    train_result = response.json()
    # We assume the service returns a path to the trained model.
    # The key is assumed to be 'output_path' based on other services.
    sae_model_path = train_result.get("output_path")
    
    if sae_model_path:
        print(f"\n✅ Success! Training complete. SAE model saved to: {sae_model_path}")
    else:
        print(f"❌ Error: Training job finished but did not return an output path.")
        print("Response from server:", train_result)

except requests.exceptions.RequestException as e:
    print(f"❌ Error during Train step: {e}")

▶️ Step 1: Train
This step will take a long time. Please be patient.

Sending training request to http://localhost:8001/train_model/ with payload:
{
  "model_name": "microsoft/phi-4",
  "dataset_name": "stas/openwebtext-10k",
  "layer": "model.layers.24.mlp",
  "output_dir": "./notebook_run/output",
  "hf_token": "hf_...[REDACTED]"
}
❌ Error during Train step: 404 Client Error: Not Found for url: http://localhost:8001/train_model/


Step 2: Find Interesting Features with miStudioFind
Now that we have a trained SAE, we'll use the miStudioFind service to analyze it and extract a list of features with their statistics and top activating examples.

In [None]:
if sae_model_path:
    print("\n▶️ Step 2: Find")

    find_payload = {
        "sae_model_path": sae_model_path,
        "dataset_name": DATASET_NAME,
        "output_dir": OUTPUT_DIR
    }

    print(f"Sending request to {FIND_URL}/find...")
    try:
        response = requests.post(f"{FIND_URL}/find", json=find_payload, timeout=10800) # 3 hour timeout
        response.raise_for_status()
        
        find_result = response.json()
        features_path = find_result.get("output_path")
        
        print(f"✅ Success! Features saved to: {features_path}")

        features_df = pd.read_json(features_path)
        print("\nSample of found features:")
        display(features_df.head())

    except requests.exceptions.RequestException as e:
        print(f"❌ Error during Find step: {e}")
        features_path = None
else:
    print("\nSkipping Step 2: Find, as the previous step failed.")

Step 3: Explain Features with miStudioExplain
With our list of features, we can now use the miStudioExplain service to generate human-readable explanations for what each feature represents.

In [None]:
if features_path:
    print("\n▶️ Step 3: Explain")
    
    explain_payload = {
        "feature_data_path": features_path,
        "output_dir": OUTPUT_DIR
    }

    print(f"Sending request to {EXPLAIN_URL}/explain...")
    try:
        response = requests.post(f"{EXPLAIN_URL}/explain", json=explain_payload, timeout=3600) # 1 hour timeout
        response.raise_for_status()
        
        explain_result = response.json()
        explanations_path = explain_result.get("output_path")
        
        print(f"✅ Success! Explanations saved to: {explanations_path}")

        explanations_df = pd.read_json(explanations_path)
        print("\nSample of explained features:")
        display(explanations_df[["feature_index", "explanation"]].head())

    except requests.exceptions.RequestException as e:
        print(f"❌ Error during Explain step: {e}")
        explanations_path = None
else:
    print("\nSkipping Step 3: Explain, as the previous step failed.")

Step 4: Score Features with miStudioScore
This is the final step. We will use the miStudioScore service to apply both a relevance score and a more advanced ablation-based utility score.

First, we need to create the scoring_config.yaml and a dummy benchmark_dataset.py file.

In [None]:
# Create the benchmark dataset file for ablation scoring
benchmark_content = """
import torch
from tqdm import tqdm

def run_benchmark(model, tokenizer, device):
    # A simple dummy benchmark that calculates perplexity on a fixed sentence.
    # A real benchmark would use a proper evaluation dataset.
    text = "The quick brown fox jumps over the lazy dog."
    encodings = tokenizer(text, return_tensors="pt").to(device)
    
    with torch.no_grad():
        outputs = model(**encodings, labels=encodings.input_ids)
        loss = outputs.loss
    
    return loss.item()
"""
benchmark_path = os.path.join(CONFIG_DIR, "notebook_benchmark.py")
with open(benchmark_path, "w") as f:
    f.write(benchmark_content)
print(f"Benchmark script created at: {benchmark_path}")


# Create the scoring configuration file dynamically for this run
scoring_config = {
    "scoring_jobs": [
        {
            "scorer": "relevance_scorer",
            "name": "code_relevance",
            "params": {
                "positive_keywords": ["python", "def", "import", "class", "return", "for", "while"],
                "negative_keywords": ["marketing", "recipe", "sports", "weather"],
            },
        },
        {
            "scorer": "ablation_scorer",
            "name": "perplexity_utility",
            "params": {
                "benchmark_dataset_path": benchmark_path,
                "target_model_name": MODEL_NAME,
                "target_model_layer": TARGET_LAYER,
                "device": "cuda", # Change to "cpu" if not using a GPU
            },
        }
    ]
}

scoring_config_path = os.path.join(CONFIG_DIR, "notebook_scoring_config.yaml")
with open(scoring_config_path, "w") as f:
    yaml.dump(scoring_config, f, indent=2)

print(f"Scoring configuration created at: {scoring_config_path}")

Now we can call the scoring service.

In [None]:
if explanations_path:
    print("\n▶️ Step 4: Score")
    
    score_payload = {
        "features_path": explanations_path, # Use the output from the previous step
        "config_path": scoring_config_path,
        "output_dir": OUTPUT_DIR
    }

    print(f"Sending request to {SCORE_URL}/score...")
    try:
        response = requests.post(f"{SCORE_URL}/score", json=score_payload, timeout=10800) # 3 hour timeout for ablation
        response.raise_for_status()
        
        score_result = response.json()
        scores_path = score_result.get("output_path")
        
        print(f"✅ Success! Final scores saved to: {scores_path}")

        scores_df = pd.read_json(scores_path)
        print("\nFinal scored and explained features (sorted by code relevance):")
        display(scores_df.sort_values(by="code_relevance", ascending=False).head(10))
        
        print("\nFinal scored and explained features (sorted by perplexity utility):")
        # A higher utility score (more negative impact when removed) means the feature is more important
        display(scores_df.sort_values(by="perplexity_utility", ascending=False).head(10))

    except requests.exceptions.RequestException as e:
        print(f"❌ Error during Score step: {e}")
else:
    print("\nSkipping Step 4: Score, as the previous step failed.")

Conclusion
This notebook has successfully demonstrated the full miStudio pipeline. We have shown how a series of independent microservices can be chained together to create a powerful and flexible interpretability workflow, taking a large language model and turning its opaque internal workings into scored, explained, and actionable insights.