# Preliminary MoE Routing Demo (Switch Transformer)

This notebook is a small exploratory test to understand **Mixture-of-Experts (MoE) routing** in a real model.

Using **Switch-Base-8**, we inspect **token-level expert routing decisions** at a **single encoder MoE layer**.  
We capture router logits, selected experts, and routing entropy for short inputs, and align them with the tokenizer’s actual subword tokens.

This notebook is **purely observational**:
- no training
- no fine-tuning
- no architectural modification

The goal is simply to verify that expert routing can be intercepted and interpreted before scaling to larger MoE models.


In [35]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "google/switch-base-8"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype="auto"
)

model.eval()
print("Switch MoE loaded correctly.")

Loading weights:   0%|          | 0/440 [00:00<?, ?it/s]



Switch MoE loaded correctly.


List all modules with "router" in the name to find the routing components.

In [36]:
!nvidia-smi

Tue Feb 10 19:00:06 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   50C    P0             28W /   70W |    4360MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [37]:
for name, module in model.named_modules():
    if "router" in name.lower():
        print(name, "->", type(module))
print("Switch MoE router modules identified.")

encoder.block.1.layer.1.mlp.router -> <class 'transformers.models.switch_transformers.modeling_switch_transformers.SwitchTransformersTop1Router'>
encoder.block.1.layer.1.mlp.router.classifier -> <class 'torch.nn.modules.linear.Linear'>
encoder.block.3.layer.1.mlp.router -> <class 'transformers.models.switch_transformers.modeling_switch_transformers.SwitchTransformersTop1Router'>
encoder.block.3.layer.1.mlp.router.classifier -> <class 'torch.nn.modules.linear.Linear'>
encoder.block.5.layer.1.mlp.router -> <class 'transformers.models.switch_transformers.modeling_switch_transformers.SwitchTransformersTop1Router'>
encoder.block.5.layer.1.mlp.router.classifier -> <class 'torch.nn.modules.linear.Linear'>
encoder.block.7.layer.1.mlp.router -> <class 'transformers.models.switch_transformers.modeling_switch_transformers.SwitchTransformersTop1Router'>
encoder.block.7.layer.1.mlp.router.classifier -> <class 'torch.nn.modules.linear.Linear'>
encoder.block.9.layer.1.mlp.router -> <class 'transforme

Tokenize a single example and print ids, tokens, and attention mask to see the subword split.

In [None]:
# Minimal step: tokenize a single input and inspect tokens
text = "question: Where is Paris?, context: Paris is the capital of France."

encoding = tokenizer(text, return_tensors="pt")
input_ids = encoding["input_ids"][0].tolist()
attention_mask = encoding["attention_mask"][0].tolist()

print("Input text:")
print(text)
print("\nToken IDs:")
print(input_ids)
print("\nTokens:")
print(tokenizer.convert_ids_to_tokens(input_ids))
print("\nAttention Mask:")
print(attention_mask)

Input text:
question: Where is Paris?, context: Paris is the capital of France.

Token IDs:
[822, 10, 2840, 19, 1919, 58, 6, 2625, 10, 1919, 19, 8, 1784, 13, 1410, 5, 1]

Tokens:
['▁question', ':', '▁Where', '▁is', '▁Paris', '?', ',', '▁context', ':', '▁Paris', '▁is', '▁the', '▁capital', '▁of', '▁France', '.', '</s>']

Attention Mask:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]


Error during conversion: AttributeError("'str' object has no attribute 'decode'")


Move tokenized tensors onto the same device as the model. The model might be on GPU while the tokenizer outputs are on CPU, and PyTorch requires both to be on the same device for a forward pass.

In [39]:
# Move tokenized tensors to the model device
encoding = tokenizer(text, return_tensors="pt")
input_ids = encoding["input_ids"].to(model.device)
attention_mask = encoding["attention_mask"].to(model.device)

print("input_ids device:", input_ids.device)
print("attention_mask device:", attention_mask.device)

input_ids device: cuda:0
attention_mask device: cuda:0


Exception in thread Thread-auto_conversion:
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1012, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/dist-packages/transformers/safetensors_conversion.py", line 116, in auto_conversion
    raise e
  File "/usr/local/lib/python3.12/dist-packages/transformers/safetensors_conversion.py", line 95, in auto_conversion
    sha = get_conversion_pr_reference(api, pretrained_model_name_or_path, **cached_file_kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/safetensors_conversion.py", line 76, in get_conversion_pr_reference
    raise OSError(
OSError: Could not create safetensors conversion PR. The repo does not appear to have a file named pytorch_model.bin or model.safetensors.

In [40]:
import torch

# Encoder forward only (no hooks)
with torch.no_grad(), torch.autocast(model.device.type, dtype=model.dtype):
    encoder_outputs = model.encoder(
        input_ids=input_ids,
        attention_mask=attention_mask
    )

print("Encoder output last_hidden_state:", encoder_outputs.last_hidden_state.shape)
print("This consists of the batch dimension (1), sequence length, and hidden state dimension.")


Encoder output last_hidden_state: torch.Size([1, 17, 768])
This consists of the batch dimension (1), sequence length, and hidden state dimension.



### Forward hook method


- **`router_classifier`**  
  The **linear layer inside the router** that maps each token’s hidden state to logits over the experts  
  (shape: `(hidden_dim, num_experts)`).

- **`classifier_outputs`**  
  A list used to **store the outputs captured by the hook**.  
  Hooks may fire multiple times, so outputs are accumulated safely here.

- **`classifier_hook(module, module_inputs, module_outputs)`**  
  A function that PyTorch **automatically calls when the classifier runs**.
  - `module`: the classifier layer itself (`nn.Linear`)
  - `module_inputs`: hidden-state tensors **entering the classifier**  
    (one vector per token)
  - `module_outputs`: **raw logits over experts**  
    (shape: `[num_tokens, num_experts] so (17,8) `)

- **Tuple handling inside the hook**  
  Some modules return `(output, extra_info)`.  
  This unwraps the tensor so we always store just the logits.

- **`register_forward_hook(...)`**  
  Attaches the hook to the classifier.  
  From this point on, **whenever the classifier executes**, the hook is triggered.

- **`with torch.no_grad()`**  
  Disables gradient tracking because we are **observing, not training**.  
  Saves memory and computation.

- **`torch.autocast(...)`**  
  Runs the forward pass in the model’s native precision (e.g. FP16 on GPU).  
  Prevents dtype mismatches and is standard practice for inference.

- **`model.encoder(...)`**  
  Executes a single encoder forward pass.  
  This is what **actually triggers the router and the hook**.

- **`handle.remove()`**  
  Detaches the hook immediately after the pass to avoid duplicate captures.

- **`classifier_outputs[0]`**  
  The captured tensor of **per-token expert logits**.  
  Each row corresponds to one token, each column to one expert.

- **Final prints**  
  Confirm tensor shape and inspect logits for a specific token and all experts.


In [41]:
# Hook the router classifier to capture per-expert logits
router_classifier = model.encoder.block[1].layer[1].mlp.router.classifier
classifier_outputs = []

# Define a hook function to capture the outputs of the router classifier
def classifier_hook(module, module_inputs, module_outputs):
    if isinstance(module_outputs, tuple):
        #Ensure we only capture the logits 
        module_outputs = module_outputs[0]
    classifier_outputs.append(module_outputs)


handle = router_classifier.register_forward_hook(classifier_hook)

with torch.no_grad(), torch.autocast(model.device.type, dtype=model.dtype):
    _ = model.encoder(
        input_ids=input_ids,
        attention_mask=attention_mask
    )

handle.remove()

logits = classifier_outputs[0].cpu()
print("Classifier logits shape:", tuple(logits.shape))
print("Classifier logits sample:", logits[13, :8].tolist())

Classifier logits shape: (17, 8)
Classifier logits sample: [0.05078125, -2.0, -0.17578125, 2.8125, -0.154296875, -0.173828125, 0.470703125, 0.279296875]


Convert logits into a concrete routing summary. We take the argmax to pick the top expert per token, compute entropy to measure confidence, then align those values with the tokenizer's subword tokens in a table.

In [42]:
import pandas as pd

import torch.nn.functional as F

# Compute expert assignment + entropy per token
logits = logits.float()
experts = logits.argmax(dim=-1)
entropy = -(F.softmax(logits, dim=-1) * F.log_softmax(logits, dim=-1)).sum(dim=-1)

tokens = tokenizer.convert_ids_to_tokens(input_ids[0].tolist())
df = pd.DataFrame({
    "TOKEN": tokens,
    "EXPERT": experts.cpu().numpy(),
    "LOGIT": logits.max(dim=-1).values.cpu().numpy(),
    "ENTROPY": entropy.cpu().numpy()
})
df

Unnamed: 0,TOKEN,EXPERT,LOGIT,ENTROPY
0,▁question,4,2.0,1.733095
1,:,2,2.46875,1.472535
2,▁Where,7,2.375,1.368644
3,▁is,4,2.71875,1.297853
4,▁Paris,7,2.265625,1.361608
5,?,2,2.359375,1.484937
6,",",2,2.0625,1.598489
7,▁context,5,2.65625,1.720571
8,:,2,2.796875,1.35567
9,▁Paris,7,2.921875,1.281017


Next we prepare the contents of Prompts_base and Prompts_context for input

In [43]:
# Mount to drive

import google.colab.drive as drive

drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [44]:
from pathlib import Path
data_dir = Path("/content/drive/MyDrive/Individual-Project-25-26/stage1/data")
base_path = data_dir / "Prompt_base.jsonl"
context_path = data_dir / "Prompt_context.jsonl"

In [45]:
import json

def load_prompt_records(path):
    records = []
    with path.open("r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            records.append(json.loads(line))
    return records

def build_context_prompt(item):
    cue = item.get("Cue", "")
    context = item.get("Context", "")
    question = item.get("Question", "")
    parts = []
    if cue:
        parts.append(cue)
    parts.append("Context:\n" + context)
    parts.append("Question:\n" + question)
    return "\n\n".join(parts)

base_records = load_prompt_records(base_path)
context_records = load_prompt_records(context_path)

base_prompts = [record.get("Question", "") for record in base_records]
context_prompts = [build_context_prompt(record) for record in context_records]
context_questions = [record.get("Question", "") for record in context_records]

prompt_sets = {
    "base": base_prompts,
    "context": context_prompts,
}

print("Loaded base prompts:", len(base_prompts))
print("Loaded context prompts:", len(context_prompts))

Loaded base prompts: 100
Loaded context prompts: 100


In [52]:
def tokenize_prompts(prompts):
    tokenized = []
    for prompt in prompts:
        encoding = tokenizer(prompt, return_tensors="pt")
        tokenized.append({
            "input_ids": encoding["input_ids"][0].tolist(),
            "attention_mask": encoding["attention_mask"][0].tolist(),
        })
    return tokenized


base_tokenized = tokenize_prompts(base_prompts)
context_tokenized = tokenize_prompts(context_prompts)  # Fixed: was context_questions, now context_prompts

print("Base tokenized:", len(base_tokenized))
print("Context tokenized:", len(context_tokenized))

Base tokenized: 100
Context tokenized: 100


In [53]:
def inspect_tokenized(tokenized, prompts, label, idx=99):
    if not tokenized:
        print(f"{label}: no items")
        return
    idx = max(0, min(idx, len(tokenized) - 1))
    item = tokenized[idx]
    ids = item["input_ids"]
    mask = item["attention_mask"]
    print(f"{label} example {idx}:")
    print("prompt:", prompts[idx])
    print("num tokens:", len(ids))
    print("input_ids:", ids[:30])
    print("attention_mask:", mask[:30])
    print("tokens:", tokenizer.convert_ids_to_tokens(ids[:30]))

inspect_tokenized(base_tokenized, base_prompts, "base", idx=99)
inspect_tokenized(context_tokenized, context_prompts, "context", idx=99)  # Fixed: was context_questions, now context_prompts

base example 99:
prompt: Where was the city originally located?
num tokens: 8
input_ids: [2840, 47, 8, 690, 5330, 1069, 58, 1]
attention_mask: [1, 1, 1, 1, 1, 1, 1, 1]
tokens: ['▁Where', '▁was', '▁the', '▁city', '▁originally', '▁located', '?', '</s>']
context example 99:
prompt: You must answer the question using ONLY the information provided in the context below.
If the answer cannot be determined from the context, respond with "Not answerable from the given context."
Do not use any external knowledge.

Context:
Founded in 1670 as Charles Town in honor of King Charles II of England, Charleston adopted its present name in 1783. It moved to its present location on Oyster Point in 1680 from a location on the west bank of the Ashley River known as Albemarle Point. By 1690, Charles Town was the fifth-largest city in North America, and it remained among the 10 largest cities in the United States through the 1840 census. With a 2010 census population of 120,083  (and a 2014 estimate of 130,1

In [48]:
import torch
import torch.nn.functional as F
import pandas as pd

def get_router_logits_for_prompt(prompt):
    encoding = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
    input_ids = encoding["input_ids"].to(model.device)
    attention_mask = encoding["attention_mask"].to(model.device)

    classifier_outputs = []

    def hook(module, module_inputs, module_outputs):
        if isinstance(module_outputs, tuple):
            module_outputs = module_outputs[0]
        classifier_outputs.append(module_outputs)

    handle = router_classifier.register_forward_hook(hook)
    with torch.no_grad(), torch.autocast(model.device.type, dtype=model.dtype):
        _ = model.encoder(input_ids=input_ids, attention_mask=attention_mask)
    handle.remove()

    return classifier_outputs[0].float().cpu(), input_ids[0].cpu().tolist()


def find_subsequence(haystack, needle):
    if not needle or len(needle) > len(haystack):
        return None
    for start in range(0, len(haystack) - len(needle) + 1):
        if haystack[start:start + len(needle)] == needle:
            return start
    return None


def question_logits_from_context(context_prompt, question_text):
    logits, full_ids = get_router_logits_for_prompt(context_prompt)
    question_ids = tokenizer(question_text, return_tensors="pt", add_special_tokens=False)["input_ids"][0].tolist()
    start = find_subsequence(full_ids, question_ids)
    if start is None:
        return None
    end = start + len(question_ids)
    return logits[start:end], question_ids


def mean_routing_distribution_for_questions(base_questions, context_prompts, context_questions):
    total_base_probs = None
    total_context_probs = None
    total_base_tokens = 0
    total_context_tokens = 0

    for base_q, context_prompt, context_q in zip(base_questions, context_prompts, context_questions):
        base_logits, _ = get_router_logits_for_prompt(base_q)
        base_probs = F.softmax(base_logits, dim=-1)
        total_base_probs = base_probs.sum(dim=0) if total_base_probs is None else total_base_probs + base_probs.sum(dim=0)
        total_base_tokens += base_probs.shape[0]

        context_result = question_logits_from_context(context_prompt, context_q)
        if context_result is None:
            continue
        context_logits, _ = context_result
        context_probs = F.softmax(context_logits, dim=-1)
        total_context_probs = context_probs.sum(dim=0) if total_context_probs is None else total_context_probs + context_probs.sum(dim=0)
        total_context_tokens += context_probs.shape[0]

    base_mean = total_base_probs / total_base_tokens
    context_mean = total_context_probs / total_context_tokens
    return base_mean, context_mean


base_mean, context_mean = mean_routing_distribution_for_questions(
    base_prompts, context_prompts, context_questions
 )

df_mean = pd.DataFrame({
    "EXPERT": list(range(base_mean.numel())),
    "BASE_MEAN": base_mean.numpy(),
    "CONTEXT_MEAN": context_mean.numpy(),
})
df_mean["DELTA"] = df_mean["CONTEXT_MEAN"] - df_mean["BASE_MEAN"]
df_mean

Token indices sequence length is longer than the specified maximum sequence length for this model (567 > 512). Running this sequence through the model will result in indexing errors


Unnamed: 0,EXPERT,BASE_MEAN,CONTEXT_MEAN,DELTA
0,0,0.163211,0.16169,-0.00152
1,1,0.057861,0.064267,0.006406
2,2,0.18153,0.178404,-0.003126
3,3,0.133045,0.123711,-0.009335
4,4,0.077007,0.080151,0.003144
5,5,0.110977,0.099015,-0.011962
6,6,0.124609,0.131001,0.006393
7,7,0.15176,0.161761,0.01


In [49]:
# Sanity check: expert logits for identical question tokens under base vs context
sample_idx = min(0, len(base_prompts) - 1, len(context_questions) - 1)
if base_prompts and context_prompts and context_questions:
    base_text = base_prompts[sample_idx]
    context_full = context_prompts[sample_idx]
    context_question = context_questions[sample_idx]

    context_result = question_logits_from_context(context_full, context_question)
    if context_result is None:
        print("No question slice match found; cannot compare experts.")
    else:
        base_logits, base_ids = get_router_logits_for_prompt(base_text)
        context_logits, question_ids = context_result

        token_text = tokenizer.convert_ids_to_tokens(question_ids)
        print("Tokens:", token_text)
        print("\nExpert logits for BASE prompt (first 3 question tokens):")
        print(base_logits)
        print("\nExpert logits for CONTEXT prompt (same 3 question tokens):")
        print(context_logits)

        base_top = base_logits.argmax(dim=-1)
        context_top = context_logits.argmax(dim=-1)
        top_changes = (base_top != context_top).sum().item()
        total_tokens = base_top.numel()
        print("\nTop expert changes across question tokens:", f"{top_changes}/{total_tokens}")
        print("Top expert change rate:", top_changes / max(1, total_tokens))

Tokens: ['▁How', '▁many', '▁years', '▁do', '▁mom', 'o', 'tre', 'mes', '▁and', '▁the', 'rian', '▁mammals', '▁go', '▁back', '?']

Expert logits for BASE prompt (first 3 question tokens):
tensor([[-0.8008, -2.2500,  0.4746, -2.3438, -4.5000, -3.9844,  1.3828,  2.4531],
        [ 0.0684, -2.1094, -0.0962, -3.0781, -2.3594, -0.9336,  2.7344,  0.5352],
        [ 0.6797,  0.7070,  0.1914, -2.4375,  0.6797, -0.0466, -0.4258,  0.0864],
        [ 0.4258, -1.9375,  1.4141, -0.9141,  2.8750, -0.2334,  2.0312,  2.0156],
        [-1.5000, -0.7109,  1.1250, -0.9570, -0.3145,  2.7500,  0.3184,  1.2891],
        [ 4.8438, -0.8945,  1.9141, -1.7266, -1.5312,  1.0703, -0.4258, -1.1484],
        [ 4.1875, -1.8594,  2.0000, -0.9375, -1.7422,  2.5469,  0.4062, -0.4258],
        [ 3.9062, -3.7656,  0.3906, -0.9219, -2.6406, -0.9414, -1.6172, -1.9609],
        [ 0.0952, -1.0547,  3.4375, -1.8828, -2.6094,  0.3730,  0.8516,  1.2109],
        [ 1.0547, -0.1719,  1.1797, -1.9922, -2.5625,  2.4531,  1.0469,  0.49

## Observation

The single-example check shows that **adding context shifts routing logits for the same question tokens**. While the token IDs remain identical, the expert preferences change because the model encodes context-dependent representations.

To determine if this is a **systematic effect** rather than noise, we now measure how often the top expert assignment changes across **all 100 question tokens** when context is added.

In [50]:
# Measure top expert changes across all 100 questions
total_changes = 0
total_tokens = 0
mismatches = 0

for base_q, context_prompt, context_q in zip(base_prompts, context_prompts, context_questions):
    base_logits, _ = get_router_logits_for_prompt(base_q)
    context_result = question_logits_from_context(context_prompt, context_q)
    if context_result is None:
        mismatches += 1
        continue
    context_logits, _ = context_result
    base_top = base_logits.argmax(dim=-1)
    context_top = context_logits.argmax(dim=-1)
    total_changes += (base_top != context_top).sum().item()
    total_tokens += base_top.numel()

print(f"Total question tokens: {total_tokens}")
print(f"Top expert changes: {total_changes}")
print(f"Top expert change rate: {total_changes / max(1, total_tokens):.4f}")


Total question tokens: 1397
Top expert changes: 104
Top expert change rate: 0.0744


## Expert Activation Rates and Risk Difference

Compute the activation rate for each expert, defined as the proportion of tokens for which that expert is the top-1 (argmax) choice.

We compare:
- **question_base** (x^(1)): question-only prompts
- **question_context** (x^(2)): question tokens within full context prompts

The **Risk Difference** Δi = p_i^(1) - p_i^(2) quantifies how much more/less frequently expert i is activated in question_base vs question_context.

In [51]:
# Compute expert activation rates for question_base and question_context

num_experts = 8

# 1. question_base (x^(1)): question-only prompts
question_base_activations = torch.zeros(num_experts, dtype=torch.long)
question_base_total_tokens = 0

for base_prompt in base_prompts:
    logits, _ = get_router_logits_for_prompt(base_prompt)
    top_experts = torch.argmax(logits, dim=1)  # [num_tokens]
    
    for expert_idx in range(num_experts):
        question_base_activations[expert_idx] += (top_experts == expert_idx).sum().item()
    
    question_base_total_tokens += len(top_experts)

# 2. question_context (x^(2)): question tokens within full context prompts
question_context_activations = torch.zeros(num_experts, dtype=torch.long)
question_context_total_tokens = 0

for context_prompt, context_question in zip(context_prompts, context_questions):
    # Get question logits within the full context
    result = question_logits_from_context(context_prompt, context_question)
    if result is None:
        continue
    
    question_logits, _ = result
    
    # Top experts for question tokens only
    top_experts = torch.argmax(question_logits, dim=1)  # [num_question_tokens]
    
    for expert_idx in range(num_experts):
        question_context_activations[expert_idx] += (top_experts == expert_idx).sum().item()
    
    question_context_total_tokens += len(top_experts)

# Compute activation rates
question_base_rates = question_base_activations.float() / question_base_total_tokens
question_context_rates = question_context_activations.float() / question_context_total_tokens

# Compute Risk Difference
rd = question_base_rates - question_context_rates

# Create DataFrame for visualization
df_activation_rates = pd.DataFrame({
    'Expert': [f'Expert {i}' for i in range(num_experts)],
    'question_base (p^(1))': question_base_rates.numpy(),
    'question_context (p^(2))': question_context_rates.numpy(),
    'Risk Difference (Δi)': rd.numpy(),
    'question_base Count': question_base_activations.numpy(),
    'question_context Count': question_context_activations.numpy(),
})

print(f"question_base: N^(1) = {question_base_total_tokens} tokens")
print(f"question_context: N^(2) = {question_context_total_tokens} tokens")
print(f"\nActivation rates sum to 1.0: question_base={question_base_rates.sum():.4f}, question_context={question_context_rates.sum():.4f}\n")

df_activation_rates

question_base: N^(1) = 1397 tokens
question_context: N^(2) = 1397 tokens

Activation rates sum to 1.0: question_base=1.0000, question_context=1.0000



Unnamed: 0,Expert,question_base (p^(1)),question_context (p^(2)),Risk Difference (Δi),question_base Count,question_context Count
0,Expert 0,0.142448,0.141732,0.000716,199,198
1,Expert 1,0.050107,0.052255,-0.002147,70,73
2,Expert 2,0.136722,0.132427,0.004295,191,185
3,Expert 3,0.132427,0.127416,0.005011,185,178
4,Expert 4,0.133858,0.143164,-0.009306,187,200
5,Expert 5,0.129563,0.120974,0.00859,181,169
6,Expert 6,0.110952,0.114531,-0.003579,155,160
7,Expert 7,0.163923,0.167502,-0.003579,229,234
