# Automatic Validation of Hold-Out Test Sets with ROC-AUC Metric
This script builds on probe.ipynb, with the addition of a new automatic validation metric for example.txt and unrelated.txt.
For the purpose of the interview, I have added markdown sections with additional description/reasoning in this notebook.   
For the automatic validation, sentence transformers (SBERT) is used to provide an external 'benchmark' value for semantic textual similarity.   
Link to SBERT: https://sbert.net/index.html   
Link to explaination of semantic textual similarity: https://sbert.net/docs/sentence_transformer/usage/semantic_textual_similarity.html   

### Why was SBERT chosen?
SBERT was chosen over other semantic similiarity calculators (such as WuPalmer similarity) as it has the ability to compare similarity between not just words, but phrases and sentences. This is vital for our purpose of testing the probe, as it allows an input of token strings which consequently enables the validation of conceptual similarities, not just word similarity. 


In [1]:
# Install required libraries if not already installed (this currently isn't in requirements.txt)
# For installation: !pip install sentence_transformers

# Import libraries needed
import json
import numpy as np
import torch
from IPython.display import HTML, display
import html
import os
import nltk
from transformers import AutoTokenizer
from transformer_lens import HookedTransformer
from sentence_transformers import SentenceTransformer, util
from sklearn.metrics import roc_auc_score

# Download necessary NLTK data (for tokenization, if needed)
nltk.download('punkt')

# Load model once globally
def load_model(model_name="gemma-2-2b"):
    """Load the model and return it for reuse"""
    print(f"Loading model: {model_name}")
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = HookedTransformer.from_pretrained(model_name, device=device)
    return model, model.tokenizer, device

global_model, global_tokenizer, global_device = load_model()

# Define separators for grouping of tokens later
SEPARATORS = {'.', ',', '),', ').'}

def is_group_separator(token_strs, i):
    """
    Returns True if token_strs[i] is considered a separator for grouping.
    Tokens to be used as separators are: '.', ',', ').', '),'
    Note: '.' is not a separator if it is between digits - to avoid separating numbers.
    """
    token = token_strs[i].strip()
    if token not in SEPARATORS:
        return False
    # Check if '.' is between numeric tokens (e.g., "26.8")
    if token == '.':
        if i > 0 and i < len(token_strs) - 1:
            prev_tok = token_strs[i - 1].strip()
            next_tok = token_strs[i + 1].strip()
            if prev_tok.isdigit() and next_tok.isdigit():
                return False
    return True

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Loading model: gemma-2-2b


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]



Loaded pretrained model gemma-2-2b into HookedTransformer


## Separation of Tokens into Groups
### Why did I choose to separate into groups?
Originally I considered using each token separately, however it was clear that single tokens could not convey concepts by themselves as effectively as phrases/sentences could. Thus, a comparison between a single token and the concept seemed unfair and not effective for validation. I then considered using sentences, however sentences seemed too long and I thought that they weren't precise enough.    
Finally, I had the idea of separating by full-stops or commas, as the separation would create phrases which often express a single concept in language. Furthermore, I chose . or , as the activation most often occurs at them, due to the probe being trained on the final token.    
I wanted to make sure that each group only contained one . or , token, and so I chose to use them as separators.  

### How does the grouping work? 
The way the grouping works is that the tokens are grouped with the . or , that occurs after them (not before), due to the activation representing the concept in the tokens before (phase shift we talked about).    
For example: "He has high blood sugar, as he eats a lot of fast food." would be split into "He has high blood sugar," and "because he eats a lot of fast food."   

### Trouble shooting   
After I trialled this method, I saw that many of the phrases were not being split the way I imagined. After analysing the groups created and the tokens from the example text, I realised that this was because tokens such as ). and ), weren't being counted as separators. So, I added them to the separator list.

### Exceptions
I realised that numbers with decimals were being split into two groups due to the . token, so I added an exception for this case.

In [3]:
# For processing examples
def process_example(model, tokenizer, hook_name, probe, text, device):
    """Process a single example and return tokens and their activation scores."""
    # Tokenize the text
    tokens = tokenizer.encode(text, return_tensors="pt").to(device)

    # Get token strings
    token_strs = [tokenizer.decode(t).replace('▁', ' ') for t in tokens[0]]
    
    # Run model with cache to extract activations at the specified hook
    with torch.no_grad():
        _, cache = model.run_with_cache(tokens, names_filter=[hook_name])
        activations = cache[hook_name]
    
    # Apply the probe to each token position
    scores = []
    for pos in range(activations.shape[1]):
        # Get activations for this position
        pos_activations = activations[0, pos].cpu().numpy().reshape(1, -1)

        # Apply the probe to get probability of positive class
        score = probe.predict_proba(pos_activations)[0, 1] 
        scores.append(float(score))
    
    return token_strs, scores

# Load model for semantic similarity
semantic_model = SentenceTransformer('all-MiniLM-L6-v2')

def compute_semantic_similarity(phrase, concept_text):
    """
    Compute cosine similarity between 'phrase' and 'concept_text' using Sentence-BERT.
    Returns a float in the range [-1, 1].
    """
    emb_phrase = semantic_model.encode(phrase, convert_to_tensor=True)
    emb_concept = semantic_model.encode(concept_text, convert_to_tensor=True)
    # Calculate cosine similarity and return as a float
    return util.cos_sim(emb_phrase, emb_concept).item()

# Calculate activation validation score by summing each token's contribution.
def compute_activation_validation_score(token_strs, scores, concept_text):
    """
    Sums the activation-validation scores for the whole text.
    Each token's contribution is: activation * abs(similarity of its group).
    The function groups tokens by separators and then sums the contributions.
    """
    total = 0.0
    current_group_tokens = []
    current_group_scores = []
    # Loop through tokens and accumulate group data
    for i, (token, score) in enumerate(zip(token_strs, scores)):
        current_group_tokens.append(token)
        current_group_scores.append(score)
        # when a separator is reached, finalise the group
        if is_group_separator(token_strs, i):
            group_string = "".join(current_group_tokens).strip()
            similarity = compute_semantic_similarity(group_string, concept_text)
            total += sum(sc * abs(similarity) for sc in current_group_scores)
            # Reset group lists for the next group
            current_group_tokens = []
            current_group_scores = []
    # Process any remaining tokens not ending with a seperator
    if current_group_tokens:
        group_string = "".join(current_group_tokens).strip()
        similarity = compute_semantic_similarity(group_string, concept_text)
        total += sum(sc * abs(similarity) for sc in current_group_scores)
    return total

# Group tokens and return token-level details for printing.
def match_tokens_to_groups(token_strs, scores, concept_text):
    """
    Groups tokens based on defined separators and computes activation validation scores.
    Explicitly tracks individual tokens and returns a list of dictionaries.
    Each dictionary includes token index, token text, group text, activation,
    semantic similarity (original value), and activation_validation (activation * abs(similarity)).
    """
    rows = []
    current_group_tokens = []
    current_group_indices = []
    current_group_scores = []
    for i, (token, score) in enumerate(zip(token_strs, scores)):
        current_group_tokens.append(token)
        current_group_scores.append(score)
        current_group_indices.append(i)
        # Check if current token is a group seperator
        if is_group_separator(token_strs, i):
            group_string = "".join(current_group_tokens).strip()
            similarity = compute_semantic_similarity(group_string, concept_text)
            # Loop over tokens in the group and compute their contributions
            for idx, tok, sc in zip(current_group_indices, current_group_tokens, current_group_scores):
                act_val = sc * abs(similarity)
                rows.append({
                    "index": idx,
                    "token": tok,
                    "group": group_string,
                    "activation": sc,
                    "concept": concept_text,
                    "similarity": similarity,
                    "activation_validation": act_val
                })
            # Reset group lists
            current_group_tokens = []
            current_group_scores = []
            current_group_indices = []
    # If tokens remain in the last group, process them
    if current_group_tokens:
        group_string = "".join(current_group_tokens).strip()
        similarity = compute_semantic_similarity(group_string, concept_text)
        for idx, tok, sc in zip(current_group_indices, current_group_tokens, current_group_scores):
            act_val = sc * abs(similarity)
            rows.append({
                "index": idx,
                "token": tok,
                "group": group_string,
                "activation": sc,
                "concept": concept_text,
                "similarity": similarity,
                "activation_validation": act_val
            })
    return rows

# Print a text block table of token-level contributions.
def print_token_contributions_table(rows, label="Example Text"):
    """
    Print a fixed-width table of tokens with non-zero activation.
    Columns: Index, Token, Group, Activation, Similarity, ActVal.
    Note: Similarity is printed as original (may be negative) while activation is multiplied by its absolute value.
    """
    threshold = 0.001  # Only include tokens with activation above this threshold
    filtered = [r for r in rows if abs(r["activation"]) > threshold]
    
    if not filtered:
        print(f"\nNo non-zero activations found for {label}.")
        return
    
    # Define fixed column widths for consistent formatting
    header_format = "{:<5} | {:<15} | {:<75} | {:>12} | {:>12} | {:>12}"
    line_width = 5 + 3 + 15 + 3 + 75 + 3 + 12 + 3 + 12 + 3 + 12
    header = header_format.format("Idx", "Token", "Group", "Activation", "Similarity", "ActVal")
    separator = "-" * line_width
    table_lines = [f"\n--- Token-Level Contributions ({label}) ---", header, separator]
    
    for r in filtered:
        idx_str = str(r["index"])
        token_str = r["token"].replace("\n", " ") # remove spaces/new lines for formatting
        group_str = r["group"].replace("\n", " ")
        activation_str = f"{r['activation']:.4f}"
        similarity_str = f"{r['similarity']:.4f}"
        av_str = f"{r['activation_validation']:.4f}"
        line = header_format.format(
            idx_str,
            token_str[:15],
            group_str[:75], # truncates if longer than this
            activation_str,
            similarity_str,
            av_str
        )
        table_lines.append(line)
    
    print("\n".join(table_lines))

## Activation Validation
### What is the semantic similarity calculated for?   
The semantic similarity is calculated between the concept and the group of tokens.    
### How is the activation validation calculated?   
A token activation validation is calculated for each token by:    
activation validation = activation of token x absolute semantic similarity of the group the token is in   
I have chosen to use the group as for the vast majority of the examples, the activation occurs in the . or , at the end of the phrase.     
However, this activation represents the concept of the group of tokens just before the . or ,.   
Note: I know that for token in the middle of the group, only the tokens prior to the token should be counted - this is an area for improvement/further consideration due to time restraints.   
The activation validation scores of all the tokens are then summed across the text to the produce the overall score for the text.    
As an indication of overall performance, the difference between the activation validation of the example.txt and the unrelated.txt is also given as a result.
### Why use absolute semantic similarity? 
Absolute semantic similarity is used as semantic similarity ranges from -1 to 1. 
A score of 0 represents that the phrases are completely unrelated. 1 means they are perfectly aligned in meaning, where are -1 represents that they are opposite in meaning.   
I believe that in this context, the negative scores still show conceptual links. This is because in a medical context, an opposite meaning may give just as much information regarding a concept as a positive one.    
For example, the phrase: "John never drinks alcohol." and the concept "high_total_cholesterol" may have negative semantic similarity, but it would make sense for the probe to activate.   
Originally I didn't realise this, but after checking through each of the phrases, I realised that the negative semantic similarity scores often did indicate meaningful relations, and so I decided to use an absolute measure instead.   
### Why use multiplication for the metric? 
I debated between using the multiplication of semantic similarity with activation vs using their absolute difference. In the end, I chose multiplication due to several reasons:   
- Tokens with higher activations are "rewarded", with the amount depending on how high the semantic similarity score is
- Tokens with activation of 0 would result in an 0 for the validation score - this helps focus on the tokens that are activated and reduces calculation complexity
Note: a future avenue to explore would be using different ways to calculate this metric - a better method can probably be found that has the most meaningfulness for the validation. 
### Addition future considerations
Right now, certain groups of tokens are often summed more than once due to there being multiple tokens afterwards being activated (e.g. /n, space, .). They do have different activations (which is why I chose to keep it), but a future direction would be to see if it would be better to just count each group once (as it's the same concept).

In [7]:
# Compute ROC AUC using token-level activation scores (not activation validation)
def compute_token_level_roc_auc_new(example_rows, unrelated_rows, label_threshold=0.5):
    """
    Compute ROC AUC for token-level activation scores (raw probe outputs).
    
    Labeling Method:
    - Find the first token in example text where activation exceeds label_threshold.
    - Tokens before this token are labeled as 0 (negative).
    - Tokens at or after this token are labeled as 1 (positive).
    - All tokens from the unrelated text are labeled as 0.
    
    Uses raw activations to measure the probe’s effectiveness at detecting the concept.
    """
    # Sort example rows by token index
    example_sorted = sorted(example_rows, key=lambda r: r["index"])
    
    # Find the first token (mark token) where activation exceeds the threshold
    mark_index = None
    for r in example_sorted:
        if r["activation"] > label_threshold:
            mark_index = r["index"]
            break
    if mark_index is None:
        print("No token in example text reaches the threshold; cannot compute ROC AUC.")
        return None
    
    # Label example tokens: tokens with index >= mark_index get label 1; those before get label 0.
    y_example = [1 if r["index"] >= mark_index else 0 for r in example_sorted]
    scores_example = [r["activation"] for r in example_sorted] 
    
    # For unrelated text, label all tokens as 0.
    unrelated_sorted = sorted(unrelated_rows, key=lambda r: r["index"])
    y_unrelated = [0] * len(unrelated_sorted)
    scores_unrelated = [r["activation"] for r in unrelated_sorted] 
    
    # Combine labels and scores from both texts
    y_true = y_example + y_unrelated
    y_scores = scores_example + scores_unrelated
    
    return roc_auc_score(y_true, y_scores)

# Validate example and unrelated text, then print overall and token-level details.
def validate_example_unrelated(example_path, unrelated_path, concept_key, concept_string=None, layer=22):
    """
    1) Load the probe for the concept.
    2) Compute token-level activations for example and unrelated texts.
    3) Sum activation validation (using abs semantic similarity) and print overall difference.
    4) Print a detailed table of non-zero tokens for both texts.
    5) Compute and print token-level ROC AUC
    """
    if concept_string is None:
        concept_string = concept_key.replace("_", " ")
    
    # Set up probe file paths
    probe_dir = os.path.join("probes", concept_key)
    joblib_path = os.path.join(probe_dir, "probe.joblib")
    pkl_path = os.path.join(probe_dir, "probe.pkl")
    config_path = os.path.join(probe_dir, "config.json")
    
    # Load the probe model (check joblib first, then pickle)
    if os.path.exists(joblib_path):
        import joblib
        probe = joblib.load(joblib_path)
        print(f"Loaded probe from {joblib_path}")
    elif os.path.exists(pkl_path):
        import pickle
        with open(pkl_path, 'rb') as f:
            probe = pickle.load(f)
        print(f"Loaded probe from {pkl_path}")
    else:
        print(f"Probe not found at {joblib_path} or {pkl_path}")
        return
    
    # Load configuration if available and update concept string if needed
    if os.path.exists(config_path):
        with open(config_path, "r") as f:
            config = json.load(f)
        concept_string = config.get("concept", concept_string)
    
    # Define the hook name for extraction
    hook_name = f"blocks.{layer}.hook_resid_post"
    
    # Process Example Text
    with open(example_path, "r", encoding="utf-8") as f:
        example_text = f.read().strip()
    example_tokens, example_scores = process_example(global_model, global_tokenizer, hook_name, probe, example_text, global_device)
    example_activation_validation = compute_activation_validation_score(example_tokens, example_scores, concept_string)
    
    # Process Unrelated Text
    with open(unrelated_path, "r", encoding="utf-8") as f:
        unrelated_text = f.read().strip()
    unrelated_tokens, unrelated_scores = process_example(global_model, global_tokenizer, hook_name, probe, unrelated_text, global_device)
    unrelated_activation_validation = compute_activation_validation_score(unrelated_tokens, unrelated_scores, concept_string)
    
    overall_difference = example_activation_validation - unrelated_activation_validation
    
    # Print summary results
    print("\n=== Activation Validation Results ===")
    print(f"Concept Key: {concept_key}")
    print(f"Concept String: '{concept_string}'")
    print(f"Example Text Score:   {example_activation_validation:.4f}")
    print(f"Unrelated Text Score: {unrelated_activation_validation:.4f}")
    print(f"Overall Difference:   {overall_difference:.4f}")
    
    # Get token-level details for printing
    example_rows = match_tokens_to_groups(example_tokens, example_scores, concept_string)
    print_token_contributions_table(example_rows, label="Example Text")
    
    unrelated_rows = match_tokens_to_groups(unrelated_tokens, unrelated_scores, concept_string)
    print_token_contributions_table(unrelated_rows, label="Unrelated Text")
    
    # Compute and print ROC AUC
    roc_auc = compute_token_level_roc_auc_new(example_rows, unrelated_rows, label_threshold=0.5)
    # Can change the label threshold
    if roc_auc is not None:
        print(f"\nToken-level ROC AUC: {roc_auc:.4f}")

## ROC-AUC Metric Explanation
In our script, each token (or its activation validation score) is treated as a sample. For the labelling, we label all the tokens in the unrelated text as 0. For the tokens in the example text, we use a different method. We first find the token where activation first exceeds a configurable threshold (default 0.5). Tokens before that token get label 0 and tokens at/after that token get label 1. Note, the activation threshold can be changed. 

The token’s activation serves as the classifier’s score. The idea is that tokens with high scores should belong to the example text (positive) and those with low scores should belong to the unrelated text (negative). 

By comparing the distribution of these scores for tokens labeled as 1 versus those labeled as 0 across various thresholds, the ROC curve is generated. The AUC quantifies how well these two groups are separated. A higher AUC means that the activation validation scores are effective at distinguishing tokens from the two texts.   

Note: I thought about using the activation validation as the classifier's score, however I think that using activation validation (which includes similarity) could make the evaluation dependent on semantic similarity, rather than just on the probe's effectiveness.

In [8]:
# List available concepts from JSON file
def list_available_concepts(json_file_path):
    """
    List all available concepts in the JSON file.
    
    Parameters:
    -----------
    json_file_path : str
        Path to the JSON file containing the concepts
    
    Returns:
    --------
    List of available concepts
    """
    with open(json_file_path, 'r') as file:
        data = json.load(file)
    concepts = data['concepts']
    return [concept.replace(" ", "_") for concept in concepts]

# Visualisation Function
def visualize_concept_on_text(text, concept_key, model=global_model, tokenizer=global_tokenizer, layer=22):
    """
    Create an HTML visualization of token-level activations for a concept on user input text.
    
    Args:
        text: The text to analyze
        concept_key: The concept key (e.g., 'elevated_LDL_cholesterol')
        model: Pre-loaded model (uses global model by default)
        tokenizer: Pre-loaded tokenizer
        layer: Layer to extract representations from
    """
    # Load the probe
    probe_dir = os.path.join("probes", concept_key)
    joblib_path = os.path.join(probe_dir, "probe.joblib")
    pkl_path = os.path.join(probe_dir, "probe.pkl")
    config_path = os.path.join(probe_dir, "config.json")
    
    # Check for both joblib and pkl files
    if os.path.exists(joblib_path):
        import joblib
        probe = joblib.load(joblib_path)
        print(f"Loaded probe from {joblib_path}")
    elif os.path.exists(pkl_path):
        import pickle
        with open(pkl_path, 'rb') as f:
            probe = pickle.load(f)
        print(f"Loaded probe from {pkl_path}")
    else:
        print(f"Probe not found at {joblib_path} or {pkl_path}")
        return None
    
    # Load the config to get the concept name
    if not os.path.exists(config_path):
        print(f"Config not found at {config_path}")
        return None
    
    with open(config_path, "r") as f:
        config = json.load(f)
    
    concept = config.get("concept", concept_key.replace("_", " "))
    
    # Hook name for the residual stream at the specified layer
    hook_name = f"blocks.{layer}.hook_resid_post"
    
    # Process user input text
    tokens, scores = process_example(model, tokenizer, hook_name, probe, text, global_device)
    
    # Create HTML output
    html_output = f"<h2>Activation visualization for concept: '{concept}'</h2>"
    html_output += "<div style='line-height: 2.5; font-family: monospace; font-size: 14px;'>"
    
    for i, (token, score) in enumerate(zip(tokens, scores)):
        # Escape HTML special characters
        escaped_token = html.escape(token)

        # Calculate color intensity based on activation
        green_intensity = 255
        other_intensity = int(255 * (1 - score))
        color = f"rgb({other_intensity}, {green_intensity}, {other_intensity})"

        # Create token span
        html_output += f"""<span title='Token: "{escaped_token}"
Position: #{i}
Activation: {score:.4f}' style='background-color: {color}; padding: 3px; border-radius: 3px; margin: 1px;'>{escaped_token}</span>"""
    
    html_output += "</div>"
    # Add color scale
    html_output += """
    <div style='margin-top: 20px;'>
        <h3>Color Scale</h3>
        <div style='display: flex; width: 500px;'>
            <span style='background-color: rgb(255, 255, 255); width: 100px; padding: 10px; text-align: center;'>0.0</span>
            <span style='background-color: rgb(192, 255, 192); width: 100px; padding: 10px; text-align: center;'>0.25</span>
            <span style='background-color: rgb(128, 255, 128); width: 100px; padding: 10px; text-align: center;'>0.5</span>
            <span style='background-color: rgb(64, 255, 64); width: 100px; padding: 10px; text-align: center;'>0.75</span>
            <span style='background-color: rgb(0, 255, 0); width: 100px; padding: 10px; text-align: center;'>1.0</span>
        </div>
    </div>
    """
    return HTML(html_output)

# Main usage example
# Load example and unrelated text
with open("inputs/example.txt", "r", encoding="utf-8") as f:
    example_text = f.read()

with open("inputs/unrelated.txt", "r", encoding="utf-8") as f:
    unrelated_text = f.read()

# Load concepts and process each
concepts = list_available_concepts("inputs/concepts_copy.json")
for concept in concepts:
    print("\n============================================")
    print(f"Concept: {concept}")
    
    print("\n--- Visualizing Example Text ---")
    display(visualize_concept_on_text(example_text, concept))
    
    print("\n--- Visualizing Unrelated Text ---")
    display(visualize_concept_on_text(unrelated_text, concept))
    
    print("\n--- Validation Results ---")
    validate_example_unrelated("inputs/example.txt", "inputs/unrelated.txt", concept)



Concept: heavy_alcohol_use

--- Visualizing Example Text ---
Loaded probe from probes/heavy_alcohol_use/probe.joblib



--- Visualizing Unrelated Text ---
Loaded probe from probes/heavy_alcohol_use/probe.joblib



--- Validation Results ---
Loaded probe from probes/heavy_alcohol_use/probe.joblib

=== Activation Validation Results ===
Concept Key: heavy_alcohol_use
Concept String: 'heavy alcohol use'
Example Text Score:   1.2375
Unrelated Text Score: 0.0000
Overall Difference:   1.2375

--- Token-Level Contributions (Example Text) ---
Idx   | Token           | Group                                                                       |   Activation |   Similarity |       ActVal
--------------------------------------------------------------------------------------------------------------------------------------------------
126   | .               | or depressive symptoms.                                                     |       0.0219 |       0.1948 |       0.0043
164   | ).              | though BP remains mildly elevated (148/92 mmHg).                            |       0.9126 |       0.2193 |       0.2002
247   | .               | with a BMI of 28.7 (overweight).                           


--- Visualizing Unrelated Text ---
Loaded probe from probes/elevated_LDL_cholesterol/probe.joblib



--- Validation Results ---
Loaded probe from probes/elevated_LDL_cholesterol/probe.joblib

=== Activation Validation Results ===
Concept Key: elevated_LDL_cholesterol
Concept String: 'elevated LDL cholesterol'
Example Text Score:   4.3889
Unrelated Text Score: 0.0261
Overall Difference:   4.3628

--- Token-Level Contributions (Example Text) ---
Idx   | Token           | Group                                                                       |   Activation |   Similarity |       ActVal
--------------------------------------------------------------------------------------------------------------------------------------------------
150   | ,               | currently managed with Amlodipine 5 mg daily,                               |       1.0000 |       0.2025 |       0.2025
164   | ).              | though BP remains mildly elevated (148/92 mmHg).                            |       0.9615 |       0.3680 |       0.3538
252   | ).              | with a BMI of 28.7 (overweight).      


--- Visualizing Unrelated Text ---
Loaded probe from probes/low_HDL_cholesterol/probe.joblib



--- Validation Results ---
Loaded probe from probes/low_HDL_cholesterol/probe.joblib

=== Activation Validation Results ===
Concept Key: low_HDL_cholesterol
Concept String: 'low HDL cholesterol'
Example Text Score:   3.4613
Unrelated Text Score: 0.0000
Overall Difference:   3.4613

--- Token-Level Contributions (Example Text) ---
Idx   | Token           | Group                                                                       |   Activation |   Similarity |       ActVal
--------------------------------------------------------------------------------------------------------------------------------------------------
138   | ),              | His medical history includes hypertension (diagnosed three years ago),      |       0.9994 |       0.1850 |       0.1849
150   | ,               | currently managed with Amlodipine 5 mg daily,                               |       1.0000 |       0.2207 |       0.2207
164   | ).              | though BP remains mildly elevated (148/92 mmHg).     


--- Visualizing Unrelated Text ---
Loaded probe from probes/high_total_cholesterol/probe.joblib



--- Validation Results ---
Loaded probe from probes/high_total_cholesterol/probe.joblib

=== Activation Validation Results ===
Concept Key: high_total_cholesterol
Concept String: 'high total cholesterol'
Example Text Score:   2.7351
Unrelated Text Score: 0.0000
Overall Difference:   2.7351

--- Token-Level Contributions (Example Text) ---
Idx   | Token           | Group                                                                       |   Activation |   Similarity |       ActVal
--------------------------------------------------------------------------------------------------------------------------------------------------
146   |                 | currently managed with Amlodipine 5 mg daily,                               |       1.0000 |       0.2449 |       0.2449
252   | ).              | with a BMI of 28.7 (overweight).                                            |       0.1907 |       0.2253 |       0.0430
278   |                 | Laboratory investigations reveal dyslipidemi


--- Visualizing Unrelated Text ---
Loaded probe from probes/not_previously_on_statin/probe.joblib



--- Validation Results ---
Loaded probe from probes/not_previously_on_statin/probe.joblib

=== Activation Validation Results ===
Concept Key: not_previously_on_statin
Concept String: 'not previously on statin'
Example Text Score:   2.3935
Unrelated Text Score: 0.3465
Overall Difference:   2.0470

--- Token-Level Contributions (Example Text) ---
Idx   | Token           | Group                                                                       |   Activation |   Similarity |       ActVal
--------------------------------------------------------------------------------------------------------------------------------------------------
126   | .               | or depressive symptoms.                                                     |       0.0092 |       0.0823 |       0.0008
209   | .               | increasing his cardiovascular risk.                                         |       1.0000 |       0.1932 |       0.1932
252   | ).              | with a BMI of 28.7 (overweight).      


--- Visualizing Unrelated Text ---
Loaded probe from probes/dyslipidemia/probe.joblib



--- Validation Results ---
Loaded probe from probes/dyslipidemia/probe.joblib

=== Activation Validation Results ===
Concept Key: dyslipidemia
Concept String: 'dyslipidemia'
Example Text Score:   22.7931
Unrelated Text Score: 0.0234
Overall Difference:   22.7697

--- Token-Level Contributions (Example Text) ---
Idx   | Token           | Group                                                                       |   Activation |   Similarity |       ActVal
--------------------------------------------------------------------------------------------------------------------------------------------------
138   | ),              | His medical history includes hypertension (diagnosed three years ago),      |       0.0099 |       0.2271 |       0.0023
150   | ,               | currently managed with Amlodipine 5 mg daily,                               |       0.9853 |       0.1383 |       0.1363
164   | ).              | though BP remains mildly elevated (148/92 mmHg).                        


--- Visualizing Unrelated Text ---
Loaded probe from probes/atorvastatin/probe.joblib



--- Validation Results ---
Loaded probe from probes/atorvastatin/probe.joblib

=== Activation Validation Results ===
Concept Key: atorvastatin
Concept String: 'atorvastatin'
Example Text Score:   14.0566
Unrelated Text Score: 0.0000
Overall Difference:   14.0566

--- Token-Level Contributions (Example Text) ---
Idx   | Token           | Group                                                                       |   Activation |   Similarity |       ActVal
--------------------------------------------------------------------------------------------------------------------------------------------------
132   |  (              | His medical history includes hypertension (diagnosed three years ago),      |       0.9996 |       0.2831 |       0.2830
141   |  with           | currently managed with Amlodipine 5 mg daily,                               |       1.0000 |       0.3080 |       0.3080
146   |                 | currently managed with Amlodipine 5 mg daily,                           


--- Visualizing Unrelated Text ---
Loaded probe from probes/acute_liver_disease/probe.joblib



--- Validation Results ---
Loaded probe from probes/acute_liver_disease/probe.joblib

=== Activation Validation Results ===
Concept Key: acute_liver_disease
Concept String: 'acute liver disease'
Example Text Score:   1.6488
Unrelated Text Score: 0.5657
Overall Difference:   1.0831

--- Token-Level Contributions (Example Text) ---
Idx   | Token           | Group                                                                       |   Activation |   Similarity |       ActVal
--------------------------------------------------------------------------------------------------------------------------------------------------
330   |  Liver          | Liver and renal function tests are normal,                                  |       0.0212 |       0.3864 |       0.0082
410   |  persistent     | persistent liver enzyme elevation,                                          |       0.0018 |       0.5169 |       0.0009
411   |  liver          | persistent liver enzyme elevation,                   


--- Visualizing Unrelated Text ---
Loaded probe from probes/elevated_liver_enzymes/probe.joblib



--- Validation Results ---
Loaded probe from probes/elevated_liver_enzymes/probe.joblib

=== Activation Validation Results ===
Concept Key: elevated_liver_enzymes
Concept String: 'elevated liver enzymes'
Example Text Score:   1.8290
Unrelated Text Score: 0.2242
Overall Difference:   1.6048

--- Token-Level Contributions (Example Text) ---
Idx   | Token           | Group                                                                       |   Activation |   Similarity |       ActVal
--------------------------------------------------------------------------------------------------------------------------------------------------
21    |  with           | presents for a routine check-up with occasional headaches and fatigue persi |       0.3549 |       0.0939 |       0.0333
31    | .               | presents for a routine check-up with occasional headaches and fatigue persi |       0.9992 |       0.0939 |       0.0939
74    | .               | worsening by end of the workday but relieved


--- Visualizing Unrelated Text ---
Loaded probe from probes/pregnancy/probe.joblib



--- Validation Results ---
Loaded probe from probes/pregnancy/probe.joblib

=== Activation Validation Results ===
Concept Key: pregnancy
Concept String: 'pregnancy'
Example Text Score:   0.1080
Unrelated Text Score: 0.0000
Overall Difference:   0.1080

--- Token-Level Contributions (Example Text) ---
Idx   | Token           | Group                                                                       |   Activation |   Similarity |       ActVal
--------------------------------------------------------------------------------------------------------------------------------------------------
150   | ,               | currently managed with Amlodipine 5 mg daily,                               |       0.0021 |       0.0033 |       0.0000
164   | ).              | though BP remains mildly elevated (148/92 mmHg).                            |       0.7130 |       0.0201 |       0.0143
209   | .               | increasing his cardiovascular risk.                                         |      


--- Visualizing Unrelated Text ---
Loaded probe from probes/renal_impairment/probe.joblib



--- Validation Results ---
Loaded probe from probes/renal_impairment/probe.joblib

=== Activation Validation Results ===
Concept Key: renal_impairment
Concept String: 'renal impairment'
Example Text Score:   0.5390
Unrelated Text Score: 0.3408
Overall Difference:   0.1982

--- Token-Level Contributions (Example Text) ---
Idx   | Token           | Group                                                                       |   Activation |   Similarity |       ActVal
--------------------------------------------------------------------------------------------------------------------------------------------------
126   | .               | or depressive symptoms.                                                     |       0.0418 |       0.2018 |       0.0084
146   |                 | currently managed with Amlodipine 5 mg daily,                               |       1.0000 |       0.1342 |       0.1342
164   | ).              | though BP remains mildly elevated (148/92 mmHg).              


--- Visualizing Unrelated Text ---
Loaded probe from probes/hypothyroidism/probe.joblib



--- Validation Results ---
Loaded probe from probes/hypothyroidism/probe.joblib

=== Activation Validation Results ===
Concept Key: hypothyroidism
Concept String: 'hypothyroidism'
Example Text Score:   0.8957
Unrelated Text Score: 0.0000
Overall Difference:   0.8957

--- Token-Level Contributions (Example Text) ---
Idx   | Token           | Group                                                                       |   Activation |   Similarity |       ActVal
--------------------------------------------------------------------------------------------------------------------------------------------------
164   | ).              | though BP remains mildly elevated (148/92 mmHg).                            |       1.0000 |       0.1693 |       0.1693
252   | ).              | with a BMI of 28.7 (overweight).                                            |       0.2201 |       0.0590 |       0.0130
271   |  reveal         | Laboratory investigations reveal dyslipidemia: total cholesterol 230

## How to use activation validation score?
Firstly, whilst this is an automated validation method, some level of human observation is still required. Overall, I would say that a difference of 0.5 above between the example and unrelated text score is at least required to show that there is a difference in the probe's activity. However, this is really varied depending on what the example text score actually is and also how related the example text itself is to the concept.    
It is benefitical to look at the ratio of the example and unrelated text scores and also to look at the individual activations as well.

## ROC-AUC Validation
To validate ROC-AUC scores, the scores for each of the concepts are shown below in highest to lowest.


In [1]:
# Dictionary to store ROC AUC scores
concept_auc_scores = {}

# Load concepts and compute ROC AUC for each
concepts = list_available_concepts("inputs/concepts_copy.json")

for concept in concepts:
    print("\n============================================")
    print(f"Concept: {concept}")
    
    # Validate and compute ROC AUC
    example_rows = match_tokens_to_groups(example_tokens, example_scores, concept)
    unrelated_rows = match_tokens_to_groups(unrelated_tokens, unrelated_scores, concept)
    
    roc_auc = compute_token_level_roc_auc_new(example_rows, unrelated_rows, label_threshold=0.5)
    
    if roc_auc is not None:
        concept_auc_scores[concept] = roc_auc
        print(f"ROC AUC Score: {roc_auc:.4f}")

# Sort and print the concepts by ROC AUC in descending order
sorted_concepts = sorted(concept_auc_scores.items(), key=lambda x: x[1], reverse=True)

print("\n=== Concepts Ranked by ROC AUC (Highest to Lowest) ===")
for concept, auc in sorted_concepts:
    print(f"{concept}: {auc:.4f}")


NameError: name 'list_available_concepts' is not defined