

**Reversed Analysis Prompt**

I want to build a table to present an analysis about roles in models, with models as rows and roles as columns. Do it in python code

**Directory Structure Overview:**

- There is a main directory with subdirectories for each model.
- Within each model directory, there are subdirectories for each role.
- Inside every role directory, you will find eight (split) subdirectories. (At this level, ignore both the `generate_directions` and the `select_direction` directories if they appear outside the designated paths.)
- Use the following mapping to associate splits with roles:

```python
ROLE_DATASET_MAPPING = {
    "econ": ["economic researcher", "economist", "financial analyst"],
    "eecs": ["electronics technician", "data scientist", "electrical engineer", "software engineer", "web developer"],
    "law": ["bailiff", "lawyer"],
    "math": ["data analyst", "mathematician", "statistician"],
    "medicine": ["nurse", "doctor", "physician", "dentist", "surgeon"],
    "natural_science": ["geneticist", "biologist", "physicist", "teacher", "chemist", "ecologist"],
    "politics": ["politician", "sheriff", "governor", "enthusiast", "partisan"],
    "psychology": ["psychologist"]
}
```

**Analysis Steps (Reversed Order):**

1. **Identify Passed Directions from Test Results:**

   - For each model and role, navigate to the test directions at version 3.0:
     
     ```
     {models}\{model}\{role}\test_direction\3.0
     ```
     
   - In this directory, you will find multiple JSON files, each corresponding to a direction (named as `{layer}_{position}.json`).
   - Open each file and examine the JSON array. For every direction object, check if the `"passed"` field is `"Yes"`.
   - Collect all unique `(layer, position)` pairs from these files that have `"passed": "Yes"`. This constitutes your *passed directions set*.

2. **Retrieve Directions from Role’s Split Data:**

   - Now, for the same model/role, explore the split directories associated with that role.
   - For each applicable split (ignoring any unrelated ones), go into the following paths to retrieve the directions:
     
     - **First Source:**  
       ```
       {models}\{model}\{role}\{split}\1.0\select_direction\direction_evaluations_filtered.json
       ```
     
     - **Second Source:**  
       ```
       {models}\{model}\{role}\{split}\3.0\select_direction\direction_evaluations_filtered.json
       ```
     
   - From each `direction_evaluations_filtered.json` file, extract the set of unique `(layer, position)` pairs.

3. **Compare and Calculate the Percentage:**

   - For each model/role, determine how many directions from your *passed directions set* (from Step 1) are present in the union of the `(layer, position)` pairs from the select_direction files (from Step 2).
   - Calculate the percentage using the formula:
     
     \[
     \text{Percentage} = \frac{\text{Number of Passed Directions Found in Select\_Direction Files}}{\text{Total Number of Passed Directions}} \times 100
     \]
     
4. **Populate the Table:**

   - Create a table where the rows represent models and the columns represent roles.
   - In each cell (model/role), report the computed percentage from Step 3.

This reversed approach starts with collecting the directions that successfully passed the test and then checks for their presence in the role’s split (select_direction) data. The final table will show, for each model and role, what percentage of the passed directions are accounted for in the split data.

In [None]:
import os
import json
import glob
import pandas as pd

# Mapping from splits to associated roles (note: "governor" has been removed)
ROLE_DATASET_MAPPING = {
    "econ": ["economic researcher", "economist", "financial analyst"],
    "eecs": ["electronics technician", "data scientist", "electrical engineer", "software engineer", "web developer"],
    "law": ["bailiff", "lawyer"],
    "math": ["data analyst", "mathematician", "statistician"],
    "medicine": ["nurse", "doctor", "physician", "dentist", "surgeon"],
    "natural_science": ["geneticist", "biologist", "physicist", "teacher", "chemist", "ecologist"],
    "politics": ["politician", "sheriff", "enthusiast", "partisan"],
    "psychology": ["psychologist"]
}

def debug_print(message):
    """Simple debug print function."""
    print(f"DEBUG: {message}")

def evaluate_direction(role_dir, direction):
    """
    Given a role directory and a (layer, position) tuple, check the test result.
    The test file is expected at:
       {role_dir}/test_direction/3.0/{layer}_{position}.json
    Returns True if any entry in that file has "passed" == "Yes" (case insensitive),
    otherwise returns False.
    """
    layer, position = direction
    filename = f"{layer}_{position}.json"
    test_file = os.path.join(role_dir, "test_direction", "3.0", filename)
    debug_print(f"Evaluating direction {direction} using test file: {test_file}")
    
    if not os.path.isfile(test_file):
        debug_print(f"Test file not found: {test_file}")
        return False

    try:
        with open(test_file, "r") as f:
            results = json.load(f)
            for res in results:
                passed_val = str(res.get("passed", "")).strip().lower()
                debug_print(f"In file {test_file}, found passed value: '{passed_val}'")
                if passed_val == "yes":
                    return True
    except Exception as e:
        debug_print(f"Error reading {test_file}: {e}")
    return False

def get_split_for_role(role_name):
    """
    Given a role name, return the corresponding split key from ROLE_DATASET_MAPPING.
    """
    for split, roles in ROLE_DATASET_MAPPING.items():
        if role_name in roles:
            return split
    debug_print(f"No split mapping found for role '{role_name}'.")
    return None

def get_passed_directions(model_path, role):
    """
    For a given model and role, use evaluate_direction to check each test file in:
       {model_path}/{role}/test_direction/3.0
    Each file is named {layer}_{position}.json. Convert these values to integers and,
    if evaluate_direction returns True, add the (layer, position) tuple to the set.
    Returns a set of unique (layer, position) tuples that passed.
    """
    passed_directions = set()
    role_dir = os.path.join(model_path, role)
    test_dir = os.path.join(role_dir, "test_direction", "3.0")
    if not os.path.isdir(test_dir):
        debug_print(f"Test direction directory does not exist: {test_dir}")
        return passed_directions  # Return empty set if the directory does not exist

    json_files = glob.glob(os.path.join(test_dir, "*.json"))
    if not json_files:
        debug_print(f"No JSON files found in test direction directory: {test_dir}")
    
    for json_file in json_files:
        base = os.path.basename(json_file)
        if not base.endswith(".json"):
            continue
        # Remove the '.json' extension and split into layer and position.
        name = base[:-5]
        parts = name.split("_")
        if len(parts) != 2:
            debug_print(f"Filename {base} does not match expected format 'layer_position.json'")
            continue
        try:
            # Convert layer and position to integers
            layer = int(parts[0])
            position = int(parts[1])
        except Exception as e:
            debug_print(f"Error converting {parts} to int: {e}")
            continue
        direction = (layer, position)
        if evaluate_direction(role_dir, direction):
            passed_directions.add(direction)
    debug_print(f"For model '{os.path.basename(model_path)}', role '{role}', found {len(passed_directions)} passed directions.")
    return passed_directions

def get_select_directions(model_path, role, split):
    """
    For a given model, role, and the applicable split, retrieve the union of directions
    from both versions of the select_direction file:
       {model_path}/{role}/{split}/1.0/select_direction/direction_evaluations_filtered.json
       {model_path}/{role}/{split}/3.0/select_direction/direction_evaluations_filtered.json
    Returns a set of unique (layer, position) tuples.
    """
    select_directions = set()
    for version in ["1.0", "3.0"]:
        file_path = os.path.join(model_path, role, split, version, "select_direction", "direction_evaluations_filtered.json")
        if not os.path.isfile(file_path):
            debug_print(f"File does not exist: {file_path}")
            continue
        try:
            with open(file_path, "r") as f:
                directions = json.load(f)
            for direction in directions:
                layer = direction.get("layer")
                position = direction.get("position")
                if layer is not None and position is not None:
                    try:
                        layer = int(layer)
                        position = int(position)
                    except Exception as e:
                        debug_print(f"Error converting select_direction values {layer}, {position} to int: {e}")
                        continue
                    select_directions.add((layer, position))
        except Exception as e:
            debug_print(f"Error processing {file_path}: {e}")
    debug_print(f"For model '{os.path.basename(model_path)}', role '{role}', split '{split}', found {len(select_directions)} select directions.")
    return select_directions

def compute_percentage(passed_set, select_set):
    """
    Given the passed directions set and the select_direction set,
    calculate the percentage of passed directions that are found in the select data.
    If the passed_set is empty, returns 0.
    """
    if not passed_set:
        debug_print("No passed directions to compute percentage. Setting percentage to 0.")
        return 0  # Set percentage to 0 when no passed directions exist.
    debug_print(f"Passed directions: {passed_set}")
    debug_print(f"Select directions: {select_set}")
    common = passed_set.intersection(select_set)
    percentage = (len(common) / len(passed_set)) * 100
    debug_print(f"Found {len(common)} common directions out of {len(passed_set)} passed directions.")
    return percentage

def analyze_models(base_dir):
    """
    Iterates through each model and role, computes the percentage of passed directions
    (from test results) that are found in the select_direction files.
    Builds and returns a pandas DataFrame with roles as rows and models as columns.
    """
    # List all model directories
    models = [d for d in os.listdir(base_dir) if os.path.isdir(os.path.join(base_dir, d))]
    if not models:
        debug_print(f"No models found in base directory: {base_dir}")
    
    # Build a sorted list of all roles from ROLE_DATASET_MAPPING
    roles = set()
    for role_list in ROLE_DATASET_MAPPING.values():
        roles.update(role_list)
    roles = sorted(roles)

    # Dictionary to hold the results: {model: {role: percentage}}
    results = {}
    for model in models:
        model_path = os.path.join(base_dir, model)
        debug_print(f"Processing model: {model}")
        results[model] = {}
        for role in roles:
            role_dir = os.path.join(model_path, role)
            if not os.path.isdir(role_dir):
                debug_print(f"Role directory does not exist for model '{model}': {role_dir}")
                results[model][role] = 0
                continue

            # Get passed directions using evaluate_direction
            passed_directions = get_passed_directions(model_path, role)
            if not passed_directions:
                debug_print(f"No passed directions found for model '{model}', role '{role}'.")

            # Determine the applicable split for this role using the mapping
            split = get_split_for_role(role)
            if not split:
                results[model][role] = 0
                continue

            # Retrieve directions from the role's split select_direction files
            select_directions = get_select_directions(model_path, role, split)
            if not select_directions:
                debug_print(f"No select directions found for model '{model}', role '{role}', split '{split}'.")

            # Compute the percentage match
            pct = compute_percentage(passed_directions, select_directions)
            results[model][role] = pct

    # Build DataFrame with models as rows, then transpose it so roles are rows and models are columns.
    df = pd.DataFrame.from_dict(results, orient="index")
    df = df.transpose()  # Now rows are roles and columns are models.
    return df

if __name__ == "__main__":
    # Set the base directory containing your model subdirectories
    base_dir = r"C:\Users\user\Desktop\temp\rolevectors_results"  
    df = analyze_models(base_dir)

    # Format the percentages for display:
    df_formatted = df.applymap(lambda x: f"{x:.2f}%" if isinstance(x, (int, float)) else "N/A")
    
    print("\nFinal Percentage Table:")
    print(df_formatted)


In [None]:
df.to_html()

In [None]:
# Sum across columns (axis=1) and sort
role_totals = df.sum(axis=0).sort_values()

print(role_totals)


In [None]:
# Sum down the columns (axis=0) to get the total presence per model
model_totals = df.sum(axis=0)

# Sort these sums in ascending order
sorted_model_totals = model_totals.sort_values()

# Reorder the columns in the DataFrame to match this sorted order
df_sorted_models = df[sorted_model_totals.index]

df_sorted_models


In [None]:
df = df_sorted_models

In [None]:
import pandas as pd
import numpy as np

def latex_blue_gradient(val, vmin=0, vmax=100):
    """
    Convert a numerical value into a LaTeX string that applies a blue background.
    The intensity of the blue (using xcolor's syntax) is determined by the value's
    position between vmin and vmax.
    """
    try:
        numeric_val = float(val)
    except Exception:
        numeric_val = 0.0

    # Normalize the value between vmin and vmax
    norm_val = (numeric_val - vmin) / (vmax - vmin)
    norm_val = np.clip(norm_val, 0, 1)

    # Map normalized value to an intensity percentage (e.g., 0 to 50)
    intensity = int(norm_val * 50)
    
    # Create the cell color command. If intensity is 0, no color command is added.
    color_cmd = f"\\cellcolor{{blue!{intensity}}}" if intensity > 0 else ""
    
    # Return the LaTeX string with the background color and the value (formatted to 2 decimals)
    return f"{color_cmd} $ {numeric_val:.2f} $"

def style_df_to_latex(df, vmin=0, vmax=100):
    """
    Apply the blue gradient styling to every cell in the DataFrame,
    converting each cell to a LaTeX string that includes a background color.
    """
    # Create a copy to avoid modifying the original DataFrame
    df_styled = df.copy()
    
    # Apply our custom LaTeX styling function cell-by-cell
    for col in df.columns:
        df_styled[col] = df[col].apply(lambda x: latex_blue_gradient(x, vmin, vmax))
    
    return df_styled

# --- Usage Example ---

# Suppose your DataFrame is called df.
# (You might have created it or loaded it from somewhere.)

# Instead of using .style.background_gradient for HTML output,
# we create a new DataFrame with LaTeX-formatted strings.
df_latex_styled = style_df_to_latex(df, vmin=0, vmax=100)

# Convert the styled DataFrame to LaTeX code.
# Note: escape=False is needed so that our LaTeX commands are not escaped.
latex_code = df_latex_styled.to_latex(escape=False)
print(latex_code)