# Notebook: Prediction Model Evaluation and Performance Metrics

## Introduction
This Jupyter Notebook is designed to evaluate the performance of the violence dynamics prediction model. Its primary purpose is to compare the predictions generated by the 03_Predictions_victims.ipynb notebook (Intensity and Escalation) with the actual observed values (derived from the 02_Intensity_Escalation_victims.ipynb notebook). The objective is to assess the accuracy and reliability of these predictions at various geographical levels (Country, Department, and Region) and, crucially, broken down by armed actor type.

## Evaluation Methodology
For each geographical unit (e.g., Colombia, Antioquia, Pacífico) and each violence type (VS, VI, VC), and now also for each armed actor type (State Actors, Non-State Actors, Unknown Actor Type), the notebook performs the following steps:

1. Data Loading
Predicted values for 'Intensity' and 'Escalation' (from _victims_predictions.tsv files) and actual (true) values for 'Intensity' and 'Escalation' (from _victims_metrics.tsv files) are loaded. It's important to note that the "Previous State" is used as an input variable for prediction logic, but it is not directly evaluated as a predicted output variable.

2. Data Alignment and Merging
The predicted and actual datasets are carefully merged using common identifiers such as 'Year', 'Month', and 'violence type'. This ensures that comparisons are made at the same temporal points and for the same types of violence.

3. Calculation of Evaluation Metrics
For each prediction target ('Intensity' and 'Escalation'), the following metrics are calculated:

Confusion Matrix: A table summarizing the performance of a classification algorithm. It shows the number of correct and incorrect predictions, broken down by each class (-1, 0, 1).

Classification Report: A detailed report that includes precision, recall, F1-score, and support for each class, as well as macro and weighted averages. This allows for an in-depth assessment of the model's performance for each category of change (decrease, no change, increase).

4. Report Generation and Saving
Evaluation results are generated and saved into plain text files (.txt) following an organized directory structure:

Individual Reports: Separate reports are created for each individual geographical unit (Country, each Department, each Region) within specific subfolders for each armed actor type. This allows for a detailed analysis of the model's performance in each case.

Aggregated/Global Reports: For Department and Region levels, data from all individual units within that level are consolidated to generate a single aggregated evaluation report. This provides an overview of the model's performance on a larger scale, also broken down by actor type. Country-level is already evaluated individually, and its individual report serves as its global view.

## Output Structure
The evaluation results will be saved in the following folder structure:

results/
└── evaluation/
    └── victims/
        ├── country/
        │   ├── state_actors/
        │   │   └── colombia_evaluation.txt
        │   ├── non_state_actors/
        │   │   └── colombia_evaluation.txt
        │   └── unknown_actor_type/
        │       └── colombia_evaluation.txt
        ├── department/
        │   ├── state_actors/
        │   │   ├── antioquia_evaluation.txt
        │   │   ├── bogota_evaluation.txt
        │   │   └── departments_global_evaluation.txt (aggregated)
        │   ├── non_state_actors/
        │   │   ├── antioquia_evaluation.txt
        │   │   ├── bogota_evaluation.txt
        │   │   └── departments_global_evaluation.txt (aggregated)
        │   └── unknown_actor_type/
        │       ├── antioquia_evaluation.txt
        │       ├── bogota_evaluation.txt
        │       └── departments_global_evaluation.txt (aggregated)
        └── region/
            ├── state_actors/
            │   ├── pacifica_evaluation.txt
            │   ├── caribe_evaluation.txt
            │   └── regions_global_evaluation.txt (aggregated)
            ├── non_state_actors/
            │   ├── pacifica_evaluation.txt
            │   ├── caribe_evaluation.txt
            │   └── regions_global_evaluation.txt (aggregated)
            └── unknown_actor_type/
                ├── pacifica_evaluation.txt
                ├── caribe_evaluation.txt
                └── regions_global_evaluation.txt (aggregated)

This comprehensive evaluation approach allows for a detailed understanding of the model's robustness and applicability across diverse geographical conditions and among different types of actors responsible for violence.

### 1. Initial Setup, Library Imports, and Path Configuration
This block performs the initial setup for the model evaluation notebook. It includes importing all necessary Python libraries for data handling, file system operations, and, crucially, for calculating classification metrics. It defines the relative paths for both the input directories containing the predicted data (from the prediction notebook) and the actual/true data (from the metrics calculation notebook), now accounting for the nested ActorType subfolders. It also sets up the output directory where the final evaluation results will be saved, maintaining the same nested structure.

In [5]:
# 1. Initial Setup, Library Imports, and Path Configuration

import pandas as pd
import os
import numpy as np
from sklearn.metrics import confusion_matrix, classification_report
import sys # Added for stdout redirection in Cell 4 and Cell 3
import io # Added for output buffering in Cell 3 and Cell 4

# Define the base input directory for predicted data (output from the prediction notebook)
# This directory now contains subfolders like 'country/state_actors', 'department/non_state_actors', etc.
predicted_data_base_dir = os.path.join(os.getcwd(), '..', 'Results', 'predictions', 'victims')

# Define the base input directory for actual/true data (output from the metrics calculation notebook)
# This directory also contains subfolders like 'country/state_actors', 'department/non_state_actors', etc.
actual_data_base_dir = os.path.join(os.getcwd(), '..', 'Results', 'intensity & escalation', 'victims')

# Define the base output directory for evaluation results
# Evaluation results will be saved following a similar nested structure.
evaluation_results_base_dir = os.path.join(os.getcwd(), '..', 'Results', 'evaluation', 'victims')

# List of actor types subfolders to iterate through.
ACTOR_TYPES_SUBFOLDERS = ['state_actor', 'non_state_actor', 'unknown_actor_type']

# Define possible labels for classification reports to ensure all classes are shown consistently.
# These should cover all possible values for Intensity and Escalation.
intensity_escalation_labels = [-1, 0, 1]
# Note: 'Previous State' is no longer a prediction target, so its specific labels are not strictly needed
# for the output reports, but can be kept for consistency if evaluating the true Previous State distribution.
# state_labels = sorted(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'Start']) # No longer needed for prediction eval.

print("Initial setup, library imports, and path configuration complete for evaluation.")
print(f"Predicted data will be read from: {predicted_data_base_dir}")
print(f"Actual data will be read from: {actual_data_base_dir}")
print(f"Evaluation results will be saved in: {evaluation_results_base_dir}")
print(f"Processing will be segmented by Actor Types: {ACTOR_TYPES_SUBFOLDERS}")

# Create the top-level evaluation results directory if it doesn't exist.
# Subdirectories for levels and actor types will be created later.
os.makedirs(evaluation_results_base_dir, exist_ok=True)


Initial setup, library imports, and path configuration complete for evaluation.
Predicted data will be read from: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/predictions/victims
Actual data will be read from: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/intensity & escalation/victims
Evaluation results will be saved in: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/evaluation/victims
Processing will be segmented by Actor Types: ['state_actor', 'non_state_actor', 'unknown_actor_type']


### 2. Define Evaluation Logic Functions
This block defines the core functions required for evaluating the model's predictions. These functions facilitate the comparison between actual and predicted values and the generation of performance metrics.

align_and_merge_data(df_actual, df_predicted): A helper function to merge the actual and predicted DataFrames based on common temporal and type identifiers (Año, Mes, violence type). This ensures that each prediction is correctly matched with its corresponding true value.

evaluate_predictions(y_true, y_pred, target_name, labels=None): This function takes the true and predicted labels for a specific target variable (e.g., 'State', 'Intensity', 'Escalation'). It then calculates and prints the confusion matrix and a detailed classification report, providing insights into the model's performance for each class.

In [6]:
# 2. Define Evaluation Logic Functions

print("\n--- Defining Evaluation Logic Functions ---")

def align_and_merge_data(df_actual, df_predicted):
    """
    Merges actual and predicted DataFrames on common identifiers to prepare for evaluation.

    Args:
        df_actual (pd.DataFrame): DataFrame containing actual values (from _victims_metrics.tsv).
                                  Expected columns: 'Año', 'Mes', 'violence type',
                                  'Previous State', 'Intensity', 'Escalation'.
        df_predicted (pd.DataFrame): DataFrame containing predicted values (from _victims_predictions.tsv).
                                   Expected columns: 'Año', 'Mes', 'violence type',
                                   'Predicted_State', 'Predicted_Intensity', 'Predicted_Escalation'.

    Returns:
        pd.DataFrame: A merged DataFrame with actual and predicted values aligned,
                      or an empty DataFrame if merging fails or results in no common data.
    """
    # Define common columns for merging
    merge_cols = ['Año', 'Mes', 'violence type']

    # Perform an inner merge to keep only rows present in both DataFrames
    # This ensures we only compare predictions where we have corresponding actual values.
    merged_df = pd.merge(df_actual, df_predicted, on=merge_cols, how='inner', suffixes=('_actual', '_predicted'))

    if merged_df.empty:
        print("Warning: Merging actual and predicted data resulted in an empty DataFrame. Check data alignment.")
        return pd.DataFrame()

    # Sort the merged DataFrame to ensure consistent order for evaluation
    merged_df = merged_df.sort_values(by=merge_cols).reset_index(drop=True)

    return merged_df

def evaluate_predictions(y_true, y_pred, target_name, labels=None):
    """
    Calculates and prints the confusion matrix and classification report for predictions.

    Args:
        y_true (pd.Series or list): Actual (true) labels.
        y_pred (pd.Series or list): Predicted labels.
        target_name (str): Name of the target variable being evaluated (e.g., 'State', 'Intensity').
        labels (list, optional): List of unique labels to consider. If None, labels are inferred from y_true/y_pred.
                                 Useful for ensuring all classes are represented in the report, even if not predicted.
    """
    print(f"\n--- Evaluation for {target_name} ---")

    if len(y_true) == 0 or len(y_pred) == 0:
        print(f"No data available for {target_name} evaluation.")
        return

    # Generate Confusion Matrix
    # If labels are provided, use them to ensure consistent matrix shape and class order
    cm = confusion_matrix(y_true, y_pred, labels=labels)
    print(f"\nConfusion Matrix for {target_name}:")
    print(cm)

    # Generate Classification Report
    # zero_division handles how to report precision/recall for classes with no true samples or no predicted samples.
    # 'warn' will issue a warning, '0' will set to 0.0, 'np.nan' will set to NaN.
    # We use 'None' to let sklearn handle it (usually 0.0 for precision/recall where support is 0).
    report = classification_report(y_true, y_pred, labels=labels, zero_division=0)
    print(f"\nClassification Report for {target_name}:")
    print(report)

print("Evaluation logic functions defined.")



--- Defining Evaluation Logic Functions ---
Evaluation logic functions defined.


### 3. Perform Evaluation and Save Results (Individual Units)
This block orchestrates the evaluation process for the predictions generated in the previous notebook. It iterates through each geographical level (Country, Department, Region) and then through each defined ActorType (State Actor, Non-State Actor, Unknown).

For each specific combination of geographical unit, violence type (VS, VI, VC), and actor type, the process is as follows:

Data Loading: It loads the relevant actual metrics data (_victims_metrics.tsv) and predicted values (_victims_predictions.tsv) for the corresponding unit and actor type.

Data Alignment and Merging: The actual and predicted datasets are merged based on common identifiers (Año, Mes, violence type) to ensure a correct comparison of month-by-month predictions.

Evaluation Metrics Calculation: For both 'Intensity' and 'Escalation' prediction targets, it calculates:

Confusion Matrix: To visualize the correct and incorrect classifications for each category (-1, 0, 1).

Classification Report: Providing precision, recall, f1-score, and support for each class, as well as overall averages.

Results Saving (Individual Reports): The confusion matrices and classification reports for each individual unit (Country, Department, Region) and actor type are saved to separate text files within a structured output directory (../Results/evaluation/victims/{level}/{actor_type}/).

This comprehensive evaluation at the individual unit level allows for a granular assessment of model performance across different geographical contexts and responsible actors.

In [7]:
# 3. Perform Evaluation and Save Results (Individual Units)

print("\n--- Performing Evaluation and Saving Individual Unit Results ---")

# Ensure required global variables are defined
if 'predicted_data_base_dir' not in globals() or \
   'actual_data_base_dir' not in globals() or \
   'evaluation_results_base_dir' not in globals() or \
   'ACTOR_TYPES_SUBFOLDERS' not in globals() or \
   'intensity_escalation_labels' not in globals() or \
   'io' not in globals() or \
   'sys' not in globals():
    print("Error: Required global variables or functions not found. Please run Cell 1 and Cell 2.")
else:
    # Define the geographical levels for iteration
    levels_info = [
        {'name': 'Country', 'input_subdir_actual': 'country', 'input_subdir_predicted': 'country'},
        {'name': 'Department', 'input_subdir_actual': 'department', 'input_subdir_predicted': 'department'},
        {'name': 'Region', 'input_subdir_actual': 'region', 'input_subdir_predicted': 'region'}
    ]

    # Targets for evaluation (Intensity and Escalation)
    prediction_targets = ['Intensity', 'Escalation']

    for level in levels_info:
        level_name = level['name']
        actual_base_level_dir = os.path.join(actual_data_base_dir, level['input_subdir_actual'])
        predicted_base_level_dir = os.path.join(predicted_data_base_dir, level['input_subdir_predicted'])

        print(f"\n=======================================================")
        print(f"=== Starting Evaluation for {level_name} Level ===")
        print(f"=======================================================")

        # Loop through each actor type subfolder
        for actor_type_subfolder in ACTOR_TYPES_SUBFOLDERS:
            current_actual_dir = os.path.join(actual_base_level_dir, actor_type_subfolder)
            current_predicted_dir = os.path.join(predicted_base_level_dir, actor_type_subfolder)
            current_output_dir = os.path.join(evaluation_results_base_dir, level_name.lower(), actor_type_subfolder)

            print(f"\n--- Processing {level_name} Level for Actor Type: {actor_type_subfolder} ---")

            # Create the output directory for this level and actor type if it doesn't exist
            os.makedirs(current_output_dir, exist_ok=True)
            print(f"Ensured output directory exists: {current_output_dir}")

            # Check if input directories exist
            if not os.path.exists(current_actual_dir):
                print(f"Error: Actual data directory not found for {level_name} ({actor_type_subfolder}): {current_actual_dir}. Skipping.")
                continue
            if not os.path.exists(current_predicted_dir):
                print(f"Error: Predicted data directory not found for {level_name} ({actor_type_subfolder}): {current_predicted_dir}. Skipping.")
                continue

            # List all actual and predicted TSV files in their respective directories
            # Assuming filenames are consistent: 'unitname_victims_metrics.tsv' for actual, 'unitname_victims_predictions.tsv' for predicted
            actual_files = {f.replace("_victims_metrics.tsv", ""): os.path.join(current_actual_dir, f)
                            for f in os.listdir(current_actual_dir) if f.endswith('_victims_metrics.tsv')}
            predicted_files = {f.replace("_victims_predictions.tsv", ""): os.path.join(current_predicted_dir, f)
                               for f in os.listdir(current_predicted_dir) if f.endswith('_victims_predictions.tsv')}

            # Find common units between actual and predicted files
            common_units = sorted(list(set(actual_files.keys()) & set(predicted_files.keys())))

            if not common_units:
                print(f"Warning: No matching actual and predicted files found for {level_name} ({actor_type_subfolder}). Skipping evaluation.")
                continue

            print(f"Found {len(common_units)} common units for {level_name} ({actor_type_subfolder}). Evaluating each...")

            for unit_name in common_units:
                actual_file_path = actual_files[unit_name]
                predicted_file_path = predicted_files[unit_name]

                print(f"\nEvaluating {level_name}: {unit_name} ({actor_type_subfolder})")
                print(f"  Actual data from: {os.path.basename(actual_file_path)}")
                print(f"  Predicted data from: {os.path.basename(predicted_file_path)}")

                try:
                    df_actual = pd.read_csv(actual_file_path, sep='\t')
                    df_predicted = pd.read_csv(predicted_file_path, sep='\t')

                    # Ensure essential columns for merging and evaluation are present
                    common_cols = ['Año', 'Mes', 'violence type']
                    if not all(col in df_actual.columns for col in common_cols + prediction_targets):
                        print(f"Error: Missing required columns in actual data for {unit_name}. Skipping.")
                        continue
                    if not all(col in df_predicted.columns for col in common_cols + [f'Predicted_{p}' for p in prediction_targets]):
                        print(f"Error: Missing required columns in predicted data for {unit_name}. Skipping.")
                        continue

                    # Merge actual and predicted data
                    # Use 'inner' join to only compare months where both actual and predicted data exist
                    merged_df = pd.merge(df_actual, df_predicted, on=common_cols, how='inner', suffixes=('_actual', '_predicted'))

                    if merged_df.empty:
                        print(f"Warning: Merged data for {unit_name} is empty after alignment. Skipping evaluation.")
                        continue

                    # Evaluate each prediction target (Intensity, Escalation) for each violence type
                    output_buffer = [] # To collect all evaluation reports for the unit

                    output_buffer.append(f"Evaluation Report for {level_name}: {unit_name} (Actor Type: {actor_type_subfolder})\n")
                    output_buffer.append("---------------------------------------------------------------------------\n")

                    # NEW: Loop through each violence type (VS, VI, VC)
                    # Ensure consistent order for output
                    violence_types_to_evaluate = sorted(merged_df['violence type'].unique())

                    for v_type in violence_types_to_evaluate:
                        output_buffer.append(f"\n----- Violence Type: {v_type} -----\n")
                        df_subset_vtype = merged_df[merged_df['violence type'] == v_type].copy()

                        if df_subset_vtype.empty:
                            output_buffer.append(f"  No merged data for {v_type} in {unit_name}. Skipping evaluation for this violence type.\n")
                            continue

                        for target in prediction_targets: # Iterates Intensity, Escalation
                            actual_col = f'{target}'
                            predicted_col = f'Predicted_{target}'

                            if actual_col not in df_subset_vtype.columns or predicted_col not in df_subset_vtype.columns:
                                output_buffer.append(f"  Skipping evaluation for '{target}' (Violence Type: {v_type}): Required columns not found.\n")
                                continue

                            # Ensure data types are consistent and valid for classification
                            df_subset_vtype[actual_col] = pd.to_numeric(df_subset_vtype[actual_col], errors='coerce').fillna(0).astype(int)
                            df_subset_vtype[predicted_col] = pd.to_numeric(df_subset_vtype[predicted_col], errors='coerce').fillna(0).astype(int)

                            # Drop rows with NaN in target columns after conversion (if any happened)
                            eval_data = df_subset_vtype.dropna(subset=[actual_col, predicted_col]).copy()

                            if eval_data.empty:
                                output_buffer.append(f"  No valid data for {target} after cleaning for {v_type} in {unit_name}. Skipping evaluation for this target.\n")
                                continue

                            y_true = eval_data[actual_col]
                            y_pred = eval_data[predicted_col]

                            # Use io.StringIO to capture evaluate_predictions output to buffer
                            temp_output_buffer = io.StringIO()
                            original_stdout = sys.stdout # Save original stdout
                            sys.stdout = temp_output_buffer # Redirect stdout to the buffer

                            # Call evaluate_predictions to print to the redirected buffer
                            evaluate_predictions(y_true, y_pred,
                                                 f"{target} ({v_type})", # Updated target name to include violence type
                                                 labels=intensity_escalation_labels)

                            sys.stdout = original_stdout # Restore original stdout
                            output_buffer.append(temp_output_buffer.getvalue()) # Add captured output to main buffer

                    # Save the evaluation report for the current unit
                    output_filename = f"{unit_name.lower().replace(' ', '')}_evaluation.txt"
                    output_path = os.path.join(current_output_dir, output_filename)

                    try:
                        with open(output_path, 'w') as f:
                            f.writelines(output_buffer)
                        print(f"Saved evaluation report for {level_name}: {unit_name} to {output_filename}")
                    except Exception as e:
                        print(f"Error saving evaluation report for {unit_name}: {e}")

                except Exception as e:
                    print(f"An unexpected error occurred during evaluation for {unit_name}: {e}")

            print(f"\n{level_name}-level individual evaluation for Actor Type: {actor_type_subfolder} complete.")

    print("\n--- Overall Individual Unit Evaluation Process Finished ---")




--- Performing Evaluation and Saving Individual Unit Results ---

=== Starting Evaluation for Country Level ===

--- Processing Country Level for Actor Type: state_actor ---
Ensured output directory exists: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/evaluation/victims/country/state_actor
Found 1 common units for Country (state_actor). Evaluating each...

Evaluating Country: colombia (state_actor)
  Actual data from: colombia_victims_metrics.tsv
  Predicted data from: colombia_victims_predictions.tsv
Saved evaluation report for Country: colombia to colombia_evaluation.txt

Country-level individual evaluation for Actor Type: state_actor complete.

--- Processing Country Level for Actor Type: non_state_actor ---
Ensured output directory exists: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/evaluation/victims/country/non_state_actor
Found 1 common units for Country (non_state_actor). Evaluating each...

Evaluating Country: col

### 4. Aggregated Global Evaluation and Saving Results
This final evaluation block provides a consolidated view of the model's performance across broader geographical scopes: aggregated Department-level and aggregated Region-level. Instead of evaluating each individual department or region separately, this section combines all data within each category to generate a single set of confusion matrices and classification reports for each. The Country-level evaluation was already handled individually in Cell 3 (and effectively acts as a global evaluation for that level).

The results for these aggregated evaluations will be captured and saved into separate plain text files (.txt) for easy review and record-keeping, providing a global perspective on the model's predictive capabilities for 'Intensity' and 'Escalation' at these consolidated levels, broken down by Actor Type.

In [8]:
# 4. Aggregated Global Evaluation and Saving Results

print("\n--- Starting Aggregated Global Evaluation and Saving Results ---")

# Ensure base directories and evaluation functions are defined from previous cells
if 'predicted_data_base_dir' not in globals() or \
   'actual_data_base_dir' not in globals() or \
   'evaluation_results_base_dir' not in globals() or \
   'ACTOR_TYPES_SUBFOLDERS' not in globals() or \
   'intensity_escalation_labels' not in globals() or \
   'align_and_merge_data' not in globals() or \
   'evaluate_predictions' not in globals() or \
   'sys' not in globals(): # Ensure sys is imported for stdout redirection
    print("Error: Required global variables or functions not found. Please ensure all previous cells (1, 2, 3) have been run correctly and 'import sys' is in Cell 1.")
else:
    # Targets for evaluation (Intensity and Escalation)
    prediction_targets = ['Intensity', 'Escalation']

    # Levels for aggregated evaluation (Department and Region)
    # Country is evaluated individually in Cell 3, which serves as its global view.
    aggregated_levels_info = [
        {'name': 'Department', 'input_subdir_actual': 'department', 'input_subdir_predicted': 'department'},
        {'name': 'Region', 'input_subdir_actual': 'region', 'input_subdir_predicted': 'region'}
    ]

    for level in aggregated_levels_info:
        level_name = level['name']
        actual_base_level_dir = os.path.join(actual_data_base_dir, level['input_subdir_actual'])
        predicted_base_level_dir = os.path.join(predicted_data_base_dir, level['input_subdir_predicted'])

        print(f"\n==========================================================")
        print(f"=== Aggregated Evaluation for All {level_name}s Combined ===")
        print(f"==========================================================")

        # Loop through each actor type subfolder for aggregated evaluation
        for actor_type_subfolder in ACTOR_TYPES_SUBFOLDERS:
            current_actual_dir = os.path.join(actual_base_level_dir, actor_type_subfolder)
            current_predicted_dir = os.path.join(predicted_base_level_dir, actor_type_subfolder)
            current_output_dir = os.path.join(evaluation_results_base_dir, level_name.lower(), actor_type_subfolder)

            print(f"\n--- Aggregated Evaluation for All {level_name}s (Actor Type: {actor_type_subfolder}) ---")

            # Create the output directory for this aggregated level and actor type if it doesn't exist
            os.makedirs(current_output_dir, exist_ok=True)
            print(f"Ensured output directory exists: {current_output_dir}")

            # Collect all actual and predicted dataframes for the current level and actor type
            all_level_actual_dfs = []
            all_level_predicted_dfs = []

            if not os.path.exists(current_actual_dir) or not os.path.exists(current_predicted_dir):
                print(f"Error: Data directories not found for {level_name} ({actor_type_subfolder}). Skipping aggregation.")
                continue

            # Load all actual files for the current level and actor type
            for f_name in os.listdir(current_actual_dir):
                if f_name.endswith('_victims_metrics.tsv'):
                    try:
                        df = pd.read_csv(os.path.join(current_actual_dir, f_name), sep='\t')
                        all_level_actual_dfs.append(df)
                    except Exception as e:
                        print(f"Error loading actual file {f_name} for {level_name} ({actor_type_subfolder}): {e}")

            # Load all predicted files for the current level and actor type
            for f_name in os.listdir(current_predicted_dir):
                if f_name.endswith('_victims_predictions.tsv'):
                    try:
                        df = pd.read_csv(os.path.join(current_predicted_dir, f_name), sep='\t')
                        all_level_predicted_dfs.append(df)
                    except Exception as e:
                        print(f"Error loading predicted file {f_name} for {level_name} ({actor_type_subfolder}): {e}")

            if not all_level_actual_dfs or not all_level_predicted_dfs:
                print(f"Warning: No actual or predicted files found for aggregated {level_name} ({actor_type_subfolder}). Skipping.")
                continue

            # Concatenate all dataframes for actual and predicted across all units in this level/actor type
            df_aggregated_actual = pd.concat(all_level_actual_dfs, ignore_index=True)
            df_aggregated_predicted = pd.concat(all_level_predicted_dfs, ignore_index=True)

            # Merge aggregated data
            merged_aggregated_data = align_and_merge_data(df_aggregated_actual, df_aggregated_predicted)

            if merged_aggregated_data.empty:
                print(f"No merged data for Aggregated {level_name} ({actor_type_subfolder}) evaluation. Skipping.")
                continue

            # Prepare the output buffer for saving to a .txt file
            output_buffer = io.StringIO()
            original_stdout = sys.stdout # Save original stdout

            sys.stdout = output_buffer # Redirect stdout to the buffer

            print(f"\n--- Consolidated Evaluation for All {level_name}s (Actor Type: {actor_type_subfolder}) ---")
            print("-------------------------------------------------------------------\n")

            # Evaluate for each violence type (VS, VI, VC)
            for v_type in merged_aggregated_data['violence type'].unique():
                print(f"\n----- Violence Type: {v_type} (Aggregated {level_name}) -----")
                df_subset = merged_aggregated_data[merged_aggregated_data['violence type'] == v_type].copy()

                if df_subset.empty:
                    print(f"No merged data for {v_type} in Aggregated {level_name} evaluation. Skipping.")
                    continue

                for target in prediction_targets:
                    actual_col = f'{target}'
                    predicted_col = f'Predicted_{target}'

                    if actual_col in df_subset.columns and predicted_col in df_subset.columns:
                        # Ensure data types are consistent and valid for classification
                        df_subset[actual_col] = pd.to_numeric(df_subset[actual_col], errors='coerce').fillna(0).astype(int)
                        df_subset[predicted_col] = pd.to_numeric(df_subset[predicted_col], errors='coerce').fillna(0).astype(int)

                        # Drop rows with NaN in target columns after conversion (if any happened)
                        eval_data = df_subset.dropna(subset=[actual_col, predicted_col]).copy()

                        if eval_data.empty:
                            print(f"  No valid data for {target} after cleaning for {v_type} in aggregated {level_name}. Skipping evaluation for this target.\n")
                            continue

                        y_true = eval_data[actual_col]
                        y_pred = eval_data[predicted_col]

                        # Call evaluate_predictions to print to the redirected buffer
                        evaluate_predictions(y_true, y_pred,
                                             f"{target} ({v_type}) - Aggregated {level_name}",
                                             labels=intensity_escalation_labels)
                    else:
                        print(f"  Missing '{actual_col}' or '{predicted_col}' columns for {v_type} in Aggregated {level_name} evaluation.")


            sys.stdout = original_stdout # Restore original stdout

            # Save the captured output to a .txt file
            output_filename = os.path.join(current_output_dir, f"{level_name.lower()}_global_evaluation.txt")
            with open(output_filename, 'w') as f:
                f.write(output_buffer.getvalue())
            print(f"Saved Aggregated {level_name}-level evaluation for Actor Type: {actor_type_subfolder} to: {output_filename}")


    print("\n--- Overall Aggregated Global Evaluation Finished ---")




--- Starting Aggregated Global Evaluation and Saving Results ---

=== Aggregated Evaluation for All Departments Combined ===

--- Aggregated Evaluation for All Departments (Actor Type: state_actor) ---
Ensured output directory exists: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/evaluation/victims/department/state_actor
Saved Aggregated Department-level evaluation for Actor Type: state_actor to: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/evaluation/victims/department/state_actor/department_global_evaluation.txt

--- Aggregated Evaluation for All Departments (Actor Type: non_state_actor) ---
Ensured output directory exists: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/evaluation/victims/department/non_state_actor
Saved Aggregated Department-level evaluation for Actor Type: non_state_actor to: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/evaluation/victims/depart