# Notebook: Model Prediction Evaluation and Performance Metrics
# 1. Introduction
This notebook is designed to evaluate the performance of the violence dynamics prediction model. It takes the generated predictions for "Previous State", "Intensity", and "Escalation" and compares them against the actual observed values. The primary goal is to assess the accuracy and reliability of the predictions at various geographical levels (Country, Department, and Region).

The evaluation will be conducted using standard classification metrics, derived from confusion matrices, for each of the predicted variables.

# 2. Evaluation Methodology
For each geographical unit (e.g., Colombia, Antioquia, Pacífico) and each violence type (VS, VI, VC), the notebook will perform the following steps:

Data Loading:

Load the predicted values for 'Previous State', 'Intensity', and 'Escalation' from the output files of the prediction model notebook (_cases_predictions.tsv).

Load the actual (true) values for 'Previous State', 'Intensity', and 'Escalation' from the output files of the metric calculation notebook (_cases_metrics.tsv).

Data Alignment and Merging:

The predicted and actual datasets will be merged based on common identifiers: Año (Year), Mes (Month), and violence type. This ensures that each prediction is directly compared with its corresponding actual observed value.

Metrics Calculation for Each Variable ('Previous State', 'Intensity', 'Escalation'):

For each of these three target variables, a separate evaluation will be performed.

Confusion Matrix: A confusion matrix will be generated, showing the counts of true positives, true negatives, false positives, and false negatives.

Classification Report: Key classification metrics will be calculated, including:

Accuracy: Overall correctness of the model.

Precision: The proportion of positive identifications that were actually correct.

Recall (Sensitivity): The proportion of actual positives that were correctly identified.

F1-Score: The harmonic mean of Precision and Recall, providing a balance between the two.

Support: The number of occurrences of each class in y_true.

# 3. Geographical Scope
The evaluation process will be applied to the prediction results and actual data at the following geographical levels:

Country: Colombia

Departments: Each analyzed department.

Regions: Each defined region in the study.

# 4. Data Input and Output
Input:

Prediction files: .tsv files from ../Results/predictions/cases/ (e.g., colombia_cases_predictions.tsv).

Actual data files: .tsv files from ../Results/intensity & escalation/cases/ (e.g., colombia_cases_metrics.tsv).

Output: The notebook will display the confusion matrices and classification reports directly within the notebook. Optionally, these results could be saved to text files or DataFrames for further analysis or reporting.

# 5. Key Libraries Used
pandas for data manipulation.

numpy for numerical operations.

sklearn.metrics for confusion matrix and classification report generation.

### 1. Initial Setup, Library Imports, and Path Configuration
This block performs the initial setup for the model evaluation notebook. It includes importing all necessary Python libraries for data handling, file system operations, and, crucially, for calculating classification metrics. It defines the relative paths for both the input directories containing the predicted data (from the prediction notebook) and the actual/true data (from the metrics calculation notebook). It also sets up an output directory for saving evaluation results, if needed.

In [1]:
# 1. Initial Setup, Library Imports, and Path Configuration

import pandas as pd
import os
import numpy as np
from sklearn.metrics import confusion_matrix, classification_report

# Define the base input directory for predicted data (output from the prediction notebook)
# Assumes the notebook is in a 'notebooks' subfolder and predictions are in 'Results/predictions/cases'.
# Example structure:
# Project_Root/
# ├── Results/
# │   └── predictions/
# │       └── cases/    # Predicted files are here (country, department, region subfolders)
# └── notebooks/        # This notebook is here
predicted_data_base_dir = os.path.join(os.getcwd(), '..', 'Results', 'predictions', 'cases')

# Define the base input directory for actual/true data (output from the metrics calculation notebook)
# Assumes the notebook is in a 'notebooks' subfolder and metrics are in 'Results/intensity & escalation/cases'.
# Example structure:
# Project_Root/
# ├── Results/
# │   └── intensity & escalation/
# │       └── cases/    # Actual/True files are here (country, department, region subfolders)
# └── notebooks/        # This notebook is here
actual_data_base_dir = os.path.join(os.getcwd(), '..', 'Results', 'intensity & escalation', 'cases')

# Define the base output directory for evaluation results
# This is where confusion matrices or classification reports could be saved if desired.
evaluation_results_base_dir = os.path.join(os.getcwd(), '..', 'Results', 'evaluation', 'cases')

print("Initial setup, library imports, and path configuration complete for evaluation.")
print(f"Predicted data will be read from: {predicted_data_base_dir}")
print(f"Actual data will be read from: {actual_data_base_dir}")
print(f"Evaluation results could be saved in: {evaluation_results_base_dir}")

# Create the evaluation results directory if it doesn't exist
os.makedirs(evaluation_results_base_dir, exist_ok=True)


Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


Initial setup, library imports, and path configuration complete for evaluation.
Predicted data will be read from: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/predictions/cases
Actual data will be read from: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/intensity & escalation/cases
Evaluation results could be saved in: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/evaluation/cases


### 2. Define Evaluation Logic Functions
This block defines the core functions required for evaluating the model's predictions. These functions facilitate the comparison between actual and predicted values and the generation of performance metrics.

align_and_merge_data(df_actual, df_predicted): A helper function to merge the actual and predicted DataFrames based on common temporal and type identifiers (Año, Mes, violence type). This ensures that each prediction is correctly matched with its corresponding true value.

evaluate_predictions(y_true, y_pred, target_name, labels=None): This function takes the true and predicted labels for a specific target variable (e.g., 'State', 'Intensity', 'Escalation'). It then calculates and prints the confusion matrix and a detailed classification report, providing insights into the model's performance for each class.

In [2]:
# 2. Define Evaluation Logic Functions

print("\n--- Defining Evaluation Logic Functions ---")

def align_and_merge_data(df_actual, df_predicted):
    """
    Merges actual and predicted DataFrames on common identifiers to prepare for evaluation.

    Args:
        df_actual (pd.DataFrame): DataFrame containing actual values (from _cases_metrics.tsv).
                                  Expected columns: 'Año', 'Mes', 'violence type',
                                  'Previous State', 'Intensity', 'Escalation'.
        df_predicted (pd.DataFrame): DataFrame containing predicted values (from _cases_predictions.tsv).
                                   Expected columns: 'Año', 'Mes', 'violence type',
                                   'Predicted_State', 'Predicted_Intensity', 'Predicted_Escalation'.

    Returns:
        pd.DataFrame: A merged DataFrame with actual and predicted values aligned,
                      or an empty DataFrame if merging fails or results in no common data.
    """
    # Define common columns for merging
    merge_cols = ['Año', 'Mes', 'violence type']

    # Perform an inner merge to keep only rows present in both DataFrames
    # This ensures we only compare predictions where we have corresponding actual values.
    merged_df = pd.merge(df_actual, df_predicted, on=merge_cols, how='inner', suffixes=('_actual', '_predicted'))

    if merged_df.empty:
        print("Warning: Merging actual and predicted data resulted in an empty DataFrame. Check data alignment.")
        return pd.DataFrame()

    # Sort the merged DataFrame to ensure consistent order for evaluation
    merged_df = merged_df.sort_values(by=merge_cols).reset_index(drop=True)

    return merged_df

def evaluate_predictions(y_true, y_pred, target_name, labels=None):
    """
    Calculates and prints the confusion matrix and classification report for predictions.

    Args:
        y_true (pd.Series or list): Actual (true) labels.
        y_pred (pd.Series or list): Predicted labels.
        target_name (str): Name of the target variable being evaluated (e.g., 'State', 'Intensity').
        labels (list, optional): List of unique labels to consider. If None, labels are inferred from y_true/y_pred.
                                 Useful for ensuring all classes are represented in the report, even if not predicted.
    """
    print(f"\n--- Evaluation for {target_name} ---")

    if len(y_true) == 0 or len(y_pred) == 0:
        print(f"No data available for {target_name} evaluation.")
        return

    # Generate Confusion Matrix
    # If labels are provided, use them to ensure consistent matrix shape and class order
    cm = confusion_matrix(y_true, y_pred, labels=labels)
    print(f"\nConfusion Matrix for {target_name}:")
    print(cm)

    # Generate Classification Report
    # zero_division handles how to report precision/recall for classes with no true samples or no predicted samples.
    # 'warn' will issue a warning, '0' will set to 0.0, 'np.nan' will set to NaN.
    # We use 'None' to let sklearn handle it (usually 0.0 for precision/recall where support is 0).
    report = classification_report(y_true, y_pred, labels=labels, zero_division=0)
    print(f"\nClassification Report for {target_name}:")
    print(report)

print("Evaluation logic functions defined.")



--- Defining Evaluation Logic Functions ---
Evaluation logic functions defined.


### 3. Execute Evaluation and Display Results
This block orchestrates the entire model evaluation process. It iterates through each defined geographical level (Country, Department, Region). For each level, it identifies and loads the corresponding predicted data files (from the prediction notebook's output) and actual data files (from the metrics calculation notebook's output).

It then merges these datasets, ensuring proper alignment of actual and predicted values. Finally, it calls the evaluate_predictions function (defined in Cell 2) for each of the target variables: 'Previous State', 'Intensity', and 'Escalation', displaying their respective confusion matrices and classification reports directly in the notebook output.

In [5]:
# 3. Execute Evaluation and Display Results

print("\n--- Executing Evaluation and Displaying Results ---")

# Ensure base directories and evaluation functions are defined
if 'predicted_data_base_dir' not in globals() or 'actual_data_base_dir' not in globals():
    print("Error: Base data directories not defined. Please run Cell 1.")
elif 'align_and_merge_data' not in globals() or 'evaluate_predictions' not in globals():
    print("Error: Evaluation logic functions not found. Please run Cell 2.")
else:
    # Define the levels and their corresponding input directories
    levels_info = [
        {'name': 'Country', 'actual_dir': os.path.join(actual_data_base_dir, 'country'), 'predicted_dir': os.path.join(predicted_data_base_dir, 'country')},
        {'name': 'Department', 'actual_dir': os.path.join(actual_data_base_dir, 'department'), 'predicted_dir': os.path.join(predicted_data_base_dir, 'department')},
        {'name': 'Region', 'actual_dir': os.path.join(actual_data_base_dir, 'region'), 'predicted_dir': os.path.join(predicted_data_base_dir, 'region')}
    ]

    # Define possible labels for classification reports to ensure all classes are shown
    # These should cover all possible values for Escalation, Intensity, and Previous State
    intensity_escalation_labels = [-1, 0, 1]
    state_labels = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'Start'] # 'Start' for the first month's Previous State

    for level in levels_info:
        level_name = level['name']
        actual_dir = level['actual_dir']
        predicted_dir = level['predicted_dir']

        print(f"\n==================================================")
        print(f"=== Starting Evaluation for {level_name} Level ===")
        print(f"==================================================")

        # Check if input directories exist
        if not os.path.exists(actual_dir):
            print(f"Error: Actual data directory not found for {level_name}: {actual_dir}. Skipping this level.")
            continue
        if not os.path.exists(predicted_dir):
            print(f"Error: Predicted data directory not found for {level_name}: {predicted_dir}. Skipping this level.")
            continue

        # List all TSV files in the actual and predicted directories for this level
        # Assuming filenames end with '_cases_metrics.tsv' for actual and '_cases_predictions.tsv' for predicted
        actual_files = {f.replace('_cases_metrics.tsv', ''): os.path.join(actual_dir, f)
                        for f in os.listdir(actual_dir) if f.endswith('_cases_metrics.tsv')}
        predicted_files = {f.replace('_cases_predictions.tsv', ''): os.path.join(predicted_dir, f)
                           for f in os.listdir(predicted_dir) if f.endswith('_cases_predictions.tsv')}

        # Find common units (e.g., 'colombia', 'antioquia') that have both actual and predicted files
        common_units = sorted(list(set(actual_files.keys()) & set(predicted_files.keys())))

        if not common_units:
            print(f"Warning: No common actual and predicted data files found for {level_name}. Skipping evaluation for this level.")
        else:
            print(f"Found {len(common_units)} common units for {level_name}. Evaluating each unit...")
            for unit_name in common_units:
                actual_file_path = actual_files[unit_name]
                predicted_file_path = predicted_files[unit_name]

                print(f"\n--- Evaluating {level_name}: {unit_name} ---")

                try:
                    df_actual = pd.read_csv(actual_file_path, sep='\t')
                    df_predicted = pd.read_csv(predicted_file_path, sep='\t')

                    # Merge actual and predicted data
                    merged_data = align_and_merge_data(df_actual, df_predicted)

                    if merged_data.empty:
                        print(f"Skipping evaluation for {unit_name} due to empty merged data.")
                        continue

                    # Ensure 'violence type' column is consistent for grouping
                    # This is important if some files only contain 'VC' for example
                    unique_violence_types = merged_data['violence type'].unique()

                    for v_type in unique_violence_types:
                        print(f"\n----- Violence Type: {v_type} -----")
                        df_subset = merged_data[merged_data['violence type'] == v_type].copy()

                        if df_subset.empty:
                            print(f"No merged data for {v_type} in {unit_name}. Skipping.")
                            continue

                        # Evaluate 'Previous State'
                        if 'Previous State' in df_subset.columns and 'Predicted_State' in df_subset.columns:
                            evaluate_predictions(df_subset['Previous State'], df_subset['Predicted_State'],
                                                 f"Previous State ({v_type})", labels=state_labels)
                        else:
                            print(f"Missing 'Previous State' or 'Predicted_State' columns for {v_type}.")

                        # Evaluate 'Intensity'
                        if 'Intensity' in df_subset.columns and 'Predicted_Intensity' in df_subset.columns:
                            evaluate_predictions(df_subset['Intensity'], df_subset['Predicted_Intensity'],
                                                 f"Intensity ({v_type})", labels=intensity_escalation_labels)
                        else:
                            print(f"Missing 'Intensity' or 'Predicted_Intensity' columns for {v_type}.")

                        # Evaluate 'Escalation'
                        if 'Escalation' in df_subset.columns and 'Predicted_Escalation' in df_subset.columns:
                            evaluate_predictions(df_subset['Escalation'], df_subset['Predicted_Escalation'],
                                                 f"Escalation ({v_type})", labels=intensity_escalation_labels)
                        else:
                            print(f"Missing 'Escalation' or 'Predicted_Escalation' columns for {v_type}.")

                except Exception as e:
                    print(f"An unexpected error occurred during evaluation for {unit_name}: {e}")

        print(f"\n=== Evaluation for {level_name} Level Complete ===")

    print("\n--- Overall Model Prediction Evaluation Finished ---")



--- Executing Evaluation and Displaying Results ---

=== Starting Evaluation for Country Level ===
Found 1 common units for Country. Evaluating each unit...

--- Evaluating Country: 1958_2022_cases_country.tsv ---

----- Violence Type: VC -----

--- Evaluation for Previous State (VC) ---

Confusion Matrix for Previous State (VC):
[[157   0  22   3   0   1   1   0  48   0]
 [  7   0   0   0   0   0   0   0   3   0]
 [128   0   6   1   0   0   1   0  14   0]
 [  0   0   0   3   0   6   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   4   0   5   0   0   0   0]
 [ 10   0   7   2   0   0   3   0  92   0]
 [  2   0   0   0   0   0   1   0  13   0]
 [ 53   0  12   4   0   0  13   0 140   0]
 [  0   0   0   0   0   0   0   0   0   0]]

Classification Report for Previous State (VC):
              precision    recall  f1-score   support

           A       0.44      0.68      0.53       232
           B       0.00      0.00      0.00        10
           C       0.13  

### 4. Aggregated Global Evaluation and Saving Results
This final evaluation block provides a consolidated view of the model's performance across broader geographical scopes: Country-level, aggregated Department-level, and aggregated Region-level. Instead of evaluating each individual department or region separately, this section combines all data within each category to generate a single set of confusion matrices and classification reports for each.

The results for these aggregated evaluations (Country, all Departments combined, all Regions combined) will be saved into separate plain text files (.txt) for easy review and record-keeping, providing a global perspective on the model's predictive capabilities for 'Previous State', 'Intensity', and 'Escalation'.

In [3]:
import sys
# 4. Aggregated Global Evaluation and Saving Results

print("\n--- Starting Aggregated Global Evaluation and Saving Results ---")

import io # For capturing print output to save to file

# Ensure base directories and evaluation functions are defined from previous cells
if 'predicted_data_base_dir' not in globals() or 'actual_data_base_dir' not in globals():
    print("Error: Base data directories not defined. Please run Cell 1.")
elif 'align_and_merge_data' not in globals() or 'evaluate_predictions' not in globals():
    print("Error: Evaluation logic functions not found. Please run Cell 2.")
else:
    # Define labels for consistency in reports
    intensity_escalation_labels = [-1, 0, 1]
    state_labels = sorted(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'Start']) # Ensure consistent order

    # --- 1. Evaluate for Country Level (Colombia) ---
    print("\n==================================================")
    print("=== Aggregated Evaluation for Country: Colombia ===")
    print("==================================================")

    country_actual_file_path = os.path.join(actual_data_base_dir, 'country', '1958_2022_cases_country.tsv')
    country_predicted_file_path = os.path.join(predicted_data_base_dir, 'country', '1958_2022_cases_country.tsv')

    if os.path.exists(country_actual_file_path) and os.path.exists(country_predicted_file_path):
        try:
            df_country_actual = pd.read_csv(country_actual_file_path, sep='\t')
            df_country_predicted = pd.read_csv(country_predicted_file_path, sep='\t')

            merged_country_data = align_and_merge_data(df_country_actual, df_country_predicted)

            if not merged_country_data.empty:
                # Prepare the output string to be saved to a file
                output_buffer = io.StringIO()
                original_stdout = sys.stdout # Save original stdout

                sys.stdout = output_buffer # Redirect stdout to the buffer

                # Evaluate for each violence type (VS, VI, VC)
                for v_type in merged_country_data['violence type'].unique():
                    print(f"\n----- Violence Type: {v_type} (Country - Colombia) -----")
                    df_subset = merged_country_data[merged_country_data['violence type'] == v_type].copy()

                    if not df_subset.empty:
                        evaluate_predictions(df_subset['Previous State'], df_subset['Predicted_State'],
                                             f"Previous State ({v_type})", labels=state_labels)
                        evaluate_predictions(df_subset['Intensity'], df_subset['Predicted_Intensity'],
                                             f"Intensity ({v_type})", labels=intensity_escalation_labels)
                        evaluate_predictions(df_subset['Escalation'], df_subset['Predicted_Escalation'],
                                             f"Escalation ({v_type})", labels=intensity_escalation_labels)
                    else:
                        print(f"No merged data for {v_type} in Country-level evaluation.")

                sys.stdout = original_stdout # Restore original stdout

                # Save the captured output to a .txt file
                output_filename = os.path.join(evaluation_results_base_dir, 'country_global_evaluation.txt')
                with open(output_filename, 'w') as f:
                    f.write(output_buffer.getvalue())
                print(f"Saved Country-level global evaluation to: {output_filename}")

            else:
                print("No merged data for Country-level evaluation.")
        except Exception as e:
            print(f"Error during Country-level evaluation: {e}")
    else:
        print(f"Country data files not found: {country_actual_file_path} or {country_predicted_file_path}")


    # --- 2. Evaluate for Aggregated Department Level ---
    print("\n==========================================================")
    print("=== Aggregated Evaluation for All Departments Combined ===")
    print("==========================================================")

    all_departments_actual = []
    all_departments_predicted = []

    department_actual_dir = os.path.join(actual_data_base_dir, 'department')
    department_predicted_dir = os.path.join(predicted_data_base_dir, 'department')

    if os.path.exists(department_actual_dir) and os.path.exists(department_predicted_dir):
        department_actual_files = [f for f in os.listdir(department_actual_dir) if f.endswith('_cases_metrics.tsv')]
        department_predicted_files = [f for f in os.listdir(department_predicted_dir) if f.endswith('_cases_predictions.tsv')]

        # Load all department actual files
        for f_name in department_actual_files:
            try:
                df = pd.read_csv(os.path.join(department_actual_dir, f_name), sep='\t')
                all_departments_actual.append(df)
            except Exception as e:
                print(f"Error loading actual department file {f_name}: {e}")

        # Load all department predicted files
        for f_name in department_predicted_files:
            try:
                df = pd.read_csv(os.path.join(department_predicted_dir, f_name), sep='\t')
                all_departments_predicted.append(df)
            except Exception as e:
                print(f"Error loading predicted department file {f_name}: {e}")

        if all_departments_actual and all_departments_predicted:
            df_all_deps_actual = pd.concat(all_departments_actual, ignore_index=True)
            df_all_deps_predicted = pd.concat(all_departments_predicted, ignore_index=True)

            merged_all_deps_data = align_and_merge_data(df_all_deps_actual, df_all_deps_predicted)

            if not merged_all_deps_data.empty:
                output_buffer = io.StringIO()
                original_stdout = sys.stdout

                sys.stdout = output_buffer

                for v_type in merged_all_deps_data['violence type'].unique():
                    print(f"\n----- Violence Type: {v_type} (Aggregated Departments) -----")
                    df_subset = merged_all_deps_data[merged_all_deps_data['violence type'] == v_type].copy()

                    if not df_subset.empty:
                        evaluate_predictions(df_subset['Previous State'], df_subset['Predicted_State'],
                                             f"Previous State ({v_type})", labels=state_labels)
                        evaluate_predictions(df_subset['Intensity'], df_subset['Predicted_Intensity'],
                                             f"Intensity ({v_type})", labels=intensity_escalation_labels)
                        evaluate_predictions(df_subset['Escalation'], df_subset['Predicted_Escalation'],
                                             f"Escalation ({v_type})", labels=intensity_escalation_labels)
                    else:
                        print(f"No merged data for {v_type} in Aggregated Departments evaluation.")

                sys.stdout = original_stdout

                output_filename = os.path.join(evaluation_results_base_dir, 'departments_global_evaluation.txt')
                with open(output_filename, 'w') as f:
                    f.write(output_buffer.getvalue())
                print(f"Saved Aggregated Department-level evaluation to: {output_filename}")
            else:
                print("No merged data for Aggregated Department-level evaluation.")
        else:
            print("No actual or predicted files found for Departments. Skipping aggregation.")
    else:
        print(f"Department data directories not found: {department_actual_dir} or {department_predicted_dir}")


    # --- 3. Evaluate for Aggregated Region Level ---
    print("\n=====================================================")
    print("=== Aggregated Evaluation for All Regions Combined ===")
    print("=====================================================")

    all_regions_actual = []
    all_regions_predicted = []

    region_actual_dir = os.path.join(actual_data_base_dir, 'region')
    region_predicted_dir = os.path.join(predicted_data_base_dir, 'region')

    if os.path.exists(region_actual_dir) and os.path.exists(region_predicted_dir):
        region_actual_files = [f for f in os.listdir(region_actual_dir) if f.endswith('_cases_metrics.tsv')]
        region_predicted_files = [f for f in os.listdir(region_predicted_dir) if f.endswith('_cases_predictions.tsv')]

        # Load all region actual files
        for f_name in region_actual_files:
            try:
                df = pd.read_csv(os.path.join(region_actual_dir, f_name), sep='\t')
                all_regions_actual.append(df)
            except Exception as e:
                print(f"Error loading actual region file {f_name}: {e}")

        # Load all region predicted files
        for f_name in region_predicted_files:
            try:
                df = pd.read_csv(os.path.join(region_predicted_dir, f_name), sep='\t')
                all_regions_predicted.append(df)
            except Exception as e:
                print(f"Error loading predicted region file {f_name}: {e}")

        if all_regions_actual and all_regions_predicted:
            df_all_regions_actual = pd.concat(all_regions_actual, ignore_index=True)
            df_all_regions_predicted = pd.concat(all_regions_predicted, ignore_index=True)

            merged_all_regions_data = align_and_merge_data(df_all_regions_actual, df_all_regions_predicted)

            if not merged_all_regions_data.empty:
                output_buffer = io.StringIO()
                original_stdout = sys.stdout

                sys.stdout = output_buffer

                for v_type in merged_all_regions_data['violence type'].unique():
                    print(f"\n----- Violence Type: {v_type} (Aggregated Regions) -----")
                    df_subset = merged_all_regions_data[merged_all_regions_data['violence type'] == v_type].copy()

                    if not df_subset.empty:
                        evaluate_predictions(df_subset['Previous State'], df_subset['Predicted_State'],
                                             f"Previous State ({v_type})", labels=state_labels)
                        evaluate_predictions(df_subset['Intensity'], df_subset['Predicted_Intensity'],
                                             f"Intensity ({v_type})", labels=intensity_escalation_labels)
                        evaluate_predictions(df_subset['Escalation'], df_subset['Predicted_Escalation'],
                                             f"Escalation ({v_type})", labels=intensity_escalation_labels)
                    else:
                        print(f"No merged data for {v_type} in Aggregated Regions evaluation.")

                sys.stdout = original_stdout

                output_filename = os.path.join(evaluation_results_base_dir, 'regions_global_evaluation.txt')
                with open(output_filename, 'w') as f:
                    f.write(output_buffer.getvalue())
                print(f"Saved Aggregated Region-level evaluation to: {output_filename}")
            else:
                print("No merged data for Aggregated Region-level evaluation.")
        else:
            print("No actual or predicted files found for Regions. Skipping aggregation.")
    else:
        print(f"Region data directories not found: {region_actual_dir} or {region_predicted_dir}")

    print("\n--- Overall Aggregated Global Evaluation Finished ---")

# Important: To ensure 'sys' is available for output redirection,
# you might need to add `import sys` at the top of your notebook (Cell 1).
# I'll add a reminder for that.



--- Starting Aggregated Global Evaluation and Saving Results ---

=== Aggregated Evaluation for Country: Colombia ===
Saved Country-level global evaluation to: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/evaluation/cases/country_global_evaluation.txt

=== Aggregated Evaluation for All Departments Combined ===
Saved Aggregated Department-level evaluation to: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/evaluation/cases/departments_global_evaluation.txt

=== Aggregated Evaluation for All Regions Combined ===
Saved Aggregated Region-level evaluation to: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/evaluation/cases/regions_global_evaluation.txt

--- Overall Aggregated Global Evaluation Finished ---
