# Notebook: Violence Dynamics Prediction (Mode-Based and Combined States Approach)

## Introduction
This Jupyter Notebook implements a predictive model for the dynamics of Selective Violence (VS), Indiscriminate Violence (VI), and Collective Violence (VC) in Colombia. Leveraging previously calculated historical Escalation, Intensity, and "Previous State" metrics, the objective is to predict Intensity and Escalation for the next month, based on modal behavior observed within a sliding time window.

The model is designed to be simple and interpretable, focusing on identifying dominant patterns in the recent past to infer the future behavior of violence, now with an additional level of granularity by armed actor type.

## Prediction Methodology
For each month to be predicted, the model follows these steps, adapted to geographical granularity and actor type:

1. Data Loading and Context
Historical metrics data (Escalation, Intensity, and "Previous State") generated by the 02_Intensity_Escalation_Cases.ipynb notebook are loaded. These data are now organized by geographical level (Country, Department, Region) and, crucially, by armed actor type (State Actors, Non-State Actors, Unknown Actor Type).

2. Creation of "Combined States" (for Department and Region)
Department Level: For each department, its "Previous State" is merged with the corresponding country's "Previous State". This creates a new "Previous State, Country+Department" that reflects the combined context.

Region Level: For each region, its "Previous State" is merged with the "Previous State" of the department to which it belongs (using the region-to-department mapping). This creates a new "Previous State, Department+Region".

3. Prediction Logic with Historical Fallback
The prediction of Intensity and Escalation is based on the "Previous State" (simple or combined) of the immediately preceding month (t-1). For month t:

Identification of State t-1: The "State" value from month t-1 is taken (this can be Previous State, Previous State, Country+Department, or Previous State, Department+Region).

Sliding Window Analysis (18 months): It checks how many times the "State" from month t-1 appears within the last 18 months of historical data.

## Data Subset Decision:

If the State appears more than once in the window: The prediction of Intensity and Escalation for month t is based on the mode of those values within the 18-month window, filtered by that specific State.

If the State appears only once in the window: The prediction of Intensity and Escalation for month t is based on the mode of those values in the entire historical data available up to month t-1, filtered by that specific State.

Prediction: The mode of the Intensity and Escalation values is calculated within the chosen data subset. It is important to note that the "State" itself is not an output prediction, but rather a basis for the Intensity and Escalation predictions.

## Geographical Scope and Actor Type
This prediction process is applied, and results are generated, for the following granularities:

Country: Colombia.

Departments: Each analyzed department.

Regions: Each defined region in the study.

Additionally, these predictions are generated for each of the armed actor types:

State Actors

Non-State Actors

Unknown Actor Type

## Data Input and Output
Input: .tsv files containing collective violence counts (VS, VI, VC) with Escalation, Intensity, and "Previous State" columns, generated by the 02_Intensity_Escalation_Cases.ipynb notebook. These files are structured by geographical level and actor type.

Output: The notebook generates new .tsv files with predictions for "Intensity" and "Escalation" for each future month, by each geographical level and actor type. An explicit "state prediction" is not saved as an output column, as its purpose is solely to guide Intensity and Escalation predictions.

This notebook aims to provide informed forecasting of violence dynamics, considering both its recent behavior and broader historical context, and disaggregating this information by responsible actors.

### 1. Initial Setup, Library Imports, and Path Configuration
This block performs the initial setup for the prediction notebook. It includes importing all necessary Python libraries for data handling, file system operations, and numerical computations. It defines the base input directories where the pre-processed data (with Escalation, Intensity, and Previous State, segmented by ActorType) is located, and the base output directory where the prediction results will be saved.

Crucially, this section now also includes a mapping of regions to their corresponding departments. This mapping is essential for implementing the combined prediction logic at the regional level, ensuring that regional predictions correctly incorporate the context of their parent department's state.

In [6]:
# 1. Initial Setup, Library Imports, and Path Configuration

import pandas as pd
import os
import numpy as np

# Define the base input directory where processed data (with metrics) is stored.
# This directory now contains subfolders for 'country', 'department', 'region',
# and within those, subfolders for 'state_actors', 'non_state_actors', etc.
base_input_data_dir = os.path.join(os.getcwd(), '..', 'Results', 'intensity & escalation', 'cases')

# Define the base output directory for prediction results.
# Prediction results will be saved following a similar structure.
prediction_results_base_dir = os.path.join(os.getcwd(), '..', 'Results', 'predictions', 'cases')

# List of actor types subfolders to iterate through.
ACTOR_TYPES_SUBFOLDERS = ['state_actor', 'non_state_actor', 'unknown_actor_type']

# Define the window size for the modal prediction.
PREDICTION_WINDOW_SIZE = 18 # Months

# Define the mapping for [Intensity, Escalation] pairs to states (for reference).
state_mapping = {
    (1, 1): 'A',
    (1, 0): 'B',
    (1, -1): 'C',
    (0, 1): 'D',
    (0, 0): 'E',
    (0, -1): 'F',
    (-1, 1): 'G',
    (-1, 0): 'H',
    (-1, -1): 'I'
}

# Mapping of Regions to their corresponding Departments ---
# This dictionary maps each department to a list of regions (including NaN placeholders)
# We'll invert this for easier lookup of a region's parent department.
region_to_department_raw_mapping = {
    "AMAZONAS": ["AMAZONIA SUR-ORIENTAL"],
    "ANTIOQUIA": ["NORTE DE ANTIOQUIA", "URABA", "ORIENTE ANTIOQUEÑO", "NORDESTE ANTIOQUEÑO", "BAJO CAUCA ANTIOQUEÑO", "VALLE DE ABURRA", "SUROESTE ANTIOQUEÑO", "MAGDALENA MEDIO", "LA MOJANA", "OCCIDENTE ANTIOQUEÑO", "MAGDALENA MEDIO ANTIOQUEÑO"],
    "ARAUCA": ["LLANOS ORIENTALES", "SARARE"],
    "ARCHIPIELAGO DE SAN ANDRES, PROVIDENCIA Y SANTA CATALINA": ["ARCHIPIELAGO DE SAN ANDRES, PROVIDENCIA Y SANTA CATALINA"],
    "ATLANTICO": ["CANAL DEL DIQUE", "NORTE DE ATLANTICO", "CENTRO ORIENTE DE ATLANTICO", "OCCIDENTE DE ATLANTICO"],
    "BOGOTA, D. C.": ["AREA METROPOLITANA DE BOGOTA"],
    "BOLIVAR": ["MONTES DE MARIA", "MAGDALENA MEDIO", "CANAL DEL DIQUE", "LA MOJANA", "SUR DE BOLIVAR"],
    "BOYACA": ["MAGDALENA MEDIO", "ALTIPLANO CUNDIBOYACENSE", "ORIENTE DE BOYACA"],
    "CALDAS": ["EJE CAFETERO", "MAGDALENA MEDIO"],
    "CAQUETA": ["FLORENCIA Y AREA DE INFLUENCIA", "CAGUAN", "AMAZONIA SUR-ORIENTAL"],
    "CASANARE": ["PIEDEMONTE LLANERO", "LLANOS ORIENTALES"],
    "CAUCA": ["PATIA", "MACIZO COLOMBIANO", "NORTE DEL CAUCA", "ANDEN PACIFICO SUR"],
    "CESAR": ["SIERRA NEVADA DE SANTA MARTA", "SUR DE CESAR", "SERRANIA DEL PERIJA", "MAGDALENA MEDIO"],
    "CHOCO": ["ATRATO", "URABA", "VALLE DE SAN JUAN", "LITORAL PACIFICO"],
    "CORDOBA": ["ALTO SINU Y SAN JORGE", "LA MOJANA", "URABA", "NORTE DE CORDOBA"],
    "CUNDINAMARCA": ["SUROCCIDENTE DE CUNDINAMARCA", "MAGDALENA MEDIO", "AREA METROPOLITANA DE BOGOTA", "NOROCCIDENTE DE CUNDINAMARCA", "PIEDEMONTE LLANERO", "ALTIPLANO CUNDIBOYACENSE", "SUMAPAZ"],
    "GUAINIA": ["AMAZONIA SUR-ORIENTAL"],
    "GUAVIARE": ["ARIARI GUAYABERO"],
    "HUILA": ["SUR DEL HUILA", "NORTE DEL HUILA", "MACIZO COLOMBIANO", "CENTRO DEL HUILA"],
    "LA GUAJIRA": ["SIERRA NEVADA DE SANTA MARTA", "ALTA GUAJIRA", "SERRANIA DEL PERIJA"],
    "MAGDALENA": ["SIERRA NEVADA DE SANTA MARTA", "CIENAGA GRANDE DE SANTA MARTA", "SUR DE MAGDALENA"],
    "META": ["ARIARI GUAYABERO", "ALTILLANURA", "PIEDEMONTE LLANERO"],
    "NARIÑO": ["ANDEN PACIFICO SUR", "PATIA", "CENTRO DE NARIÑO", "OCCIDENTE DE NARIÑO", "SUR DE NARIÑO", "CENTRO OCCIDENTE DE NARIÑO", "NORTE DE NARIÑO"],
    "NORTE DE SANTANDER": ["PROVINCIA DE RICAURTE", "CATATUMBO", "AREA METROPOLITANA DE CUCUTA", "CENTRO DE NORTE DE SANTANDER", "SURORIENTE DE NORTE DE SANTANDER", "SUROCCIDENTE DE NORTE DE SANTANDER"],
    "PUTUMAYO": ["BAJO PUTUMAYO", "MEDIO PUTUMAYO", "ALTO PUTUMAYO"],
    "QUINDIO": ["EJE CAFETERO"],
    "RISARALDA": ["EJE CAFETERO"],
    "SANTANDER": ["MAGDALENA MEDIO", "PROVINCIA DE SOTO", "PROVINCIA COMUNERA", "PROVINCIA DE GARCIA ROVIRA", "PROVINCIA DE GUANENTA", "PROVINCIA DE VELEZ", "SERRANIA DE LOS YARIGUIES"],
    "SIN INFORMACION": ["SIN INFORMACION"],
    "SUCRE": ["MONTES DE MARIA", "MORROSQUILLO Y SABANAS DE SUCRE", "LA MOJANA"],
    "TOLIMA": ["SUR DEL TOLIMA", "NORTE DEL TOLIMA", "SUMAPAZ"],
    "VALLE DEL CAUCA": ["ANDEN PACIFICO SUR", "CENTRO DEL VALLE", "SUR DEL VALLE", "NORTE DEL VALLE"],
    "VAUPES": ["AMAZONIA SUR-ORIENTAL"],
    "VICHADA": ["ALTILLANURA"]
}

# Invert the mapping to easily get the department from a region name
region_to_department_map = {}
for dept, regions in region_to_department_raw_mapping.items():
    for region in regions:
        if pd.notna(region): # Only add valid region names
            # Standardize region name to lowercase without spaces for lookup consistency
            region_key = str(region).lower().replace(" ", "")
            # Standardize department name to lowercase without spaces for lookup consistency
            department_value = str(dept).lower().replace(" ", "")
            region_to_department_map[region_key] = department_value

print("Initial setup, library imports, and path configuration complete.")
print(f"Input data will be read from: {base_input_data_dir}")
print(f"Prediction results will be saved in: {prediction_results_base_dir}")
print(f"Prediction window size set to: {PREDICTION_WINDOW_SIZE} months.")
print(f"Region-to-Department mapping loaded and prepared.")


Initial setup, library imports, and path configuration complete.
Input data will be read from: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/intensity & escalation/cases
Prediction results will be saved in: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/predictions/cases
Prediction window size set to: 18 months.
Region-to-Department mapping loaded and prepared.


### 2. Define Prediction Logic Functions with Historical Fallback
This block defines the core functions for the prediction model, incorporating the new logic for handling "state" predictions.

get_mode_or_default(series, default_value=None): A helper function to find the mode of a pandas Series. If there are multiple modes, it picks the first one. If the series is empty, it returns a default value.

predict_next_state_and_dynamics(df_time_series, window_size, state_column_name='Previous State'): This is the main prediction function. It iterates through the time series, applying a sliding window logic. It now implements a conditional filtering approach:

If the "state" of the current month (where prediction is being made) appears more than once in the defined window_size, the modes of 'Intensity' and 'Escalation' are calculated from data within that specific window where the state matches.

If the "state" of the current month appears only once in the window_size, the modes of 'Intensity' and 'Escalation' are calculated from the entire historical data available up to that month, filtered by that specific state.
This function is now flexible to use different state columns (e.g., Previous State, Previous State, Country+Department).

In [7]:
# 2. Define Prediction Logic Functions

print("\n--- Defining Prediction Logic Functions ---")

def get_mode_or_default(series, default_value=None):
    """
    Calculates the mode of a pandas Series. If there are multiple modes,
    it returns the first one. If the series is empty, it returns a default value.

    Args:
        series (pd.Series): The input series.
        default_value: The value to return if the series is empty or has no mode.

    Returns:
        The mode of the series, or the default_value.
    """
    if series.empty:
        return default_value
    modes = series.mode()
    if not modes.empty:
        return modes.iloc[0] # Return the first mode if multiple exist
    return default_value # Should not happen if series is not empty, but for safety

def predict_next_state_and_dynamics(df_time_series, window_size, state_column_name='Previous State'):
    """
    Predicts the 'State', 'Intensity', and 'Escalation' for the next month
    based on the modal behavior within a sliding window, with a historical fallback.

    Args:
        df_time_series (pd.DataFrame): A DataFrame containing 'Año', 'Mes',
                                       'Intensity', 'Escalation', and the specified
                                       'state_column_name' (e.g., 'Previous State',
                                       'Previous State, Country+Department').
                                       Assumes it's sorted chronologically.
        window_size (int): The number of months in the sliding window to look back.
        state_column_name (str): The name of the column that holds the 'state' to predict
                                 (e.g., 'Previous State', 'Previous State, Country+Department').

    Returns:
        pd.DataFrame: A DataFrame with 'Año', 'Mes', 'violence type',
                      'Predicted_State', 'Predicted_Intensity', 'Predicted_Escalation'.
                      The 'Predicted_State' refers to the prediction of the value
                      in 'state_column_name' for the next month.
    """
    predictions = []
    # Ensure the DataFrame is sorted by Year and Mes for correct window slicing
    df_time_series = df_time_series.sort_values(by=['Año', 'Mes']).reset_index(drop=True)

    # We need to make sure the state_column_name exists in the DataFrame
    if state_column_name not in df_time_series.columns:
        print(f"Error: State column '{state_column_name}' not found in the input DataFrame.")
        return pd.DataFrame()

    # Iterate through the time series to make predictions for future months
    # The prediction for month 't' (current_index) uses data from 't-window_size' to 't-1'.
    # The `df_time_series.loc[i, ...]` refers to the month 'i' which is the target month for prediction.
    for i in range(window_size, len(df_time_series)):
        # Data for the current month 'i' (which we are trying to predict the state FOR)
        current_month_data = df_time_series.loc[i]
        current_month_year = current_month_data['Año']
        current_month_mes = current_month_data['Mes']
        violence_type = current_month_data['violence type']

        # Get the 'state' value from the month *just before* the prediction target (i.e., at index i-1)
        # This is the actual previous state from which we want to base our prediction.
        state_at_prev_month = df_time_series.loc[i-1, state_column_name]

        # Define the historical window (last 'window_size' months *before* the current month 'i')
        # This window contains data from index (i - window_size) up to (i - 1)
        historical_window = df_time_series.iloc[i - window_size : i].copy()

        if historical_window.empty or state_column_name not in historical_window.columns:
            print(f"Warning: Empty or invalid window for prediction at index {i}. Skipping prediction for this month.")
            continue

        # Check how many times the 'state_at_prev_month' appears in the current historical window
        state_occurrences_in_window = historical_window[state_column_name].value_counts().get(state_at_prev_month, 0)

        # Decide which data subset to use for finding modes based on the occurrence count
        if state_occurrences_in_window > 1:
            # If the state appears more than once, filter the window by that state
            data_for_mode_calculation = historical_window[historical_window[state_column_name] == state_at_prev_month].copy()
            # print(f"Using window filtered by state '{state_at_prev_month}' for prediction at {current_month_year}-{current_month_mes}")
        else:
            # If the state appears only once in the window (meaning only the (i-1) month has it, or less),
            # use the entire historical data up to month (i-1), filtered by that state.
            full_historical_data_up_to_prev = df_time_series.iloc[:i].copy()
            data_for_mode_calculation = full_historical_data_up_to_prev[full_historical_data_up_to_prev[state_column_name] == state_at_prev_month].copy()
            # print(f"Using full historical data filtered by state '{state_at_prev_month}' for prediction at {current_month_year}-{current_month_mes}")

        # If after filtering, there's still no data, use a default
        if data_for_mode_calculation.empty:
            predicted_state = state_at_prev_month # Predict the same state if no historical context
            predicted_intensity = 0 # Default if no data
            predicted_escalation = 0 # Default if no data
            # print(f"Warning: No historical context for state '{state_at_prev_month}'. Defaulting prediction.")
        else:
            # 1. Predict the 'State' for the next month
            # The 'Predicted_State' refers to the prediction of the state_column_name value.
            # We predict the state of the *current* month `i` based on the state `state_at_prev_month`
            # and the patterns of (I,E) values observed with that state.
            # Here, we can simply predict the 'state_at_prev_month' as the predicted state,
            # as the filtering is based on that specific state.
            predicted_state = state_at_prev_month

            # 2. Predict 'Intensity' and 'Escalation' for the next month
            predicted_intensity = get_mode_or_default(data_for_mode_calculation['Intensity'], default_value=0)
            predicted_escalation = get_mode_or_default(data_for_mode_calculation['Escalation'], default_value=0)

        predictions.append({
            'Año': current_month_year,
            'Mes': current_month_mes,
            'violence type': violence_type,
            'Predicted_State': predicted_state, # This is the prediction of the state_column_name
            'Predicted_Intensity': predicted_intensity,
            'Predicted_Escalation': predicted_escalation
        })

    return pd.DataFrame(predictions)

print("Prediction logic functions defined.")



--- Defining Prediction Logic Functions ---
Prediction logic functions defined.


### 3. Execute Prediction and Save Results (Including Actor Types and Combined States)
This block orchestrates the entire prediction process, now significantly enhanced to handle actor types and combined geographical states. It iterates through each geographical level (Country, Department, Region) and then through each defined ActorType (State Actor, Non-State Actor, Unknown).

For each specific combination of geographical unit, violence type (VS, VI, VC), and actor type, the process is as follows:

Data Loading: It loads the relevant historical metrics data (_cases_metrics.tsv) from the 02_Intensity_Escalation_Cases.ipynb output, now residing in actor-specific subfolders.

State Combination (for Department and Region levels):

Department Level: For each department, it also loads the corresponding Country-level metrics data. These two datasets are merged, and a new Previous State, Country+Department column is created by concatenating their individual Previous State values.

Region Level: Similarly, for each region, it loads the corresponding Department-level metrics data (for the region's specific parent department). These two datasets are merged, and a new Previous State, Department+Region column is created.

Prediction Execution: The predict_next_state_and_dynamics function (defined in Cell 2) is called. The state_column_name argument is dynamically adjusted based on the current geographical level: 'Previous State' for Country, 'Previous State, Country+Department' for Departments, and 'Previous State, Department+Region' for Regions.

Results Saving: The generated predictions are then saved into new TSV files within a corresponding output directory structure that maintains the geographical level and actor type segmentation.

This comprehensive execution ensures that predictions are made with the appropriate contextual state information at each level, accounting for the influence of broader geographical units where specified.

In [8]:
# 3. Execute Prediction and Save Results

print("\n--- Executing Prediction and Saving Results (Including Actor Types and Combined States) ---")

# Ensure prediction window size, base directories, and prediction functions are defined
if 'PREDICTION_WINDOW_SIZE' not in globals() or \
   'predict_next_state_and_dynamics' not in globals() or \
   'ACTOR_TYPES_SUBFOLDERS' not in globals() or \
   'region_to_department_map' not in globals():
    print("Error: Required global variables or functions not found. Please run Cells 1 and 2.")
else:
    # Define the geographical levels for iteration
    # The 'input_subdir' refers to the base folder within 'base_input_data_dir' (e.g., 'country', 'department')
    # The 'output_subdir_base' refers to the base folder within 'prediction_results_base_dir'
    levels_info = [
        {'name': 'Country', 'input_subdir': 'country', 'output_subdir_base': 'country'},
        {'name': 'Department', 'input_subdir': 'department', 'output_subdir_base': 'department'}, # Note: input is 'department' output is 'department'
        {'name': 'Region', 'input_subdir': 'region', 'output_subdir_base': 'region'}
    ]

    for level in levels_info:
        level_name = level['name']
        input_base_level_dir = os.path.join(base_input_data_dir, level['input_subdir'])
        output_base_level_dir = os.path.join(prediction_results_base_dir, level['output_subdir_base'])

        print(f"\n=======================================================")
        print(f"=== Starting Prediction for {level_name} Level ===")
        print(f"=======================================================")

        # Loop through each actor type subfolder
        for actor_type_subfolder in ACTOR_TYPES_SUBFOLDERS:
            current_input_dir = os.path.join(input_base_level_dir, actor_type_subfolder)
            current_output_dir_for_actor = os.path.join(output_base_level_dir, actor_type_subfolder)

            print(f"\n--- Processing {level_name} Level for Actor Type: {actor_type_subfolder} ---")

            # Create the output directory for this level and actor type if it doesn't exist
            os.makedirs(current_output_dir_for_actor, exist_ok=True)
            print(f"Ensured output directory exists: {current_output_dir_for_actor}")

            # Check if the input directory for the current actor type exists
            if not os.path.exists(current_input_dir):
                print(f"Error: Input data directory not found for {level_name} ({actor_type_subfolder}): {current_input_dir}. Skipping this actor type.")
                continue # Skip to the next actor type

            # List all TSV files in the input directory for this level and actor type
            processed_files = [f for f in os.listdir(current_input_dir) if f.endswith('_cases_metrics.tsv')]

            if not processed_files:
                print(f"Warning: No processed metrics TSV files found in {current_input_dir}. Skipping {level_name} ({actor_type_subfolder}) prediction.")
                continue # Skip to the next actor type or next level

            print(f"Found {len(processed_files)} processed {level_name} ({actor_type_subfolder}) files. Generating predictions for each...")

            for filename in processed_files:
                file_path = os.path.join(current_input_dir, filename)
                # Extract unit name from filename (e.g., 'colombia', 'antioquia', 'pacifica')
                unit_name = filename.replace("_cases_metrics.tsv", "")

                print(f"\nGenerating predictions for {level_name}: {unit_name} ({actor_type_subfolder}) from file: {os.path.basename(file_path)}")

                try:
                    df_unit_metrics = pd.read_csv(file_path, sep='\t')

                    # Ensure required columns are present for prediction
                    required_cols = ['Año', 'Mes', 'violence type', 'Intensity', 'Escalation', 'Previous State']
                    if not all(col in df_unit_metrics.columns for col in required_cols):
                        print(f"Error: Missing required columns in {os.path.basename(file_path)}: {required_cols}. Skipping prediction for this unit.")
                        continue

                    # Ensure data types are correct
                    df_unit_metrics['Año'] = pd.to_numeric(df_unit_metrics['Año'], errors='coerce').astype(int)
                    df_unit_metrics['Mes'] = pd.to_numeric(df_unit_metrics['Mes'], errors='coerce').astype(int)
                    df_unit_metrics['Intensity'] = pd.to_numeric(df_unit_metrics['Intensity'], errors='coerce').fillna(0).astype(int)
                    df_unit_metrics['Escalation'] = pd.to_numeric(df_unit_metrics['Escalation'], errors='coerce').fillna(0).astype(int)
                    df_unit_metrics = df_unit_metrics.dropna(subset=required_cols).copy()
                    df_unit_metrics = df_unit_metrics.sort_values(by=['Año', 'Mes']).reset_index(drop=True) # Always sort for safety

                    # Get unique violence types (VS, VI, VC) from the loaded data
                    unique_violence_types_in_file = df_unit_metrics['violence type'].unique()

                    all_predictions_for_unit = []

                    # --- Logic for Country, Department, and Region Level Predictions ---
                    if level_name == 'Country':
                        state_col_to_predict = 'Previous State'
                        for v_type in unique_violence_types_in_file:
                            df_type_series = df_unit_metrics[df_unit_metrics['violence type'] == v_type].copy()
                            if len(df_type_series) < PREDICTION_WINDOW_SIZE:
                                print(f"Warning: Not enough historical data for {v_type} in {unit_name} ({len(df_type_series)} months). Skipping prediction.")
                                continue
                            predictions_df_type = predict_next_state_and_dynamics(df_type_series, PREDICTION_WINDOW_SIZE, state_column_name=state_col_to_predict)
                            if not predictions_df_type.empty:
                                # Remove the 'Predicted_State' column as it's not a desired output
                                if 'Predicted_State' in predictions_df_type.columns:
                                    predictions_df_type.drop(columns=['Predicted_State'], inplace=True)
                                all_predictions_for_unit.append(predictions_df_type)

                    elif level_name == 'Department':
                        state_col_to_predict = 'Previous State, Country+Department'
                        # Load Country-level data for the same actor type and violence type
                        country_input_dir = os.path.join(base_input_data_dir, 'country', actor_type_subfolder)
                        country_filename = f"colombia_cases_metrics.tsv" # Country filename is always 'colombia'
                        country_file_path = os.path.join(country_input_dir, country_filename)

                        if not os.path.exists(country_file_path):
                            print(f"Error: Country data not found for {actor_type_subfolder} at {country_file_path}. Skipping Department prediction for {unit_name}.")
                            continue

                        df_country_metrics = pd.read_csv(country_file_path, sep='\t')
                        df_country_metrics['Año'] = pd.to_numeric(df_country_metrics['Año'], errors='coerce').astype(int)
                        df_country_metrics['Mes'] = pd.to_numeric(df_country_metrics['Mes'], errors='coerce').astype(int)
                        df_country_metrics = df_country_metrics.dropna(subset=['Año', 'Mes', 'violence type', 'Previous State']).copy()
                        df_country_metrics = df_country_metrics.sort_values(by=['Año', 'Mes']).reset_index(drop=True)


                        for v_type in unique_violence_types_in_file:
                            df_dept_vtype = df_unit_metrics[df_unit_metrics['violence type'] == v_type].copy()
                            df_country_vtype = df_country_metrics[df_country_metrics['violence type'] == v_type].copy()

                            if df_dept_vtype.empty or df_country_vtype.empty:
                                print(f"Warning: Missing data for {v_type} in {unit_name} (Department) or Country. Skipping combined state prediction.")
                                continue

                            # Merge Department and Country data
                            merged_combined_df = pd.merge(
                                df_dept_vtype,
                                df_country_vtype[['Año', 'Mes', 'violence type', 'Previous State']],
                                on=['Año', 'Mes', 'violence type'],
                                how='inner',
                                suffixes=('_dept', '_country')
                            )

                            if merged_combined_df.empty:
                                print(f"Warning: Merged data for {v_type} in {unit_name} (Department) and Country is empty. Skipping.")
                                continue

                            # Create the combined state column
                            merged_combined_df[state_col_to_predict] = merged_combined_df['Previous State_country'] + '_' + merged_combined_df['Previous State_dept']

                            if len(merged_combined_df) < PREDICTION_WINDOW_SIZE:
                                print(f"Warning: Not enough historical data for {v_type} in {unit_name} (Department) after merging for combined state ({len(merged_combined_df)} months). Skipping prediction.")
                                continue

                            predictions_df_type = predict_next_state_and_dynamics(merged_combined_df, PREDICTION_WINDOW_SIZE, state_column_name=state_col_to_predict)
                            if not predictions_df_type.empty:
                                # Remove the 'Predicted_State' column as it's not a desired output
                                if 'Predicted_State' in predictions_df_type.columns: # It will be 'Predicted_State' initially from the function
                                    predictions_df_type.drop(columns=['Predicted_State'], inplace=True)
                                # No need to rename 'Predicted_State' to 'Predicted_Combined_State' if we're dropping it.
                                all_predictions_for_unit.append(predictions_df_type)


                    elif level_name == 'Region':
                        state_col_to_predict = 'Previous State, Department+Region'

                        # Determine the parent department for this region
                        region_key_clean = unit_name.lower().replace(" ", "")
                        parent_department_name = region_to_department_map.get(region_key_clean)

                        if parent_department_name is None:
                            print(f"Error: Parent department not found for region '{unit_name}'. Skipping Region prediction.")
                            continue

                        # Load Department-level data for the parent department, same actor type and violence type
                        department_input_dir = os.path.join(base_input_data_dir, 'department', actor_type_subfolder)
                        # The department filename also needs to be clean: lowercase, no spaces
                        department_filename = f"{parent_department_name.replace(' ', '').lower()}_cases_metrics.tsv"
                        department_file_path = os.path.join(department_input_dir, department_filename)

                        if not os.path.exists(department_file_path):
                            print(f"Error: Department data not found for parent '{parent_department_name}' ({actor_type_subfolder}) at {department_file_path}. Skipping Region prediction for {unit_name}.")
                            continue

                        df_dept_metrics = pd.read_csv(department_file_path, sep='\t')
                        df_dept_metrics['Año'] = pd.to_numeric(df_dept_metrics['Año'], errors='coerce').astype(int)
                        df_dept_metrics['Mes'] = pd.to_numeric(df_dept_metrics['Mes'], errors='coerce').astype(int)
                        df_dept_metrics = df_dept_metrics.dropna(subset=['Año', 'Mes', 'violence type', 'Previous State']).copy()
                        df_dept_metrics = df_dept_metrics.sort_values(by=['Año', 'Mes']).reset_index(drop=True)


                        for v_type in unique_violence_types_in_file:
                            df_region_vtype = df_unit_metrics[df_unit_metrics['violence type'] == v_type].copy()
                            df_dept_vtype = df_dept_metrics[df_dept_metrics['violence type'] == v_type].copy()

                            if df_region_vtype.empty or df_dept_vtype.empty:
                                print(f"Warning: Missing data for {v_type} in {unit_name} (Region) or its parent department {parent_department_name}. Skipping combined state prediction.")
                                continue

                            # Merge Region and Department data
                            merged_combined_df = pd.merge(
                                df_region_vtype,
                                df_dept_vtype[['Año', 'Mes', 'violence type', 'Previous State']],
                                on=['Año', 'Mes', 'violence type'],
                                how='inner',
                                suffixes=('_region', '_dept')
                            )

                            if merged_combined_df.empty:
                                print(f"Warning: Merged data for {v_type} in {unit_name} (Region) and Department {parent_department_name} is empty. Skipping.")
                                continue

                            # Create the combined state column
                            merged_combined_df[state_col_to_predict] = merged_combined_df['Previous State_dept'] + '_' + merged_combined_df['Previous State_region']

                            if len(merged_combined_df) < PREDICTION_WINDOW_SIZE:
                                print(f"Warning: Not enough historical data for {v_type} in {unit_name} (Region) after merging for combined state ({len(merged_combined_df)} months). Skipping prediction.")
                                continue

                            predictions_df_type = predict_next_state_and_dynamics(merged_combined_df, PREDICTION_WINDOW_SIZE, state_column_name=state_col_to_predict)
                            if not predictions_df_type.empty:
                                # Remove the 'Predicted_State' column as it's not a desired output
                                if 'Predicted_State' in predictions_df_type.columns: # It will be 'Predicted_State' initially from the function
                                    predictions_df_type.drop(columns=['Predicted_State'], inplace=True)
                                # No need to rename 'Predicted_State' to 'Predicted_Combined_State' if we're dropping it.
                                all_predictions_for_unit.append(predictions_df_type)

                    # Concatenate all predictions for the current unit (across VS, VI, VC)
                    if all_predictions_for_unit:
                        final_predictions_df = pd.concat(all_predictions_for_unit, ignore_index=True)
                        # No longer need to rename 'Predicted_State' or 'Predicted_Combined_State'
                        # as they are dropped. The output will only contain 'Predicted_Intensity' and 'Predicted_Escalation'.
                        final_predictions_df = final_predictions_df.sort_values(by=['Año', 'Mes', 'violence type']).reset_index(drop=True)

                        # --- Save the predictions to TSV ---
                        # Generate the output filename: unit name + "_cases_predictions.tsv"
                        output_filename = unit_name.lower().replace(" ", "") + "_cases_predictions.tsv"
                        # The output directory includes the level and actor type
                        output_path = os.path.join(current_output_dir_for_actor, output_filename)

                        try:
                            final_predictions_df.to_csv(output_path, sep='\t', index=False)
                            print(f"Saved predictions for {level_name}: {unit_name} ({actor_type_subfolder}) to {output_filename}")
                        except Exception as e:
                            print(f"Error saving predictions for {level_name}: {unit_name} ({actor_type_subfolder}) to {output_filename}: {e}")
                    else:
                        print(f"No predictions generated for {level_name}: {unit_name} ({actor_type_subfolder}). Skipping save.")

                except Exception as e:
                    print(f"An unexpected error occurred during processing file {os.path.basename(file_path)}: {e}")

            print(f"\n{level_name}-level prediction calculation and saving complete for Actor Type: {actor_type_subfolder}.")

    print("\n--- Overall Prediction Process Finished ---")



--- Executing Prediction and Saving Results (Including Actor Types and Combined States) ---

=== Starting Prediction for Country Level ===

--- Processing Country Level for Actor Type: state_actor ---
Ensured output directory exists: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/predictions/cases/country/state_actor
Found 1 processed Country (state_actor) files. Generating predictions for each...

Generating predictions for Country: colombia (state_actor) from file: colombia_cases_metrics.tsv
Saved predictions for Country: colombia (state_actor) to colombia_cases_predictions.tsv

Country-level prediction calculation and saving complete for Actor Type: state_actor.

--- Processing Country Level for Actor Type: non_state_actor ---
Ensured output directory exists: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/predictions/cases/country/non_state_actor
Found 1 processed Country (non_state_actor) files. Generating predictions for e