# Calculation of Intensity and Escalation Metrics for Cases (VS/VI/VC)

## Project: 
VS_VI_Source_CodeThis Jupyter Notebook is part of the VS_VI_Source_Code project, which aims to analyze the dynamics of Selective Violence (VS) and Indiscriminate Violence (VI) in Colombia.

## Purpose
The primary purpose of this notebook is to calculate the key temporal metrics (Escalation and Intensity) for pre-processed monthly case counts of Selective Violence (VS), Indiscriminate Violence (VI), and their sum (Violencia Colectiva, VC). These calculations are performed for data at the Country, Department, and Region levels, using data files generated in previous processing steps. The calculated metrics are then saved for subsequent analysis and visualization.

## Workflow
This notebook focuses on the Metric Calculation and Results Saving stages of the project's analytical workflow. It assumes that the monthly case counts per violence type (VS/VI) for each geographical unit have already been processed and saved in the ../Data/processed/cases/ directory structure.

## About
This notebook implements the logic to read structured time series data (monthly case counts for VS and VI) from files, calculate the total collective violence (VC), apply the defined Escalation and Intensity functions to the time series of VS, VI, and VC, and store the final results including these metrics.

### 1. Initial Setup, Library Imports, and Path Configuration
This block performs the initial setup for the notebook environment. It includes importing all necessary Python libraries required for data handling, file system operations, numerical computations, and generating combinations (specifically for calculating VC). It defines the relative paths for the input directories containing the pre-processed monthly case counts (VS/VI) for each geographical level and the base output directory where the final results with calculated metrics will be stored.

In [4]:
# 1. Initial Setup, Library Imports, and Path Configuration

import pandas as pd
import os
import numpy as np
from itertools import product # Needed for potential future use or clarity, though VC calculation is simpler here

# Define the base output directory for results.
# Results for different levels (country, department, region) will go into subfolders here.
# Assumes the notebook is in a subfolder (e.g., 'notebooks') and results go one level up in 'Results/intensity & escalation/cases'.
# Adjust '../' if your actual folder structure is different.
# Example structure:
# Project_Root/
# ├── Data/
# │   └── processed/
# │       └── cases/
# │           ├── country/    # Processed country data (Input)
# │           ├── departments/ # Processed department data (Input)
# │           └── regions/    # Processed region data (Input)
# ├── Results/
# │   └── intensity & escalation/
# │       └── cases/    # Results will be saved here (Output)
# └── notebooks/        # This notebook is here
base_results_dir = os.path.join(os.getcwd(), '..', 'Results', 'intensity & escalation', 'cases')

# Define the input directories for processed data (monthly counts per type).
# These directories should contain the TSV files generated in previous steps (Cells 4, 5, and 6 of the other notebook).
processed_data_base_dir = os.path.join(os.getcwd(), '..', 'Data', 'processed', 'cases')
processed_data_country_dir = os.path.join(processed_data_base_dir, 'country')
processed_data_dept_dir = os.path.join(processed_data_base_dir, 'departments')
processed_data_region_dir = os.path.join(processed_data_base_dir, 'regions')

# Define the year range (for context, should match the processed data files).
min_year = 1958
max_year = 2022

print("Initial setup, library imports, and path configuration complete.")
print(f"Processed data will be read from: {processed_data_base_dir}")
print(f"Results will be saved in: {base_results_dir}")


Initial setup, library imports, and path configuration complete.
Processed data will be read from: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Data/processed/cases
Results will be saved in: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/intensity & escalation/cases


### 2. Define Metric Calculation Functions
This block defines the core Python functions used to calculate the temporal dynamics metrics: Escalation and Intensity. These functions are designed to operate on a single time series (a pandas Series or DataFrame column) of monthly counts.calculate_escalation(series): Computes the Escalation metric by comparing each month's count to the previous month's count.calculate_intensity(series, window=36): Computes the Intensity metric by comparing each month's count to the average count over a preceding rolling window (defaulting to 36 months).The functions return a series of the same length as the input, with values -1, 0, or 1, representing decrease, no change, or increase relative to the comparison point. Initial months where the comparison is not possible are handled by filling with 0.

In [5]:
# 2. Define Metric Calculation Functions

print("\n--- Defining Metric Calculation Functions ---")

def calculate_escalation(series):
    """
    Calculates the Escalation metric for a time series.
    Escalation = 1 if current > previous, 0 if current == previous, -1 if current < previous.
    Handles the first month (no previous month) by setting Escalation to 0.

    Args:
        series (pd.Series): A pandas Series representing a time series of counts.

    Returns:
        pd.Series: A pandas Series of the same length with Escalation values (-1, 0, 1).
    """
    # Shift the series to get the previous month's value
    previous_month = series.shift(1)

    # Calculate the difference
    difference = series - previous_month

    # Apply the conditions using numpy.sign to get -1, 0, or 1
    # Fill NaN for the first month's escalation (no previous data) with 0
    escalation = np.sign(difference).fillna(0).astype(int)

    return escalation

def calculate_intensity(series, window=36):
    """
    Calculates the Intensity metric for a time series.
    Intensity = 1 if current > 36-month mean, 0 if current == mean, -1 if current < mean.
    Requires at least 'window' months of data to calculate the mean.

    Args:
        series (pd.Series): A pandas Series representing a time series of counts.
        window (int): The number of preceding months to use for the rolling mean calculation.

    Returns:
        pd.Series: A pandas Series of the same length with Intensity values (-1, 0, 1).
    """
    # Calculate the rolling mean over the specified window
    rolling_mean = series.rolling(window=window).mean()

    # Calculate the difference between current and rolling mean
    difference = series - rolling_mean

    # Apply the conditions using numpy.sign to get -1, 0, or 1
    # Fill NaN for the first 'window' months (not enough data for mean) with 0
    intensity = np.sign(difference).fillna(0).astype(int)

    return intensity

print("Metric calculation functions defined.")



--- Defining Metric Calculation Functions ---
Metric calculation functions defined.


### 3. Calculate Intensity and Escalation Metrics from Processed Data with Previous State
This block calculates the monthly Escalation and Intensity metrics for Selective Violence (VS), Indiscriminate Violence (VI), and their sum (Violencia Colectiva, VC) case counts from 1958 to 2022. These metrics are computed by loading the pre-processed monthly case counts (including VS/VI breakdown) for Country, Departments, and Regions from their respective directories (../Data/processed/cases/). The script iterates through the files for each level, calculates VC for each unit by summing the existing VS and VI counts, applies the metric functions (defined in Cell 2) to the time series for VS, VI, and VC, and crucially, adds a "Previous State" column. This new column indicates the combined state of the Intensity and Escalation metrics from the immediately preceding month, mapped to specific letter codes (A-I). Finally, it saves the resulting DataFrames (including counts, metrics, and the Previous State) to separate TSV files in the designated results directory structure (../Results/intensity & escalation/cases/). This process relies on the .tsv files generated in the previous data processing notebook (specifically, the outputs of its Cells 4, 5, and 6).

In [6]:
# 3. Calculate and Save Metrics from Processed Data

print("\n--- Calculating and Saving Metrics from Processed Data ---")

# Define the base output directory for results
base_results_dir = os.path.join(os.getcwd(), '..', 'Results', 'intensity & escalation', 'cases')

# Define the input directories for processed data (from cells 4, 5 and 6)
processed_data_base_dir = os.path.join(os.getcwd(), '..', 'Data', 'processed', 'cases')
processed_data_country_dir = os.path.join(processed_data_base_dir, 'country')
processed_data_dept_dir = os.path.join(processed_data_base_dir, 'departments')
processed_data_region_dir = os.path.join(processed_data_base_dir, 'regions')

# Ensure metric calculation functions are defined (from Cell 2)
if 'calculate_escalation' not in globals() or 'calculate_intensity' not in globals():
    print("Error: Metric calculation functions (calculate_escalation, calculate_intensity) not found. Please run Cell 2.")
else:

    # Define the mapping for [Intensity, Escalation] pairs to states
    state_mapping = {
        (1, 1): 'A',
        (1, 0): 'B',
        (1, -1): 'C',
        (0, 1): 'D',
        (0, 0): 'E',
        (0, -1): 'F',
        (-1, 1): 'G',
        (-1, 0): 'H',
        (-1, -1): 'I'
    }

    # --- Helper Function to Calculate Metrics and State for VS, VI, and VC ---

    def calculate_metrics_and_state_for_types(df_unit_counts):
        """
        Calculates Escalation, Intensity, and Previous State for VS, VI, and VC
        within a single DataFrame containing 'Año', 'Mes', 'violence type', and 'CaseCount'.
        Assumes the input DataFrame is already complete for Year-Month-Violence Type
        combinations and sorted.
        """
        metrics_list = []
        for v_type in ['VS', 'VI', 'VC']:
            # Filter data for the current violence type
            df_type = df_unit_counts[df_unit_counts['violence type'] == v_type].copy()

            if not df_type.empty:
                # Ensure the time series is sorted for correct shift/rolling calculations
                df_type = df_type.sort_values(by=['Año', 'Mes']).reset_index(drop=True)

                # Calculate metrics using the functions defined in Cell 2
                df_type['Escalation'] = calculate_escalation(df_type['CaseCount'])
                df_type['Intensity'] = calculate_intensity(df_type['CaseCount'])

                # --- Calculate Current State and Shift for Previous State ---
                # Create a temporary column with the (Intensity, Escalation) tuple for the current month
                df_type['CurrentState_Tuple'] = list(zip(df_type['Intensity'], df_type['Escalation']))

                # Map the tuple to the state letter using the defined mapping
                # Use .get() with a default (e.g., 'Unknown' or NaN) for safety if a pair isn't in mapping
                df_type['CurrentState'] = df_type['CurrentState_Tuple'].apply(lambda x: state_mapping.get(x, 'Unknown'))

                # Shift the 'CurrentState' column to get the 'Previous State'
                df_type['Previous State'] = df_type['CurrentState'].shift(1).fillna('Start') # Fill the first month's previous state with 'Start' or similar

                # Drop the temporary columns
                df_type = df_type.drop(columns=['CurrentState_Tuple', 'CurrentState'])

                # Select relevant columns and append, including the new 'Previous State'
                metrics_list.append(df_type[['Año', 'Mes', 'violence type', 'CaseCount', 'Escalation', 'Intensity', 'Previous State']])

        # Concatenate metrics for all types (VS, VI, VC) for this unit
        if metrics_list:
            unit_final_df = pd.concat(metrics_list, ignore_index=True)
            # Sort the final DataFrame
            unit_final_df = unit_final_df.sort_values(by=['Año', 'Mes', 'violence type']).reset_index(drop=True)
            return unit_final_df
        else:
            return pd.DataFrame() # Return empty DataFrame if no metrics were calculated


    # --- Helper Function to Process a Single Processed Data File ---

    def process_file_and_calculate_metrics(file_path, output_dir, level_name, unit_name):
        """
        Reads a processed TSV file (containing VS/VI counts), calculates VC,
        Escalation, Intensity, and Previous State for VS, VI, and VC, and saves results.
        """
        print(f"\nProcessing {level_name}: {unit_name} from file: {os.path.basename(file_path)}")

        try:
            # Read the processed TSV file containing 'Año', 'Mes', 'violence type' (VS/VI), 'CaseCount'
            df_counts_vsvi = pd.read_csv(file_path, sep='\t')

            # Ensure required columns are present
            required_cols = ['Año', 'Mes', 'violence type', 'CaseCount']
            if not all(col in df_counts_vsvi.columns for col in required_cols):
                print(f"Error: Missing required columns in {os.path.basename(file_path)}: {required_cols}. Skipping.")
                return # Exit function

            # Ensure data types are correct and drop NaNs in essential columns
            df_counts_vsvi['Año'] = pd.to_numeric(df_counts_vsvi['Año'], errors='coerce')
            df_counts_vsvi['Mes'] = pd.to_numeric(df_counts_vsvi['Mes'], errors='coerce')
            df_counts_vsvi['CaseCount'] = pd.to_numeric(df_counts_vsvi['CaseCount'], errors='coerce').fillna(0).astype(int)
            df_counts_vsvi = df_counts_vsvi.dropna(subset=required_cols).copy()

            # Ensure the DataFrame contains only VS and VI types before calculating VC
            # Filter out any unexpected violence types if present, but keep VS and VI
            valid_vsvi_df = df_counts_vsvi[df_counts_vsvi['violence type'].isin(['VS', 'VI'])].copy()

            if valid_vsvi_df.empty:
                 print(f"Warning: No valid VS or VI data found in {os.path.basename(file_path)} to calculate metrics. Skipping.")
                 return


            # Calculate VC (VS + VI) for the unit from the existing VS and VI counts
            # Need to pivot the valid VS/VI data to sum correctly
            # Use dropna=False to keep all Year-Month combinations even if one type is missing for a month
            pivot_for_vc = valid_vsvi_df.pivot_table(
                index=['Año', 'Mes'],
                columns='violence type',
                values='CaseCount',
                fill_value=0,
                dropna=False # Keep all index values
            )
            # Calculate VC by summing VS and VI columns from the pivot table
            pivot_for_vc['VC'] = pivot_for_vc.get('VS', 0) + pivot_for_vc.get('VI', 0)

            # Unpivot VC back to long format
            vc_df = pivot_for_vc[['VC']].stack().reset_index(name='CaseCount')
            vc_df['violence type'] = 'VC'

            # Combine VS, VI, and VC dataframes for this unit for metric calculation
            # Start with the original valid VS/VI data and concatenate the calculated VC data
            combined_unit_counts_df = pd.concat([valid_vsvi_df, vc_df], ignore_index=True)
            # Sort chronologically and by violence type
            combined_unit_counts_df = combined_unit_counts_df.sort_values(by=['Año', 'Mes', 'violence type']).reset_index(drop=True)


            # Calculate metrics and state for the unit (VS, VI, VC) using the helper function
            unit_final_df = calculate_metrics_and_state_for_types(combined_unit_counts_df) # Use the updated function

            # Save the results
            if not unit_final_df.empty:
                # Generate the output filename: unit name (lowercase, no spaces) + "_cases_metrics.tsv"
                output_filename = unit_name.lower().replace(" ", "") + "_cases_metrics.tsv"
                output_path = os.path.join(output_dir, output_filename)

                try:
                    unit_final_df.to_csv(output_path, sep='\t', index=False)
                    print(f"Saved metrics for {level_name}: {unit_name} to {output_filename}")
                except Exception as e:
                    print(f"Error saving metrics for {level_name}: {unit_name} to {output_filename}: {e}")
            else:
                 print(f"No metrics calculated for {level_name}: {unit_name}. Skipping save.")

        except Exception as e:
            print(f"An unexpected error occurred during processing file {os.path.basename(file_path)}: {e}")
            # Return empty DataFrame on exception
            return pd.DataFrame()


    # --- Process Each Level ---

    # Define the levels and their corresponding input/output directories
    levels_info = [
        {'name': 'Country', 'input_dir': processed_data_country_dir, 'output_subdir': 'country'},
        {'name': 'Department', 'input_dir': processed_data_dept_dir, 'output_subdir': 'department'},
        {'name': 'Region', 'input_dir': processed_data_region_dir, 'output_subdir': 'region'}
    ]

    for level in levels_info:
        level_name = level['name']
        input_dir = level['input_dir']
        output_subdir = level['output_subdir']
        output_dir = os.path.join(base_results_dir, output_subdir)

        print(f"\n--- Processing {level_name} Level ---")

        # Create the output directory for this level if it doesn't exist
        os.makedirs(output_dir, exist_ok=True)
        print(f"Ensured output directory exists: {output_dir}")

        # Check if the input directory exists
        if not os.path.exists(input_dir):
            print(f"Error: Processed data directory not found for {level_name}: {input_dir}. Skipping this level.")
            continue # Skip to the next level

        # List all TSV files in the input directory for this level
        processed_files = [f for f in os.listdir(input_dir) if f.endswith('.tsv')] # Assuming filenames end with _cases.tsv

        if not processed_files:
            print(f"Warning: No processed TSV files found in {input_dir}. Skipping {level_name} processing.")
        else:
            print(f"Found {len(processed_files)} processed {level_name} files. Processing each...")
            for filename in processed_files:
                file_path = os.path.join(input_dir, filename)
                # Extract unit name from filename (e.g., 'colombia', 'antioquia', 'pacifica')
                # Assumes filename format is like 'unitname_cases.tsv'
                unit_name = filename.replace("_cases.tsv", "")

                # Process the file, calculate metrics and state, and get the result DataFrame
                unit_metrics_df = process_file_and_calculate_metrics(file_path, output_dir, level_name, unit_name)

                # Saving is handled inside process_file_and_calculate_metrics now.
                # We could add a check here if needed, but the helper already saves.
                # if not unit_metrics_df.empty:
                #     print(f"Successfully processed {level_name}: {unit_name}. File saved by helper function.")
                # else:
                #      print(f"Processing {level_name}: {unit_name} resulted in an empty DataFrame.")


        print(f"\n{level_name}-level metrics calculation and saving complete.")

print("\n--- Overall Intensity and Escalation Calculation Process Finished ---")



--- Calculating and Saving Metrics from Processed Data ---

--- Processing Country Level ---
Ensured output directory exists: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/intensity & escalation/cases/country
Found 1 processed Country files. Processing each...

Processing Country: 1958_2022_cases_country.tsv from file: 1958_2022_cases_country.tsv
Saved metrics for Country: 1958_2022_cases_country.tsv to 1958_2022_cases_country.tsv_cases_metrics.tsv

Country-level metrics calculation and saving complete.

--- Processing Department Level ---
Ensured output directory exists: /Users/diegohernandez/Documents/GitHub/VS_VI_Source_Code/Scripts/../Results/intensity & escalation/cases/department
Found 35 processed Department files. Processing each...

Processing Department: meta from file: meta_cases.tsv
Saved metrics for Department: meta to meta_cases_metrics.tsv

Processing Department: atlantico from file: atlantico_cases.tsv
Saved metrics for Department: atlanti