# Purpose

### Visualise the Geographical Distribution of Incidents
Through interactive maps, we intend to plot the locations of violent incidents, providing a visual understanding of the hotspots and geographical spread of violence within Colombia. This visualization will help in identifying regions with heightened levels of violence, offering insights into areas that may require increased attention or intervention.

### Analyze the Data to Uncover Patterns and Trends 
Beyond mere visualization, the notebook seeks to delve deeper into the dataset to analyze patterns, trends, and correlations in the violent incidents. This includes examining the frequency of incidents over time, the types and sub-types of violence, the actors involved, and the impact in terms of fatalities. By doing so, we aim to understand the dynamics of violence in Colombia, identifying potential drivers and the characteristics of the most affected regions.

### Inform Policy and Intervention Strategies
The ultimate goal of this analysis is not just to document and visualize the incidents of violence but to extract actionable insights that can inform policy-making, conflict resolution, and intervention strategies. By identifying the most affected areas and understanding the nature of violence, stakeholders can better allocate resources, design targeted interventions, and work towards reducing violence and its impact on communities in Colombia.

## Load Dataset

In [1]:
#!pip install pandas numpy matplotlib seaborn plotly geopandas dash shapely

In [2]:
import pandas as pd  
import numpy as np  

import matplotlib.pyplot as plt  
import seaborn as sns 
import plotly.express as px  

import geopandas as gpd  

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [3]:
df = pd.read_csv('acled-IISSTestdata.csv')
df.head()

Unnamed: 0,lat,lon,data_id,event_date,year,event_type,sub_event_type,actor1,actor2,region,country,admin1,lat.1,lon.1,fatalities
0,1.4792,-75.4364,8643511,12-Nov-21,2021,Violence against civilians,Attack,Unidentified Armed Group (Colombia),Civilians (Colombia),South America,Colombia,Caqueta,1.4792,-75.4364,1
1,7.8939,-72.5078,8643845,12-Nov-21,2021,Violence against civilians,Attack,Police Forces of Colombia (2018-),Civilians (Colombia),South America,Colombia,Norte de Santander,7.8939,-72.5078,0
2,6.2246,-77.4034,8643619,11-Nov-21,2021,Battles,Armed clash,Military Forces of Colombia (2018-),Gulf Clan,South America,Colombia,Choco,6.2246,-77.4034,1
3,3.0161,-76.4854,8643723,11-Nov-21,2021,Explosions/Remote violence,Grenade,Unidentified Armed Group (Colombia),Civilians (Colombia),South America,Colombia,Cauca,3.0161,-76.4854,0
4,6.2246,-77.4034,8643501,10-Nov-21,2021,Battles,Armed clash,Gulf Clan,Police Forces of Colombia (2018-),South America,Colombia,Choco,6.2246,-77.4034,2


In [4]:
gdf  = gpd.read_file('acled-IISSTestdata.geojson')
gdf.head()

Unnamed: 0,data_id,event_date,year,event_type,sub_event_type,actor1,actor2,region,country,admin1,lat,lon,fatalities,geometry
0,8643511,12-Nov-21,2021,Violence against civilians,Attack,Unidentified Armed Group (Colombia),Civilians (Colombia),South America,Colombia,Caqueta,1.4792,-75.4364,1,POINT (-75.43640 1.47920)
1,8643845,12-Nov-21,2021,Violence against civilians,Attack,Police Forces of Colombia (2018-),Civilians (Colombia),South America,Colombia,Norte de Santander,7.8939,-72.5078,0,POINT (-72.50780 7.89390)
2,8643619,11-Nov-21,2021,Battles,Armed clash,Military Forces of Colombia (2018-),Gulf Clan,South America,Colombia,Choco,6.2246,-77.4034,1,POINT (-77.40340 6.22460)
3,8643723,11-Nov-21,2021,Explosions/Remote violence,Grenade,Unidentified Armed Group (Colombia),Civilians (Colombia),South America,Colombia,Cauca,3.0161,-76.4854,0,POINT (-76.48540 3.01610)
4,8643501,10-Nov-21,2021,Battles,Armed clash,Gulf Clan,Police Forces of Colombia (2018-),South America,Colombia,Choco,6.2246,-77.4034,2,POINT (-77.40340 6.22460)


## Dataset Overview

First we must gain a comprehensive overview of the data structure

In [5]:
df.info()
gdf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 799 entries, 0 to 798
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   lat             799 non-null    float64
 1   lon             799 non-null    object 
 2   data_id         799 non-null    int64  
 3   event_date      799 non-null    object 
 4   year            799 non-null    int64  
 5   event_type      799 non-null    object 
 6   sub_event_type  799 non-null    object 
 7   actor1          799 non-null    object 
 8   actor2          792 non-null    object 
 9   region          799 non-null    object 
 10  country         799 non-null    object 
 11  admin1          799 non-null    object 
 12  lat.1           799 non-null    float64
 13  lon.1           799 non-null    object 
 14  fatalities      799 non-null    int64  
dtypes: float64(2), int64(3), object(10)
memory usage: 93.8+ KB
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 799 entr

Calculate basic statistics to get an understanding of the data we are working with

Identify the Range of Events (event_type): By examining the unique values in the event_type column, we gain insight into the broad categories of violence documented in the dataset. This step is crucial for understanding the nature of the conflicts and planning subsequent analyses focusing on specific types of violence.

In [6]:
df['event_type'].unique()

array(['Violence against civilians', 'Battles',
       'Explosions/Remote violence'], dtype=object)

Understand Specific Violence Mechanisms (sub_event_type): Delving into the sub_event_type column, we identify the specific mechanisms or tactics of violence. This differentiation allows us to explore patterns, such as the prevalence of certain tactics over others, and their implications on the civilian population and conflict dynamics.

In [7]:
df['sub_event_type'].unique()

array(['Attack', 'Armed clash', 'Grenade',
       'Abduction/forced disappearance', 'Remote explosive/landmine/IED',
       'Air/drone strike', 'Sexual violence'], dtype=object)

Catalog the Actors Involved (actor1 and actor2): Understanding who is involved in the violence is critical for any conflict analysis. By listing the unique values in the actor1 and actor2 columns, we aim to identify the key perpetrators and victims. This exploration can reveal the main actors driving the conflict and their targets, offering insights into the conflict's social and political dimensions.

In [8]:
df['actor1'].unique()

array(['Unidentified Armed Group (Colombia)',
       'Police Forces of Colombia (2018-)',
       'Military Forces of Colombia (2018-)', 'Gulf Clan',
       'FARC Dissident', 'FARC Dissident - 33rd Front', 'La Local Gang',
       'ELN', 'CDF', 'FARC Dissident - 28th Front',
       'FARC Dissident - Dagoberto Ramos',
       'FARC Dissident - Carlos Patino Front',
       'Private Security Forces (Colombia)',
       'FARC Dissident - Gentil Duarte Front',
       'FARC Dissident - Carolina Ramirez Front',
       'FARC Dissident - Jaime Martinez', 'FARC Dissident - 10th Front',
       'FARC Dissident - 36th Front', 'FARC Dissident - 30th Front',
       'Kogui Indigenous Militia (Colombia)',
       'FARC Dissident - Urias Rondon',
       'Police Forces of Colombia (2018-) DIRAN',
       'FARC Dissident - Mobile Column', 'FARC Dissident - 37th Front',
       'FARC Dissident - Oliver Sinisterra Front',
       'FARC Dissident - 18th Front',
       'FARC Dissident - Edison Cinco Mil Front',
     

In [9]:
df['actor2'].unique()

array(['Civilians (Colombia)', 'Gulf Clan',
       'Police Forces of Colombia (2018-)',
       'FARC Dissident - Mobile Column',
       'Military Forces of Colombia (2018-)', 'FARC Dissident',
       'FARC Dissident - 36th Front',
       'FARC Dissident - Franco Benavides Mobile Column',
       'FARC Dissident - Carlos Patino Front', 'ELN', nan,
       'Civilians (Venezuela)', 'FARC Dissident - Second Marquetalia',
       'CDF', 'Unidentified Armed Group (Colombia)',
       'FARC Dissident - Jaime Martinez',
       'FARC Dissident - Carolina Ramirez Front',
       'FARC Dissident - 1st Front', 'Private Security Forces (Colombia)',
       'El Tren de Aragua Gang', 'GUP', 'Civilians (Haiti)',
       'FARC Dissident - Ivan Rios',
       'FARC Dissident - Gentil Duarte Front', 'Los Caparros Gang',
       'FARC Dissident - 33rd Front', 'FARC Dissident - Dagoberto Ramos',
       'Los Pelusos Gang', 'Police Forces of Colombia (2018-) UNP',
       'Police Forces of Colombia (2018-) ESMAD', 'Lo

Highlight the Geographical Focus (admin1): The admin1 column gives us a geographical breakdown of where the incidents have occurred. By identifying the unique regions within this field, we can understand the geographical distribution of violence, which is instrumental in mapping conflict hotspots and understanding regional dynamics.

In [10]:
df['admin1'].unique()

array(['Caqueta', 'Norte de Santander', 'Choco', 'Cauca', 'Guaviare',
       'Valle del Cauca', 'Cordoba', 'Antioquia', 'Cundinamarca',
       'Putumayo', 'Casanare', 'Magdalena', 'La Guajira', 'Caldas',
       'Arauca', 'Cesar', 'Guainia', 'Bogota, D.C.', 'Santander', 'Meta',
       'Atlantico', 'Narino', 'Bolivar', 'Quindio', 'Huila', 'Risaralda',
       'Boyaca', 'Tolima', 'Sucre'], dtype=object)

## Data Cleaning 

In [11]:
df = df.dropna(subset=['lat', 'lon'])
gdf = gdf.dropna(subset=['lat', 'lon'])

drop columns not relevant to the the task

In [12]:
df.drop('year', axis=1, inplace=True)
df.drop('region', axis=1, inplace=True)
df.drop('country', axis=1, inplace=True)
df.head()

Unnamed: 0,lat,lon,data_id,event_date,event_type,sub_event_type,actor1,actor2,admin1,lat.1,lon.1,fatalities
0,1.4792,-75.4364,8643511,12-Nov-21,Violence against civilians,Attack,Unidentified Armed Group (Colombia),Civilians (Colombia),Caqueta,1.4792,-75.4364,1
1,7.8939,-72.5078,8643845,12-Nov-21,Violence against civilians,Attack,Police Forces of Colombia (2018-),Civilians (Colombia),Norte de Santander,7.8939,-72.5078,0
2,6.2246,-77.4034,8643619,11-Nov-21,Battles,Armed clash,Military Forces of Colombia (2018-),Gulf Clan,Choco,6.2246,-77.4034,1
3,3.0161,-76.4854,8643723,11-Nov-21,Explosions/Remote violence,Grenade,Unidentified Armed Group (Colombia),Civilians (Colombia),Cauca,3.0161,-76.4854,0
4,6.2246,-77.4034,8643501,10-Nov-21,Battles,Armed clash,Gulf Clan,Police Forces of Colombia (2018-),Choco,6.2246,-77.4034,2


In [13]:
gdf.drop('year', axis=1, inplace=True)
gdf.drop('region', axis=1, inplace=True)
gdf.drop('country', axis=1, inplace=True)

This includes duplicate columns

In [14]:
(df['lat'] == df['lat.1']).all()

True

In [15]:
(df['lon'] == df['lon.1']).all()

True

In [16]:
df.drop('lat.1', axis=1, inplace=True)
df.drop('lon.1', axis=1, inplace=True)

identiy missing values

In [17]:
missing_values = df.isnull().sum()
print(missing_values)

lat               0
lon               0
data_id           0
event_date        0
event_type        0
sub_event_type    0
actor1            0
actor2            7
admin1            0
fatalities        0
dtype: int64


In [18]:
missing_actor2_rows = df[df['actor2'].isnull()]

In [19]:
df['actor2'].fillna('Unknown', inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['actor2'].fillna('Unknown', inplace=True)


Checking for duplicate rows

In [20]:
df.duplicated().sum()
gdf.duplicated().sum()

0

In [21]:
#df.to_json('acled-IISSTestdata-cleaned.geojson')

In [33]:
# Import Dash components, Plotly, and Pandas
import dash
from dash import dcc, html
from dash.dependencies import Input, Output, State
import plotly.express as px
import pandas as pd
import json
import geopandas as gpd
from shapely.geometry import Point

colombia_gdf = gpd.read_file("acled-IISSTestdata.geojson")
colombia_json = json.loads(colombia_gdf.to_json())

app = dash.Dash(__name__)

actors = ['All'] + sorted(df['actor1'].unique().tolist() + df['actor2'].unique().tolist())
event_types = ['All'] + sorted(df['event_type'].unique().tolist())
sub_event_types = ['All'] + sorted(df['sub_event_type'].unique().tolist())
color_options = ['actor1', 'actor2', 'event_type', 'sub_event_type', 'admin1', 'fatalities']

# Updated app layout to include event and sub-event type dropdowns
app.layout = html.Div([
    html.Div([
        html.Div([  # Wrap each label-dropdown pair in its own div
            html.Label('Actor 1', style={'marginBottom': '5px'}),
            dcc.Dropdown(
                id='actor1-dropdown',
                options=[{'label': actor, 'value': actor} for actor in actors],
                value='All',
                style={"width": "200px"}
            )
        ], style={'marginRight': '5px'}),

        html.Div([  # Repeat the pattern for each dropdown
            html.Label('Actor 2', style={'marginBottom': '5px'}),
            dcc.Dropdown(
                id='actor2-dropdown',
                options=[{'label': actor, 'value': actor} for actor in actors],
                value='All',
                style={"width": "200px"}
            )
        ], style={'marginRight': '5px'}),

        html.Div([
            html.Label('Event Type', style={'marginBottom': '5px'}),
            dcc.Dropdown(
                id='event-dropdown',
                options=[{'label': event, 'value': event} for event in event_types],
                value='All',
                style={"width": "200px"}
            )
        ], style={'marginRight': '5px'}),

        html.Div([
            html.Label('Sub-Event Type', style={'marginBottom': '5px'}),
            dcc.Dropdown(
                id='sub-event-dropdown',
                options=[{'label': sub_event, 'value': sub_event} for sub_event in sub_event_types],
                value='All',
                style={"width": "200px"}
            )
        ], style={'marginRight': '5px'}),

        html.Div([
            html.Label('Color Scheme', style={'marginBottom': '5px'}),
            dcc.Dropdown(
                id='color-scheme-dropdown',
                options=[{'label': option, 'value': option} for option in color_options],
                value='event_type',
                style={"width": "200px"}
            )
        ], style={'marginRight': '10px'})
    ], style={'display': 'flex', 'flex-wrap': 'wrap', 'margin-top': '10px', 'margin-bottom': '10px'}),

    dcc.Graph(id='incident-map'),
   html.Div([
    dcc.Checklist(
        id='time-filter-checklist',
        options=[
            {'label': 'Show entire year', 'value': 'ALL'}
        ],
        value=['ALL'],  # Default to showing the entire year
        style={'margin': '10px'}
    ),
    html.Div([  # Wrap the slider within a div for styling
        dcc.Slider(
            id='month-slider',
            min=1,
            max=12,
            step=1,
            marks={i: f'{i}' for i in range(1, 13)},  # Label months from 1 to 12
            value=1,
            disabled=True,  # Initially disabled
        )
    ], style={'width': '80%', 'margin': '0 auto'}),  # Apply the styling here
    ], style={'textAlign': 'center'}),
    dcc.Graph(id='detail-graph')
])


# Updated callback to include event and sub-event type filters
@app.callback(
    [Output('incident-map', 'figure'),
     Output('detail-graph', 'figure')],
    [Input('actor1-dropdown', 'value'),
     Input('actor2-dropdown', 'value'),
     Input('event-dropdown', 'value'),
     Input('sub-event-dropdown', 'value'),
     Input('color-scheme-dropdown', 'value'),
     Input('time-filter-checklist', 'value'),
     Input('month-slider', 'value')]
)

@app.callback(
    Output('month-slider', 'disabled'),
    [Input('time-filter-checklist', 'value')]
)
def toggle_slider(time_filter):
    return 'ALL' in time_filter

    
def update_content(selected_actor1, selected_actor2, selected_event, selected_sub_event, selected_color_scheme, time_filter, selected_month):
    filtered_df = df.copy()

    # Apply filters
    if selected_actor1 != 'All':
        filtered_df = filtered_df[(filtered_df['actor1'] == selected_actor1) | (filtered_df['actor2'] == selected_actor1)]
    if selected_actor2 != 'All':
        filtered_df = filtered_df[(filtered_df['actor1'] == selected_actor2) | (filtered_df['actor2'] == selected_actor2)]
    if selected_event != 'All':
        filtered_df = filtered_df[filtered_df['event_type'] == selected_event]
    if selected_sub_event != 'All':
        filtered_df = filtered_df[filtered_df['sub_event_type'] == selected_sub_event]

    if 'ALL' not in time_filter:
        filtered_df = filtered_df[filtered_df['event_date'].dt.month == selected_month]
        
    # Adjust sizes: Calculate marker sizes to be proportional to the square root of fatalities + 1
    # This makes the area of the marker proportional to the number of fatalities
    filtered_df['marker_size'] = (np.sqrt(filtered_df['fatalities'] + 1) * 5) # Multiply by a factor for visibility

    # Create the map
    if filtered_df.empty:
        fig = px.scatter_mapbox(lat=[], lon=[])  # Creates an empty map
    else:
        fig = px.scatter_mapbox(filtered_df,
                                lat="lat",
                                lon="lon",
                                color=selected_color_scheme, 
                                size="marker_size",  # Use the adjusted marker size
                                hover_name="actor1",
                                hover_data=["actor2", "event_date", "sub_event_type", "admin1", "fatalities"],
                                zoom=5,
                                center={"lat": 4.5709, "lon": -74.2973})

    fig.update_layout(mapbox_style="open-street-map")
    fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0}, title="Violent Incidents in Colombia")

    if selected_color_scheme != 'fatalities':
        # For categorical data
        count_series = filtered_df[selected_color_scheme].value_counts()
        bar_fig = px.bar(x=count_series.index, y=count_series.values, labels={'x': selected_color_scheme, 'y': 'Count'})
    else:
        # For numerical data like 'fatalities', you may want to bin the data or choose another approach
        bar_fig = px.histogram(filtered_df, x='fatalities', nbins=20, labels={'fatalities': 'Fatalities Count'})
    
    bar_fig.update_xaxes(dtick=1) 
    bar_fig.update_layout(title=f"Distribution of {selected_color_scheme}")

    # Return both the map and the bar graph
    return fig, bar_fig

if __name__ == '__main__':
    app.run_server(debug=True)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
TypeError: toggle_slider() takes 1 positional argument but 5 were given

