<h2><center><b><i>Cluster bomb</b></i>: Uncovering Patterns in Terrorist Group Beliefs and Attacks</center></h2>

#### **COM-480: Data Visualization**

**Team**: Alexander Sternfeld, Silvia Romanato & Antoine Bonnet

**Dataset**: [Global Terrorism Database (GTD)](https://www.start.umd.edu/gtd/) 

**Additional dataset**: [Profiles of Perpetrators of Terrorism in the United States (PPTUS)](https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl%3A1902.1/17702)

## **Geographical analysis**
 

In [None]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

from load_data import *

pd.set_option('display.max_columns', None)

GTD = load_GTD()
PPTUS_data, PPTUS_sources = load_PPTUS()

## Prepare the datasets

In [None]:
df_orig = GTD.copy()
df_orig.head()

df_ppt_DATA = PPTUS_data.copy()
df_ppt_SOURCES = PPTUS_sources.copy()

In [None]:
# Rename some of the columns
df_ppt_DATA.rename(columns={'DOM_I': 'dominant_ideology', 'I_ETHNO': 'ethno_nationalist',  'I_REL': 'religious', 'I_RACE':  'racist',
                            'I_LEFT': 'extreme_left' , 'I_RIGHT':  'extreme_right', 'G_POL_1':  'politic_reasons', 'G_SOC_1':  'social_reasons',
                            'G_ECO_1': 'economic_reasons', 'G_REL_1':  'religious_reasons'}, inplace=True)


In [None]:
# Merge PPTUS and GTD 
df = df_ppt_DATA[['ORGNAME', 'dominant_ideology']].merge(df_orig, left_on='ORGNAME', right_on='gname', how= 'right')
print(df.shape)

df = df[df['dominant_ideology'].notnull()]
print(df.shape)

## MAP OF THE DISTRIBUTION OF THE ATTACKS PER GROUPS OVER THE YEAR

- You choose the group/year and you can display the distribution over the globe.
- Can be done for the top 30 groups and include a slide for the group.
- Can be done for all the groups over the years
- PROBLEMS: HOW TO DISPLAY THE GROUPS AND IDENTIFY THEM, SHOULD WE DO A SELECTION? <br>
<br>

The example where  you choose the year and the color represents the dominant ideolgy.

THE DOMINANT IDEOLOGY CATEOGRIES: <br>
-99= Uncertainty/conflicting information exists in available data<br> 
1= Extreme Right Wing (including all racist ideologies)<br>
2= Extreme Left Wing<br>
3= Religious<br>
4= Ethno-nationalist/Separatist<br>
5= Single Issue<br>

In [None]:
fig = px.scatter_geo(df, lat = 'latitude', lon = 'longitude', color="dominant_ideology", animation_frame = 'iyear', projection="natural earth")
fig.show()

the example done for the top 30 groups:

In [None]:
top_groups = df['gname'].value_counts().head(30).index
df_top = df[df['gname'].isin(top_groups)]

In [None]:
fig = px.scatter_geo(df_top, lat = 'latitude', lon = 'longitude', color="iyear", animation_frame = 'gname', projection="natural earth")

fig.show()

## MAP OF THE DISTRIBUTION PER IDEOLOGY
- Not over the years because it’s not very informative, but we can take the dominant ideology category that is found in the PPT_US dataset and plot the distribution over the map.
- PROBLEM: from 214666 datapoint we reduce them to 7131.

In [None]:
df_orig.gname

## MAP THE FLOW OF OF THE GROUP ATTACKS
- There is the Location of Headquarters in the PPT_US map and we plot the line.
- PROBLEM: little data points.


In [None]:
loc_cols = [col for col in df_ppt_DATA.columns if col.startswith('LOC_HQ')]
df_ppt_DATA.LOC_HQ_COUNTRY_1.unique()

In [None]:
df_ppt_DATA.LOC_HQ_COUNTRY_1
# REPLACE -99 VALUES WITH NAN
df_ppt_DATA[loc_cols] = df_ppt_DATA[loc_cols].replace(-99, np.nan).replace(4, 'Afghanistan').replace(228, 'Yemen').replace(217, 'United States').replace(110, 'Lebanon').replace(102, 'Jordan')
df_ppt_DATA[loc_cols] = df_ppt_DATA[loc_cols].replace(603, 'United Kingdom').replace(95, 'Iraq').replace(69, 'France').replace(153, 'Pakistan').replace(87, 'Haiti')
df_ppt_DATA.LOC_HQ_COUNTRY_1.unique()

In [None]:
# import countries.csv as a df where you can find the coordinates of the centroid  of a state
df_countries = pd.read_csv('data/countries.csv')
df_countries.rename(columns={'longitude': 'hqlon', 'latitude': 'hqlat'}, inplace=True)

In [None]:
df_countries = df_ppt_DATA[['LOC_HQ_COUNTRY_1']].merge(df_countries, left_on='LOC_HQ_COUNTRY_1', right_on='COUNTRY', how='left')[['LOC_HQ_COUNTRY_1', 'hqlon', 'hqlat']]
df_ppt_DATA = df_countries.merge(df_ppt_DATA, on='LOC_HQ_COUNTRY_1', how = 'inner')
df = df_ppt_DATA[['ORGNAME', 'LOC_HQ_COUNTRY_1', 'hqlon', 'hqlat']].merge(df, left_on='ORGNAME', right_on='gname', how= 'inner')
df.head()

In [None]:
# get dataframe where hqlon is not null
df = df[df['hqlon'].notnull()]
df = df[df['longitude'].notnull()]

df.reset_index(inplace=True, drop=True)

In [None]:
"""fig.add_trace(go.Scattergeo(
    locationmode = 'USA-states',
    lon = df['longitude'],
    lat = df['latitude'],
    #hoverinfo = 'text',
    #text = df_airports['airport'],
    mode = 'markers',
    marker = dict(
        size = 2,
        color = 'rgb(255, 0, 0)',
        line = dict(
            width = 3,
            color = 'rgba(68, 68, 68, 0)'
        )
    )))"""

In [None]:
# reduce the df only to dominant ideology == 1
df_dom1 = df[df['dominant_ideology'] == 2]
df_dom1.reset_index(inplace=True, drop=True)

In [None]:
import plotly.graph_objects as go

fig = go.Figure()

flight_paths = []
for i in range(len(df_dom1.hqlon)):
    fig.add_trace(
        go.Scattergeo(
            lon = [df_dom1['hqlon'][i], df_dom1['longitude'][i]],
            lat = [df_dom1['hqlat'][i], df_dom1['latitude'][i]],
            mode = 'lines',
            line = dict(width = 1,color = 'red'),
            #opacity = float(df_flight_paths['cnt'][i]) / float(df_flight_paths['cnt'].max()),
        )
    )

fig.update_layout(
    title_text = 'The lines between the HQ of the attack and the attack location for dominant ideology extreme left wing',
    showlegend = False,
    geo = dict(
        scope = 'world',
        projection_type = 'equirectangular',
        showland = True,
        landcolor = 'rgb(243, 243, 243)',
        countrycolor = 'rgb(204, 204, 204)',
        showcountries=True,
    ),
)

fig.show() 