<h2><center><b><i>Cluster bomb</b></i>: Uncovering Patterns in Terrorist Group Beliefs and Attacks</center></h2>

#### **COM-480: Data Visualization**

**Team**: Alexander Sternfeld, Silvia Romanato & Antoine Bonnet

**Dataset**: [Global Terrorism Database (GTD)](https://www.start.umd.edu/gtd/) 

**Additional dataset**: [Profiles of Perpetrators of Terrorism in the United States (PPTUS)](https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl%3A1902.1/17702)

## **Terrorist ideologies**

In [1]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from load_data import *

pd.set_option('display.max_columns', None)

GTD = load_GTD()
PPTUS_data, PPTUS_sources = load_PPTUS()

GTD pickle file found, loading...
PPTUS pickle files found, loading...


In [2]:
PPTUS_data.rename(columns={'DOM_I': 'dominant_ideology', 'I_ETHNO': 'ethno_nationalist',  'I_REL': 'religious', 'I_RACE':  'racist',
                            'I_LEFT': 'extreme_left' , 'I_RIGHT':  'extreme_right', 'G_POL_1':  'politic_reasons', 'G_SOC_1':  'social_reasons',
                            'G_ECO_1': 'economic_reasons', 'G_REL_1':  'religious_reasons'}, inplace=True)

In [3]:
print('The shape of PPTUS_data is: ', PPTUS_data.shape, '\nThe shape of PPTUS_sources is: ' ,PPTUS_sources.shape,'\nThe shape of GTD is:', GTD.shape)

The shape of PPTUS_data is:  (145, 428) 
The shape of PPTUS_sources is:  (928, 3) 
The shape of GTD is: (214666, 135)


merging the two datasets on the groups

In [4]:
# Merge PPTUS and GTD 
df = PPTUS_data[['ORGNAME', 'dominant_ideology']].merge(GTD, left_on='ORGNAME', right_on='gname', how= 'inner')
print('the number of attacks of which we know the organization ideology from the PPTUS dataframe are: ', df.shape[0], ', while before where: ', GTD.shape[0])
df.dominant_ideology

the number of attacks of which we know the organization ideology from the PPTUS dataframe are:  7131 , while before where:  214666


0       2
1       3
2       3
3       3
4       3
       ..
7126    2
7127    2
7128    2
7129    2
7130    5
Name: dominant_ideology, Length: 7131, dtype: int64

THE DOMINANT IDEOLOGY CATEGORIES: <br>
-99= Uncertainty/conflicting information exists in available data<br> 
1= Extreme Right Wing (including all racist ideologies)<br>
2= Extreme Left Wing<br>
3= Religious<br>
4= Ethno-nationalist/Separatist<br>
5= Single Issue<br>

In [5]:
# create five new dataframes, one for each ideology
df_ethno_nationalist = df[df['dominant_ideology'] == 4]
df_religious = df[df['dominant_ideology'] == 3]
df_extreme_right = df[df['dominant_ideology'] == 1]
df_extreme_left = df[df['dominant_ideology'] == 2]
df_single_issue = df[df['dominant_ideology'] == 5]
df_uncertain = df[df['dominant_ideology'] == -99]

print('The sum of the number of attacks of each ideology is equal to the total number of attacks: ', df_ethno_nationalist.shape[0] + df_religious.shape[0] + df_extreme_right.shape[0] + df_extreme_left.shape[0] + df_single_issue.shape[0] + df_uncertain.shape[0] == df.shape[0])

The sum of the number of attacks of each ideology is equal to the total number of attacks:  True


In [6]:
# rename the dominant_ideology row from the number to the ideology name in the df dataframe
df['dominant_ideology'] = df['dominant_ideology'].replace({1: 'extreme_right', 2: 'extreme_left', 3: 'religious', 4: 'ethno_nationalist', 5: 'single_issue', -99: 'uncertain'})

## Analysis of the ethno_nationalists
1. geographic analysis
2. time series of attacks over time
3. number of groups and the most important ones
4. 

In [None]:
fig1 = px.scatter_geo(df, lat = 'latitude', lon = 'longitude',animation_frame = 'dominant_ideology' , projection="natural earth")
fig1.update_layout(title = 'attacks')
fig1.show()

In [10]:
df_ethno_nationalist.columns

Index(['ORGNAME', 'dominant_ideology', 'eventid', 'iyear', 'imonth', 'iday',
       'approxdate', 'extended', 'resolution', 'country',
       ...
       'addnotes', 'scite1', 'scite2', 'scite3', 'dbsource', 'INT_LOG',
       'INT_IDEO', 'INT_MISC', 'INT_ANY', 'related'],
      dtype='object', length=137)

In [21]:
# do a time series graphh of the number of attacks per year for each ideology
df_ethno_nationalist['iyear'] = df_ethno_nationalist['iyear'].astype(int)

fig2 = px.line(df_ethno_nationalist.groupby('iyear').count().reset_index(), x = 'iyear', y = 'eventid')
fig2.update_layout(title = 'ethno_nationalist attacks')

df_religious['iyear'] = df_religious['iyear'].astype(int)

fig3 = px.line(df_religious.groupby('iyear').count().reset_index(), x = 'iyear', y = 'eventid')
fig3.update_layout(title = 'religious attacks')

df_extreme_left['iyear'] = df_extreme_left['iyear'].astype(int)

fig3 = px.line(df_extreme_left.groupby('iyear').count().reset_index(), x = 'iyear', y = 'eventid')
fig3.update_layout(title = 'etreme left attacks')

df_extreme_right['iyear'] = df_extreme_right['iyear'].astype(int)

fig3 = px.line(df_extreme_right.groupby('iyear').count().reset_index(), x = 'iyear', y = 'eventid')
fig3.update_layout(title = 'extreme right attacks')

df_single_issue['iyear'] = df_single_issue['iyear'].astype(int)

fig3 = px.line(df_single_issue.groupby('iyear').count().reset_index(), x = 'iyear', y = 'eventid')
fig3.update_layout(title = 'single issue attacks')

df_uncertain['iyear'] = df_uncertain['iyear'].astype(int)

fig3 = px.line(df_uncertain.groupby('iyear').count().reset_index(), x = 'iyear', y = 'eventid')
fig3.update_layout(title = 'uncertain attacks')

fig4 = make_subplots(rows=2, cols=3, subplot_titles=("ethno_nationalist attacks", "religious attacks", "extreme left attacks", "extreme right attacks", "single issue attacks", "uncertain attacks"))

fig4.add_trace(
    go.Scatter(x=df_ethno_nationalist.groupby('iyear').count().reset_index()['iyear'], y=df_ethno_nationalist.groupby('iyear').count().reset_index()['eventid'], name = 'ethno_nationalist'),
    row=1, col=1
)

fig4.add_trace(
    go.Scatter(x=df_religious.groupby('iyear').count().reset_index()['iyear'], y=df_religious.groupby('iyear').count().reset_index()['eventid'], name = 'religious'),
    row=1, col=2
)

fig4.add_trace(
    go.Scatter(x=df_extreme_left.groupby('iyear').count().reset_index()['iyear'], y=df_extreme_left.groupby('iyear').count().reset_index()['eventid'], name = 'extreme left'),
    row=2, col=1
)

fig4.add_trace(
    go.Scatter(x=df_extreme_right.groupby('iyear').count().reset_index()['iyear'], y=df_extreme_right.groupby('iyear').count().reset_index()['eventid'], name = 'extreme right'),
    row=2, col=2
)

fig4.add_trace(
    go.Scatter(x=df_single_issue.groupby('iyear').count().reset_index()['iyear'], y=df_single_issue.groupby('iyear').count().reset_index()['eventid'], name = 'single issue'),
    row=1, col=3
)

fig4.add_trace(
    go.Scatter(x=df_uncertain.groupby('iyear').count().reset_index()['iyear'], y=df_uncertain.groupby('iyear').count().reset_index()['eventid'], name = 'uncertain'),
    row=2, col=3
)

fig4.update_layout(height=600, width=1000, title_text="Subplots")
fig4.show()

In [22]:
# get the top 3 weapons used by each ideology
print(df_ethno_nationalist['weaptype1_txt'].value_counts().head(3))
print(df_religious['weaptype1_txt'].value_counts().head(3))
print(df_extreme_left['weaptype1_txt'].value_counts().head(3))
print(df_extreme_right['weaptype1_txt'].value_counts().head(3))
print(df_single_issue['weaptype1_txt'].value_counts().head(3))
print(df_uncertain['weaptype1_txt'].value_counts().head(3))

Explosives    1834
Firearms      1138
Incendiary     406
Name: weaptype1_txt, dtype: int64
Explosives    1420
Firearms       971
Unknown        231
Name: weaptype1_txt, dtype: int64
Explosives    274
Firearms       88
Incendiary     31
Name: weaptype1_txt, dtype: int64
Firearms      26
Explosives    21
Incendiary     9
Name: weaptype1_txt, dtype: int64
Incendiary    183
Explosives    178
Firearms       17
Name: weaptype1_txt, dtype: int64
Explosives    4
Incendiary    2
Firearms      1
Name: weaptype1_txt, dtype: int64


In [28]:
# diplay the number of killed and wounded people by each ideology per attack
print(df_ethno_nationalist[['nkill', 'nwound']].sum() / df_ethno_nationalist.shape[0])
print(df_religious[['nkill', 'nwound']].sum() / df_religious.shape[0])
print(df_extreme_left[['nkill', 'nwound']].sum() / df_extreme_left.shape[0])
print(df_extreme_right[['nkill', 'nwound']].sum() / df_extreme_right.shape[0])
print(df_single_issue[['nkill', 'nwound']].sum() / df_single_issue.shape[0])
print(df_uncertain[['nkill', 'nwound']].sum() / df_uncertain.shape[0])

nkill     0.577110
nwound    1.297854
dtype: float64
nkill      5.150257
nwound    16.180382
dtype: float64
nkill     1.283721
nwound    1.320930
dtype: float64
nkill     0.311475
nwound    1.950820
dtype: float64
nkill     0.216867
nwound    0.351807
dtype: float64
nkill     0.500
nwound    0.375
dtype: float64


In [33]:
# plot a time series of the nuumber of killed people annd wounded people by each ideology per year on the same graph with different lines per ideology

fig5 = make_subplots(rows=1, cols=1, subplot_titles=("ethno_nationalist attacks"))

fig5.add_trace(
    go.Scatter(x=df_ethno_nationalist.groupby('iyear').sum().reset_index()['iyear'], y=df_ethno_nationalist.groupby('iyear').sum().reset_index()['nkill'], name = 'ethno_nationalist'),
    row = 1, col = 1
)

fig5.add_trace(
    go.Scatter(x=df_religious.groupby('iyear').sum().reset_index()['iyear'], y=df_religious.groupby('iyear').sum().reset_index()['nkill'], name = 'religious'),
    row = 1, col = 1
)

fig5.add_trace(
    go.Scatter(x=df_extreme_left.groupby('iyear').sum().reset_index()['iyear'], y=df_extreme_left.groupby('iyear').sum().reset_index()['nkill'], name = 'extreme left'),
    row = 1, col = 1
)

fig5.add_trace(
    go.Scatter(x=df_extreme_right.groupby('iyear').sum().reset_index()['iyear'], y=df_extreme_right.groupby('iyear').sum().reset_index()['nkill'], name = 'extreme right'),
    row = 1, col = 1
)

fig5.add_trace(
    go.Scatter(x=df_single_issue.groupby('iyear').sum().reset_index()['iyear'], y=df_single_issue.groupby('iyear').sum().reset_index()['nkill'], name = 'single issue'),
    row = 1, col = 1
)

fig5.show()


In [39]:
to_save = df_ethno_nationalist[['ORGNAME', 'iyear']].groupby('iyear').count().reset_index()
# save the data to a csv file
to_save.to_csv('ethno_nationalist.csv', index = False)