<h2><center><b><i>Cluster bomb</b></i>: Uncovering Patterns in Terrorist Group Beliefs and Attacks</center></h2>

#### **COM-480: Data Visualization**

**Team**: Alexander Sternfeld, Silvia Romanato & Antoine Bonnet

**Dataset**: [Global Terrorism Database (GTD)](https://www.start.umd.edu/gtd/) 

**Additional dataset**: [Profiles of Perpetrators of Terrorism in the United States (PPTUS)](https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl%3A1902.1/17702)

## **Maps**
 
The goal of this notebook is to create the data underlying a world map of terrorist attacks. This data will then be used to generate an interactive map using [Kepler](https://kepler.gl), an open-source geospatial analysis tool for large-scale data sets. 

<center><p float="left">
  <img src="https://d1a3f4spazzrp4.cloudfront.net/kepler.gl/website/showcase/points-s.png" width="400" />
  <img src="https://d1a3f4spazzrp4.cloudfront.net/kepler.gl/website/showcase/lines-s.png" width="400" /> 
  <img src="https://d1a3f4spazzrp4.cloudfront.net/kepler.gl/website/showcase/hexagons-s.png" width="400" />
  <img src="https://d1a3f4spazzrp4.cloudfront.net/kepler.gl/website/showcase/heatmap-s.png" width="400" />
</p></center>


**Map 1**: A first map will display all locations of terrorist attacks from 1970 to today shown as dots on the world map. A time slider will be used to show the progression of terrorist attacks through time. 

**Map 2**: A second map will display arrows going from the base of operations of terrorist organizations to the location of terrorist attacks perpetrated by members of those organizations. Arrows will be colored depending on the ideology of each organization. 

## **Data pre-processing**

### **Map 1**: Terrorist attacks 

From the GTD, we first create an `attacks.csv` file that displays each attack's location, date, number of casualties, attack type, target type and terrorist group. This file will be used as the basis for our first map. 

In [None]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from load_data import *

pd.set_option('display.max_columns', None)

GTD = load_GTD()
PPTUS, _ = load_PPTUS()

MAP_DIR = os.path.join(DATA_DIR, 'maps')

In [None]:
# Create a datetime column by combining columns iyear, imonth, iday renaming them
GTD.rename(columns={'iyear': 'year', 'imonth': 'month', 'iday': 'day'}, inplace=True)
GTD['day'].replace(0, 1, inplace=True) # Some attacks only have year data, so we set them to January 1st 
GTD['month'].replace(0, 1, inplace=True)
GTD['datetime'] = pd.to_datetime(GTD[['year', 'month', 'day']], errors='coerce') + pd.Timedelta(hours=12)

# Keep only columns: datetime, country_txt, region_txt, latitude, longitude, nkill, attacktype1_txt, gname
GTD = GTD[['datetime', 'city', 'country_txt', 'region_txt', 'latitude', 'longitude', 'nkill', 'attacktype1_txt', 'gname', 'targtype1_txt', 'weaptype1_txt']]

# Drop rows with missing latitude, longitude
GTD = GTD.dropna(subset=['latitude', 'longitude'])

# Replace all Unknown values with nan
GTD.replace('Unknown', np.nan, inplace=True)

# Rename columns for display
GTD = GTD.rename(columns={
    'country_txt': 'Country', 
    'region_txt': 'Region', 
    'attacktype1_txt': 'Attack type',
    'nkill': 'Fatalities',
    'gname': 'Terrorist group', 
    'latitude': 'Latitude', 
    'longitude': 'Longitude', 
    'targtype1_txt': 'Target type', 
    'city': 'City', 
    'weaptype1_txt': 'Weapon used'})

# Store file in csv format 
GTD.to_csv(os.path.join(MAP_DIR, 'attacks.csv'), index=False)
GTD

### **Map 2**: Group to targets arcs

This second map will focus on terrorist groups by drawing arcs from the group's country of origin to the locations of their terrorist attacks. We will then be able to see the geographical distribution of terrorist attacks for each group. 

#### Manual labelling 

To do this, we manually labelled the origin country of the top 200 groups with largest amount of fatalities in the file `top_groups.csv`. During the annotation, here were a few of the decisions that were made concerning specific groups: 

- **Allied Democratic Forces (ADF)** originated in Uganda but expanded to DRC (labelled Uganda)

- **Sikh Extremists** (i.e. Khalistan movement) originates in Punjab region = Pakistan + India (labelled Pakistan)

- **Tamils** population is mainly distributed in India, but also in Sri Lanka (labelled India)

- **Kashmir** region crosses Pakistan, Indian and China (labelled India)

- **Basque Fatherland and Freedom (ETA)** is Basque which is principally Spanish but also French (labelled Spain)

- **Movement for Oneness and Jihad in West Africa** was active in both Algeria and Mali (labelled Algeria)

- **Naga** people originate from a region split between India and Myanmar (labelled India)

- The following groups were not affiliated to a particular location and so are removed from the maps: Muslim extremists, Islamist extremists, Separatists, Death Squad, Tribesmen, Muslim Militants, Gunmen, Narco-Terrorists, Rebels, Muslim Rebels, Muslims, Jihadi-inspired extremists, Protestant extremists, Right-Wing Death Squad, Communists, Muslim Separatists, Shia Muslim extremists, Sunni Muslim extremists, Mouhadine, Rebel Military Unit, Dissident Military Mmbrs of Northern Tribal Group, Moslem Activists, Anti-Government Guerrillas, Anti-Communist Vigilante Group, Muslim Fundamentalists, White supremacists/nationalists

In [None]:
# Find all group names in GTD sorted in decreasing number of total non-NA nkill
GTD_bis = load_GTD()
GTD_bis['gname'].replace('Unknown', np.nan).dropna()
top_groups = GTD_bis[GTD_bis['gname'] != 'Unknown'].groupby('gname')['nkill'].sum(numeric_only=True).dropna().sort_values(ascending=False).reset_index()[:232]

# Manual annotation of top 100 groups country of origin
manual_locations = ['Afghanistan', 'Iraq', 'Nigeria', 'Somalia', 'Peru',
                    'Sri Lanka', 'El Salvador', 'Yemen', 'Guatemala', 
                    'Pakistan', 'Nigeria', 'Colombia', 'Turkey', 'Philippines',
                    'Iraq', 'Rwanda', 'Afghanistan', 'Afghanistan', 'Yemen', 
                    'Uganda', 'Syria', 'Uganda', 'Pakistan', 'Angola', 'India', 
                    'Mozambique', np.nan, 'Egypt', 'China', 'Colombia', 'South Sudan', 
                    'Nicaragua', 'Ireland', 'Iraq', 'Ukraine', 'Russian Federation',
                    'India', 'Algeria', 'United States', 'Afghanistan', 'Lebanon', 
                    np.nan, 'Mali', np.nan, 'Pakistan', 'Mali', 'Rwanda', 'Algeria', 
                    'Palestine', 'Palestine', 'Philippines', 'Sri Lanka', 'Syria', 
                    'Spain', 'Central African Republic', np.nan, 'Congo DRC', 
                    'Ethiopia', 'Philippines', np.nan, np.nan, 'Nepal', 'Ethiopia', 
                    'South Africa', 'India', 'Philippines', 'Afghanistan', 'Algeria', 
                    'Nicaragua', 'Sudan', 'Guatemala', 'Peru', 'Libya', 'Algeria', 
                    'Sierra Leone', 'Sudan', 'Colombia', 'Palestine', 'Rwanda', 'Iran', 
                    'India', np.nan, 'Congo DRC', 'Yemen', np.nan, 
                    'Egypt', 'India', 'Bangladesh', 'Afghanistan', 'Nicaragua', 
                    'Syria', 'Central African Republic', 'Libya', 'Russian Federation', 'India', 
                    np.nan, 'India', np.nan, 'Colombia', 'Philippines', 'China', np.nan,
                    'South Sudan', 'Pakistan', 'India', 'Northern Ireland', 'Ethiopia', 
                    'Cambodia', np.nan, 'South Africa', np.nan, 'Congo DRC', 
                    'Colombia', 'Indonesia', 'Iran', 'Palestine', 'Colombia', 'Palestine', 
                    'Central African Republic', 'Egypt', 'Syria', 'Yemen', np.nan, 'Syria', 
                    'Palestine', 'Nigeria', 'Lebanon', 'Syria', 'Guatemala', 'Palestine', 
                    np.nan, 'Mozambique', 'India', 'Tanzania', 'Sudan', 'India', np.nan,
                    'Pakistan', 'Russian Federation', np.nan, 'India', 'Ethiopia', 'Palestine', 'Central African Republic',
                    'Ethiopia', 'Pakistan', 'Nicaragua', 'Myanmar', 'Djibouti', 'Pakistan', 'Pakistan', 
                    'Senegal', 'Egypt', 'Russian Federation', 'Uganda', 'India', np.nan, 'Congo DRC',
                    'Algeria', 'South Sudan', 'South Sudan', np.nan, 'India', 'Argentina', 'Algeria', 
                    'Syria', 'Pakistan', 'Colombia', 'Northern Ireland', 'Liberia', 'Russian Federation', 
                    'India', 'Serbia', 'Congo DRC', 'Colombia', 
                    'Lebanon', np.nan, np.nan, 'Palestine', np.nan, 'Syria', 'Iraq', 'Iraq', 'Nigeria', 
                    'Pakistan', 'Iran', 'Turkey', 'Mali', np.nan, 'Ethiopia', np.nan, 'Rwanda', 
                    'India', 'Myanmar', 'Palestine', 'Algeria', np.nan, 'Egypt', np.nan, 
                    'Myanmar', 'Libya', np.nan, 'India', 'Mali', np.nan, 'Zimbabwe', np.nan, 
                    'Pakistan', 'Pakistan', 'Indonesia', 'Uganda', 'Syria', 'India', 'Mali', 
                    'Yemen', np.nan, 'Iraq', np.nan, 'El Salvador', 'Indonesia', 'Turkey',
                    'Myanmar', 'Libya', 'Iraq', 'Iraq', np.nan, 'Egypt', 'India',
                    'Algeria', 'Guatemala', 'Mali', 'Yemen']

# Add country of origin column
top_groups['Country'] = manual_locations

# Drop the nan countries
top_groups = top_groups.dropna(subset=['Country'])
top_groups = top_groups.rename(columns={'gname':'Group', 'nkill': 'Fatalities'})

# 

#### Country location

To find the countries location on the map, we scrape the [World Countries Centroids](https://github.com/gavinr/world-countries-centroids) dataset. These locations denote the longitude and latitude of each country's centroid point. 


In [None]:
# Get country locations csv from url 
locations_url = 'https://cdn.jsdelivr.net/gh/gavinr/world-countries-centroids@v1/dist/countries.csv'
locations = pd.read_csv(locations_url, keep_default_na=False, na_values=['_'])
locations.head()

In [None]:
# Print countries that have a missing location
missing_countries = [country for country in manual_locations if country not in locations['COUNTRY'].values]
print(set(missing_countries))

As special cases, Palestine and Northern Ireland are missing a location from the dataset. We manually add them to the `countries.csv` file.

In [None]:
# Add row to locations for missing countries
locations = locations.append({'longitude': 54.607577, 'latitude': -6.693145, 'COUNTRY': 'Northern Ireland', 'ISO': 'GB', 'COUNTRYAFF': 'United Kingdom', 'AFF_ISO': 'GBR'}, ignore_index=True)
locations = locations.append({'longitude': 31.947351, 'latitude': 35.227163, 'COUNTRY': 'Palestine', 'ISO': 'PSE', 'COUNTRYAFF': 'Palestine', 'AFF_ISO': 'PSE'}, ignore_index=True)

#### Merging terrorist groups and their locations

Now that we have country locations, we are going to merge the terrorist groups' country of origin to each country location. We will then have a longitude and latitude for each terrorist group. 

In [None]:
# Add the longitude and latitude of each country by joining on Country name
locations = locations.rename(columns={'COUNTRY': 'Country'})
top_groups = top_groups.merge(locations[['Country', 'latitude', 'longitude']], on='Country', how='left')

# Save top_groups to csv file
top_groups = top_groups.rename(columns={'Group' : 'Terrorist group', 'latitude' : 'Group latitude', 'longitude' : 'Group longitude', 'Country' : 'Group origin'})
top_groups.to_csv(os.path.join(MAP_DIR, 'top_groups.csv'), index=False)
top_groups

#### Merging group locations with target locations

We now merge the above groups locations to have source and target coordinates in order to draw arcs from a terrorist group's country of origin to the location of its targets. 



In [None]:
# Add group location to GTD by joining on group name
GTD = GTD.merge(top_groups[['Terrorist group', 'Group origin', 'Group latitude', 'Group longitude']], on='Terrorist group', how='left')

# Remove all attacks without group location
GTD = GTD.dropna(subset=['Group latitude', 'Group longitude'])
print('Number of attacks with identified group location:', len(GTD))
 
# Save to csv in group_attacks.csv
GTD.to_csv(os.path.join(MAP_DIR, 'group_attacks.csv'), index=False)
GTD



And we are now done preparing the data for the maps. Head over to our website to visualize them. 