# **GLOBAL TERRORISM ANALYSIS**
## PART 4 : REPORT
#### Author : Samarpan Das

---
---

## **Introduction**
1. The **Global Terrorism Database** [GTD](https://gtd.terrorismdata.com/files/gtd-1970-2019-4/)  is an open-source database including information on terrorist attacks around the world from 1970 through 2019. The GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 180,000 attacks. The database is maintained by researchers at the **National Consortium for the Study of Terrorism and Responses to Terrorism** [START](https://www.start.umd.edu/gtd/), headquartered at the University of Maryland.



The GTD defines terrorism as-
> "The threatened or actual use of illegal force and violence by a non-state actor to attain political, economic, religious or social goal through fear, coercion, or intimidation."





2. **Characteristics of the Database**

*   Contains information on over 201,000 terrorist attacks
*   Currently the most comprehensive unclassified database on terrorist attacks in the world
*   Includes information on more than 88,000 bombings, 19,000 assassinations, and 11,000 kidnappings since 1970
*   Includes information on at least 45 variables for each case, with more recent incidents including information on more than 120 variables
*   More than 4,000,000 news articles and 25,000 news sources were reviewed to collect incident data from 1998 to 2017 alone




3. **Project Goals**
* Read the source and do some quick research to understand more about the dataset and its topic
* Clean the data
* Perform some Preprocessing to get the field that needs to be given the prime focus
* Perform Exploratory Data Analysis on the dataset
* Analyze the data more deeply and extract insights
* Visualize analysis on Tableau . Please find our report here.

---
----

## 1. PREPARE and Inspection stage
In this step we tried to inspect the basic features of the dataset and prepared the dataset according to fit best for our purpose

Detailed code and explaination for the same can be found [here](https://github.com/SamarpanDas/Global-Terrorism-Analysis/blob/main/project/1.%20Prepare/Prepare.sql)

The initial prepare phase was conducted using Google's BigQuery as the data was too vast to be handled locally and could be efficiently handled by google's systems

Glimpse of the code used for the preparation phase


```
-- Queries the whole database by ascending order of their date of occurance
SELECT *
FROM `qwiklabs-gcp-01-28c376c2a71a.terrorism_dataset.terrorism_table`
ORDER BY iyear, imonth, iday


-- Checking if duplicates are present in the database. Returns 201183
SELECT COUNT(DISTINCT(eventid))
FROM `qwiklabs-gcp-01-28c376c2a71a.terrorism_dataset.terrorism_table`   
SELECT COUNT(eventid)
FROM `qwiklabs-gcp-01-28c376c2a71a.terrorism_dataset.terrorism_table`

-- checking totatl number of rows. Returns 201183
SELECT COUNT(eventid)
FROM `qwiklabs-gcp-01-28c376c2a71a.terrorism_dataset.terrorism_table`   
SELECT COUNT(eventid)
FROM `qwiklabs-gcp-01-28c376c2a71a.terrorism_dataset.terrorism_table`


-- Drawing some basic insights

-- Query to group the number of times a each country has been attacked, in descending order
SELECT COUNT(eventid) AS AttackCount, country_txt AS Country
FROM `qwiklabs-gcp-01-28c376c2a71a.terrorism_dataset.terrorism_table`
GROUP BY country_txt
ORDER BY COUNT(eventid) DESC;


-- Query to group together distinct attack_types and each of their counts in descending order
SELECT COUNT(eventid) AS AttackCount, attacktype1_txt AS AttackType
FROM `qwiklabs-gcp-01-28c376c2a71a.terrorism_dataset.terrorism_table`
GROUP BY attacktype1_txt
ORDER BY COUNT(eventid) DESC;



```

A few targeted pivot tables were also formulated using the prepare phase and some initial graphs and charts were designed. 

Those charts can be found [here](https://github.com/SamarpanDas/Global-Terrorism-Analysis/tree/main/project/1.%20Prepare/Prep%20Stage%20Results)



---
---

## 2. Data Processing
In this step we have looked in the the features of this dataset in details and have improved, added, removed and altered some of its features into more convenient forms 

Detailed code and explanation for the same can be found [here](https://github.com/SamarpanDas/Global-Terrorism-Analysis/blob/main/project/2.%20Process/data_preprocessing.ipynb)

The data was sill to huge to be inspected using spreadsheet softwares and hence from here on Python has been used to manipulate and work around with the data

Importing necessary libraries

In [1]:
import time
import matplotlib.pyplot as plt 
import matplotlib.ticker as ticker
from matplotlib import animation
import numpy as np
import pandas as pd
import seaborn as sns

Connecting colab to google drive

In [2]:
from google.colab import drive 
drive.mount('/content/drive')

Mounted at /content/drive


Importing data from google drive

In [3]:
# BaseForAnalysis.csv was uploaded into google drve before hand
primary_df = pd.read_csv('/content/drive/My Drive/BaseForAnalysis.csv', sep=',', encoding="ISO-8859-1")


Initial layout of the data

In [4]:
primary_df.head(10)

Unnamed: 0,eventid,iyear,imonth,iday,extended,country_txt,region_txt,city,latitude,longitude,vicinity,crit1,multiple,success,suicide,attacktype1_txt,targtype1_txt,natlty1_txt,gname,nperps,claimed,weaptype1_txt,nkill,nkillter,nwound,propextent_txt,ishostkid,ransom,nreleased
0,197000000002,1970,0,0,0,Mexico,North America,Mexico city,19.371887,-99.086624,0,1,0,1,0,Hostage Taking (Kidnapping),Government (Diplomatic),Belgium,23rd of September Communist League,7.0,,Unknown,0.0,,0.0,,1.0,1.0,
1,197001000001,1970,1,0,0,Philippines,Southeast Asia,Unknown,15.478598,120.599741,0,1,0,1,0,Assassination,Journalists & Media,United States,Unknown,,,Unknown,1.0,,0.0,,0.0,0.0,
2,197001000002,1970,1,0,0,Greece,Western Europe,Athens,37.99749,23.762728,0,1,0,1,0,Bombing/Explosion,Government (Diplomatic),United States,Unknown,,,Explosives,,,,,0.0,0.0,
3,197001000003,1970,1,0,0,Japan,East Asia,Fukouka,33.580412,130.396361,0,1,0,1,0,Facility/Infrastructure Attack,Government (Diplomatic),United States,Unknown,,,Incendiary,,,,,0.0,0.0,
4,197001010002,1970,1,1,0,United States,North America,Cairo,37.005105,-89.176269,0,1,0,1,0,Armed Assault,Police,United States,Black Nationalists,-99.0,0.0,Firearms,0.0,0.0,0.0,Minor (likely < $1 million),0.0,0.0,
5,197001050001,1970,1,1,0,United States,North America,Baraboo,43.4685,-89.744299,0,1,0,0,0,Bombing/Explosion,Military,United States,"Weather Underground, Weathermen",,,Explosives,0.0,,0.0,Minor (likely < $1 million),0.0,0.0,
6,197001020001,1970,1,2,0,Uruguay,South America,Montevideo,-34.891151,-56.187214,0,1,0,0,0,Assassination,Police,Uruguay,Tupamaros (Uruguay),3.0,,Firearms,0.0,,0.0,,0.0,0.0,
7,197001020002,1970,1,2,0,United States,North America,Oakland,37.791927,-122.225906,0,1,0,1,0,Bombing/Explosion,Utilities,United States,Unknown,-99.0,0.0,Explosives,0.0,0.0,0.0,Minor (likely < $1 million),0.0,0.0,
8,197001020003,1970,1,2,0,United States,North America,Madison,43.076592,-89.412488,0,1,0,1,0,Facility/Infrastructure Attack,Military,United States,New Year's Gang,1.0,1.0,Incendiary,0.0,0.0,0.0,Minor (likely < $1 million),0.0,0.0,
9,197001030001,1970,1,3,0,United States,North America,Madison,43.07295,-89.386694,0,1,0,1,0,Facility/Infrastructure Attack,Government (General),United States,New Year's Gang,1.0,0.0,Incendiary,0.0,0.0,0.0,Minor (likely < $1 million),0.0,0.0,


In [5]:
print ('dataframe shape: ', primary_df.shape)

dataframe shape:  (201183, 29)


####Changing the content and features of the data

Renaming certain columns to better identifiable names

In [6]:
primary_df.rename(columns = 
                  {'iyear':'year', 
                   'imonth':'month',
                   'iday':'day',
                   'country_txt' : 'country',
                   'region_txt' : 'region',
                   'crit1' : 'crit',
                   'attacktype1_txt' : 'attacktype',
                   'targtype1_txt' : 'targettype',
                   'natlty1_txt' : 'nationalityofvic',
                   'gname' : 'organisation',
                   'claimed' : 'claimedresp',
                   'weaptype1_txt' : 'weapontype',
                   'nkill' : 'nkilled',
                   'nkillter' : 'nkillonlyter',
                   'nwound' : 'nwounded',
                   'propextent_txt' : 'propdamageextent',
                   'ishostkid' : 'victimkidnapped',
                   'ransom' : 'ransomdemanded',
                   }, inplace = True)

In [7]:
#Add column ncasualties (Number of Dead/Injured people) by adding Nkill and Nwound
primary_df['ncasualties'] = primary_df['nkilled'] + primary_df['nwounded']

In [8]:
# Limit long strings
primary_df['weapontype'] = primary_df['weapontype'].replace(u'Vehicle (not to include vehicle-borne explosives, i.e., car or truck bombs)', 'Vehicle')


primary_df['propdamageextent'] = primary_df['propdamageextent'].replace('Minor (likely < $1 million)', 'Minor')
primary_df['propdamageextent'] = primary_df['propdamageextent'].replace('Major (likely > $1 million but < $1 billion)', 'Major')
primary_df['propdamageextent'] = primary_df['propdamageextent'].replace('Catastrophic (likely > $1 billion)', 'Catastrophic')

Glimpse of the final preprocessed data

In [9]:
primary_df.head(10)

Unnamed: 0,eventid,year,month,day,extended,country,region,city,latitude,longitude,vicinity,crit,multiple,success,suicide,attacktype,targettype,nationalityofvic,organisation,nperps,claimedresp,weapontype,nkilled,nkillonlyter,nwounded,propdamageextent,victimkidnapped,ransomdemanded,nreleased,ncasualties
0,197000000002,1970,0,0,0,Mexico,North America,Mexico city,19.371887,-99.086624,0,1,0,1,0,Hostage Taking (Kidnapping),Government (Diplomatic),Belgium,23rd of September Communist League,7.0,,Unknown,0.0,,0.0,,1.0,1.0,,0.0
1,197001000001,1970,1,0,0,Philippines,Southeast Asia,Unknown,15.478598,120.599741,0,1,0,1,0,Assassination,Journalists & Media,United States,Unknown,,,Unknown,1.0,,0.0,,0.0,0.0,,1.0
2,197001000002,1970,1,0,0,Greece,Western Europe,Athens,37.99749,23.762728,0,1,0,1,0,Bombing/Explosion,Government (Diplomatic),United States,Unknown,,,Explosives,,,,,0.0,0.0,,
3,197001000003,1970,1,0,0,Japan,East Asia,Fukouka,33.580412,130.396361,0,1,0,1,0,Facility/Infrastructure Attack,Government (Diplomatic),United States,Unknown,,,Incendiary,,,,,0.0,0.0,,
4,197001010002,1970,1,1,0,United States,North America,Cairo,37.005105,-89.176269,0,1,0,1,0,Armed Assault,Police,United States,Black Nationalists,-99.0,0.0,Firearms,0.0,0.0,0.0,Minor,0.0,0.0,,0.0
5,197001050001,1970,1,1,0,United States,North America,Baraboo,43.4685,-89.744299,0,1,0,0,0,Bombing/Explosion,Military,United States,"Weather Underground, Weathermen",,,Explosives,0.0,,0.0,Minor,0.0,0.0,,0.0
6,197001020001,1970,1,2,0,Uruguay,South America,Montevideo,-34.891151,-56.187214,0,1,0,0,0,Assassination,Police,Uruguay,Tupamaros (Uruguay),3.0,,Firearms,0.0,,0.0,,0.0,0.0,,0.0
7,197001020002,1970,1,2,0,United States,North America,Oakland,37.791927,-122.225906,0,1,0,1,0,Bombing/Explosion,Utilities,United States,Unknown,-99.0,0.0,Explosives,0.0,0.0,0.0,Minor,0.0,0.0,,0.0
8,197001020003,1970,1,2,0,United States,North America,Madison,43.076592,-89.412488,0,1,0,1,0,Facility/Infrastructure Attack,Military,United States,New Year's Gang,1.0,1.0,Incendiary,0.0,0.0,0.0,Minor,0.0,0.0,,0.0
9,197001030001,1970,1,3,0,United States,North America,Madison,43.07295,-89.386694,0,1,0,1,0,Facility/Infrastructure Attack,Government (General),United States,New Year's Gang,1.0,0.0,Incendiary,0.0,0.0,0.0,Minor,0.0,0.0,,0.0


In [10]:
'''
# Converting the dataframe to a csv file and uploading it to google drive with the name BaseForAnalysis_Version2.csv
primary_df.to_csv("/content/drive/My Drive/BaseForAnalysis_Version2.csv", sep = ",")
'''

'\n# Converting the dataframe to a csv file and uploading it to google drive with the name BaseForAnalysis_Version2.csv\nprimary_df.to_csv("/content/drive/My Drive/BaseForAnalysis_Version2.csv", sep = ",")\n'

###End of data processing
---
---

..