<a href="https://colab.research.google.com/github/coder-harshil/global-terrorism-analysis/blob/main/Capstone_Project_EDA_Global_Terrorism_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## <b> The Global Terrorism Database (GTD) is an open-source database including information on terrorist attacks around the world from 1970 through 2017. The GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 180,000 attacks. The database is maintained by researchers at the National Consortium for the Study of Terrorism and Responses to Terrorism (START), headquartered at the University of Maryland.</b>

# <b> Explore and analyze the data to discover key findings pertaining to terrorist activities. </b>

##**Terrorism has been a global problem for decades now, and with recent misfortune in Afghanistan, it is high time that we shade light on how the attacks have happened over the years. Thanks to the assignment, we have data of more than 180,000 attacks that happened during 1970-2017.**

## **Here is a list of operations I will be performing in this colab notebook to analyse the dataset -**

## **1. Cleaning the data and keeping only required columns (there are 135 columns in the dataset, and I will select only the necessary ones).**

## **2. Replacing null values when and where necessary for smooth data analysis.**

## **3. Filtering dataset and creating other dataframes for diving deep into how particular factors influence the data.**

## **4. Creating basic visualizations to understand the convery the effect of factors.**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
#Importing and reading the dataset

df = pd.read_csv('/content/drive/MyDrive/AlmaBetter/Capstone Project - EDA/Global Terrorism Data.csv', encoding = 'latin1' )

  interactivity=interactivity, compiler=compiler, result=result)


In [4]:
df.head()

Unnamed: 0,eventid,iyear,imonth,iday,approxdate,extended,resolution,country,country_txt,region,region_txt,provstate,city,latitude,longitude,specificity,vicinity,location,summary,crit1,crit2,crit3,doubtterr,alternative,alternative_txt,multiple,success,suicide,attacktype1,attacktype1_txt,attacktype2,attacktype2_txt,attacktype3,attacktype3_txt,targtype1,targtype1_txt,targsubtype1,targsubtype1_txt,corp1,target1,...,weapsubtype4,weapsubtype4_txt,weapdetail,nkill,nkillus,nkillter,nwound,nwoundus,nwoundte,property,propextent,propextent_txt,propvalue,propcomment,ishostkid,nhostkid,nhostkidus,nhours,ndays,divert,kidhijcountry,ransom,ransomamt,ransomamtus,ransompaid,ransompaidus,ransomnote,hostkidoutcome,hostkidoutcome_txt,nreleased,addnotes,scite1,scite2,scite3,dbsource,INT_LOG,INT_IDEO,INT_MISC,INT_ANY,related
0,197000000001,1970,7,2,,0,,58,Dominican Republic,2,Central America & Caribbean,,Santo Domingo,18.456792,-69.951164,1.0,0,,,1,1,1,0.0,,,0.0,1,0,1,Assassination,,,,,14,Private Citizens & Property,68.0,Named Civilian,,Julio Guzman,...,,,,1.0,,,0.0,,,0,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,0,0,0,0,
1,197000000002,1970,0,0,,0,,130,Mexico,1,North America,Federal,Mexico city,19.371887,-99.086624,1.0,0,,,1,1,1,0.0,,,0.0,1,0,6,Hostage Taking (Kidnapping),,,,,7,Government (Diplomatic),45.0,"Diplomatic Personnel (outside of embassy, cons...",Belgian Ambassador Daughter,"Nadine Chaval, daughter",...,,,,0.0,,,0.0,,,0,,,,,1.0,1.0,0.0,,,,Mexico,1.0,800000.0,,,,,,,,,,,,PGIS,0,1,1,1,
2,197001000001,1970,1,0,,0,,160,Philippines,5,Southeast Asia,Tarlac,Unknown,15.478598,120.599741,4.0,0,,,1,1,1,0.0,,,0.0,1,0,1,Assassination,,,,,10,Journalists & Media,54.0,Radio Journalist/Staff/Facility,Voice of America,Employee,...,,,,1.0,,,0.0,,,0,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,-9,-9,1,1,
3,197001000002,1970,1,0,,0,,78,Greece,8,Western Europe,Attica,Athens,37.99749,23.762728,1.0,0,,,1,1,1,0.0,,,0.0,1,0,3,Bombing/Explosion,,,,,7,Government (Diplomatic),46.0,Embassy/Consulate,,U.S. Embassy,...,,,Explosive,,,,,,,1,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,-9,-9,1,1,
4,197001000003,1970,1,0,,0,,101,Japan,4,East Asia,Fukouka,Fukouka,33.580412,130.396361,1.0,0,,,1,1,1,-9.0,,,0.0,1,0,7,Facility/Infrastructure Attack,,,,,7,Government (Diplomatic),46.0,Embassy/Consulate,,U.S. Consulate,...,,,Incendiary,,,,,,,1,,,,,0.0,,,,,,,0.0,,,,,,,,,,,,,PGIS,-9,-9,1,1,


### **As the data contains many unnecessary columns, we will first filter and select only the essential ones for our analysis**

In [5]:
df.columns

Index(['eventid', 'iyear', 'imonth', 'iday', 'approxdate', 'extended',
       'resolution', 'country', 'country_txt', 'region',
       ...
       'addnotes', 'scite1', 'scite2', 'scite3', 'dbsource', 'INT_LOG',
       'INT_IDEO', 'INT_MISC', 'INT_ANY', 'related'],
      dtype='object', length=135)

In [6]:
#Will use only the essential columns for the analysis of the data

columns_required = ['iyear', 'imonth', 'iday', 'country_txt', 'region_txt', 'provstate', 'city', 'attacktype1_txt', 'targtype1_txt', 'gname', 'weaptype1_txt', 'nkill']
terror_df = pd.DataFrame(columns = columns_required, data = df)

In [7]:
terror_df.head()

Unnamed: 0,iyear,imonth,iday,country_txt,region_txt,provstate,city,attacktype1_txt,targtype1_txt,gname,weaptype1_txt,nkill
0,1970,7,2,Dominican Republic,Central America & Caribbean,,Santo Domingo,Assassination,Private Citizens & Property,MANO-D,Unknown,1.0
1,1970,0,0,Mexico,North America,Federal,Mexico city,Hostage Taking (Kidnapping),Government (Diplomatic),23rd of September Communist League,Unknown,0.0
2,1970,1,0,Philippines,Southeast Asia,Tarlac,Unknown,Assassination,Journalists & Media,Unknown,Unknown,1.0
3,1970,1,0,Greece,Western Europe,Attica,Athens,Bombing/Explosion,Government (Diplomatic),Unknown,Explosives,
4,1970,1,0,Japan,East Asia,Fukouka,Fukouka,Facility/Infrastructure Attack,Government (Diplomatic),Unknown,Incendiary,


In [8]:
terror_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 181691 entries, 0 to 181690
Data columns (total 12 columns):
 #   Column           Non-Null Count   Dtype  
---  ------           --------------   -----  
 0   iyear            181691 non-null  int64  
 1   imonth           181691 non-null  int64  
 2   iday             181691 non-null  int64  
 3   country_txt      181691 non-null  object 
 4   region_txt       181691 non-null  object 
 5   provstate        181270 non-null  object 
 6   city             181257 non-null  object 
 7   attacktype1_txt  181691 non-null  object 
 8   targtype1_txt    181691 non-null  object 
 9   gname            181691 non-null  object 
 10  weaptype1_txt    181691 non-null  object 
 11  nkill            171378 non-null  float64
dtypes: float64(1), int64(3), object(8)
memory usage: 16.6+ MB


In [9]:
#Eliminating null values from 'nkill' column and replacing them by median values

terror_df.loc[terror_df['nkill'].isna(), 'nkill'] = terror_df.loc[~terror_df['nkill'].isna(), 'nkill'].median()

In [10]:
#Checking dataframe information again to ensure null values are filled

terror_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 181691 entries, 0 to 181690
Data columns (total 12 columns):
 #   Column           Non-Null Count   Dtype  
---  ------           --------------   -----  
 0   iyear            181691 non-null  int64  
 1   imonth           181691 non-null  int64  
 2   iday             181691 non-null  int64  
 3   country_txt      181691 non-null  object 
 4   region_txt       181691 non-null  object 
 5   provstate        181270 non-null  object 
 6   city             181257 non-null  object 
 7   attacktype1_txt  181691 non-null  object 
 8   targtype1_txt    181691 non-null  object 
 9   gname            181691 non-null  object 
 10  weaptype1_txt    181691 non-null  object 
 11  nkill            181691 non-null  float64
dtypes: float64(1), int64(3), object(8)
memory usage: 16.6+ MB


In [11]:
terror_df.describe()

Unnamed: 0,iyear,imonth,iday,nkill
count,181691.0,181691.0,181691.0,181691.0
mean,2002.638997,6.467277,15.505644,2.26686
std,13.25943,3.388303,8.814045,11.227057
min,1970.0,0.0,0.0,0.0
25%,1991.0,4.0,8.0,0.0
50%,2009.0,6.0,15.0,0.0
75%,2014.0,9.0,23.0,2.0
max,2017.0,12.0,31.0,1570.0


In [12]:
#Saving CSV in system for creating better visuals with Tableau

terror_df.to_excel("/content/drive/MyDrive/AlmaBetter/Capstone Project - EDA/terror.xlsx")