# Global Terrorism Database (GTD) 
The Global Terrorism Database contains data from 1970 until 2018 with all terrorist attacks globally.

Some general findings derived from the GTD involve the nature and distribution of terrorist attacks. For example, about half of all terrorist attacks in the GTD are non-lethal, and although approximately one percent of attacks involve 25 or more fatalities, these highly lethal attacks killed more than 140,000 people in total between 1970 and 2018. The attacks in the GTD are attributed to more than 2,000 named perpetrator organizations and more than 700 additional generic groupings such as "Tamil separatists." However, two-thirds of these groups are active for less than a year and carry out fewer than four total attacks. Likewise, only 20 perpetrator groups are responsible for half of all attacks from 1970 to 2018 for which a perpetrator was identified. In general, patterns of terrorist attacks are very diverse across time and place and the GTD supports in-depth analysis of these patterns.

## Importing data

In [155]:
# Import libraries
import pandas as pd

In [156]:
# Read in the Global Terrorism Database 
terrorism = pd.read_csv('globalterrorismdb_0919dist.csv', sep=";")

  interactivity=interactivity, compiler=compiler, result=result)


In [157]:
# Print an overview of the terrorism dataframe 
terrorism.head()

Unnamed: 0,eventid,iyear,imonth,iday,approxdate,extended,resolution,country,country_txt,region,...,addnotes,scite1,scite2,scite3,dbsource,INT_LOG,INT_IDEO,INT_MISC,INT_ANY,related
0,197000000001,1970,7,2,,0,,58,Dominican Republic,2,...,,,,,PGIS,0,0,0,0,
1,197000000002,1970,0,0,,0,,130,Mexico,1,...,,,,,PGIS,0,1,1,1,
2,197001000001,1970,1,0,,0,,160,Philippines,5,...,,,,,PGIS,-9,-9,1,1,
3,197001000002,1970,1,0,,0,,78,Greece,8,...,,,,,PGIS,-9,-9,1,1,
4,197001000003,1970,1,0,,0,,101,Japan,4,...,,,,,PGIS,-9,-9,1,1,


In [121]:
# NOT YET DECIDED WHAT TO DO WITH ZERO DATES 

# Check the number of events including 'zero' dates (meaning that the start cannot be pinpointed to one date)
terrorism[(terrorism['imonth'] == 0) | (terrorism['iday'] == 0)].groupby(['country_txt']).agg({'eventid':'count',
                                                                                               'nkill':'sum',
                                                                                               'nwound':'sum'})

# Creating a new dataframe for conversion of dates to timeseries data
terrorism_dates = pd.DataFrame({'year': terrorism['iyear'],
                                'month': terrorism['imonth'],
                                'day': terrorism['iday']})

# Creating 'Date' column (DateTime) for the terrorism dataframe
#terrorism['Date'] = pd.to_datetime(terrorism_dates[["Year","Month", "Day"]])

Unnamed: 0_level_0,eventid,nkill,nwound
country_txt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Afghanistan,13,10.0,3.0
Algeria,31,220.0,41.0
Angola,18,40.0,93.0
Argentina,6,2.0,0.0
Australia,3,0.0,9.0
...,...,...,...
West Bank and Gaza Strip,6,4.0,0.0
West Germany (FRG),4,0.0,0.0
Yemen,2,1.0,2.0
Zaire,1,8.0,50.0


## Exploratory Data Analysis

In [158]:
# Create an overview of database specific statistics
terrorism.describe()

Unnamed: 0,eventid,iyear,imonth,iday,extended,country,region,specificity,vicinity,crit1,...,nhostkidus,ndays,ransom,ransompaidus,hostkidoutcome,nreleased,INT_LOG,INT_IDEO,INT_MISC,INT_ANY
count,191464.0,191464.0,191464.0,191464.0,191464.0,191464.0,191464.0,191463.0,191464.0,191464.0,...,14571.0,8861.0,78428.0,611.0,12045.0,11454.0,191464.0,191464.0,191464.0,191464.0
mean,200348600000.0,2003.420136,6.46251,15.507688,0.047476,131.290446,7.205167,1.458172,0.069151,0.98841,...,-0.334912,-33.179664,-0.154384,217.00491,4.643005,-29.816309,-4.521727,-4.439247,0.088951,-3.930582
std,1334949000.0,13.349405,3.388515,8.807727,0.212656,112.058063,2.923811,0.991536,0.284292,0.10703,...,6.635037,125.390956,1.24084,2796.042504,2.034429,65.183303,4.543713,4.639931,0.556741,4.689726
min,197000000000.0,1970.0,0.0,0.0,0.0,4.0,1.0,1.0,-9.0,0.0,...,-99.0,-99.0,-9.0,-99.0,1.0,-99.0,-9.0,-9.0,-9.0,-9.0
25%,199108300000.0,1991.0,4.0,8.0,0.0,78.0,6.0,1.0,0.0,1.0,...,0.0,-99.0,0.0,0.0,2.0,-99.0,-9.0,-9.0,0.0,-9.0
50%,201003100000.0,2010.0,6.0,15.0,0.0,98.0,7.0,1.0,0.0,1.0,...,0.0,-99.0,0.0,0.0,4.0,0.0,-9.0,-9.0,0.0,0.0
75%,201501300000.0,2015.0,9.0,23.0,0.0,160.0,10.0,1.0,0.0,1.0,...,0.0,4.0,0.0,0.0,7.0,1.0,0.0,0.0,0.0,0.0
max,201812300000.0,2018.0,12.0,31.0,1.0,1004.0,12.0,5.0,1.0,1.0,...,86.0,2676.0,1.0,48000.0,7.0,2912.0,1.0,1.0,1.0,1.0


In [159]:
# All the columns in the dataset
columns = list(terrorism.columns)
print("The list of columns in the entire dataset are the following 134: {}".format(columns))

The list of columns in the entire dataset are the following 134: ['eventid', 'iyear', 'imonth', 'iday', 'approxdate', 'extended', 'resolution', 'country', 'country_txt', 'region', 'region_txt', 'provstate', 'city', 'latitude', 'longitude', 'specificity', 'vicinity', 'location', 'summary', 'crit1', 'crit2', 'crit3', 'doubtterr', 'alternative', 'alternative_txt', 'multiple', 'success', 'suicide', 'attacktype1', 'attacktype1_txt', 'attacktype2', 'attacktype2_txt', 'attacktype3', 'attacktype3_txt', 'targtype1', 'targtype1_txt', 'targsubtype1', 'targsubtype1_txt', 'corp1', 'target1', 'natlty1', 'natlty1_txt', 'targtype2', 'targtype2_txt', 'targsubtype2', 'targsubtype2_txt', 'corp2', 'target2', 'natlty2', 'natlty2_txt', 'targtype3', 'targtype3_txt', 'targsubtype3', 'targsubtype3_txt', 'corp3', 'target3', 'natlty3', 'natlty3_txt', 'gname', 'gsubname', 'gname2', 'gsubname2', 'gname3', 'gsubname3', 'motive', 'guncertain1', 'guncertain2', 'guncertain3', 'individual', 'nperps', 'nperpcap', 'claim

In [160]:
terrorism.shape

(191464, 135)

In [161]:
# Terroistic attack has related ID's changing two times 8,5 to 9 and 7 so that column can be converted to int
terrorism['nwound'].loc[terrorism['eventid'] == 201207070021] = 9 
terrorism['nwound'].loc[terrorism['eventid'] == 201207070022] = 7 

# Changing column from object to float, because float accept NaN 
terrorism['nwound'] = terrorism['nwound'].astype(float)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)


### Global Analysis
Analysis for global terroristic events that are considered 'major' terroristic attacks 

In [162]:
global_events = terrorism.loc[(terrorism['nkill'] > 20) | (terrorism['nwound'] > 25)]

# Creating a dataframe see the count of 'major' terrostic event and the associated sum of death and wounded 
global_summary = global_events[['country_txt', 'eventid', 'nkill', 'nwound']].groupby('country_txt').agg({
                                                                                            'eventid':'count',
                                                                                            'nkill':'sum',
                                                                                            'nwound':'sum'
                                                                                        })
global_summary.sort_values('eventid', ascending=False).head(10)

Unnamed: 0_level_0,eventid,nkill,nwound
country_txt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Iraq,1186,35432.0,54802.0
Afghanistan,532,14883.0,20174.0
Pakistan,398,8899.0,20897.0
Syria,321,10909.0,9142.0
Nigeria,306,12464.0,5935.0
India,281,4779.0,11894.0
Sri Lanka,252,8220.0,10199.0
Algeria,153,4330.0,4343.0
El Salvador,152,6154.0,1267.0
Nicaragua,147,6904.0,579.0


In [170]:
# Some rows contain zero in the column day as they do not know exactly when the attack started
# For simplicity we will drop these rows (total sample size allows dropping, only 16 cases)
DropDayZeroUS = list(global_events.loc[(global_events['iday'] == 0) | (global_events['iday'] == 0)].index)
global_events.loc[(global_events['iday'] == 0)]

# Creating a new dataframe for conversion of dates to timeseries data
global_events_dates = pd.DataFrame({'year': global_events['iyear'],
                                    'month': global_events['imonth'],
                                    'day': global_events['iday']})

# Drop cases and update dataframe
global_events = global_events.drop(DropDayZeroGlobal)

# Creating 'Date' column (DateTime) for the terrorism dataframe
global_events['Date'] = pd.to_datetime(global_events_dates[["year","month", "day"]])

In [171]:
global_events

Unnamed: 0,eventid,iyear,imonth,iday,approxdate,extended,resolution,country,country_txt,region,...,scite1,scite2,scite3,dbsource,INT_LOG,INT_IDEO,INT_MISC,INT_ANY,related,Date
80,197002210002,1970,2,21,,0,,199,Switzerland,8,...,,,,PGIS,1,1,0,1,,1970-02-21
210,197004210001,1970,4,21,,0,,160,Philippines,5,...,,,,PGIS,-9,-9,0,-9,,1970-04-21
1079,197111170003,1971,11,17,,0,18-11-1971,217,United States,1,...,"""Arsonists Are Hunted At Okla. U.,"" Washington...","""Probe Arson in OU Fires,"" The Fort Scott Trib...","""OU Damages $200,000, 27 Hurt; Vandals Sought,...",Hewitt Project,-9,-9,0,-9,,1971-11-17
1085,197111200002,1971,11,20,,0,,201,Taiwan,4,...,,,,PGIS,-9,-9,1,1,,1971-11-20
1152,197201260002,1972,1,26,,0,,236,Czechoslovakia,9,...,,,,PGIS,1,1,1,1,,1972-01-26
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
191140,201812160030,2018,12,16,,1,,229,Democratic Republic of the Congo,11,...,"""Revealed: DR Congo’s ‘invisible’ massacre,"" D...",,,START Primary Collection,-9,-9,0,-9,"201812160027, 201812160028, 201812160029, 2018...",2018-12-16
191147,201812170007,2018,12,17,,0,,228,Yemen,10,...,"""Yemen army claims gains in Al-Hudaydah despit...","""Yemen army claims gains in Al-Hudaydah despit...","""Yemen: Roundup of Political, Security, CT Dev...",START Primary Collection,0,0,0,0,"201812170007, 201812170008",2018-12-17
191148,201812170008,2018,12,17,,0,,228,Yemen,10,...,"""Yemen army claims gains in Al-Hudaydah despit...","""Yemen army claims gains in Al-Hudaydah despit...","""Yemen: Roundup of Political, Security, CT Dev...",START Primary Collection,0,0,0,0,"201812170007, 201812170008",2018-12-17
191294,201812250001,2018,12,24,,0,,4,Afghanistan,6,...,"""Gunmen storm government building in Kabul, ta...","""Legislative want comprehensive investigation ...","""Attack on Afghan government compound kills 43...",START Primary Collection,0,0,0,0,,2018-12-24


### US Casulties Analysis

#### US Major Terrorstic Attacks

In [122]:
# Selection of variables to show
variables = ['eventid','iyear','imonth','iday','country_txt', 'nkill', 'nkillus', 'nwound', 'nwoundus', 'addnotes', 'scite1','scite2','scite3','related']

# Extract from the global events analysis with all terrorist attacks on US soil with over 20 deaths or 25 wounded 
global_events_us = global_events[variables].loc[(global_events['country_txt'] == 'United States')]

print(global_events_us.shape)
global_events_us.head()

(21, 14)


Unnamed: 0,eventid,iyear,imonth,iday,country_txt,nkill,nkillus,nwound,nwoundus,addnotes,scite1,scite2,scite3,related
1079,197111170003,1971,11,17,United States,0.0,0.0,27.0,27.0,"At the time, African American students at Okla...","""Arsonists Are Hunted At Okla. U.,"" Washington...","""Probe Arson in OU Fires,"" The Fort Scott Trib...","""OU Damages $200,000, 27 Hurt; Vandals Sought,...",
2505,197408060004,1974,8,6,United States,3.0,3.0,36.0,36.0,The explosion occurred at 8:10am. This may be...,"""T Is for Terror: A mad bomber who stalked Los...","""Suspect Arrested in Coast Bombings,"" New York...","""Bomb Explosion Kills 2 at Airport in Los Ange...",
2767,197501240001,1975,1,24,United States,4.0,,53.0,,,,,,
3473,197512290003,1975,12,29,United States,11.0,,74.0,,,,,,
23313,198409200009,1984,9,20,United States,0.0,0.0,751.0,751.0,,,,,


#### US Terrorstic Attack analysis based on US Casulties both in the US and abroad.

In [107]:
# Extract from terrorism dataframe all terrorist attacks with one or more US casulties 
US_casulties = terrorism.loc[(terrorism['nkillus'] > 0) | (terrorism['nwoundus'] > 0)]

# Present overview with the number of events per country with one or mroe US casulties
US_casulties_summary = US_casulties.groupby('country_txt').agg({
                                                                'eventid':'count',
                                                                'nkill' : 'sum',
                                                                'nkillus': 'sum',
                                                                'nwound': 'sum',
                                                                'nwoundus':'sum'
                                                                })

US_casulties_summary.sort_values(by='eventid', ascending=False).head(20)

Unnamed: 0_level_0,eventid,nkill,nkillus,nwound,nwoundus
country_txt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
United States,407,3674.0,3551.0,24902.0,2752.0
Afghanistan,350,1199.0,504.0,1709.0,488.0
Iraq,150,918.0,263.0,2166.0,455.0
Philippines,31,94.0,34.0,250.0,16.0
West Bank and Gaza Strip,29,62.0,26.0,115.0,16.0
Pakistan,25,97.0,37.0,307.0,61.0
Guatemala,19,11.0,11.0,15.0,11.0
Colombia,18,19.0,19.0,9.0,6.0
Israel,17,126.0,22.0,638.0,23.0
El Salvador,16,112.0,24.0,169.0,7.0


#### Creating 'Date' column with timeseries data

In [168]:
# Some rows contain zero in the column day as they do not know exactly when the attack started
# For simplicity we will drop these rows (specific rows do not have high casulties and total sample size allows dropping)
DropDayZeroUS = list(US_casulties.loc[(US_casulties['iday'] == 0) | (US_casulties['iday'] == 0)].index)
US_casulties = US_casulties.drop(DropDayZeroUS)

# Creating a new dataframe for conversion of dates to timeseries data
US_casulties_dates = pd.DataFrame({'year': US_casulties['iyear'],
                                   'month': US_casulties['imonth'],
                                   'day': US_casulties['iday']})

global_events_us_dates = pd.DataFrame({'year': global_events_us['iyear'],
                                       'month': global_events_us['imonth'],
                                       'day': global_events_us['iday']})

# Creating 'Date' column (DateTime) for the terrorism dataframe
US_casulties['Date'] = pd.to_datetime(US_casulties_dates[["year","month", "day"]])
global_events_us['Date'] = pd.to_datetime(global_events_us_dates[["year","month", "day"]])

In [169]:
print(US_casulties.info())
US_casulties.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1371 entries, 20 to 190714
Columns: 136 entries, eventid to Date
dtypes: datetime64[ns](1), float64(45), int64(24), object(66)
memory usage: 1.4+ MB
None


Unnamed: 0,eventid,iyear,imonth,iday,approxdate,extended,resolution,country,country_txt,region,...,scite1,scite2,scite3,dbsource,INT_LOG,INT_IDEO,INT_MISC,INT_ANY,related,Date
20,197001140001,1970,1,14,,0,,217,United States,1,...,Committee on Government Operations United Stat...,"Christopher Hewitt, ""Political Violence and Te...","Peter F. Nardulli and Jeffrey M. Stonecash, ""P...",Hewitt Project,-9,-9,0,-9,,1970-01-14
30,197001250002,1970,1,25,,0,,217,United States,1,...,Committee on Government Operations United Stat...,"Martin Arnold, ""Harlem Area Sealed Off As Poli...","""2 Policemen Wounded by Sniper Fire,"" New York...",Hewitt Project,-9,-9,0,-9,,1970-01-25
63,197002130003,1970,2,13,,0,,217,United States,1,...,Committee on Government Operations United Stat...,"""Bombs Injure 2 Policemen in Berkeley,"" Washin...","""San Francisco Bomb Injured 7 Policemen,"" Wash...",Hewitt Project,-9,-9,0,-9,,1970-02-13
65,197002150002,1970,2,15,,0,,217,United States,1,...,"Mike Stahlberg, ""Clues Sought for Cause of UO ...","""Students Impede Firemen; R.O.T.C. Offices in ...","Emily Mosen, ""The Reserve Officers' Training C...",Hewitt Project,-9,-9,0,-9,,1970-02-15
111,197003050003,1970,3,5,,0,,217,United States,1,...,Committee on the Judiciary United States Sena...,"William Sater, ""Puerto Rican Terrorists: A Pos...","""Toward People's War for Independence and Soci...",Hewitt Project,0,1,0,1,,1970-03-05


In [127]:
print(global_events_us.info())
global_events_us.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 21 entries, 1079 to 179491
Data columns (total 15 columns):
eventid        21 non-null int64
iyear          21 non-null int64
imonth         21 non-null int64
iday           21 non-null int64
country_txt    21 non-null object
nkill          21 non-null float64
nkillus        18 non-null float64
nwound         21 non-null float64
nwoundus       12 non-null float64
addnotes       13 non-null object
scite1         15 non-null object
scite2         15 non-null object
scite3         14 non-null object
related        7 non-null object
Date           21 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(4), int64(4), object(6)
memory usage: 2.6+ KB
None


Unnamed: 0,eventid,iyear,imonth,iday,country_txt,nkill,nkillus,nwound,nwoundus,addnotes,scite1,scite2,scite3,related,Date
1079,197111170003,1971,11,17,United States,0.0,0.0,27.0,27.0,"At the time, African American students at Okla...","""Arsonists Are Hunted At Okla. U.,"" Washington...","""Probe Arson in OU Fires,"" The Fort Scott Trib...","""OU Damages $200,000, 27 Hurt; Vandals Sought,...",,1971-11-17
2505,197408060004,1974,8,6,United States,3.0,3.0,36.0,36.0,The explosion occurred at 8:10am. This may be...,"""T Is for Terror: A mad bomber who stalked Los...","""Suspect Arrested in Coast Bombings,"" New York...","""Bomb Explosion Kills 2 at Airport in Los Ange...",,1974-08-06
2767,197501240001,1975,1,24,United States,4.0,,53.0,,,,,,,1975-01-24
3473,197512290003,1975,12,29,United States,11.0,,74.0,,,,,,,1975-12-29
23313,198409200009,1984,9,20,United States,0.0,0.0,751.0,751.0,,,,,,1984-09-20


# Saving data for further analysis

In [172]:
# Transform Date column (Date Time Series) back to string to ensure proper format in .csv
US_casulties['Date'] = US_casulties['Date'].astype(str)
global_events_us['Date'] = global_events_us['Date'].astype(str)
global_events['Date'] = global_events['Date'].astype(str)

# Save file to .csv to be used in further analysis
US_casulties.to_csv('us_casulties_clean.csv')
global_events_us.to_csv('global_events_us.csv')
global_events.to_csv('global_events.csv')