***Currently working on the dataset from the Global Terrorism Database (GTD) maintained by the University of Maryland to gain insights into current state of terrorism in the US, and possibly making predictions of future unrewarded risks. I foresee this analysis valuable to the US Department of Homeland Security, other government establishments, businesses or individuals that care about safety of lives and properties.***

***Dataset can be found at: http://www.start.umd.edu/gtd/***

**Overseeing Mentor: Dr. Stylianos Kampakis***

In [1]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="whitegrid", color_codes=True)
np.random.seed(sum(map(ord, "categorical")))
matplotlib.style.use('ggplot')


In [2]:
%matplotlib inline

## Exploratory Data Analysis (EDA)

The dataset was downloaded as an excel spreadsheet. This will be uploaded into Pandas and then we carry out an Exploratory Data Analysis (EDA). 

In [3]:
file= r'C:\Users\dejavu\Desktop\git_jupyter\springboard_mini_project/capstone_projects/globalterrorismdb_0617dist.xlsx'
df= pd.read_excel(file)

In [4]:
#restrict this dataset to occurrences in the US.
df1= df['country_txt'].str.contains('United States')
df2= df[df1]
df2.head(3)

Unnamed: 0,eventid,iyear,imonth,iday,approxdate,extended,resolution,country,country_txt,region,...,addnotes,scite1,scite2,scite3,dbsource,INT_LOG,INT_IDEO,INT_MISC,INT_ANY,related
5,197001010002,1970,1,1,,0,NaT,217,United States,1,...,"The Cairo Chief of Police, William Petersen, r...","""Police Chief Quits,"" Washington Post, January...","""Cairo Police Chief Quits; Decries Local 'Mili...","Christopher Hewitt, ""Political Violence and Te...",Hewitt Project,-9,-9,0,-9,
7,197001020002,1970,1,2,,0,NaT,217,United States,1,...,"Damages were estimated to be between $20,000-$...",Committee on Government Operations United Stat...,"Christopher Hewitt, ""Political Violence and Te...",,Hewitt Project,-9,-9,0,-9,
8,197001020003,1970,1,2,,0,NaT,217,United States,1,...,The New Years Gang issue a communiqué to a loc...,"Tom Bates, ""Rads: The 1970 Bombing of the Army...","David Newman, Sandra Sutherland, and Jon Stewa...","The Wisconsin Cartographers' Guild, ""Wisconsin...",Hewitt Project,0,0,0,0,


In [5]:
#structure of DataFrame i need
df2.shape

(2758, 135)

In [6]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2758 entries, 5 to 169902
Columns: 135 entries, eventid to related
dtypes: datetime64[ns](1), float64(53), int64(24), object(57)
memory usage: 2.9+ MB


In [7]:
#shows number missing values for each feature in the DataFrame
df2.isnull().sum()

eventid                  0
iyear                    0
imonth                   0
iday                     0
approxdate            2721
extended                 0
resolution            2746
country                  0
country_txt              0
region                   0
region_txt               0
provstate                0
city                     0
latitude                 1
longitude                1
specificity              0
vicinity                 0
location              1852
summary               1054
crit1                    0
crit2                    0
crit3                    0
doubtterr                0
alternative           2364
alternative_txt       2364
multiple                 0
success                  0
suicide                  0
attacktype1              0
attacktype1_txt          0
                      ... 
propextent            1191
propextent_txt        1191
propvalue             1840
propcomment           1748
ishostkid              176
nhostkid              2696
n

In [8]:
class EDA():
    '''Used for running Exploratory Data Analysis'''
    def __init__(self):
        ''''''
    def drop_col_nan(self, df2, threshold):
        for col in df2.columns:
            amt = sum(df2[col].isnull())/float(len(df2)) * 100
            if amt > threshold:
                df2 = df2.drop(col,1)
        return df2
    
    def drop_noisy_col(self, df3, x=[]):
        df3=df3.drop(x, 1)
        return df3        

In [9]:
my_EDA = EDA()
df3=my_EDA.drop_col_nan(df2, 70)

In [10]:
df3.shape

(2758, 68)

In [11]:
#df3.isnull().sum()

In [12]:
df4=my_EDA.drop_noisy_col(df3, ['country','country_txt','addnotes', 'summary', 'scite1' , 'scite2' , 'scite3' , 'dbsource', 'INT_LOG' , 'INT_MISC', 'INT_ANY', 'INT_IDEO', 'longitude','specificity', 'eventid'])

In [13]:
df4.shape

(2758, 53)

In [14]:
df5=my_EDA.drop_noisy_col(df4, ['location','region', 'region_txt', 'propcomment'])
df5.shape

(2758, 49)

In [15]:
df5.isnull().sum()

iyear                  0
imonth                 0
iday                   0
extended               0
provstate              0
city                   0
latitude               1
vicinity               0
crit1                  0
crit2                  0
crit3                  0
doubtterr              0
multiple               0
success                0
suicide                0
attacktype1            0
attacktype1_txt        0
targtype1              0
targtype1_txt          0
targsubtype1         120
targsubtype1_txt     120
corp1                939
target1               45
natlty1                9
natlty1_txt            9
gname                  0
motive              1328
guncertain1            0
individual             0
nperps               982
nperpcap            1055
claimed             1051
weaptype1              0
weaptype1_txt          0
weapsubtype1         259
weapsubtype1_txt     259
weapdetail           452
nkill                 73
nkillus              953
nkillter            1004


In [16]:
pd.set_option('display.max_columns', None)

In [17]:
df5.index = range(len(df5))

In [18]:
df5.head(1000)

Unnamed: 0,iyear,imonth,iday,extended,provstate,city,latitude,vicinity,crit1,crit2,crit3,doubtterr,multiple,success,suicide,attacktype1,attacktype1_txt,targtype1,targtype1_txt,targsubtype1,targsubtype1_txt,corp1,target1,natlty1,natlty1_txt,gname,motive,guncertain1,individual,nperps,nperpcap,claimed,weaptype1,weaptype1_txt,weapsubtype1,weapsubtype1_txt,weapdetail,nkill,nkillus,nkillter,nwound,nwoundus,nwoundte,property,propextent,propextent_txt,propvalue,ishostkid,ransom
0,1970,1,1,0,Illinois,Cairo,37.005105,0,1,1,1,0,0,1,0,2,Armed Assault,3,Police,22.0,"Police Building (headquarters, station, school)",Cairo Police Department,Cairo Police Headquarters,217.0,United States,Black Nationalists,To protest the Cairo Illinois Police Deparment,0.0,0,-99.0,-99.0,0.0,5,Firearms,5.0,Unknown Gun Type,Several gunshots were fired.,0.0,0.0,0.0,0.0,0.0,0.0,1,3.0,Minor (likely < $1 million),,0.0,0.0
1,1970,1,2,0,California,Oakland,37.805065,0,1,1,1,1,0,1,0,3,Bombing/Explosion,21,Utilities,107.0,Electricity,Pacific Gas & Electric Company,Edes Substation,217.0,United States,Unknown,,0.0,0,-99.0,-99.0,0.0,6,Explosives/Bombs/Dynamite,16.0,Unknown Explosive Type,,0.0,0.0,0.0,0.0,0.0,0.0,1,3.0,Minor (likely < $1 million),22500.0,0.0,0.0
2,1970,1,2,0,Wisconsin,Madison,43.076592,0,1,1,1,0,0,1,0,7,Facility/Infrastructure Attack,4,Military,28.0,Military Recruiting Station/Academy,R.O.T.C.,"R.O.T.C. offices at University of Wisconsin, M...",217.0,United States,New Year's Gang,To protest the War in Vietnam and the draft,0.0,0,1.0,1.0,1.0,8,Incendiary,19.0,Molotov Cocktail/Petrol Bomb,Firebomb consisting of gasoline,0.0,0.0,0.0,0.0,0.0,0.0,1,3.0,Minor (likely < $1 million),60000.0,0.0,0.0
3,1970,1,3,0,Wisconsin,Madison,43.072950,0,1,1,1,0,0,1,0,7,Facility/Infrastructure Attack,2,Government (General),21.0,Government Building/Facility/Office,Selective Service,Selective Service Headquarters in Madison Wisc...,217.0,United States,New Year's Gang,To protest the War in Vietnam and the draft,0.0,0,1.0,1.0,0.0,8,Incendiary,20.0,Gasoline or Alcohol,Poured gasoline on the floor and lit it with a...,0.0,0.0,0.0,0.0,0.0,0.0,1,3.0,Minor (likely < $1 million),,0.0,0.0
4,1970,1,1,0,Wisconsin,Baraboo,43.468500,0,1,1,0,1,0,0,0,3,Bombing/Explosion,4,Military,27.0,Military Barracks/Base/Headquarters/Checkpost,,Badger Army ammo depot.,217.0,United States,"Weather Underground, Weathermen",,0.0,0,,,,6,Explosives/Bombs/Dynamite,16.0,Unknown Explosive Type,Explosive,0.0,,,0.0,,,0,3.0,Minor (likely < $1 million),0.0,0.0,0.0
5,1970,1,6,0,Colorado,Denver,39.740010,0,1,1,1,1,0,1,0,7,Facility/Infrastructure Attack,4,Military,28.0,Military Recruiting Station/Academy,Army Recruiting Station,"Army Recruiting Station, Denver Colorado",217.0,United States,Left-Wing Militants,Protest the draft and Vietnam War,0.0,0,-99.0,-99.0,0.0,8,Incendiary,19.0,Molotov Cocktail/Petrol Bomb,Molotov cocktail,0.0,0.0,0.0,0.0,0.0,0.0,1,3.0,Minor (likely < $1 million),305.0,0.0,0.0
6,1970,1,9,0,Michigan,Detroit,42.331685,0,1,1,1,0,0,1,0,7,Facility/Infrastructure Attack,2,Government (General),21.0,Government Building/Facility/Office,U.S. Government housing,Packard Properties building of Detroit Michigan,217.0,United States,Left-Wing Militants,,0.0,0,-99.0,-99.0,0.0,8,Incendiary,19.0,Molotov Cocktail/Petrol Bomb,Firebomb,0.0,0.0,0.0,0.0,0.0,0.0,1,3.0,Minor (likely < $1 million),,0.0,0.0
7,1970,1,9,0,Puerto Rico,Rio Piedras,18.399712,0,1,1,1,1,0,1,0,7,Facility/Infrastructure Attack,1,Business,7.0,Retail/Grocery/Bakery,American owned business in Puerto Rico,Baker's Store,217.0,United States,Armed Commandos of Liberation,To protest United States owned businesses in P...,1.0,0,-99.0,-99.0,1.0,8,Incendiary,18.0,Arson/Fire,Fire set in back of store,0.0,0.0,0.0,0.0,0.0,0.0,1,2.0,Major (likely > $1 million but < $1 billion),2000000.0,0.0,0.0
8,1970,1,12,0,New York,New York City,40.610069,0,1,1,1,0,0,1,0,3,Bombing/Explosion,8,Educational Institution,49.0,School/University/Educational Building,High School,James Madison High School,217.0,United States,Black Nationalists,Suspected motives were to protest the Vietnam ...,0.0,0,-99.0,-99.0,0.0,6,Explosives/Bombs/Dynamite,11.0,"Projectile (rockets, mortars, RPGs, etc.)",Crudely made pipe bomb. Five inches long and ...,0.0,0.0,0.0,0.0,0.0,0.0,1,3.0,Minor (likely < $1 million),,0.0,0.0
9,1970,1,12,0,Puerto Rico,Rio Grande,18.379998,0,1,1,1,0,0,1,0,3,Bombing/Explosion,1,Business,4.0,Multinational Corporation,General Electric,General Electric factory in Rio Grande Puerto ...,217.0,United States,Strikers,,0.0,0,-99.0,-99.0,0.0,6,Explosives/Bombs/Dynamite,16.0,Unknown Explosive Type,Bomb,0.0,0.0,0.0,0.0,0.0,0.0,-9,4.0,Unknown,,0.0,0.0
