## Prepare Data

1. **[Hypoteses](#hyp)**
2. **[Metrics](#met)**
3. **[Importing libraries](#Importinglibraries)**
4. **[Pre-Processing](#Prepro)**
    1. [Intersection of the countries](#Intersectionofthecountries)
    2. [Renamed countries](#renco)
    3. [Data Visualization](#DataVisualization)
5. **[Global Terrorism DataBase](#GlobalTerrorismDatabase)**
    1. [Ransom](#Ransom)
    2. [Nkill & Nwound](#=NKILL&NWOUND)
    3. [Property](#prop)
    4. [Individual](#individual)
6. **[Indicators](#Indicators)**
    1. [GDP](#GDP)
    2. [Urban Agglomeration](#UrbanAgglomeration)
7. **[Join](#Join)**
    1. [Final Dataset](#FinalDataset)

<a id="Importinglibraries"></a>
### Importing libraries

In [1]:
import pandas as pd 
import matplotlib.pyplot as plt
import numpy as np
import gzip
from functools import reduce
import seaborn as sns
import os
import conda

from mpl_toolkits.basemap import Basemap

<a id="Preproc"></a>
### Pre-Processing

let's load the two datasets:     

- globalterrorismdb_0718dist.csv
- Indicators.csv

In [2]:
gtds = pd.read_csv("../PROJECT/Datasets/globalterrorismdb_0718dist.csv", encoding="ISO-8859-1")
indicators = pd.read_csv("../PROJECT/Datasets/Indicators.csv") 

  interactivity=interactivity, compiler=compiler, result=result)


<a id="Intersectionofthecountries"></a>
**Intersection of the countries**       

We'll check if the two dataset have different or missing countries.

At the beginning shows the country that have correspondencies between the two dataset and below the exluced ones.

In [3]:
states = reduce(np.intersect1d, ((indicators.CountryName.values), (gtds.country_txt.values)))
print(states.shape[0])
states

161


array(['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola',
       'Antigua and Barbuda', 'Argentina', 'Armenia', 'Australia',
       'Austria', 'Azerbaijan', 'Bahrain', 'Bangladesh', 'Barbados',
       'Belarus', 'Belgium', 'Belize', 'Benin', 'Bhutan', 'Bolivia',
       'Botswana', 'Brazil', 'Bulgaria', 'Burkina Faso', 'Burundi',
       'Cambodia', 'Cameroon', 'Canada', 'Central African Republic',
       'Chad', 'Chile', 'China', 'Colombia', 'Comoros', 'Costa Rica',
       'Croatia', 'Cuba', 'Cyprus', 'Czech Republic', 'Denmark',
       'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador',
       'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia',
       'Ethiopia', 'Fiji', 'Finland', 'France', 'French Polynesia',
       'Gabon', 'Georgia', 'Germany', 'Ghana', 'Greece', 'Grenada',
       'Guatemala', 'Guinea', 'Guinea-Bissau', 'Guyana', 'Haiti',
       'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iraq',
       'Ireland', 'Israel', 'Italy', 'Jamaica', 'Japan',

Let's have 161 values in common

In [4]:
def excluded(lst1, lst2): 
    lst3 = [value for value in lst1 if value not in lst2] 
    return lst3 


lst1 = gtds.country_txt.unique()
lst2 = states
excl = excluded(lst1, lst2)
excl

['East Germany (GDR)',
 'Venezuela',
 'West Germany (FRG)',
 'Egypt',
 'Iran',
 'South Yemen',
 'Taiwan',
 'West Bank and Gaza Strip',
 'Czechoslovakia',
 'South Vietnam',
 'Brunei',
 'Zaire',
 "People's Republic of the Congo",
 'Yugoslavia',
 'North Yemen',
 'Syria',
 'South Korea',
 'Bahamas',
 'Rhodesia',
 'Soviet Union',
 'Western Sahara',
 'Hong Kong',
 'New Hebrides',
 'Guadeloupe',
 'Martinique',
 'Vatican City',
 'French Guiana',
 'Falkland Islands',
 'Laos',
 'Republic of the Congo',
 'Yemen',
 'Russia',
 'Ivory Coast',
 'Bosnia-Herzegovina',
 'Macedonia',
 'Wallis and Futuna',
 'Gambia',
 'North Korea',
 'Macau',
 'Kyrgyzstan',
 'Democratic Republic of the Congo',
 'East Timor',
 'International',
 'Serbia-Montenegro']

Let's have 44 different values

In [5]:
len(excl)

44

<a id="renco"></a>
**Renamed Countries**

The list above it's the list containing the countries that doesn't match. Now we procede to rename all the countries that doesn't match with the right name and we discard the others.

These are the rename ones:


In [6]:
gtds = gtds.replace(to_replace ='East Germany (GDR)', value ='Germany') 
gtds = gtds.replace(to_replace ='West Germany (FRG)', value ='Germany') 
indicators = indicators.replace(to_replace ='Venezuela, RB', value ='Venezuela') 
indicators = indicators.replace(to_replace ='Egypt, Arab Rep.', value ='Egypt')
indicators = indicators.replace(to_replace ='Iran, Islamic Rep.', value ='Iran')
indicators = indicators.replace(to_replace ='Yemen, Rep.', value ='Yemen')
gtds = gtds.replace(to_replace ='South Yemen', value ='Yemen') 
gtds = gtds.replace(to_replace ='North Yemen', value ='Yemen') 
gtds = gtds.replace(to_replace ='West Bank and Gaza Strip', value ='West Bank and Gaza')
gtds = gtds.replace(to_replace ='South Vietnam', value ='Vietnam')
indicators = indicators.replace(to_replace ='Brunei Darussalam', value ='Brunei')
indicators = indicators.replace(to_replace ='Syrian Arab Republic', value ='Syria')
indicators = indicators.replace(to_replace ='Korea, Rep.', value ='South Korea')
indicators = indicators.replace(to_replace ='Korea, Dem. Rep.', value ='North Korea')
indicators = indicators.replace(to_replace ='Bahamas, The', value ='Bahamas')
#indicators = indicators.replace(to_replace ='Sub-Saharan Africa (developing only)', value ='Sahara')
indicators = indicators.replace(to_replace ='Sub-Saharan Africa (all income levels)', value ='Sahara')
gtds = gtds.replace(to_replace ='Western Sahara', value ='Sahara') 
indicators = indicators.replace(to_replace ='Hong Kong SAR, China', value ='Hong Kong')
indicators = indicators.replace(to_replace ='Lao PDR', value ='Laos')
indicators = indicators.replace(to_replace ='Congo, Rep.', value ='Republic of the Congo')
indicators = indicators.replace(to_replace ='Congo, Dem. Rep.', value ='Democratic Republic of the Congo')
indicators = indicators.replace(to_replace ='Russian Federation', value ='Russia')
indicators = indicators.replace(to_replace ='Bosnia and Herzegovina', value ='Bosnia-Herzegovina')
indicators = indicators.replace(to_replace ='Macedonia, FYR', value ='Macedonia')
indicators = indicators.replace(to_replace ='Gambia, The', value ='Gambia')
indicators = indicators.replace(to_replace ='Timor-Leste', value ='East Timor')
indicators = indicators.replace(to_replace ='Macao SAR, China', value ='Macau')
indicators = indicators.replace(to_replace ='Kyrgyz Republic', value ='Kyrgyzstan')

These are the discarded ones. The values discarded are 17

In [7]:
states1 = reduce(np.intersect1d, ((indicators.CountryName.values), (gtds.country_txt.values)))
lst1 = gtds.country_txt.unique()
lst2 = states1
ex = excluded(lst1, lst2)
ex

['Taiwan',
 'Czechoslovakia',
 'Zaire',
 "People's Republic of the Congo",
 'Yugoslavia',
 'Rhodesia',
 'Soviet Union',
 'New Hebrides',
 'Guadeloupe',
 'Martinique',
 'Vatican City',
 'French Guiana',
 'Falkland Islands',
 'Ivory Coast',
 'Wallis and Futuna',
 'International',
 'Serbia-Montenegro']

In [8]:
len(ex)

17

In [9]:
gtds = gtds[gtds.country_txt != 'Taiwan']
gtds = gtds[gtds.country_txt != 'Czechoslovakia']
gtds = gtds[gtds.country_txt != "People's Republic of the Congo"]
gtds = gtds[gtds.country_txt != 'Yugoslavia']
gtds = gtds[gtds.country_txt != 'Rhodesia']
gtds = gtds[gtds.country_txt != 'Soviet Union']
gtds = gtds[gtds.country_txt != 'New Hebrides']
gtds = gtds[gtds.country_txt != 'Guadeloupe']
gtds = gtds[gtds.country_txt != 'Martinique']
gtds = gtds[gtds.country_txt != 'Vatican City']
gtds = gtds[gtds.country_txt != 'French Guiana']
gtds = gtds[gtds.country_txt != 'Falkland Islands']
gtds = gtds[gtds.country_txt != 'Wallis and Futuna']
gtds = gtds[gtds.country_txt != 'International']
gtds = gtds[gtds.country_txt != 'Serbia-Montenegro']
gtds = gtds[gtds.country_txt != 'Zaire']
gtds = gtds[gtds.country_txt != 'Ivory Coast']

Print the original shape and the new one. We apply this filter because we're interested only in the attack against 'Private Citizens and Property' (targtype1 = 14)

In [10]:
print(gtds.shape)
gtds_fil = gtds.loc[gtds["targtype1"]==14]
gtds_fil.shape

(181048, 135)


(43402, 135)

Keep only the required columns and visualize some records

In [11]:
ds1 = gtds_fil[["region","region_txt", "country_txt", "country","iyear","individual","nkill","nwound","ransom","property","weaptype1"]]
ds1.head()

Unnamed: 0,region,region_txt,country_txt,country,iyear,individual,nkill,nwound,ransom,property,weaptype1
0,2,Central America & Caribbean,Dominican Republic,58,1970,0,1.0,0.0,0.0,0,13
28,1,North America,United States,217,1970,0,0.0,0.0,0.0,1,6
29,1,North America,United States,217,1970,0,0.0,0.0,0.0,1,8
46,1,North America,United States,217,1970,0,0.0,0.0,0.0,1,6
58,1,North America,United States,217,1970,0,0.0,0.0,0.0,1,6


With the other dataset we keep only the GDP 

In [12]:
filters = ((indicators["IndicatorName"] ==  "GDP per capita (current US$)"))
indicators1 = indicators[filters]
indicators1.head()

Unnamed: 0,CountryName,CountryCode,IndicatorName,IndicatorCode,Year,Value
95,Caribbean small states,CSS,GDP per capita (current US$),NY.GDP.PCAP.CD,1960,457.464712
264,East Asia & Pacific (all income levels),EAS,GDP per capita (current US$),NY.GDP.PCAP.CD,1960,146.814138
377,East Asia & Pacific (developing only),EAP,GDP per capita (current US$),NY.GDP.PCAP.CD,1960,89.319639
518,Euro area,EMU,GDP per capita (current US$),NY.GDP.PCAP.CD,1960,924.571393
624,Europe & Central Asia (all income levels),ECS,GDP per capita (current US$),NY.GDP.PCAP.CD,1960,648.223441


<a id="GlobalTerrorismDatabase"></a>
### Global Terrorism Database

Now we prepare the GTDB. We take sub-tables of the dataset to apply the pivoting to each single table


**COUNT**

First of all, we compute the count the unique coutries.

In [13]:
total_country = ds1.groupby(['country_txt'])['country_txt'].nunique()
total_country = pd.DataFrame(total_country)
total_country

Unnamed: 0_level_0,country_txt
country_txt,Unnamed: 1_level_1
Afghanistan,1
Albania,1
Algeria,1
Angola,1
Argentina,1
...,...
Venezuela,1
West Bank and Gaza,1
Yemen,1
Zambia,1


Now we calculate the sum of events of each year for each country.

In [14]:
total_count = ds1.groupby(['iyear','country_txt'])['country'].value_counts()
total_count = pd.DataFrame(total_count)
total_count.tail(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,country
iyear,country_txt,country,Unnamed: 3_level_1
2017,Somalia,182,101
2017,South Africa,183,9
2017,South Sudan,1004,11
2017,Spain,185,3
2017,Sri Lanka,186,18
2017,Sudan,195,82
2017,Sweden,198,13
2017,Syria,200,86
2017,Tanzania,203,1
2017,Thailand,205,17


<a id="Ransom"></a>
**RANSOM** 

- 1 -> YES (The incident involved a demand of monetary ransom)
- 0 -> NO (The incident did not involve a demand of monetary ransom)
- -9 -> UNKNOWN (It is unknown if the incident involved a demand of monetary ransom)

**NaN replacement**

In [15]:
ds1.ransom.isna().sum()

27126

In [16]:
ds1.ransom.value_counts()

 0.0    15292
-9.0      646
 1.0      338
Name: ransom, dtype: int64

In [17]:
ds1.ransom.count()

16276

In [18]:
ds1=ds1.sample(frac = 1,random_state=1234)

In [19]:
ds1['ransom']=ds1.ransom.fillna(value=0, limit=25526)
ds1['ransom'].isna().sum()

1600

In [20]:
ds1['ransom']=ds1.ransom.fillna(value=1, limit=815)
ds1['ransom'].isna().sum()

785

In [21]:
ds1['ransom']=ds1.ransom.fillna(value=-9, limit=785)
ds1['ransom'].isna().sum()

0

In [22]:
ds1.isna().sum()

region            0
region_txt        0
country_txt       0
country           0
iyear             0
individual        0
nkill          2426
nwound         4756
ransom            0
property          0
weaptype1         0
dtype: int64

In [23]:
ds2 = ds1[['country_txt','country','iyear','ransom']]
ransom_count = ds2.groupby(['iyear','country_txt','country'])['ransom'].agg('count')
ransom_count = pd.DataFrame(ransom_count)
ransom_count = ransom_count.rename(columns={'ransom':'ransom_count'})
print(ransom_count.shape)
ransom_count.tail(10)

(2070, 1)


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,ransom_count
iyear,country_txt,country,Unnamed: 3_level_1
2017,Tunisia,208,1
2017,Turkey,209,22
2017,Uganda,213,2
2017,Ukraine,214,23
2017,United Kingdom,603,69
2017,United States,217,25
2017,Venezuela,222,4
2017,West Bank and Gaza,155,20
2017,Yemen,228,49
2017,Zambia,230,1


In [24]:
ds2 = ds2.groupby(['iyear','country_txt','country'])['ransom'].value_counts()
ds2 = pd.DataFrame(ds2)
ds2 = ds2.rename(columns={'ransom':'counts'})
print(ds2.shape)
ds2.tail()

(2701, 1)


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,counts
iyear,country_txt,country,ransom,Unnamed: 4_level_1
2017,West Bank and Gaza,155,-9.0,1
2017,West Bank and Gaza,155,1.0,1
2017,Yemen,228,0.0,48
2017,Yemen,228,-9.0,1
2017,Zambia,230,0.0,1


In [25]:
ds2 = ds2.pivot_table(index=['iyear', 'country_txt','country'],columns='ransom',values='counts',fill_value=0)
ds2 = ds2.rename(columns={-9.0:'ransom_Unkn',0.0:'ransom_No',1.0:'ransom_Ok'})
print(ds2.shape)
ds2.head(10)

(2070, 3)


Unnamed: 0_level_0,Unnamed: 1_level_0,ransom,ransom_Unkn,ransom_No,ransom_Ok
iyear,country_txt,country,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1970,Argentina,11,0,4,0
1970,Dominican Republic,58,0,1,0
1970,Germany,362,0,1,0
1970,Germany,499,0,1,0
1970,United Kingdom,603,0,6,0
1970,United States,217,0,44,0
1971,Costa Rica,49,0,1,0
1971,Ireland,96,0,1,0
1971,Spain,185,0,1,0
1971,Turkey,209,0,3,0


<a id=NKILL&NWOUND></a>
**Nkill & Nwound**

- NKILL: This field stores the number of total confirmed fatalities for the incident. The number includes all victims and attackers who died as a direct result of the incident
- NWOUND: This field records the number of confirmed non-fatal injuries to both perpetrators and victims.

In [28]:
ds3 = pd.DataFrame(ds1.groupby(['iyear','country_txt','country'])['nkill'].agg('sum'))
ds4 = pd.DataFrame(ds1.groupby(['iyear','country_txt','country'])['nwound'].agg('sum'))
print(ds3.head())
print(ds4.head())

                                  nkill
iyear country_txt        country       
1970  Argentina          11         1.0
      Dominican Republic 58         1.0
      Germany            362        7.0
                         499        0.0
      United Kingdom     603        9.0
                                  nwound
iyear country_txt        country        
1970  Argentina          11          0.0
      Dominican Republic 58          0.0
      Germany            362         9.0
                         499         0.0
      United Kingdom     603         0.0


<a id=prop></a>
**Property**

“Yes” appears if there is evidence of property damage from the incident.

- 1 -> YES (The incident resulted in property damage)
- 0 -> NO (The incident did not result in property damage)
- -9 -> UNKNOWN (It is unknown if the incident resulted in property damage)

In [29]:
ransom_count = ds1.groupby(['iyear','country_txt','country'])['property'].agg('count')
ransom_count = pd.DataFrame(ransom_count)
ransom_count = ransom_count.rename(columns={'ransom':'ransom_count'})
print(ransom_count.shape)
ransom_count.tail(10)

(2070, 1)


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,property
iyear,country_txt,country,Unnamed: 3_level_1
2017,Tunisia,208,1
2017,Turkey,209,22
2017,Uganda,213,2
2017,Ukraine,214,23
2017,United Kingdom,603,69
2017,United States,217,25
2017,Venezuela,222,4
2017,West Bank and Gaza,155,20
2017,Yemen,228,49
2017,Zambia,230,1


In [30]:
ds5 = ds1.groupby(['iyear','country_txt','country'])['property'].value_counts()
ds5 = pd.DataFrame(ds5)
ds5 = ds5.rename(columns={'property':'counts'})
print(ds5.shape)
ds5.tail(10)

(3586, 1)


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,counts
iyear,country_txt,country,property,Unnamed: 4_level_1
2017,United States,217,0,12
2017,Venezuela,222,0,3
2017,Venezuela,222,1,1
2017,West Bank and Gaza,155,0,8
2017,West Bank and Gaza,155,1,8
2017,West Bank and Gaza,155,-9,4
2017,Yemen,228,0,22
2017,Yemen,228,1,16
2017,Yemen,228,-9,11
2017,Zambia,230,1,1


In [31]:
ds5 = ds5.pivot_table(index=['iyear', 'country_txt','country'],columns='property',values='counts',fill_value=0)
ds5 = ds5.rename(columns={-9.0:'property_Unkn',0.0:'property_No',1.0:'property_Ok'})
print(ds5.shape)
ds5.head(10)

(2070, 3)


Unnamed: 0_level_0,Unnamed: 1_level_0,property,property_Unkn,property_No,property_Ok
iyear,country_txt,country,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1970,Argentina,11,0,1,3
1970,Dominican Republic,58,0,1,0
1970,Germany,362,0,0,1
1970,Germany,499,0,0,1
1970,United Kingdom,603,0,6,0
1970,United States,217,4,10,30
1971,Costa Rica,49,0,1,0
1971,Ireland,96,0,1,0
1971,Spain,185,0,0,1
1971,Turkey,209,0,0,3


<a id=individual></a>
**Individual**

- 1 -> YES ( The perpetrator(s) were identified by name (or specific unnamed minors) and not known to be affiliated with a group or organization.)
- 0 -> NO (The perpetrator(s) were not identified as unaffiliated individuals (i.e. the perpetrators were either not identified by name, or were known to be affiliated with a group or organization).)

In [32]:
ds6 = ds1[['country_txt','country','iyear','individual']]
ransom_count = ds6.groupby(['iyear','country_txt','country'])['individual'].agg('count')
ransom_count = pd.DataFrame(ransom_count)
ransom_count = ransom_count.rename(columns={'individual':'counts'})
print(ransom_count.shape)
ransom_count.tail(10)

(2070, 1)


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,counts
iyear,country_txt,country,Unnamed: 3_level_1
2017,Tunisia,208,1
2017,Turkey,209,22
2017,Uganda,213,2
2017,Ukraine,214,23
2017,United Kingdom,603,69
2017,United States,217,25
2017,Venezuela,222,4
2017,West Bank and Gaza,155,20
2017,Yemen,228,49
2017,Zambia,230,1


In [33]:
ds6 = ds6.groupby(['iyear','country_txt','country'])['individual'].value_counts()
ds6 = pd.DataFrame(ds6)
ds6 = ds6.rename(columns={'individual':'counts'})
print(ds6.shape)
ds6.tail()

(2113, 1)


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,counts
iyear,country_txt,country,individual,Unnamed: 4_level_1
2017,United States,217,0,7
2017,Venezuela,222,0,4
2017,West Bank and Gaza,155,0,20
2017,Yemen,228,0,49
2017,Zambia,230,0,1


In [34]:
ds6 = ds6.pivot_table(index=['iyear', 'country_txt','country'],columns='individual',values='counts',fill_value=0)
ds6 = ds6.rename(columns={0.0:'individual_No',1.0:'individual_Ok'})
print(ds6.shape)
ds6.head(10)

(2070, 2)


Unnamed: 0_level_0,Unnamed: 1_level_0,individual,individual_No,individual_Ok
iyear,country_txt,country,Unnamed: 3_level_1,Unnamed: 4_level_1
1970,Argentina,11,4,0
1970,Dominican Republic,58,1,0
1970,Germany,362,1,0
1970,Germany,499,1,0
1970,United Kingdom,603,6,0
1970,United States,217,44,0
1971,Costa Rica,49,1,0
1971,Ireland,96,1,0
1971,Spain,185,1,0
1971,Turkey,209,3,0


<a id=weaptype></a>
**Weapon Type**

Information on up to four types and sub-types of the weapons used in an attack are recorded for each case.

We consider only:

- Melee ( projectile in which the user and target are in contact with it simultaneously.)
- Firearms (A weapon which is capable of firing a projectile using an explosive charge as a propellant.) 

In [35]:
ds7 = ds1[['country_txt','country','iyear',"weaptype1"]]

In [36]:
count = ds7.groupby(['iyear','country_txt','country'])['weaptype1'].agg("count")
count = pd.DataFrame(count)
count

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,weaptype1
iyear,country_txt,country,Unnamed: 3_level_1
1970,Argentina,11,4
1970,Dominican Republic,58,1
1970,Germany,362,1
1970,Germany,499,1
1970,United Kingdom,603,6
...,...,...,...
2017,United States,217,25
2017,Venezuela,222,4
2017,West Bank and Gaza,155,20
2017,Yemen,228,49


In [37]:
ds7 = ds7.groupby(['iyear','country_txt','country'])['weaptype1'].value_counts()
ds7 = pd.DataFrame(ds7)
ds7 = ds7.rename(columns={'weaptype1':'counts'})
ds7 = ds7.pivot_table(index=['iyear', 'country_txt','country'],columns='weaptype1',values='counts',fill_value=0)
ds7

Unnamed: 0_level_0,Unnamed: 1_level_0,weaptype1,1,2,5,6,7,8,9,10,11,12,13
iyear,country_txt,country,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1970,Argentina,11,0,0,3,0,0,0,0,0,0,0,1
1970,Dominican Republic,58,0,0,0,0,0,0,0,0,0,0,1
1970,Germany,362,0,0,0,0,0,1,0,0,0,0,0
1970,Germany,499,0,0,0,1,0,0,0,0,0,0,0
1970,United Kingdom,603,0,0,6,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2017,United States,217,0,0,9,1,0,8,6,1,0,0,0
2017,Venezuela,222,0,0,2,1,0,0,0,0,0,0,1
2017,West Bank and Gaza,155,0,0,4,1,0,8,6,1,0,0,0
2017,Yemen,228,0,0,10,32,0,0,0,0,0,0,7


In [38]:
ds7 = ds7.drop(columns=[1,2,7,8,10,11,12,13])
ds7 = ds7.rename(columns={9:'Melee', 5:'Firearms', 6:'Explosives'})
ds7

Unnamed: 0_level_0,Unnamed: 1_level_0,weaptype1,Firearms,Explosives,Melee
iyear,country_txt,country,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1970,Argentina,11,3,0,0
1970,Dominican Republic,58,0,0,0
1970,Germany,362,0,0,0
1970,Germany,499,0,1,0
1970,United Kingdom,603,6,0,0
...,...,...,...,...,...
2017,United States,217,9,1,6
2017,Venezuela,222,2,1,0
2017,West Bank and Gaza,155,4,1,6
2017,Yemen,228,10,32,0


<a id='Indicators'></a>
### Indicators

<a id="GDP"></a>
**GDP**

Per capita gross domestic product (GDP) is a metric that breaks down a country's economic output per person and is calculated by dividing the GDP of a country by its population.

In [39]:
indicators1 = indicators1[["CountryName", "IndicatorName", "Year", "Value"]]
indicators1 = indicators1.rename(columns={"CountryName":"country_txt"})
indicators1 = indicators1.rename(columns={"Year":"iyear"})
print(indicators1.shape)
indicators1.head(50)

(10343, 4)


Unnamed: 0,country_txt,IndicatorName,iyear,Value
95,Caribbean small states,GDP per capita (current US$),1960,457.464712
264,East Asia & Pacific (all income levels),GDP per capita (current US$),1960,146.814138
377,East Asia & Pacific (developing only),GDP per capita (current US$),1960,89.319639
518,Euro area,GDP per capita (current US$),1960,924.571393
624,Europe & Central Asia (all income levels),GDP per capita (current US$),1960,648.223441
836,European Union,GDP per capita (current US$),1960,876.477907
1004,Heavily indebted poor countries (HIPC),GDP per capita (current US$),1960,107.416655
1127,High income,GDP per capita (current US$),1960,1240.385955
1327,High income: OECD,GDP per capita (current US$),1960,1448.142479
1437,Latin America & Caribbean (all income levels),GDP per capita (current US$),1960,367.811643


In [40]:
indicators1 = indicators1.pivot_table(index=['iyear', 'country_txt'], columns='IndicatorName')
indicators1.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Value
Unnamed: 0_level_1,IndicatorName,GDP per capita (current US$)
iyear,country_txt,Unnamed: 2_level_2
1960,Afghanistan,59.787681
1960,Algeria,244.823735
1960,Australia,1806.804876
1960,Austria,935.460427
1960,Bahamas,1550.337434
1960,Bangladesh,88.689453
1960,Belgium,1273.691659
1960,Belize,304.910281
1960,Benin,93.022585
1960,Bermuda,1902.402119


In [41]:
#indicators1 = indicators1.reset_index()
#indicators1[indicators1.country_txt == 'Germany']

Let's obtain a GDP value for every **Country** and **Year**

In [42]:
#indicators1 = indicators1.fillna(0)
indicators1['GDP per capita (current US$)'] = (indicators1.take([0], axis=1))
indicators1.columns = indicators1.columns.droplevel(1)
indicators2 = indicators1.drop(columns="Value")
indicators2 = indicators2.reset_index()
indicators2

Unnamed: 0,iyear,country_txt,GDP per capita (current US$)
0,1960,Afghanistan,59.787681
1,1960,Algeria,244.823735
2,1960,Australia,1806.804876
3,1960,Austria,935.460427
4,1960,Bahamas,1550.337434
...,...,...,...
10338,2014,Vietnam,2052.294202
10339,2014,West Bank and Gaza,2965.903675
10340,2014,World,10721.417039
10341,2014,Zambia,1721.623274


Here we compute the mean of the GDP for each country, for each year starting from 1970.

In [43]:
#dataset_sum = indicators2.join(gtds1.set_index('country_txt'), on ='country_txt')
#dataset1_sum = dataset_sum .drop_duplicates()
indicators3 = indicators2.loc[indicators2['iyear'] >= 1970]
indicators3 = indicators3.groupby(['iyear','country_txt']).agg('mean')
indicators3

Unnamed: 0_level_0,Unnamed: 1_level_0,GDP per capita (current US$)
iyear,country_txt,Unnamed: 2_level_1
1970,Afghanistan,157.258461
1970,Algeria,334.259555
1970,Andorra,3238.091462
1970,Arab World,254.249138
1970,Argentina,1317.487535
...,...,...
2014,Vietnam,2052.294202
2014,West Bank and Gaza,2965.903675
2014,World,10721.417039
2014,Zambia,1721.623274


Since Indicators.csv present values up to 2014 but at the same time the range of years between 2014-2017 are very important in Global Terrorism database because there is a huge amount of events, we wanted to aviod to loose this amount of data. For this reason, we decided to compute a mean of the GDP values from 2007 to 2014 for each country, and put these means in the range of years between 2014-2017. 

In [44]:
indicators4 = indicators3.reset_index()
indicators4 = indicators4.loc[indicators4['iyear'] >= 2007]
indicators4.shape

(1811, 3)

We compute the mean of the years 2007-2014

In [45]:
indicators5 = indicators4.groupby(['country_txt'])[['GDP per capita (current US$)']].agg('mean')
indicators5

Unnamed: 0_level_0,GDP per capita (current US$)
country_txt,Unnamed: 1_level_1
Afghanistan,550.876966
Albania,4230.374340
Algeria,4900.977722
Andorra,42919.043839
Angola,4549.188850
...,...
West Bank and Gaza,2392.371081
World,9791.766978
Yemen,1296.057492
Zambia,1485.258349


In [46]:
indicators3 = indicators3.reset_index()
test = indicators3[indicators3['country_txt'] == 'Zambia']
test = test[test['iyear'] <= 1984]
test

Unnamed: 0,iyear,country_txt,GDP per capita (current US$)
148,1970,Zambia,427.38775
301,1971,Zambia,381.726755
454,1972,Zambia,417.564185
607,1973,Zambia,524.134969
760,1974,Zambia,605.091713
916,1975,Zambia,490.19944
1073,1976,Zambia,531.50316
1233,1977,Zambia,470.71646
1392,1978,Zambia,508.108213
1553,1979,Zambia,585.491104


In addition, there are null values (111) in the GDP column. So we apply the same approach as before considering the mean of the GDP computed in two different ranges of years in order to be more precise with the values that we add (1970-1992 / 1993-2014)

In [47]:
#indicators3 = indicators3.reset_index()
indicators_year1 = indicators3[indicators3['iyear'] <= 1992]
indicators_year1 = indicators_year1.groupby(['country_txt'])[['GDP per capita (current US$)']].agg('mean')
indicators_year1 = indicators_year1.reset_index()
#indicators_year1[indicators_year1['country_txt'] == 'Namibia']

In [48]:
indicators_year2 = indicators3[(indicators3['iyear'] > 1992) & (indicators3['iyear'] <= 2014)]
indicators_year2 = indicators_year2.groupby(['country_txt'])[['GDP per capita (current US$)']].agg('mean')
indicators_year2 = indicators_year2.reset_index()
indicators_year2[indicators_year2['country_txt'] == 'West Bank and Gaza']

Unnamed: 0,country_txt,GDP per capita (current US$)
234,West Bank and Gaza,1752.297159


In [49]:
#indicators_year3 = indicators3[(indicators3['iyear'] > 1999) & (indicators3['iyear'] <= 2014)]
#indicators_year3 = indicators_year3.groupby(['country_txt'])[['GDP per capita (current US$)']].agg('mean')
#indicators_year3 = indicators_year3.reset_index()
#indicators_year3

<a id="Join"></a>
### Join

Procede to join all peaces of the worked dataset

In [50]:
indicators3 = indicators3.groupby(['iyear','country_txt']).agg('mean')
finale = ds3.join(ds2).join(ds4).join(ds5).join(ds6).join(ds7).join(indicators3)
finale

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,nkill,ransom_Unkn,ransom_No,ransom_Ok,nwound,property_Unkn,property_No,property_Ok,individual_No,individual_Ok,Firearms,Explosives,Melee,GDP per capita (current US$)
iyear,country_txt,country,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
1970,Argentina,11,1.0,0,4,0,0.0,0,1,3,4,0,3,0,0,1317.487535
1970,Dominican Republic,58,1.0,0,1,0,0.0,0,1,0,1,0,0,0,0,329.860648
1970,Germany,362,7.0,0,1,0,9.0,0,0,1,1,0,0,0,0,2750.719742
1970,Germany,499,0.0,0,1,0,0.0,0,0,1,1,0,0,1,0,2750.719742
1970,United Kingdom,603,9.0,0,6,0,0.0,0,6,0,6,0,6,0,0,2347.544318
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2017,United States,217,13.0,1,22,2,30.0,0,12,13,7,18,9,1,6,
2017,Venezuela,222,3.0,0,4,0,0.0,0,3,1,4,0,2,1,0,
2017,West Bank and Gaza,155,11.0,1,18,1,15.0,4,8,8,20,0,4,1,6,
2017,Yemen,228,85.0,1,48,0,121.0,11,22,16,49,0,10,32,0,


We normalize all the variables by the total number of events for each country

In [51]:
finale["ransom_Unkn"] = finale["ransom_Unkn"]/total_count["country"]
finale["ransom_No"] = finale["ransom_No"]/total_count["country"]
finale["ransom_Ok"] = finale["ransom_Ok"]/total_count["country"]
finale["nwound"] = finale["nwound"]/total_count["country"]
finale["property_Unkn"] = finale["property_Unkn"]/total_count["country"]
finale["property_Ok"] = finale["property_Ok"]/total_count["country"]
finale["property_No"] = finale["property_No"]/total_count["country"]
finale["individual_Ok"] = finale["individual_Ok"]/total_count["country"]
finale["individual_No"] = finale["individual_No"]/total_count["country"]
finale["Firearms"] = finale["Firearms"]/total_count["country"]
finale["Melee"] = finale["Melee"]/total_count["country"]
finale["Explosives"] = finale["Explosives"]/total_count["country"]

Let's drop the columns that don't interest us

In [52]:
finale = finale.drop(columns=["ransom_Ok", "ransom_Unkn", "property_No", "property_Unkn", "individual_Ok"])
#finale = finale.rename(columns={"weaptype1":"cnt"})
finale = finale.reset_index()

In [53]:
finale

Unnamed: 0,iyear,country_txt,country,nkill,ransom_No,nwound,property_Ok,individual_No,Firearms,Explosives,Melee,GDP per capita (current US$)
0,1970,Argentina,11,1.0,1.000000,0.000000,0.750000,1.00,0.750000,0.000000,0.00,1317.487535
1,1970,Dominican Republic,58,1.0,1.000000,0.000000,0.000000,1.00,0.000000,0.000000,0.00,329.860648
2,1970,Germany,362,7.0,1.000000,9.000000,1.000000,1.00,0.000000,0.000000,0.00,2750.719742
3,1970,Germany,499,0.0,1.000000,0.000000,1.000000,1.00,0.000000,1.000000,0.00,2750.719742
4,1970,United Kingdom,603,9.0,1.000000,0.000000,0.000000,1.00,1.000000,0.000000,0.00,2347.544318
...,...,...,...,...,...,...,...,...,...,...,...,...
2065,2017,United States,217,13.0,0.880000,1.200000,0.520000,0.28,0.360000,0.040000,0.24,
2066,2017,Venezuela,222,3.0,1.000000,0.000000,0.250000,1.00,0.500000,0.250000,0.00,
2067,2017,West Bank and Gaza,155,11.0,0.900000,0.750000,0.400000,1.00,0.200000,0.050000,0.30,
2068,2017,Yemen,228,85.0,0.979592,2.469388,0.326531,1.00,0.204082,0.653061,0.00,


**NaN**

- 69 Ranson_No
- 34 GDP per capita.

In [54]:
finale.isna().sum()

iyear                             0
country_txt                       0
country                           0
nkill                             0
ransom_No                         0
nwound                            0
property_Ok                       0
individual_No                     0
Firearms                          0
Explosives                        0
Melee                             0
GDP per capita (current US$)    323
dtype: int64

In [55]:
add = finale.loc[finale["iyear"]>2014]
add = add.drop(columns='GDP per capita (current US$)')
add = add.join(indicators5, on='country_txt', how='left')
add.head()

Unnamed: 0,iyear,country_txt,country,nkill,ransom_No,nwound,property_Ok,individual_No,Firearms,Explosives,Melee,GDP per capita (current US$)
1858,2015,Afghanistan,4,1189.0,0.930667,4.234667,0.301333,1.0,0.338667,0.477333,0.016,550.876966
1859,2015,Algeria,6,2.0,1.0,1.333333,0.0,1.0,0.333333,0.333333,0.0,4900.977722
1860,2015,Argentina,11,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,11671.397676
1861,2015,Armenia,12,0.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,3451.795782
1862,2015,Bangladesh,19,11.0,0.910448,3.208955,0.119403,1.0,0.059701,0.895522,0.014925,792.973643


In [56]:
base = finale.loc[finale["iyear"]<=2014]
base.head()

Unnamed: 0,iyear,country_txt,country,nkill,ransom_No,nwound,property_Ok,individual_No,Firearms,Explosives,Melee,GDP per capita (current US$)
0,1970,Argentina,11,1.0,1.0,0.0,0.75,1.0,0.75,0.0,0.0,1317.487535
1,1970,Dominican Republic,58,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,329.860648
2,1970,Germany,362,7.0,1.0,9.0,1.0,1.0,0.0,0.0,0.0,2750.719742
3,1970,Germany,499,0.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,2750.719742
4,1970,United Kingdom,603,9.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,2347.544318


In [57]:
finish = base.append(add)
print(finish.shape)
print(finish.isna().sum())
print(base.isna().sum())
print(add.isna().sum())
finish.tail(10)

(2070, 12)
iyear                             0
country_txt                       0
country                           0
nkill                             0
ransom_No                         0
nwound                            0
property_Ok                       0
individual_No                     0
Firearms                          0
Explosives                        0
Melee                             0
GDP per capita (current US$)    111
dtype: int64
iyear                             0
country_txt                       0
country                           0
nkill                             0
ransom_No                         0
nwound                            0
property_Ok                       0
individual_No                     0
Firearms                          0
Explosives                        0
Melee                             0
GDP per capita (current US$)    111
dtype: int64
iyear                           0
country_txt                     0
country                        

Unnamed: 0,iyear,country_txt,country,nkill,ransom_No,nwound,property_Ok,individual_No,Firearms,Explosives,Melee,GDP per capita (current US$)
2060,2017,Tunisia,208,1.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,4212.927817
2061,2017,Turkey,209,18.0,0.954545,0.772727,0.318182,1.0,0.318182,0.454545,0.0,10143.447154
2062,2017,Uganda,213,0.0,0.5,0.0,0.0,1.0,0.5,0.0,0.0,584.007799
2063,2017,Ukraine,214,8.0,0.782609,0.956522,0.478261,1.0,0.130435,0.608696,0.0,3377.059863
2064,2017,United Kingdom,603,13.0,0.898551,1.42029,0.536232,0.927536,0.318841,0.15942,0.072464,42504.796322
2065,2017,United States,217,13.0,0.88,1.2,0.52,0.28,0.36,0.04,0.24,50085.766478
2066,2017,Venezuela,222,3.0,1.0,0.0,0.25,1.0,0.5,0.25,0.0,11365.374196
2067,2017,West Bank and Gaza,155,11.0,0.9,0.75,0.4,1.0,0.2,0.05,0.3,2392.371081
2068,2017,Yemen,228,85.0,0.979592,2.469388,0.326531,1.0,0.204082,0.653061,0.0,1296.057492
2069,2017,Zambia,230,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1485.258349


In [58]:
finish[finish['GDP per capita (current US$)'].isnull()]

Unnamed: 0,iyear,country_txt,country,nkill,ransom_No,nwound,property_Ok,individual_No,Firearms,Explosives,Melee,GDP per capita (current US$)
24,1972,West Bank and Gaza,155,1.0,1.000000,3.000000,1.000000,1.0,1.000000,0.000000,0.000000,
35,1973,Myanmar,138,0.0,0.000000,0.000000,0.000000,1.0,0.000000,0.000000,0.000000,
68,1976,Ethiopia,65,0.0,1.000000,0.000000,0.000000,1.0,0.000000,0.000000,0.000000,
74,1976,Namibia,139,2.0,1.000000,0.000000,0.000000,1.0,1.000000,0.000000,0.000000,
88,1977,Ethiopia,65,1.0,1.000000,0.000000,0.000000,1.0,0.000000,0.000000,0.000000,
...,...,...,...,...,...,...,...,...,...,...,...,...
1777,2013,Syria,200,425.0,0.931818,9.488636,0.340909,1.0,0.113636,0.806818,0.022727,
1785,2013,Venezuela,222,0.0,1.000000,7.000000,0.000000,1.0,1.000000,0.000000,0.000000,
1846,2014,Syria,200,548.0,0.961538,8.015385,0.369231,1.0,0.061538,0.769231,0.030769,
1855,2014,Venezuela,222,0.0,1.000000,0.000000,1.000000,1.0,0.000000,0.000000,0.000000,


In [59]:
null1 = finish[(finish["iyear"] <= 1992) & (finish["iyear"] >= 1970)]
null1 = null1[null1['GDP per capita (current US$)'].isnull()]
null1 = null1.merge(indicators_year1, on='country_txt')
null1 = null1.rename(columns = {'GDP per capita (current US$)_y':'GDP per capita (current US$)'})
null1 = null1.drop(columns = {'GDP per capita (current US$)_x'})
print(null1.shape)
null1

(37, 12)


Unnamed: 0,iyear,country_txt,country,nkill,ransom_No,nwound,property_Ok,individual_No,Firearms,Explosives,Melee,GDP per capita (current US$)
0,1976,Ethiopia,65,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,230.452348
1,1977,Ethiopia,65,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,230.452348
2,1979,Ethiopia,65,100.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,230.452348
3,1976,Namibia,139,2.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1952.760768
4,1978,Namibia,139,0.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,1952.760768
5,1979,Namibia,139,14.0,1.0,1.375,0.625,1.0,0.375,0.375,0.0,1952.760768
6,1978,Angola,8,0.0,1.0,20.0,1.0,1.0,0.0,1.0,0.0,750.038549
7,1979,Angola,8,1.0,1.0,12.0,1.0,1.0,0.0,1.0,0.0,750.038549
8,1982,Angola,8,2.0,1.0,0.0,0.5,1.0,0.0,0.0,0.0,750.038549
9,1983,Angola,8,58.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,750.038549


In [60]:
null2 = finish[(finish["iyear"] <= 2014) & (finish["iyear"] > 1992)]
null2 = null2[null2['GDP per capita (current US$)'].isnull()]
null2 = null2.merge(indicators_year2, on='country_txt')
null2 = null2.rename(columns = {'GDP per capita (current US$)_y':'GDP per capita (current US$)'})
null2 = null2.drop(columns = {'GDP per capita (current US$)_x'})
print(null2.shape)
null2

(49, 12)


Unnamed: 0,iyear,country_txt,country,nkill,ransom_No,nwound,property_Ok,individual_No,Firearms,Explosives,Melee,GDP per capita (current US$)
0,1994,Afghanistan,4,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,406.075392
1,1996,Afghanistan,4,15.0,1.0,30.0,1.0,1.0,0.0,1.0,0.0,406.075392
2,1997,Afghanistan,4,4.0,1.0,37.0,1.0,1.0,0.0,1.0,0.0,406.075392
3,1998,Afghanistan,4,8.0,1.0,30.0,1.0,1.0,0.0,1.0,0.0,406.075392
4,1999,Afghanistan,4,5.0,1.0,2.5,0.5,1.0,0.0,1.0,0.0,406.075392
5,2000,Afghanistan,4,25.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,406.075392
6,1994,Iraq,95,51.0,1.0,40.0,1.0,1.0,1.0,0.0,0.0,4294.063668
7,1997,Iraq,95,1.0,1.0,0.0,0.666667,1.0,0.333333,0.0,0.0,4294.063668
8,1998,Iraq,95,3.0,1.0,14.0,1.0,1.0,0.0,1.0,0.0,4294.063668
9,1999,Iraq,95,27.0,1.0,13.714286,0.571429,1.0,0.285714,0.571429,0.0,4294.063668


In [61]:
'''
null3 = finish[(finish["iyear"] <= 2014) & (finish["iyear"] > 1999)]
null3 = null3[null3['GDP per capita (current US$)'].isnull()]
null3 = null3.merge(indicators_year3, on='country_txt')
null3 = null3.rename(columns = {'GDP per capita (current US$)_y':'GDP per capita (current US$)'})
null3 = null3.drop(columns = {'GDP per capita (current US$)_x'})
print(null3.shape)
null3
'''

'\nnull3 = finish[(finish["iyear"] <= 2014) & (finish["iyear"] > 1999)]\nnull3 = null3[null3[\'GDP per capita (current US$)\'].isnull()]\nnull3 = null3.merge(indicators_year3, on=\'country_txt\')\nnull3 = null3.rename(columns = {\'GDP per capita (current US$)_y\':\'GDP per capita (current US$)\'})\nnull3 = null3.drop(columns = {\'GDP per capita (current US$)_x\'})\nprint(null3.shape)\nnull3\n'

In [62]:
null_final = null1.append(null2)
#null_final = null_final.append(null3)
print(null_final.shape)

(86, 12)


In [63]:
finish = finish[finish['GDP per capita (current US$)'].notna()]
print(finish.shape)
finish = finish.append(null_final)
print(finish.shape)
print(finish.isna().sum())

(1959, 12)
(2045, 12)
iyear                           0
country_txt                     0
country                         0
nkill                           0
ransom_No                       0
nwound                          0
property_Ok                     0
individual_No                   0
Firearms                        0
Explosives                      0
Melee                           0
GDP per capita (current US$)    0
dtype: int64


In [64]:
finish.tail()

Unnamed: 0,iyear,country_txt,country,nkill,ransom_No,nwound,property_Ok,individual_No,Firearms,Explosives,Melee,GDP per capita (current US$)
44,2013,Syria,200,425.0,0.931818,9.488636,0.340909,1.0,0.113636,0.806818,0.022727,1216.151653
45,2014,Syria,200,548.0,0.961538,8.015385,0.369231,1.0,0.061538,0.769231,0.030769,1216.151653
46,2013,Venezuela,222,0.0,1.0,7.0,0.0,1.0,1.0,0.0,0.0,6231.917341
47,2014,Venezuela,222,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,6231.917341
48,2014,Yemen,228,233.0,0.94958,2.865546,0.487395,1.0,0.285714,0.554622,0.008403,775.971029


In [65]:
finish

Unnamed: 0,iyear,country_txt,country,nkill,ransom_No,nwound,property_Ok,individual_No,Firearms,Explosives,Melee,GDP per capita (current US$)
0,1970,Argentina,11,1.0,1.000000,0.000000,0.750000,1.0,0.750000,0.000000,0.000000,1317.487535
1,1970,Dominican Republic,58,1.0,1.000000,0.000000,0.000000,1.0,0.000000,0.000000,0.000000,329.860648
2,1970,Germany,362,7.0,1.000000,9.000000,1.000000,1.0,0.000000,0.000000,0.000000,2750.719742
3,1970,Germany,499,0.0,1.000000,0.000000,1.000000,1.0,0.000000,1.000000,0.000000,2750.719742
4,1970,United Kingdom,603,9.0,1.000000,0.000000,0.000000,1.0,1.000000,0.000000,0.000000,2347.544318
...,...,...,...,...,...,...,...,...,...,...,...,...
44,2013,Syria,200,425.0,0.931818,9.488636,0.340909,1.0,0.113636,0.806818,0.022727,1216.151653
45,2014,Syria,200,548.0,0.961538,8.015385,0.369231,1.0,0.061538,0.769231,0.030769,1216.151653
46,2013,Venezuela,222,0.0,1.000000,7.000000,0.000000,1.0,1.000000,0.000000,0.000000,6231.917341
47,2014,Venezuela,222,0.0,1.000000,0.000000,1.000000,1.0,0.000000,0.000000,0.000000,6231.917341


In [66]:
finish = finish.groupby(['iyear','country_txt','country']).mean()
finish

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,nkill,ransom_No,nwound,property_Ok,individual_No,Firearms,Explosives,Melee,GDP per capita (current US$)
iyear,country_txt,country,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1970,Argentina,11,1.0,1.000000,0.000000,0.750000,1.00,0.750000,0.000000,0.00,1317.487535
1970,Dominican Republic,58,1.0,1.000000,0.000000,0.000000,1.00,0.000000,0.000000,0.00,329.860648
1970,Germany,362,7.0,1.000000,9.000000,1.000000,1.00,0.000000,0.000000,0.00,2750.719742
1970,Germany,499,0.0,1.000000,0.000000,1.000000,1.00,0.000000,1.000000,0.00,2750.719742
1970,United Kingdom,603,9.0,1.000000,0.000000,0.000000,1.00,1.000000,0.000000,0.00,2347.544318
...,...,...,...,...,...,...,...,...,...,...,...
2017,United States,217,13.0,0.880000,1.200000,0.520000,0.28,0.360000,0.040000,0.24,50085.766478
2017,Venezuela,222,3.0,1.000000,0.000000,0.250000,1.00,0.500000,0.250000,0.00,11365.374196
2017,West Bank and Gaza,155,11.0,0.900000,0.750000,0.400000,1.00,0.200000,0.050000,0.30,2392.371081
2017,Yemen,228,85.0,0.979592,2.469388,0.326531,1.00,0.204082,0.653061,0.00,1296.057492


In [67]:
print(finish.isna().sum())

nkill                           0
ransom_No                       0
nwound                          0
property_Ok                     0
individual_No                   0
Firearms                        0
Explosives                      0
Melee                           0
GDP per capita (current US$)    0
dtype: int64


In [68]:
finish = finish.fillna(0)
print("Nan Values: ",finish.isna().sum().sum())

Nan Values:  0


Let's save the dataset calling it **"fin1to"**

In [69]:
finish.to_csv('fin1shed_country.csv.gz', compression="gzip")