# **Database cleanup steps** 

Any queries or shaping of the dataset is done to render the data, making it more similar to other datasets used in this project.

**1.**

importing dependencies is the crucial first step in data manipulation using pandas.


In [41]:
import pandas as pd


**2.**

Then the original csv must be located.

In [42]:
terrorism_csv = "Resources/globalterrorismdb_0718dist.csv"

**3.**

Following that, the data is read and renamed.

In [43]:
terrorism_data = pd.read_csv(terrorism_csv, encoding="ISO-8859-1", low_memory=False)

**4.**

A dataframe is then created using the particular columns, which will be beneficial when combining with other datasets.

In [44]:
terrorism_df = terrorism_data[['iyear', 'imonth', 'country_txt', 'region_txt', 'gname', 'attacktype1_txt', 'weaptype1_txt']]

**5.**

The first 5 rows of the new terrorism dataframe is then outputted for visual inspection.

In [45]:
terrorism_df.head(5)

Unnamed: 0,iyear,imonth,country_txt,region_txt,gname,attacktype1_txt,weaptype1_txt
0,1970,7,Dominican Republic,Central America & Caribbean,MANO-D,Assassination,Unknown
1,1970,0,Mexico,North America,23rd of September Communist League,Hostage Taking (Kidnapping),Unknown
2,1970,1,Philippines,Southeast Asia,Unknown,Assassination,Unknown
3,1970,1,Greece,Western Europe,Unknown,Bombing/Explosion,Explosives
4,1970,1,Japan,East Asia,Unknown,Facility/Infrastructure Attack,Incendiary


**6.**

The types of the values are then checked to ensure they are compatiable with the other data going into the rational database.

In [46]:
terrorism_df.dtypes

iyear               int64
imonth              int64
country_txt        object
region_txt         object
gname              object
attacktype1_txt    object
weaptype1_txt      object
dtype: object

**7.**

The shape of the dataframe is then checked.

In [47]:
terrorism_df.shape

(181691, 7)

**8.**

Any NULL values are checked.

In [48]:
terrorism_df.isnull().any()

iyear              False
imonth             False
country_txt        False
region_txt         False
gname              False
attacktype1_txt    False
weaptype1_txt      False
dtype: bool

**9.**

A query is then done to isolate the years between 1983 and 2013. Data associated with the year 1983 is only taken from the month of May on.


In [49]:
terrorism_df= terrorism_df.query('(iyear > 1983 & iyear <= 2013) or (iyear == 1983 & imonth >= 5)') 

**10.**

The shape of the dataframe is then checked once again.

In [50]:
terrorism_df.shape

(106687, 7)

**11.**

The dataframe is then printed to a csv.

In [51]:
terrorism_df.to_csv('terrorismCleanData.csv', index=False)