# The Sparks Foundation

# Kiran L. Ware , Data Science and Business Analytics Intern.

Task 4: Perform ‘Exploratory Data Analysis’ on dataset ‘Global Terrorism’
    * As a security/defense analyst, try to find out the hot zone of terrorism.

* Steps which we follow during this task are:
    * Import required libraries
    * Reading the dataset
    * Preprocessing given data
    * EDA
    * Visualization
    * Conclusion

* Import Required Libraries
    * numpy
    * pandas
    * matplotlib
    * seaborn
    * plotly

In [None]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import plotly.express as px

* Read Dataset Using Pandas library

In [None]:
import chardet
with open('K:\\machine learning\\sparks foundation internship\\globalterrorism.csv', 'rb') as rawdata:
    result = chardet.detect(rawdata.read(100000))
result

In [None]:
df = pd.read_csv('K:\\machine learning\\sparks foundation internship\\globalterrorism.csv',encoding='ISO-8859-1')
df.head()

* Check information related to Data like shape, columns name, Null Value, etc.

In [None]:
df.shape

In [None]:
df.columns.values

* above we have 135 columns but we not want that much for exploring or analyzing, so select some important columns from those columns

In [None]:
df = df[["iyear", "imonth", "iday","country", "country_txt", "region_txt", "city","success", "attacktype1_txt",
         "nkill", "propvalue", "targtype1_txt","latitude","gname","longitude", "targsubtype1_txt","target1",
         "weaptype1_txt", "weapdetail"]]

In [None]:
df.head()

* here we get our new dataframe, now we go stepwise

In [None]:
df.shape

In [None]:
df.columns

* Here we rename our columns name 

In [None]:
df.rename(columns = {'iyear':'Year', "imonth":"Month","iday":"Day", "country":"Country_no",
                     "country_txt":"Country","region_txt":"Region","city":"City", "propvalue":"property_value",
                     "attacktype1_txt":"Attack_type", "targtype1_txt":"Target_type","gname":"group_name",
                     "targsubtype1_txt":"Target_sub_type", "target1":"Target", "nkill":"no_of_kills",
                     "weaptype1_txt":"Weapon_type", "weapdetail":"Weapon_detail"}, inplace = True)

Here we check for Null values

In [None]:
df.isnull().sum()

* Here we see there are 8 columns having null values so here we import values some required places

In [None]:
df["City"].fillna("Unknown", inplace = True)
df["Target"].fillna("Unknown", inplace = True)
df["property_value"].fillna(0, inplace = True)
df["no_of_kills"].fillna(0, inplace = True)

* Here we import values in city, Target, Property Value and no_of_kills columns.
* Here we not import or replace null values in some columns by any other information because there is not that much need of that columns and are not affect our other columns.

In [None]:
df.isnull().sum()

* so here we see we successfully replace null values in some columns.

In [None]:
df.info()

# Explorating Data Analysis

* now we visualize data using some plotting libraries.

* Here we plot countplot for counting number of attacks per year in the world

In [None]:
from matplotlib import style
style.use('ggplot')
plt.figure(figsize=(25,9))
sns.countplot(x ='Year', data = df)
plt.title("No of Attacks [1970-2017]",fontsize=25)  
plt.grid(color = 'g',linestyle = '-.',linewidth = 0.5)

* from the above graph we say that 
 * in the year of 2014 there are most numbers of Attacks

* Here we visualize number of Success attacks in comparison with the Failure.

In [None]:
plt.figure(figsize=(25,9))   
plt.title("No of Attacks 'Success' & 'Failure'")
sns.set_style('darkgrid') 
sns.countplot(x ='Year', hue = "success", data = df)
plt.grid(color = 'black',linestyle = '-.',linewidth = 0.5)
plt.show()

* in Blue color Show number of Success and in red color show number of failure. 

* Here we see most of the number of Successful attacks in the year of 2014.

* Now we plot Distribution of number of Attacks per Region. 

In [None]:
plt.figure(figsize=(25,9))
sns.set_style('darkgrid')
plt.title("Region-Wise Attack Count",fontsize=25) 
sns.countplot(x ='Region', data = df)
plt.grid(color = 'black',linestyle = '-.',linewidth = 0.5)
plt.show()

* Here we see 'Middle East & North Africa' region and 'South Asia' region faces the most of Attacks.

In [None]:
fig1 = px.scatter(df, x="Year", y="Region", color="Attack_type",size="no_of_kills", hover_name="Country",
                 log_x=True, size_max=60,title='Attack type in each region [1970-2017]')
fig1.show()

* from above we see
    * North America Faces most numbers of Hijacking type attack
    * sub-saharan africa faces most of the Armed Assault type attack
    * middle East & North africs faces most of the Hostage Taking (Kidnapping) type attacks.
* Bombing/Explosion is the attack which is most number of times use for attack.

In [None]:
px.scatter_mapbox(df, lat="latitude", lon="longitude", color="Attack_type",size="no_of_kills",hover_name="City",
                  color_continuous_scale=px.colors.cyclical.IceFire, size_max=25,
                  mapbox_style="carto-positron",title='Number of Kills distribution in each City')

* Most Hot Places (More number of kills) 
    * Upper left border of South America (Peru)
    * Guatemala (bottom part of North America)
    * India-Pakistan Border
    * India-China Border
    * Nigeria-Chad Border
    * Afganistan-Pakistan Border
    * Lebanon-syria-Iraq 

In [None]:
gr = df.groupby(['Country'],as_index=False).count()
fig = px.choropleth(gr,locations='Country',locationmode='country names',color='Year',hover_name='Country',
                    projection='orthographic',title='Total Number of Attacks in [1970-2017]',labels={'Year','Attacks'})
fig.show()

* Yellow Show Most number of attacks in that country and purple show less number of attacks.

* Most number pf Attacks In the Iraq.
* After that in Pakistan then Afganistan and then india.

In [None]:
plt.figure(figsize=(25,9))
sns.set_style('darkgrid')
sns.catplot(x ='Year', y ='no_of_kills', data = df, kind ='bar',height=10, aspect=27/13,color='r')
plt.title('Number of Kills Per Year',fontsize=25)
plt.grid(color = 'black',linestyle = '-.',linewidth = 0.5)
plt.show()

most Number of Kills in the year of 1998 & 2004.

In [None]:
fig2 = px.pie(df, values='no_of_kills', names='Attack_type', title='Number of Kills per Attack Type')
fig2.update_layout(legend_title_text='ATTACK TYPES')

* Number of Kills Per Attack Type
    * Most Damaged in Armed Assault Attack Type & Bombing/Explosion Attack Type

In [None]:
fig3 = px.pie(df, values='no_of_kills', names='Weapon_type', title='Number of Kills by each type of Weapon')
fig3.update_layout(legend=dict(orientation='v',
    yanchor="top",
    y=1.3,
    xanchor="right",
    x=2.5
))
fig3.update_layout(legend_title_text='Types Of WEAPONS')
fig3.show()

* Number of Kills Per Weapon Type
    * Most Damaged by firearms & Explosives Weapon Type

In [None]:
fig4 = px.pie(df, values='no_of_kills', names='Target_type', title='Number of Kills in each type of target')
fig4.update_layout(legend_title_text='TARGET AREA')

* Private Citizens & Property Area
* Military Area
* Police & Government Related Areas 
    Are the Hot Target Areas

# Hot Zone of TERRORISM:

* Most Hot Countries
    * Iraq, Pakistan, Afganistan and India are the most hot Zone of Terrorism
* Most Hot Region
    * 'Middle East & North Africa' region and 'South Asia' region faces the most of Attacks.
* Most Hot Places (More number of kills) 
    * Upper left border of South America (Peru)
    * Guatemala (bottom part of North America)
    * India-Pakistan Border
    * India-China Border
    * Nigeria-Chad Border
    * Afganistan-Pakistan Border
    * Lebanon-syria-Iraq
* Most Hot Target Type
    * Private Citizens & Property Area
    * Military Area
    * Police & Government Related Areas
    
* Highest Number of Attacks Takes place in the year of 2014.
* In between 2010-2017 we see the Hostage Taking (Barricade incident , Kidnapping) takes place at a wide range.
* out of total number of kills 38.9% kills are single in Armed assault attack type.

# Thank You..!