<a href="https://colab.research.google.com/github/PrashantShrivastava1612/Global-Terrorism-Data-Analysis/blob/main/Global_Terrorism_Analysis_Capstone_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## <b> The Global Terrorism Database (GTD) is an open-source database including information on terrorist attacks around the world from 1970 through 2017. The GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 180,000 attacks. The database is maintained by researchers at the National Consortium for the Study of Terrorism and Responses to Terrorism (START), headquartered at the University of Maryland.</b>

# <b> Explore and analyze the data to discover key findings pertaining to terrorist activities. </b>

In [None]:
# Import statements
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
file_path = "/content/drive/MyDrive/AlmaBetter /Capstone Project/Global Terrorism Analysis/Global Terrorism Data.csv"
gt_df = pd.read_csv(file_path,encoding = "ISO-8859-1")

In [None]:
# This will allow the columns to be displayed. As there are 135 columns.
# pd.set_option('display.max_columns', None)

In [None]:
gt_df.head()

In [None]:
gt_df.tail() 

In [None]:
gt_df.info

In [None]:
gt_df.describe()

In [None]:
print(gt_df.columns)      

In [None]:
print(pd.DataFrame(gt_df.isna().sum()).T)

 Most of the columns have null values here. some have nearly 40-50% null values so removing those columns would be a better option.

In [None]:
#selecting only those columns which are usefull and dont have high number of null values

gt_df.rename(columns = {"attacktype1_txt":'attack_type',"country_txt":'country_name',"iday":'day', "imonth":'month', "iyear":'year',"natlty1_txt":'nationality',
              "region_txt":'region',"targtype1_txt":'target',"weaptype1_txt":'weapon',"gname":'gang'}, inplace = True)
df = gt_df[['attack_type','city' ,'country_name' ,'gang' ,'day' ,'month' ,'year' 
            ,'nkill' ,'success' ,'nationality' ,'provstate' ,'region' ,'target1' 
            ,'target' ,'weapon' ,'latitude' ,'longitude']]
df.head(3)

<!-- # Q1: Number of attack per year 
# Q2: World Wide what are the major attack_types used?
# Q3: Geographic Representation on Map, in which places attacks have been done since 1970
# Q4: Top 6 highly attacked regions in india with different attack types
# Q5: Weapon used per Attacktype in India
# Q6: Number of attacks per attack type in top 10 most attacked countries ?
# Q7: Terrorist group attacking most number of times? -->

 ### important insights to draw from the data
 ##### Q1: Number of attack per year 
 ##### Q2: World Wide what are the major attack_types used?
 ##### Q3: Geographic Representation on Map, in which places attacks have been done since 1970
 ##### Q4: Top 6 highly attacked regions in india with different attack types
 ##### Q5: Weapon used per Attacktype in India
 ##### Q6: Number of attacks per attack type in top 10 most attacked countries ?
 ##### Q7: Terrorist group attacking most number of times?

## Number of attack per year

In [None]:
Atk_pr_yr = df.groupby(['year'])['year'].count()

In [None]:
Atk_pr_yr.plot(kind='bar',stacked=True, colormap='Dark2',title='Number of attack per year',xlabel ='Years' ,ylabel = 'Count of attack', figsize=(15,10))

Highest number of attacks were done in 2014 that was more than 16000 followed by the year 2015 and 2016. Least number of attacks were done in 1973.

## World Wide what are the major attack_types used?

In [None]:
Atk_typ = df.groupby(['attack_type'])['attack_type'].count()
Atk_typ 

In [None]:
Atk_typ.plot(kind = 'pie', figsize=(6,6), autopct = '%.4f')
plt.title('World wide Attack types') 

world-wide 48.57% attacks were of Bombing and Explosion followed by Armed Assault 23.48%  and 10% assassination. The least used attacktype are hostage Taking(Barricade Incident) and hijacking

## Geographic Representation on Map, in which places attacks have been done since 1970

In [None]:
fig = px.scatter_geo(df,lat='latitude',lon='longitude')
fig.update_layout(title = 'Attacks throughout the world', title_x=0.5)
fig.show()

From this map it can be assumed that countries like Greenland,Australia, sweden, norway, States of USA like Alaska, Northern Canada and north east of Russia are among very peaceful places throughout the world.

##Top 6 highly attacked regions in india with different attack types

In [None]:
prov_df = df.loc[df['country_name'] == 'India' ]
prov_df = prov_df.groupby(['provstate'])['provstate'].count().reset_index(name='count').sort_values(['count'],ascending = False).head(6)

In [None]:
prov_df.head(6)

In [None]:
plt.figure(1)
count = 1
for prov in prov_df['provstate']:
  new_df = df.loc[df['provstate']==prov]
  Atk_typ = new_df.groupby(['attack_type'])['attack_type'].count()
  plt.subplot(3,2,count)
  plt.title(prov)
  Atk_typ.plot(kind = 'barh', figsize = (21,30))
  #Atk_typ.plot(kind = 'pie', figsize = (21,20), autopct = '%.4f', labeldistance= 1.2) # uncomment line to create pie chart 
  count+=1

These are the most attacked states in india from 1970. Jammu & Kashmir is leading this Index followed by Assam, manipur, chattishgarh, punjab and jharkhand. Attack type varies state to state again Bombing and Explosion being the most used.

## Weapon used per Attacktype in India

In [None]:
ind = df.loc[df['country_name'] == 'India'].groupby(['attack_type', 'weapon']).size().reset_index()

fig = px.treemap(ind, path=['attack_type', 'weapon'],
                 values=0,
                 color='weapon')
fig.show()

In India , mostly Explosives have been used in Bombing and Explosion being the highest attack type in india. Next is Armed Assault where Firearms are mostly used weapons others include Melee, Explosives and Incendiary.
Based on the treemap it can be said that Most preferred weapons are firearms apart from Bombing and explosion.

##Number of attacks per attack type in top 10 most attacked countries ?

In [None]:
df.head() 

In [None]:
per_country_attack = df.groupby(['country_name','attack_type'])['attack_type'].size().reset_index(name='size')

In [None]:
attack = per_country_attack.groupby(['country_name','attack_type'])['size'].sum()

In [None]:
nation_atk_count = {}
for country in list(per_country_attack['country_name']):
  if country not in nation_atk_count.values():
    nation_atk_count[country] = per_country_attack[per_country_attack['country_name'] == country]['size'].sum()
nation_atk_count = sorted(nation_atk_count.items(), key=lambda x:x[1], reverse = True)
sort_nation_atk_count = dict(nation_atk_count)
print(sort_nation_atk_count)


In [None]:
atk_df = per_country_attack[per_country_attack['country_name'].isin(list(sort_nation_atk_count.keys())[0:10])]

In [None]:
atk_df.head()

In [None]:
pivot = atk_df.pivot(index='country_name', columns='attack_type', values='size')
pivot

In [None]:
ax = sns.heatmap(pivot,annot=False,linewidths=1, linecolor = 'white', cmap="icefire")  
plt.show()

In these Top 10 Most Attacked Countries Iraq was enormously attacked by Bombing and Explosions outreaching the sum of any two attack_types throughout the globe.it was attacked 18000+ times by bombing. Second name is of Afganistan Again in Bombing and explosion.
If we compare the attack type used in these top 10 highest attacked countries ranking would be(1: Bombing/Explosion, 2:Armed Assault, 3: Assassination)

## Terrorist group attacking most number of times?

In [None]:
df.head(2)

In [None]:
new = df.groupby(['gang'])['gang'].count().reset_index(name='count').sort_values(['count']).tail(10)
new.drop(3408, inplace=True)

plt.figure(figsize = (8,8))
plt.pie(new['count'],labels= new['gang'] , autopct = '%.2f', labeldistance=0.9)
#plt.legend()
plt.show()

Worldwide nearly 70.50 Attacks were done by unknown Groups. 
If we talk about grouped attacks the highest known active terrorist group is Taliban undertking 6.37% of global attacks followed by ISIL, and Shining Path(SL) group

In [None]:
pip install geopandas
import geopandas as gpd

In [None]:
shp_gdf = gpd.read_file('/content/drive/MyDrive/Datasets/Shapefile/India_State_Boundary.shp')
shp_gdf.reset_index(inplace=True)
shp_gdf.sort_values('Name', inplace = True)

In [None]:
states_count = india_df.groupby(['Province/State'])['Province/State'].count()
states_count = states_count.to_frame()


In [None]:
states_count.rename(columns={'Province/State':'State','Province/State':'Values'}, inplace=True)
states_count.head()

In [None]:
num_of_attacks = [0,292,24,1151,688,47,979,0,208,5,85,50,24,2454,887,71,98,2454,0,75,302,1100,294,27,115,649,2,949,43,4,164,24,117,201,24,650]

In [None]:
shp_gdf['Attacks']=num_of_attacks
shp_gdf.head()

In [None]:
fig, ax = plt.subplots(1, figsize=(12, 12))
ax.axis('off')
ax.set_title('Statewise Attacks in India',
             fontdict={'fontsize': '15', 'fontweight' : '3'})
fig = shp_gdf.plot(column='Attacks', cmap='OrRd', linewidth=0.5, ax=ax, edgecolor='0.2',legend=True)