<a href="https://colab.research.google.com/github/NEEL5252/Global-Data-Analysis/blob/main/Analyzing_of_the_global_terrorism_dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **PROJECT NAME**: Global Terrorism Dataset

### **PROJECT TYPE:** EDA
### **CONTRIBUTOR:** Individual

# **PROJECT SUMMARY**

Using the Global Terrorism Database (GTD) from 1970 to 2017, this research investigates essential facets of global terrorism through extensive analysis. The findings include locations with various assault intensities, trends in annual attack frequency, success rates, and the top nations with both the highest and lowest fatalities. It dives into the number of injuries received in each nation, the number of fatalities and injuries per area, the weapons used, and the geographical predominance of assault-based assaults. This study strengthens policymaking, counterterrorism efforts, and academic research by providing greater insights into the varied processes of terrorism. Finally, it prepares stakeholders to improve global security measures, creating a proactive approach to effectively battling terrorism.


# GITHUB link:


https://github.com/NEEL5252

# Problem Statement:

1. Attacks overall by areas
2. Total assaults for the year, whether they went down or up.
3. The number of successful and unsuccessful assaults.
4. The top 10 nations in terms of both kill rates and kills per capita
5. The total number of victims injured in terrorist acts worldwide
6. The overall death toll by geographic region
7. The overall number of victims per location of injuries
8. The total number of weapons used 9. The location of all the assaults that took place.


# Buisseness Objective

# **LET'S START THE ANALYSIS**

## **PHASE - 1**


```
#  Importing libraries, importing dataset and read the dataset
```



In [155]:
# Importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import plotly.express as px
import seaborn as sns
import plotly.graph_objects as go

In [156]:
# Mount the google drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [157]:
# Load the csv file / Read the csv file
globalTd = pd.read_csv("/content/drive/MyDrive/Global Terrarisom dataSet/Global Terrorism Data.csv", encoding='ISO-8859-1')


Columns (4,6,31,33,61,62,63,76,79,90,92,94,96,114,115,121) have mixed types. Specify dtype option on import or set low_memory=False.



In [158]:
# Print the globalTd data
globalTd.head()

Unnamed: 0,eventid,iyear,imonth,iday,approxdate,extended,resolution,country,country_txt,region,...,addnotes,scite1,scite2,scite3,dbsource,INT_LOG,INT_IDEO,INT_MISC,INT_ANY,related
0,197000000001,1970,7,2,,0,,58,Dominican Republic,2,...,,,,,PGIS,0,0,0,0,
1,197000000002,1970,0,0,,0,,130,Mexico,1,...,,,,,PGIS,0,1,1,1,
2,197001000001,1970,1,0,,0,,160,Philippines,5,...,,,,,PGIS,-9,-9,1,1,
3,197001000002,1970,1,0,,0,,78,Greece,8,...,,,,,PGIS,-9,-9,1,1,
4,197001000003,1970,1,0,,0,,101,Japan,4,...,,,,,PGIS,-9,-9,1,1,


In [159]:
# Shape of the globalTd dataset
globalTd.shape

(181691, 135)

In [160]:
# Count the null and missing values from the particular column of the dataset
nullcounts = globalTd.isnull().sum()
# Total null and missing counts from the dataset
totalNull = nullcounts.sum()
print(totalNull)

13853997


### Insights of the dataset:-


Together, there are 181691 rows and columns. 1,38,53,997 pieces of information are either missing or null in total. The year, kind, quantity of fatalities, nature of the weapon used, quantity of injuries, and city of the assault are among the details included in this data collection. Null values and missing values for several columns need to be removed. The wrangling and cleansing of the data will soon be finished.



## **PHASE - 2:-**


```
# Understanding the dataset
```



In [161]:
# Describing our dataset
globalTd.describe()

Unnamed: 0,eventid,iyear,imonth,iday,extended,country,region,latitude,longitude,specificity,...,ransomamt,ransomamtus,ransompaid,ransompaidus,hostkidoutcome,nreleased,INT_LOG,INT_IDEO,INT_MISC,INT_ANY
count,181691.0,181691.0,181691.0,181691.0,181691.0,181691.0,181691.0,177135.0,177134.0,181685.0,...,1350.0,563.0,774.0,552.0,10991.0,10400.0,181691.0,181691.0,181691.0,181691.0
mean,200270500000.0,2002.638997,6.467277,15.505644,0.045346,131.968501,7.160938,23.498343,-458.6957,1.451452,...,3172530.0,578486.5,717943.7,240.378623,4.629242,-29.018269,-4.543731,-4.464398,0.09001,-3.945952
std,1325957000.0,13.25943,3.388303,8.814045,0.208063,112.414535,2.933408,18.569242,204779.0,0.99543,...,30211570.0,7077924.0,10143920.0,2940.967293,2.03536,65.720119,4.543547,4.637152,0.568457,4.691325
min,197000000000.0,1970.0,0.0,0.0,0.0,4.0,1.0,-53.154613,-86185900.0,1.0,...,-99.0,-99.0,-99.0,-99.0,1.0,-99.0,-9.0,-9.0,-9.0,-9.0
25%,199102100000.0,1991.0,4.0,8.0,0.0,78.0,5.0,11.510046,4.54564,1.0,...,0.0,0.0,-99.0,0.0,2.0,-99.0,-9.0,-9.0,0.0,-9.0
50%,200902200000.0,2009.0,6.0,15.0,0.0,98.0,6.0,31.467463,43.24651,1.0,...,15000.0,0.0,0.0,0.0,4.0,0.0,-9.0,-9.0,0.0,0.0
75%,201408100000.0,2014.0,9.0,23.0,0.0,160.0,10.0,34.685087,68.71033,1.0,...,400000.0,0.0,1273.412,0.0,7.0,1.0,0.0,0.0,0.0,0.0
max,201712300000.0,2017.0,12.0,31.0,1.0,1004.0,12.0,74.633553,179.3667,5.0,...,1000000000.0,132000000.0,275000000.0,48000.0,7.0,2769.0,1.0,1.0,1.0,1.0


In [162]:
# Calculating total number of kills and injured people during terrorist attacks
globalTd[['nkill', 'nwound']].describe()

Unnamed: 0,nkill,nwound
count,171378.0,165380.0
mean,2.403272,3.167668
std,11.545741,35.949392
min,0.0,0.0
25%,0.0,0.0
50%,0.0,0.0
75%,2.0,2.0
max,1570.0,8191.0


In [163]:
# Counting total success attacks
globalTd[globalTd['success'] == 1]['success'].sum()

161632

In [164]:
# Different types of weapons used in terrorist attacks
globalTd['weaptype1_txt'].unique()

array(['Unknown', 'Explosives', 'Incendiary', 'Firearms', 'Chemical',
       'Melee', 'Sabotage Equipment',
       'Vehicle (not to include vehicle-borne explosives, i.e., car or truck bombs)',
       'Fake Weapons', 'Radiological', 'Other', 'Biological'],
      dtype=object)

In [165]:
# Types of the attacks
globalTd['attacktype1_txt'].unique()

array(['Assassination', 'Hostage Taking (Kidnapping)',
       'Bombing/Explosion', 'Facility/Infrastructure Attack',
       'Armed Assault', 'Hijacking', 'Unknown', 'Unarmed Assault',
       'Hostage Taking (Barricade Incident)'], dtype=object)

### Description

From the above description:
1. Total 181691 terrorist attacks have been occured till 2017
2. In latitude and longitude we have around 4000 null values which shows that the we don't know the attacked area
3. Total 171378 people were lost their lives during terrorist attacks
4. Total 165380 people were injured during terrorist attacks
5. There are many columns which we don't need we will remove them in near future during the data wrangling phase
6. Out of 181691, 161632 attacks have been successfully executed and remains were unsuccessfull attacks
7. There are different types of weapons were used in terrorist attacks i.e. Explosives, firearms, chemical etc...
8. There are many different types of attacks were done in the past like Assassination, bombing, hijacking, kidnapping etc...

## **PHASE - 3**


```
Data Wrangling
```


1. Remove columns which will not used in our analyis, which contains more numbers of null and missing values

In [166]:
# Make list of columns which we want to delete
deleteColumns = []
for i in globalTd.columns:
  deleteColumns.append(i)

In [167]:
# columns list which are not gonna deleted (Based on these columns we will analysis the dataset)
notDeleteColumns = ['evenid', 'iyear', 'country', 'country_txt', 'region', 'region_txt',
                    'provstate', 'city', 'latitude', 'longitude', 'location',
                    'doubtterr', 'alternative', 'multiple','success',
                    'suicide', 'attacktype1', 'attacktype1_txt', 'targtype1',
                    'targtype1_txt', 'targsubtype1', 'targsubtype1_txt', 'corp1',
                    'weaptype1', 'weaptype1_txt', 'nkill', 'nwound']

# Converting it into array
notDeleteColumnsArray = np.array(notDeleteColumns)

In [168]:
# Deleting not necessary columns from the dataset
for i in deleteColumns:
  if (notDeleteColumnsArray[notDeleteColumnsArray == i]):
    pass
  else:
    globalTd.drop(i, inplace=True, axis=1)


The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.


The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.


The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.


The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.


The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.


The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array i

In [169]:
globalTd.head()

Unnamed: 0,iyear,country,country_txt,region,region_txt,provstate,city,latitude,longitude,location,...,attacktype1_txt,targtype1,targtype1_txt,targsubtype1,targsubtype1_txt,corp1,weaptype1,weaptype1_txt,nkill,nwound
0,1970,58,Dominican Republic,2,Central America & Caribbean,,Santo Domingo,18.456792,-69.951164,,...,Assassination,14,Private Citizens & Property,68.0,Named Civilian,,13,Unknown,1.0,0.0
1,1970,130,Mexico,1,North America,Federal,Mexico city,19.371887,-99.086624,,...,Hostage Taking (Kidnapping),7,Government (Diplomatic),45.0,"Diplomatic Personnel (outside of embassy, cons...",Belgian Ambassador Daughter,13,Unknown,0.0,0.0
2,1970,160,Philippines,5,Southeast Asia,Tarlac,Unknown,15.478598,120.599741,,...,Assassination,10,Journalists & Media,54.0,Radio Journalist/Staff/Facility,Voice of America,13,Unknown,1.0,0.0
3,1970,78,Greece,8,Western Europe,Attica,Athens,37.99749,23.762728,,...,Bombing/Explosion,7,Government (Diplomatic),46.0,Embassy/Consulate,,6,Explosives,,
4,1970,101,Japan,4,East Asia,Fukouka,Fukouka,33.580412,130.396361,,...,Facility/Infrastructure Attack,7,Government (Diplomatic),46.0,Embassy/Consulate,,8,Incendiary,,


All the columns that won't be needed in subsequent analysis have been eliminated.
Some columns had a significant amount of null entries, while others had the same data as other columns that had been erased.


## **PHASE - 4**


```
# DATA VISUALIZATION PART
```



### **PROBLEM: 1** Attacks overall by areas

In [170]:
# Code to calculate number of attacks by countries
totalAttacks = globalTd['country_txt'].value_counts().reset_index()

In [171]:
# Update the column names, which is easy to read the table
totalAttacks.rename(columns={'index':'country', 'country_txt':'total Attacks'}, inplace=True)

In [172]:
totalAttacks.head()

Unnamed: 0,country,total Attacks
0,Iraq,24636
1,Pakistan,14368
2,Afghanistan,12731
3,India,11960
4,Colombia,8306


In [173]:
# Count the total number of attacks around the world
numberOfAttacks = totalAttacks['total Attacks'].sum()
print(numberOfAttacks)

181691


In [174]:
# Calculate the percentage (How many percent of the total number of attacks have occurred in any country?)
totalAttacks['prc'] = totalAttacks['total Attacks'].apply(lambda x: round((x/numberOfAttacks) *100, 2))

In [175]:
# Sorting the Data
totalAttacks.sort_values(by='country', inplace=True)

In [176]:
totalAttacks.head()

Unnamed: 0,country,total Attacks,prc
2,Afghanistan,12731,7.01
97,Albania,80,0.04
17,Algeria,2743,1.51
203,Andorra,1,0.0
47,Angola,499,0.27




```
🔘 To better understand the given dataset, let's break it into three pieces.
1. Nations having the highest percentage (more than 1%) of terrorist attacks compared to other nations
2. Nations having a percentage of terrorist attacks between 1% and 0% compared to other nations
3. Nations with the fewest number of terrorist attacks compared to other nations (= 0).

```



In [177]:
import plotly.express as px
import pandas as pd

# Create a line plot using Plotly Express
line_fig = px.line(totalAttacks, x='country', y='prc', title='Line Plot of Percentage')
line_fig.update_traces(line=dict(color='blue'))

# Create a bar plot using Plotly Express
bar_fig = px.bar(totalAttacks, x='country', y='total Attacks', title='Bar Plot of Total Attacks')
bar_fig.update_traces(marker_color='green')

# Combine the plots using subplot
from plotly.subplots import make_subplots

fig = make_subplots(rows=1, cols=2, subplot_titles=('Percentage', 'Total Attacks'))

# Add the line and bar plots to the subplot
fig.add_trace(line_fig.data[0], row=1, col=1)
fig.add_trace(bar_fig.data[0], row=1, col=2)

# Set layout for subplot
fig.update_layout(width=1500, showlegend=False)

# Show the plot
fig.show()


### **PROBLEM: 2** Total assaults for the year, whether they went down or up.

In [178]:
# Fetch the data and count the total attacks per year
attacksPY = globalTd['iyear'].value_counts().reset_index()

In [179]:
# Rename the columns
attacksPY.rename(columns={'index':'year', 'iyear':'totalAttacks'}, inplace=True)

In [180]:
# Sort the table by the year
attacksPY.sort_values(by=['year'], inplace=True)

In [181]:
attacksPY.head()

Unnamed: 0,year,totalAttacks
42,1970,651
46,1971,471
44,1972,568
45,1973,473
43,1974,581


In [182]:
attacksPY['prc'] = round((attacksPY['totalAttacks'] / numberOfAttacks) * 100, 2)

In [183]:
attacksPY.head()

Unnamed: 0,year,totalAttacks,prc
42,1970,651,0.36
46,1971,471,0.26
44,1972,568,0.31
45,1973,473,0.26
43,1974,581,0.32


In [190]:
# ploting line plot to see the attacks over the years
fig = px.line(attacksPY, x = 'year', y='prc', hover_data={'year':True,
                                                                   'totalAttacks': True,
                                                                   'prc': True},
               labels={'year': 'Year', 'totalAttacks': 'Attacks', 'prc': 'Percentage'}, markers=True, line_shape='spline', color_discrete_sequence=["DarkRed"])

# Update layout (give xaxis title, yaxis title,etc...)
fig.update_layout(
    xaxis_title='Year',
    yaxis_title='Total Attacks',
    title='Analysis Attacks numbers per year whether decreased or increased',
    width=900
)

# Plot/ show the plot
fig.show()