<a href="https://colab.research.google.com/github/Vaibhav-Dangar/EDA-GLOBAL-TERRORISM-DATASET/blob/main/EDA_Global_Terrorism_Dataset_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - EDA ON GLOBAL TERRORISM DATASET



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary**


GTD is an open source database including information on terrorist attacks around the world from 1970 through 2017 .Terrorism is a threat of violence that creates fear in a population. We try to find out the hot zone of terrorism and other patterns i.e. includes systematic data on domestic as well as international terrorist incidents that have occurred during specific time period.

# **Objective**

The primary objective of this project is to explore and analyze the data to understand the patterns and trends of terrorism, such as the locations, frequency, types of attacks, and perpetrators involved. The EDA process involves data cleaning, feature engineering, data visualization, and statistical analysis.

# **GitHub Link -**

https://github.com/Vaibhav-Dangar/EDA-GLOBAL-TERRORISM-DATASET

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
#import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# import missingno as msno  This library offers a very nice way to visulize the distribution of NAN Values
import missingno as msno
import seaborn as sns




### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive',force_remount=True)
# filepath = '/content/drive/MyDrive/Global Terrorism Data.csv'
df = pd.read_csv('/content/drive/MyDrive/Global Terrorism Data.csv',encoding='ISO-8859-1')



### Dataset First View

In [None]:
# Dataset First Look
df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
rows = len(df.axes[0])
cols = len(df.axes[1])
print(rows)
print(cols)

### Dataset Information

In [None]:
# Dataset Info
df.info()

In [None]:
df.describe()

In [None]:
df.shape()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
for column in df:
  dups  = df.pivot_table(index = [df[column]],aggfunc = 'size')
  print(dups)


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
# for column in df:
#   print(df[column].isnull().sum())
#  list1 = ["N/a","na","Unknown",np.nan]

missing_value = ["N/a","na",np.nan]
df1 = pd.read_csv('/content/drive/MyDrive/Global Terrorism Data.csv',na_values = missing_value,encoding='ISO-8859-1')
df1.isnull().sum()



In [None]:
df['nKill']  = df['nkill'].fillna(0)
df['nwound'] = df['nwound'].fillna(0)

In [None]:
# Visualizing the missing values
msno.bar(df)

### What did you know about your dataset?

GTD is an open source database including information on terrorist attacks around the world from 1970 through 2017 .Terrorism is a threat of violence that creates fear in a population. We try to find out the hot zone of terrorism and other patterns i.e. includes systematic data on domestic as well as international terrorist incidents that have occurred during specific time period,different types terrorist organization carried out attacks in different countries.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
list(df.columns.values)

In [None]:
# Dataset Describe
df.describe()

### Variables Description 
iyear-->This field contains the year in which the incident occurred. 

country_txt-->This field identifies the country or location where the incident occurred. 

region_txt-->This field identifies the region in which the incident occurred. 

city-->This field contains the name of the city, village, or town in which the incident occurred.

attacktype2_txt-->This field captures the general method of attack and often reflects the broad class of tactics used. 

targtype2_txt-->Information on up to three targets/victims is recorded for each incident

targsubtype2_txt-->The target subtype variable captures the more specific target category and provides the next level of designation for each target type.

target2-->This is the specific person, building, installation, etc., that was targeted

natlty2_txt--> This is the nationality of the target that was attacked

gname-->This field contains the name of the group that carried out the attack

nkill-->This field stores the number of total confirmed fatalities for the incident. The number includes all victims and attackers who died as a direct result of the incident. 

property-->Categorical Variable
“Yes” appears if there is evidence of property damage from the incident.

nwound-->This field stores the number of total confirmed wounded people in different incidents

weapsubtype2_txt-->Information on up to four types and sub-types of the weapons used in an attack are recorded for each case,

propextent_txt-->Categorical Variable
If “Property Damage?” is “Yes,” then one of the following four categories describes the extent
of the property damage:
1 = Catastrophic (likely ≥ $1 billion)
2 = Major (likely ≥ $1 million but < $1 billion)
3 = Minor (likely < $1 million)
4 = Unknown

propvalue-->Numeric Variable
If “Property Damage?” is “Yes,” then the exact U.S. dollar amount (at the time of the incident)of total damages is listed.

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for column in df:
  print(column)
  print(df[column].unique())
  print(df[column].nunique())
  print(" ")

### **3.Cleaning the data**

### Data Wrangling Code

In [None]:

# getting the requisite columns
terrirism_cols = df[['iyear','country_txt','region_txt','city','attacktype2_txt','targtype2_txt','targsubtype2_txt','target2','natlty2_txt','gname','motive','nkill','property','nwound','weapsubtype2_txt','propextent_txt','propvalue']]
terrirism_cols.info()

In [None]:
#outlier detection

df[['iyear','country_txt','region_txt','city','attacktype2_txt','targtype2_txt','targsubtype2_txt','target2','natlty2_txt','gname','motive','nkill','property','nwound','weapsubtype2_txt','propextent_txt','propvalue']].plot(kind='box')

In [None]:
#count the unique values in each columns

print(terrirism_cols.propvalue.nunique())
terrirism_cols.country_txt.nunique()

### What all manipulations have you done and insights you found?

205 of 249 (as of 2017) countries have reported terrorist activities at least once between 1970 and 2017

In [None]:
terrirism_cols.region_txt.value_counts() # No of terrorist attacks in diffenent region

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
plt.figure(figsize = (10,5))
plt.bar(x=terrirism_cols.region_txt.value_counts().index,height = terrirism_cols.region_txt.value_counts().values)
plt.title('Regionwise reported terrorist activities from 1970-2017')
plt.xticks(rotation=90)

plt.xlabel('Regions')
plt.ylabel('No. of terrist activities')

##### 1. Why did you pick the specific 
This verticle bar cahrt for fast data exploration and comparison of variable values between different groups and they allow the reader to recognize patterns 
or trends far more easily.



Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Regions that are most affected due to terrorist activities: Miidle east and 
North Africa and south Asis

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code
pd.crosstab(terrirism_cols.iyear,terrirism_cols.region_txt).plot(kind='area',figsize=(15,6))
plt.title('Terrorist activities in each year in differnt regions ')
plt.xlabel('Year')
plt.ylabel('Attacks')
plt.show()

Correlation Analysis

In [None]:
# correlation analysis
plt.figure(figsize=(7,10))
sns.heatmap(np.round(terrirism_cols.corr(),2),annot=True,cmap='BuPu')

##### 1. Why did you pick the specific chart?

Area chart helps to visulize regions effictively and most commonly used to show trends , rather than convey specific values.

##### 2. What is/are the insight(s) found from the chart?

In year 2000 to 2017 number attacks is rapidly increased. As shown in graph,
after 2010 --> Middle east and North africa and South asia are to much affect.




##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
terrirism_cols.gname.value_counts().head(15)[1:]   

In [None]:
# Chart - 3 visulize top 10 terrorist organization with highest terror attacks
plt.figure(figsize=(8,8))
sns.barplot(x=terrirism_cols.gname.value_counts().head(11)[1:].values,y=terrirism_cols.gname.value_counts().head(11)[1:].index ,palette='rocket')
plt.title('Top 10 Terrorist Organization with Highest Terror Attacks',fontsize=15)
plt.xticks(rotation=90)
plt.xlabel('No. of Attacks',fontsize=15)
plt.ylabel('Terrorist Organization',fontsize=15)
plt.show()

##### 1. Why did you pick the specific chart?

The benefit with a horizontal barchart is that the labels are easier to display
sns bar cahet dipslay with different colors so it can be more easily visulize and interpret.

##### 2. What is/are the insight(s) found from the chart?

Taliban is carried out most of the terror attacks.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization of target killing
terrirism_cols.columns
data = terrirism_cols.target2.value_counts().head(6)[1:]   
label_for_target = ['Soldiers','Civilian','Members','Officers','Others']
# plt.axis("equal")
plt.pie(data, labels = label_for_target,radius=1.2,autopct='%0.2f%%',shadow=True)
plt.legend(loc='center',shadow=True,fancybox=True)
plt.show() 

##### 1. Why did you pick the specific chart?

comparitive charts are really easy to understand for humans and data visulization and interppretation is esay.

##### 2. What is/are the insight(s) found from the chart?

Soldiers are most affected by terrorist

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization of deaths in different years
plt.figure(figsize=(7,9))
terrirism_cols.groupby(['iyear'])['nkill'].sum().plot(kind='bar',colormap="summer")
plt.title("Number of deaths in different years")
plt.xlabel('Year')
plt.ylabel('No.of deaths')
plt.xticks(rotation=90)
plt.show()



##### 1. Why did you pick the specific chart?

I choose this graph because , i need to comapre total number of death and year.

##### 2. What is/are the insight(s) found from the chart?

Number of deaths by terrorism is more between 2014 to 2016, with on average > 50k people being killed . High peak to 2014 with 45k+ deaths. 

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Terrorist attacks are increasing day by day and anti terrorist organization should take a look on it.

#### Chart - 6

In [None]:
# Chart - 6 visualization of wounded people in different regions
plt.figure(figsize=(7,7))
terrirism_cols.groupby(['region_txt'])['nwound'].sum().sort_values(ascending=False).plot(kind='bar',colormap="rocket")
plt.title("Region having wounded people",fontsize=15)
plt.xlabel('Region',fontsize=15)
plt.ylabel('No.of wounded',fontsize=15)
plt.xticks(rotation=90)
plt.show()






##### 1. Why did you pick the specific chart?

This chart is esay to understand and esay to interrept the data and visullay quite good.

##### 2. What is/are the insight(s) found from the chart?

Wounded people in Middle east & North africa and South asia is highest and more than 1LAKH 50K+ people is wounded in this region



##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# visulization of no.of people killed in different countries
plt.figure(figsize=(7,7))
terrirism_cols.groupby(['country_txt'])['nkill'].sum().sort_values(ascending=False).head(10).plot(kind='bar',colormap='viridis')
plt.title("Country having killed people",fontsize=15)
plt.xlabel('Country',fontsize=15)
plt.ylabel('No. of Killed',fontsize=15)
plt.xticks(rotation=90)
plt.show()


##### 1. Why did you pick the specific chart?

Efficently visulize No.of killed people in diffeent countries.

##### 2. What is/are the insight(s) found from the chart?

In iraq highest number of killed.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization no.of people killed by taliban over the different year
terr_df_tal = terrirism_cols[terrirism_cols.gname=='Taliban']
plt.figure(figsize=(7,7))
terr_df_tal.groupby(['iyear'])['nkill'].sum().plot(kind='bar',colormap='RdBu')
plt.title('People killed by taliban over the year',fontsize=15)
plt.xlabel('Years',fontsize=15)
plt.ylabel('No.of people killed',fontsize=15)
plt.xticks(rotation=90)
plt.grid()
plt.show()








##### 1. Why did you pick the specific chart?

This grpahs shows number of people killed by taliban over the year.

##### 2. What is/are the insight(s) found from the chart?

Taliban killed more than 5k people during 2015 that a decresing trend could be seen but still number is very high.
After 2006 taliban are more activated.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
#visulization of no.of people killed in india over different years
terr_df_ind = terrirism_cols[terrirism_cols.country_txt=='India']
plt.figure(figsize=(7,7))
terr_df_ind.groupby(['iyear'])['nkill'].sum().plot(kind='bar',colormap='winter')
plt.title('People killed in india over the year',fontsize=15)
plt.xlabel('Years',fontsize=15)
plt.ylabel('No. of people killed',fontsize=15)
plt.xticks(rotation=90)
plt.grid()
plt.show()


Easily visulize and understand the number of people killed by terrorist in different year in india

##### 2. What is/are the insight(s) found from the chart?

In 1992, terrorist killed around 1k+ people in india.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization of no.of people killed by weaponsubtype2

weapon_kill_info= terr_df_ind.groupby(['weapsubtype2_txt'])['nkill'].sum().head(10)

label_for_weapon = weapon_kill_info.index.tolist()
data1 = weapon_kill_info.values  
plt.figure(figsize=(6,6))
# # label_for_target = ['Soldiers','Civilian','Members','Officers','Others']
plt.axis("equal")
plt.pie(data1, labels = label_for_weapon,radius=2.5,autopct='%0.2f%%',shadow=True,textprops={'fontsize': 14})
plt.legend(loc='center',shadow=True,fancybox=True)

##### 1. Why did you pick the specific chart?

This pie chart , hepls to understand which type of most freuqently weapon is used to kill people.

##### 2. What is/are the insight(s) found from the chart?

From this graph , we analyze that 32.30 % indian peoples kills by semi-automatic rifile weapon.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Govt. gives attention and necessary action towards rifile making companies , who delivers the this type of weapons to the terrorist organization.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
plt.subplot(1,2,1)
terr_df_2017 = terrirism_cols[terrirism_cols.iyear==2017]
x1 = terr_df_2017['region_txt'].value_counts().index
y1 = terr_df_2017['region_txt'].value_counts().values
plt.bar(x1,y1,color='red')
# sns.barplot(x1,y1,palette='magma')
# plt.plot(terr_df_2017['region_txt'].value_counts().index,terr_df_2017['region_txt'].value_counts().values)
plt.title('Most Attacked Region in 2017',fontsize=15)
plt.xlabel('Regions',fontsize=15)
plt.ylabel('Attacks',fontsize=15)
plt.xticks(rotation=90)


# top 10 countries that were attcked most in 2017
plt.subplot(1,2,2)
terr_df_2017 = terrirism_cols[terrirism_cols.iyear==2017]
x2=terr_df_2017['country_txt'].value_counts().head(10).index
y2=terr_df_2017['country_txt'].value_counts().head(10).values
plt.bar(x2,y2,color='blue')
plt.title('Most Attacked Country in 2017',fontsize=15)
plt.xlabel('Country',fontsize=15)
plt.ylabel('Attacks',fontsize=15)
plt.xticks(rotation=90)
plt.gcf().set_size_inches(15,5)



##### 1. Why did you pick the specific chart?

Becasuse,effictively visulize the number of attcks in different counries and region in 2017

##### 2. What is/are the insight(s) found from the chart?

In 2017 Iraq was most affected by terrorist activities

Middle east & north africa and south asia was most affected among the top affected regions in 2017.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, those region or city which have a highest terrorist activity in which, no more companies invest their money .so economically this regions or cities will going backward. This type of attacks stop their overall development.

# **Conclusion**


--> Middle east and north africa was most affected among the top affected region.Most of thr people in this region were either wounded or killed.

-->In year 2000 to 2017 number attacks is rapidly increased

-->Taliban became more active since 2013 to 2015 ,they are responsible for the most of terror attacks.

--> Soldiers and civilians are most affected by terror attack

-->Number of deaths due to terror attack was more between 2014 to 2016, with on average > 50k people being killed . High peak to 2014 with 45k+ deaths.


-->Between 1991 and 1992 India was highly affected by terriorist and large number of deaths was occured betwwen this two years

--> Between 2015 to 2017 Taliban killed highest number of peoples around 5K+

--> After 2006 taliban are more activated.

--> The most common attck type in india was semi- automatic rifle.
32% people is killed with the help of this rifles

--> In iraq highest number of people killed.

--> In 2017 Iraq was most affected by terrorist activities

-->Middle east & north africa and south asia was most affected among the top affected regions in 2017.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***