<a href="https://colab.research.google.com/github/Dharitri-2022ds/EDA-Global-Terrorism-Analysis-/blob/main/Almabetter_Capstone_Module2_EDA_of_Global_terrorism.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -  **Global Terrorism Analysis**


##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**
The Global Terrorism Database (GTD) is an open-source database including information on terrorist attacks around the world from 1970 through 2017.

The GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 180,000 attacks.

The database is maintained by researchers at the National Consortium for the Study of Terrorism and Responses to Terrorism (START), headquartered at the University of Maryland.

Write the summary here within 500-600 words.

# **GitHub Link -**


https://github.com/Dharitri-2022ds/EDA-Global-Terrorism-Analysis-

# **Problem Statement**




In this project I am going to analyse Global Terrorism dataset.
This dataset contains information about various terrorist activities and attacks, such as location, date, type of attack, target, number of casualities, and more.

#### **Define Your Business Objective?**

The objective of this project is to explore and analyze the data to gain insights and generate visualisations that can help us better understand the nature and trends of global terrorism.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import missingno as msno
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns


### Dataset Loading

In [None]:
# Load Dataset
#Mount google drive for accessing the dataset of Global Terrorism
from google.colab import drive
drive.mount('/content/drive')

In [None]:
#File path of Global Terrorism dataset in google drive
drive.mount("/content/drive", force_remount=True)

file_path = "/content/drive/MyDrive/Colab Notebooks/Global Terrorism Data.csv"
df = pd.read_csv(file_path,encoding='latin-1')

### Dataset First View

In [None]:
# Dataset First Look
df.head(5)

Here we can see the first 5 rows of the dataset. Similarly, we can also view the last 5 rows of the dataset.

In [None]:
df.tail(5)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

Number of rows in the dataset is 181691.
Number of columns in the dataset is 135.

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(df[df.duplicated()])


Number of duplicate value count: 0

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(df.isnull().sum())

### What did you know about your dataset?
 This dataset contains 181691 number of rows and 135 number of columns with zero duplicate values.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

# Variables Description

**eventid:** A unique identifier for each incident.

**iyear:** The year in which the incident occurred.

**imonth and iday:** The month and the day in which the incident occurred respectively.

**country and region:** The country and region where the incident occurred.

**provstate:** The province or state where the incident took place.

**city:** The city where the incident occurred.

**latitude and longitude:** Geographic coordinates of the incident.

**specificity:** Indicates the level of geographic specificity(e.g. city, region) of the incident's location.

**Attack information:**

**attacktype1,attacktype2,attacktype3:** The primary, secondary, and tertiary attack types, categorizing the nature of the attack(assassination, bombing, hijacking).

**weaptype1,weaptype2,weaptype3:** The primary, secondary, and tertiary weapon types(e.g. explosives, firearms, chemical weapons).

**Target information:**

**targtype1,targtype2,targtype3:** The primary, secondary, and tertiary target types of the attack(e.g.civilians, military, govt)

**nkill:** The number of confirmed fatalitites in the incident.

**nwound:** The number of confirmed injuries in the incident.

**nkillter:**The number of perpetrators killed during the incident.

**Perpetrator information:**

**ngame:** The name of the group responsible for the attack.

**motive:** The perceived motive behind the attack.

**guncertain1,guncertain2,guncertain3:** Indicates whether the responsible group is uncertain for the primary, secondary, and tertiary groups involved.

**Information:**

**Summary:** A narrative desription of the incident.

**dbsource:** The source of the data for the incident.

**related:** Indicates if there is anpther incident in the database.

**propextent:** Extent of property damage(e.g.,unknown, major, minor).

**propvalue:** Value of the property.

**ishostkid:** Indicates whether hostages were taken during the incident.

**Miscellaneous information:**

**ransom:** Indicates if a ransom was demanded or paid.

**ransompaid:**  Amount of ransom paid, if applicable.

**hostidoutcome:** hostage-taking incident.




### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in df.columns.tolist():
  print("No. of unique values in ",i,"is",df[i].nunique(),".")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
null_sum= (df.isnull().sum()/len(df)) * 100
percent_null= null_sum.sort_values(ascending=False)
type(percent_null)

high_null_column= percent_null[percent_null > 50]

less_than_50_null_column= percent_null[~percent_null.isin(high_null_column)]

print(f'Number of columns with less than 50% missing values: {len(less_than_50_null_column)}')
less_null_column= less_than_50_null_column.index.tolist()
print(less_null_column)

In [None]:
new_df = df[less_null_column]


In [None]:
check= new_df.columns.tolist()

In [None]:
selected_columns_1 = ['eventid','iyear','imonth','iday','country_txt','region_txt','city','multiple','success','suicide','attacktype1_txt','targtype1_txt','targsubtype1_txt','corp1','target1','natlty1_txt','weaptype1_txt','nkill','nwoundus','claimed','individual','INT_LOG','doubtterr','INT_MISC','specificity','gname','ishostkid','INT_ANY','guncertain1','provstate']

In [None]:
len(check)-len(selected_columns_1)

In [None]:
print(list((set(check)) - (set(selected_columns_1))))

In [None]:
df_1= df[selected_columns_1]
df_1.T

In [None]:
(df_1.isnull().sum()/len(df_1))*100

In [None]:
df_1.info()

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1- **Counting Terrorist Attack Per Year**

In [None]:
# Chart - 1 visualization code
#Calculate attacks per year
attack_per_year = df_1['iyear'].value_counts().sort_index()


In [None]:
# Set figure size
plt.rcParams['figure.figsize'] = (12, 5)


ax = sns.countplot(x='iyear', data=df_1,palette='viridis')

ax.set(xlabel='Year', ylabel='Number of Terrorist Attack')
plt.xticks(rotation=90)
ax.set_title('Number of Terrorist Attacks Per Year', fontsize=15)
plt.show()

##### 1. Why did you pick the specific chart?

This chart provides a clearer representation of the variation of terrorist attacks over the years.

##### 2. What is/are the insight(s) found from the chart?

In [None]:
# Calculate the count of terrorist attacks per year and store it as a dictionary.
count_year= df_1['iyear'].value_counts().to_dict()

# Calculate the percentage increase in attacks from 1970 to 2017.
rate= ((count_year[2017]-count_year[1970])/count_year[1970])* 100

# Print the counts of attacks in 1970 and 2017.
print(count_year[1970],'attacks happened in 1970 &',count_year[2017],'attacks happened in 2017')

# Print the percentage increase in attacks from 1970 to 2017.
print('So the number of attacks has increased by',np.round(rate,2),'% from 1970 to 2017')

In [None]:
# Calculate the average number of terror attacks per year for the initial five years from 1970 to 1975.
terror_attack_first_5_years = df_1[df_1['iyear'] <= 1975]['iyear'].value_counts().mean()

# Display the calculated average number of terror attacks for the specified period
print(f"The average number of terror attacks per year for the first five years (1970-1975) is: {terror_attack_first_5_years}")

#Calculating Overall Average(1970-2017)
mean_of_terror_attack = df_1['iyear'].value_counts().mean()
print(f"The overall average number of terror attacks per year from 1970 to 2017 is: {mean_of_terror_attack}")

# Calculating the average number of terrorists attack per year for the last 5 years(2013-2017).
terror_attack_last_5_years = df_1[df_1['iyear'] >= 2013]['iyear'].value_counts().mean()

# Display the calculated average number of terror attacks for the specified period
print(f"The average number of terror attacks per year for the last five years (2013-2017) is: {terror_attack_last_5_years}")

### Insights found from the chart:
- 651 attacks happened in 1970 & 10900 attacks happened in 2017.
- So the number of attacks has increased by 1574.35 % from 1970 to 2017.
- The average number of terror attacks per year for the first five years (1970-1975) is: 580.6666666666666
- The overall average number of terror attacks per year from 1970 to 2017 is: 3865.7659574468084
- The average number of terror attacks per year for the last five years (2013-2017) is: 13678.2

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Answer Here.**

From the year 1970, there is a wave of rise in terrorist activities till 1992 then till 2004 there is a drop in terrorist activities and from 2004 to 2015 there is a steeper wave of rise in terrorist activities.

No there are no insights that to negative growth.

From the above pattern of waves of terrorist activities it seems that there could be possibility of drop in terrorist activities and again there is a rise in terrorist activities which is much steeper and larger than previous two waves.

Over the years, number of Terrorist attacks are increasing which means that measures taken to prevent attacks are not enough.

There need to be taken more preventive measures against these terrorist groups and smuggling of weapons.

#### Chart - 2  Top 30 countries via Number of attacks

In [None]:
# Chart - 2 visualization code

ax= plt.rcParams['figure.figsize'] = (12, 5)


ax = sns.countplot(x='country_txt', data=df_1,order=df_1['country_txt'].value_counts().index[:30], palette='colorblind')

ax.set(xlabel='Country', ylabel='Count of Terrorist Attack')
plt.xticks(rotation=75)
ax.set_title('Top 30 Country via Number of Attacks ', fontsize=15)
plt.show()



 ##### 1. Why did you pick the specific chart?
 **Answer Here.**
 A bar graph, especially when ordered by count, effectively highlights this ranking and makes it easy to identify the countries with the most attacks.

 This is categorical data (countries) with a numerical value associated (number of attacks). Bar graphs excel at representing this type of data clearly.

2. What is/are the insight(s) found from the chart?

**Answer Here.**

- Iraq faces the highest number of terrorist attacks, followed by Pakistan, then Afghanistan.
- South Africa and Nicargua faces the least number of terrorist attacks.



#### Chart - 3  Region via Number of attacks

In [None]:
terror_region = pd.crosstab(df_1['iyear'],df_1['region_txt'])
terror_region.plot(color= sns.color_palette('bright',12),kind='area')
fig = plt.gcf()
fig.set_size_inches(15,6)
plt.title('Variation in the Number of Terrorist Activities Across Regions Over the Year')
plt.ylabel('Number of Attacks')
plt.xlabel('Year')
plt.show()

##### 1. Why did you pick the specific chart?

**Answer Here.**

It effectively displays the overall trend of terrorist activities for each region over the years. The filled areas provide a clear visual representation of the rise and fall in the number of attacks.

#### Chart - 4   Most Frequent Used Weapons in Attacks

In [None]:
# Chart - 4 visualization code
ax= plt.rcParams['figure.figsize'] = (12, 5)


ax = sns.countplot(x='weaptype1_txt', data=df_1,order=df_1['weaptype1_txt'].value_counts().index[:6], palette='colorblind')

ax.set(xlabel='Weapon', ylabel='Count')
plt.xticks(rotation=75)
ax.set_title('Most Frequent Used Weapon in Attacks ', fontsize=15)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

Because we are trying to visualize the count of different weapon types used in attacks.

##### 2. What is/are the insight(s) found from the chart?

Most frequent weapon used in terrorist attack is Explosives followed by Firearms and least used is chemical.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5  Most Frequent Target of Terrorist.





In [None]:
# Chart - 5 visualization code
ax= plt.rcParams['figure.figsize'] = (12, 5)


ax = sns.countplot(x='targtype1_txt', data=df_1,order=df_1['targtype1_txt'].value_counts().index, palette='flare')

ax.set(xlabel='Type of Targets', ylabel='Count')
plt.xticks(rotation=75)
ax.set_title('Most Frequent Target of Terrorist', fontsize=15)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

Countplots allow for easy comparison of the frequency or count of each target type. This helps quickly identify the most and least targeted categories.

##### 2. What is/are the insight(s) found from the chart?

Most Frequent target of terrorist is Private Citizens & Property, followed by Military and then Police.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here.

  Knowing that private citizens and property are the most frequent targets of terrorism highlights the widespread impact of these attacks and the need for comprehensive security measures.

This understanding can inform strategies for:
- Increased security measures in public areas, transportation hubs, and other places where people gather.
- Building partnerships between law enforcement and communities to enhance vigilance and response capabilities.

 This could help in creating a business with potential development and growth in these areas alongwith providing the citizens with employment opportunities.  


#### Chart - 6  Most Frequent Type of Attacks

In [None]:
# Chart - 6 visualization code
ax= plt.rcParams['figure.figsize'] = (12, 5)


ax = sns.countplot(x='attacktype1_txt', data=df_1,order=df_1['attacktype1_txt'].value_counts().index, palette='flare')

ax.set(xlabel='Type of Attack', ylabel='Count')
plt.xticks(rotation=75)
ax.set_title('Most Frequent Type of Attacks ', fontsize=15)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Most frequent type of attack is Bombing/Explosion, followed by Armed Assault and then Assasination.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

In [None]:
df_1['gname'].value_counts()

#### Chart - 7  Most Number of Attacks Done By Terrorist Organisation

In [None]:
# Chart - 7 visualization code
ax= plt.rcParams['figure.figsize'] = (16, 8)


ax = sns.countplot(y='gname', data=df_1,order=df_1['gname'].value_counts().index[1:50], palette='flare')

ax.set(xlabel='Type of Attack', ylabel='Count')
plt.xticks(rotation=75)
ax.set_title('Most Frequent Type of Attacks ', fontsize=15)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Most number of terrorist attacks done by Taliban, followed by Islamic State Of Iraq And Levant(ISIL), and then by Shining Path(SL).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

In [None]:
print('\n')
print(('*')*20)
print('Value count of Object Type Category which if greater than 50')
print('\n')
print(('*')*20)
cat_col= df_1.select_dtypes('object').columns.to_list()
for col in cat_col:
  if len(df_1[col].value_counts()) < 50:
    print(df_1[col].value_counts())
    print('\n')
    print(('*')*20)

#### Chart - 8  Showing Successful Terrorist Attack

In [None]:
# Chart - 8 visualization code
ax= df_1['success'].value_counts().plot(kind='pie')
ax.set_title('Successful Terrorist Attack',fontsize=12)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9  Number of Multiple Attack

In [None]:
# Chart - 9 visualization code
ax= df_1['multiple'].value_counts().plot(kind='pie')
ax.set_title('Number of Multiple Attack',fontsize=12)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10  Number Of Suicide Attack

In [None]:
# Chart - 10 visualization code
ax= df_1['suicide'].value_counts().plot(kind='pie')
ax.set_title('Number of Suicide Attack',fontsize=12)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

# Countries Performance Against Terrorist Attacks

In [None]:
success_country= df_1.groupby('country_txt').agg({'eventid':'count','success':'sum'}).reset_index()
success_country.columns=['Country','Total_Attacks','Attack_Success']
success_country['Attack_Failed']=success_country['Total_Attacks']-success_country['Attack_Success']
success_country['Failure_Rate_Attack']=(success_country['Attack_Failed']/success_country['Total_Attacks'])*100
success_country['Success_Rate_Attack']=(success_country['Attack_Success']/success_country['Total_Attacks'])*100
success_country= success_country.round(2)


## Top 10 Countries Able to Successfully Tackle Terrorist Attack

In [None]:
able_stop_terror_attack = success_country[success_country['Total_Attacks']>=5].sort_values('Failure_Rate_Attack',ascending=False)[:10]
plot_able_stop_terror_attack = able_stop_terror_attack.loc[:,['Country','Failure_Rate_Attack']]
ax=plt.rcParams['figure.figsize']=12,5
ax=sns.barplot(x='Country',y='Failure_Rate_Attack',data=plot_able_stop_terror_attack,palette='viridis')
ax.set(xlabel='Countries', ylabel='Rate of Failure of Terror Attack')
plt.xticks(rotation=75)
ax.set_title('Top 10 Countries Able to Scuccessfully Tackle Terrorist Attack("More Than 5 Attack")', fontsize=15)
plt.show()



Country who successfully tackle terrorist attack is Brunei, followed by Ireland and then by New Zealand.

## Top 10 Countries Unable to Tackle Terrorist Attack

In [None]:
unable_stop_terror_attack = success_country[success_country['Total_Attacks']>=5].sort_values('Success_Rate_Attack',ascending=False)[:10]
plot_unable_stop_terror_attack = unable_stop_terror_attack.loc[:,['Country','Success_Rate_Attack','Total_Attacks']]
ax=plt.rcParams['figure.figsize']=12,5
ax=sns.barplot(x='Country',y='Success_Rate_Attack',data=plot_unable_stop_terror_attack,palette='viridis')
ax.set(xlabel='Countries', ylabel='Rate of Success of Terror Attack')
plt.xticks(rotation=75)
ax.set_title('Top 10 Countries Unable to Tackle Terrorist Attack("More Than 5 Attack")', fontsize=15)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Country which is not able to tackle terrorist attack is Benin(South Africa), followed by Bhutan and then by Djibouti(East Africa).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

## Trends in Terrorist Attack

In [None]:
attack_per_year=df_1.groupby(['iyear','targtype1_txt','target1','success','weaptype1_txt','gname'])['eventid'].count().to_frame(name='Total_Attacks').reset_index()


In [None]:
ax= sns.relplot(col="weaptype1_txt",y="Total_Attacks",col_wrap=4,hue='weaptype1_txt',x="iyear",kind='line',ci=None,data=attack_per_year,height=4,aspect=1)
ax.set_xticklabels(rotation=45)
plt.show()

#### Chart - 12

In [None]:
# Chart - 12 visualization code
ax= sns.relplot(col="targtype1_txt",y="Total_Attacks",col_wrap=5,hue='targtype1_txt',x="iyear",kind='line',ci=None,data=attack_per_year,height=4,aspect=1)
ax.set_xticklabels(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code
ax = sns.lineplot(x="iyear",y="Total_Attacks",hue='success',ci=None,data=attack_per_year)
ax.tick_params(rotation=45)
plt.show()

In [None]:
ax = sns.lineplot(x="iyear",y="Total_Attacks",ci=None,data=attack_per_year)
ax.tick_params(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***