<a href="https://colab.research.google.com/github/SiddharthDNathan/Global-Terrorism-EDA/blob/main/Global_Terrorism_EDA_Submission_according_to_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Global Terrorism EDA



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual

# **Project Summary -**

Write the summary here within 500-600 words.

Summary:

The Global Terrorism Database™ (GTD), maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism (START) at the University of Maryland, serves as a comprehensive repository of terrorism-related events worldwide since 1970. With over 200,000 meticulously documented records, the GTD provides invaluable insights into the patterns, causes, and consequences of terrorist activities, facilitating research, analysis, and informed decision-making for various stakeholders.

The methodology employed in compiling the GTD is rooted in transparency and rigor. The database follows a systematic data collection approach, adhering to predefined inclusion criteria and robust definitional filtering mechanisms to ensure consistency and accuracy. Through periodic updates and revisions, the GTD continuously evolves to reflect the dynamic nature of terrorism, incorporating new data and methodologies to enhance its utility and relevance.

The GTD encompasses a diverse range of variables, organized into categories such as incident details, attack methods, target demographics, perpetrator characteristics, casualty information, and more. Each variable is meticulously defined, allowing users to conduct nuanced analyses and derive meaningful insights into the multifaceted nature of terrorism worldwide.

Key features of the GTD include its public availability for search, browsing, and download on the GTD website, along with a commercial distribution partnership established in 2019 with CHC Global. However, access to the database is subject to the terms of the End User License Agreement, ensuring responsible use and adherence to ethical guidelines in handling sensitive information related to terrorism.

For researchers, policymakers, and practitioners alike, the GTD serves as a powerful tool for understanding terrorism dynamics, identifying emerging trends and hotspots, evaluating the effectiveness of counterterrorism measures, and informing evidence-based strategies to mitigate the impact of terrorism on societies and communities worldwide.

In conclusion, the GTD stands as a testament to the collaborative efforts of academia, government agencies, and research institutions in addressing one of the most pressing challenges of our time. By providing a comprehensive and reliable source of terrorism data, the GTD empowers stakeholders to advance knowledge, foster dialogue, and ultimately contribute to a safer and more secure world.



# **GitHub Link -**

https://github.com/SiddharthDNathan/Global-Terrorism-EDA


# **Problem Statement**


**To explore and analyze the data to discover the statistics of various incidents of Global Terrorism**

#### **Define Your Business Objective?**

**To gather insights to help in decision making for prevention of such future incidents**

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Dataset Loading

In [None]:
# Load Dataset
df = pd.read_csv('/content/drive/MyDrive/TerrorismData.csv', encoding = 'iso-8859-1')

### Dataset First View

In [None]:
# Dataset First Look
df

In [None]:
df.head()

In [None]:
df.tail()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isna().sum()

### What did you know about your dataset?

The Global Terrorism Database™ (GTD), maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism (START) at the University of Maryland, is an event-level database documenting over 200,000 terrorist incidents worldwide since 1970. This codebook, which is updated periodically, outlines the GTD's data collection methodology, inclusion criteria, and detailed variable definitions, aiming for transparency and consistency in documenting terrorist attacks. It encompasses various data points such as incident details, attack methods, target information, perpetrator data, and consequences, among others. The GTD is available for public use on its website, with commercial distribution managed by CHC Global since 2019, subject to an End User License Agreement. The GTD team encourages feedback and inquiries via their email to enhance the utility and accuracy of the database for users interested in analyzing terrorism trends and impacts.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Check Unique Values for each variable.

In [None]:
for col in df.columns:
  print(col, len(df[col].unique()))

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# The above df consists of 135 columns for unique categories of data which are not all usefull for further analysis, hence cleaning this data by generating a new df that consists of just the columns which are of use to us will be our step number 1 of DATA CLEANING ( we will use code book for this step).
# generating a base data for analysis from primary raw data, this new df consists of select columns from primary df which are actuall needed for further analysis and answer question and providing insights.
df = df.loc[:, ['eventid','iyear','imonth','iday','extended','country_txt','region_txt','city','latitude','longitude','vicinity','crit1','multiple','success','suicide','attacktype1_txt',
'targtype1_txt','natlty1_txt','gname','nperps','claimed','weaptype1_txt','nkill','nkillter','nwound','propextent_txt','ishostkid','ransom','nreleased']]

In [None]:
# Write your code to make your dataset analysis ready.
#renaming column for better understanding of data they contain.

df.rename(columns =
                  {'iyear':'year',
                   'imonth':'month',
                   'iday':'day',
                   'country_txt' : 'country',
                   'region_txt' : 'region',
                   'crit1' : 'crit',
                   'attacktype1_txt' : 'attacktype',
                   'targtype1_txt' : 'targettype',
                   'natlty1_txt' : 'nationalityofvic',
                   'gname' : 'organisation',
                   'claimed' : 'claimedresp',
                   'weaptype1_txt' : 'weapontype',
                   'nkill' : 'nkilled',
                   'nkillter' : 'nkillonlyter',
                   'nwound' : 'nwounded',
                   'propextent_txt' : 'propdamageextent',
                   'ishostkid' : 'victimkidnapped',
                   'ransom' : 'ransomdemanded',
                   }, inplace = True)

### What all manipulations have you done and insights you found?

The original dataframe consists of 135 columns for unique categories of data which are not all usefull for further analysis, hence cleaning this data by generating a new df that consists of just the columns which are of use to us was my step number 1 of DATA CLEANING ( we will use code book for this step)

Furthermore the variable names were changed to improve readability.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

In [None]:
df #Final data frame for analysis and visualization

#### Chart - 1

In [None]:
# Chart - 1 visualization code
#Analysis of number of attacks per year
f = plt.figure(figsize=(20, 7))
xaxis = sns.countplot(x = 'year', data = df)
plt.ylabel('Count', fontsize=12)
plt.xlabel('Year', fontsize=12)
plt.title('Number of Terrorist Attack by Year', fontsize = 12)

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code
#Analysis of number of Attacks per Region
sns.countplot(y='region', data=df)
plt.ylabel('Region')
plt.xlabel('Count')
plt.title('Number of Terrorist Attack by Region')

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code
#Analysis of number of Attacks per Region
sns.countplot(y='region', data=df)
plt.ylabel('Region')
plt.xlabel('Count')
plt.title('Number of Terrorist Attack by Region')

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code
#Top 10 most attacked countries by percentage
country_counts = df['country'].value_counts().head(10)
plt.figure(figsize=(10, 10))
sns.set(font_scale=1.2)
sns.set_palette('pastel')
country_counts.plot.pie(autopct='%1.1f%%', startangle=140)
plt.title('Percentage of Terrorist Attacks by Country')
plt.ylabel('')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code
f = plt.figure(figsize=(20, 8))

sns.set(font_scale=0.8)
xaxis = sns.countplot(x='targettype', data=df,)

xaxis.set_xticklabels(xaxis.get_xticklabels(), rotation=60)
plt.xlabel('Target Types', fontsize=12)
plt.ylabel('Count', fontsize=12)
plt.title('Types of Target', fontsize=12)

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code
df['total_casualties'] = df['nkilled'] + df['nwounded']
plt.figure(figsize=(10,6))
sns.histplot(df['total_casualties'], bins=30, kde=True)
plt.title('Distribution of Total Casualties in Terrorist Attacks')
plt.xlabel('Total Casualties')
plt.ylabel('Frequency')
plt.xlim(0, 200)  # Limiting for better visualization
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code
plt.figure(figsize=(12,7))
sns.countplot(y='weapontype', data=df, order=df['weapontype'].value_counts().index)
plt.title('Weapon Types Used in Terrorist Attacks')
plt.xlabel('Number of Attacks')
plt.ylabel('Weapon Type')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code
kidnapping_df = df[df['victimkidnapped'] == 1]
plt.figure(figsize=(15,6))
sns.countplot(x='year', data=kidnapping_df)
plt.xticks(rotation=90)
plt.title('Kidnapping Incidents Over Years')
plt.ylabel('Number of Kidnappings')
plt.xlabel('Year')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code
claimed_attacks = df[df['claimedresp'] == 1]
plt.figure(figsize=(12,7))
sns.countplot(y='organisation', data=claimed_attacks, order=claimed_attacks['organisation'].value_counts().iloc[:10].index)
plt.title('Top 10 Organizations by Number of Claimed Attacks')
plt.xlabel('Number of Attacks')
plt.ylabel('Organisation')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code
plt.figure(figsize=(12,7))
sns.countplot(y='propdamageextent', data=df, order=df['propdamageextent'].value_counts().index)
plt.title('Extent of Property Damage in Terrorist Attacks')
plt.xlabel('Number of Incidents')
plt.ylabel('Damage Extent')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code
plt.figure(figsize=(15,6))
df.groupby('year')['nkilled'].sum().plot(kind='line', color='red', marker='o')
plt.title('Yearly Fatalities Due to Terrorism')
plt.xlabel('Year')
plt.ylabel('Number of Fatalities')
plt.grid(True)
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code
plt.figure(figsize=(12,7))
sns.boxplot(x='nkilled', y='attacktype', data=df)
plt.title('Distribution of Fatalities by Attack Type')
plt.xlabel('Number of Fatalities')
plt.ylabel('Attack Type')
plt.xlim(0, 50)  # Limiting x-axis for better readability
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code
plt.figure(figsize=(10,7))
plt.hexbin(df['nkilled'], df['nwounded'], gridsize=50, cmap='Reds', mincnt=1)
plt.colorbar(label='Number of Attacks')
plt.title('Density of Terrorist Attacks by Fatalities and Injuries')
plt.xlabel('Number of Fatalities')
plt.ylabel('Number of Injuries')
plt.xlim(0, 100)  # Limit this to focus on the more common outcomes
plt.ylim(0, 100)
plt.show()



##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(20,20))
dataplot = sns.heatmap(df.corr(), cmap="YlGnBu", annot=True)

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
import seaborn as sns
import matplotlib.pyplot as plt

# Assuming df contains your dataset
# Selecting numerical columns
numerical_cols = ['nkilled', 'victimkidnapped', 'ransomdemanded']

# Subset the DataFrame with numerical columns
df_numerical = df[numerical_cols]

# Create the pairplot
sns.pairplot(df_numerical)
plt.suptitle('Pairplot of Numerical Variables', y=1.02)
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?

 Data Integration: Ensure comprehensive integration of terrorism data with internal business data (e.g., operational, supply chain, and employee data) for a holistic risk assessment.

Continuous Monitoring: Implement a robust monitoring system to continuously track and analyze terrorism data, enabling timely identification of emerging threats and proactive risk management.

Investment in Security Measures: Allocate resources towards implementing security measures such as surveillance systems, access controls, and employee training to enhance resilience against potential terrorist threats.

Scenario Planning: Develop scenario-based contingency plans considering various terrorism-related scenarios to minimize disruption to business operations and ensure continuity.

Regular Reviews: Conduct periodic reviews and updates of risk assessments and mitigation strategies in response to evolving terrorism trends and changes in the business environment.

# **Conclusion**

In conclusion, leveraging comprehensive analysis of terrorism data enables businesses to proactively assess risks, implement effective security measures, and safeguard operations, thereby fostering resilience and ensuring continued success in dynamic operating environments.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***