# **Project Name**    -



##### **Project Type**    -EDA
##### **Contribution**    -Individual
##### **Auther**      - Deepak Choudhary

# **Project Summary -**


The objective of this project is to conduct an exploratory data analysis (EDA) of the Global Terrorism Database (GTD), an open-source dataset that contains comprehensive information on both domestic and international terrorist attacks occurring globally from 1970 through 2017. Developed and maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism (START) at the University of Maryland, the database encompasses details of over 180,000 recorded terrorist incidents. The project's principal aim is to dig deep into this expansive dataset, identify significant trends, patterns, and insights pertaining to terrorism-related activities, and present these discoveries visually for an enhanced understanding.

A critical aspect of this project is the extensive use of Python libraries tailored for data analysis and visualization. The cornerstone of data manipulation, including loading the dataset, cleaning data, and executing sophisticated aggregation operations, will be the Pandas library. This powerful, high-performance tool offers efficient data structures and makes the handling of large datasets effortless.

To facilitate advanced numerical operations and speed up computation, the project employs the NumPy library. Given its proficiency in handling multi-dimensional arrays and matrices, NumPy is the perfect companion for data processing operations.

The project doesn't stop at numerical data analysis; it brings the extracted insights to life through vivid, informative visualizations, courtesy of the Matplotlib and Seaborn libraries. These libraries provide an array of visualization styles, enabling the display of data in ways that are both appealing and informative. From bar plots and scatter plots to histograms and heatmaps, the project will utilize a minimum of five different visualizations to reveal relationships between variables and provide a graphical representation of the dataset's characteristics.

Exploring the GTD through this project will pave the way for an intricate understanding of terrorism patterns over the past decades. The goal is to unveil potential trends in attack frequency, most targeted countries, preferred methods of attack, types of weapons used, casualties, and the evolution of terrorist organizations, among other relevant dimensions.

By examining these factors, the project aims to provide a detailed overview of global terrorism trends, informing counter-terrorism strategies and policies. Additionally, the findings may also help understand the characteristics of regions prone to attacks and the reasons behind their vulnerability.

In conclusion, this project offers a data-driven exploration into the dark world of terrorism, aiming to shed light on the complex patterns hidden within the enormity of the GTD. The end product of this project will be an array of valuable insights that have the potential to contribute substantially to ongoing counter-terrorism efforts and inform future research in this field. The combination of data manipulation, numerical computation, and graphic visualization is expected to yield a robust and comprehensive exploration of the dataset, leading to substantial key findings pertaining to global terrorism.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


Using exploratory data analysis (EDA) techniques on the GTD, identify the hot zones of terrorism globally and discern the evolving patterns of terrorist activities. What insights related to security issues can be derived from this analysis that could be instrumental in shaping counter-terrorism strategies?



***# Define Your business Objective?***

The business objective of this project is to leverage the data contained within the Global Terrorism Database (GTD) to derive actionable insights into terrorist activities worldwide from 1970 to 2017. By conducting a comprehensive exploratory data analysis (EDA), the goal is to identify the key patterns, trends, and correlations related to global terrorism, thereby enabling better-informed decision-making for security analysts, policy-makers, and counter-terrorism agencies.

Specifically, the objectives include:

Identification of global "hot zones" for terrorist activities: By determining the most affected regions, we can better understand where resources might be best allocated to prevent future attacks.

Analysis of frequency and intensity of attacks: Understanding how these have evolved over time can provide insights into the changing dynamics of terrorism and allow for more accurate risk assessments.

Examination of methodologies and weapons used in attacks: This can shed light on the operational preferences of terrorist organizations and potentially provide early indicators of future threats.

Assessment of casualty trends: This can help identify the most devastating types of attacks and allow for targeted response planning to minimize human loss.

Unveiling patterns related to terrorist organizations: This can potentially aid in understanding their strategies, thereby supporting intelligence agencies in their counter-terrorism efforts.



# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]

















# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px


### Dataset Loading

In [None]:
#load dataset
from google.colab import drive
drive.mount('/content/drive/')

In [None]:
data_path = "/content/drive/MyDrive/alma beter data set/Global Terrorism Data.csv"
df = pd.read_csv(data_path, encoding='ISO-8859-1')

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
rows, cols = df.shape
print(f'There are {rows} rows and {cols} colums in the dataset.')

### Dataset Information

In [None]:
# Dataset Info
df.info

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_rows = df.duplicated().sum()
print(f'there are {duplicate_rows} duplicate_rows in the data set')

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_values = df.isnull().sum()
print(f'there are {missing_values} missing_values in the data set')
#print(missing_values)

In [None]:
# Visualizing the missing values
import missingno as msno
#visualizing the missing values as matrix
msno.matrix(df)

### What did you know about your dataset?

Answer Here

Dataset Size: The dataset is quite large, containing 181,691 entries or rows.

Feature Quantity: The dataset contains 135 features or columns.

Data Types: The dataset has a mix of data types. There are 55 features with floating point numbers (float64), 22 features with integers (int64), and 58 features with objects (object). The object datatype in pandas typically means the column contains string (text) data.

Memory Usage: The dataset uses over 187.1 MB of memory.

Missing Values: There are some columns with a large number of missing values. For example, the 'approxdate' column has 172,452 missing values and the 'related' column has 156,653 missing values. However, several columns do not have any missing values, such as 'eventid', 'iyear', 'imonth', 'iday', 'INT_LOG', 'INT_IDEO', 'INT_MISC', and 'INT_ANY'.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
columns = df.columns
print("columns in the dataset")
for column in columns:
  print(column)

In [None]:
# Dataset Describe
summary = df.describe()
print(summary)


### Variables Description

eventid: Unique ID for each event or terrorist attack.

iyear: Year the terrorist attack occurred.

imonth: Month the terrorist attack occurred.

iday: Day the terrorist attack occurred.

country_txt: Name of the country where the terrorist attack occurred.

region_txt: Name of the region where the terrorist attack occurred.

city: City where the terrorist attack occurred.

attacktype1_txt: The general method of attack employed.

target1: The specific person, building, installation, etc., that was targeted.

nkill: Number of confirmed fatalities for the incident.

nwound: Number of confirmed non-fatal injuries.

gname: Name of the group that carried out the attack.

Check Unique Values for each variable.Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_countries = df["country_txt"].unique()
print(unique_countries)

print() #this will leave gap

unique_years = df['iyear'].unique()
print(unique_years)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
print(df.isnull().sum())

In [None]:
pd.set_option("display.max_rows" , None)
print(df.dtypes)

In [None]:
pd.reset_option('display.max_rows')

In [None]:
df.rename(columns={'iyear':'Year','imonth':'Month','iday':'Day','country_txt':'Country','provstate':'state','region_txt':'Region','attacktype1_txt':'AttackType','target1':'Target','nkill':'Killed','nwound':'Wounded','summary':'Summary','gname':'Group','targtype1_txt':'Target_type','weaptype1_txt':'Weapon_type','motive':'Motive'},inplace=True)

In [None]:
data=df[['Year','Month','Day','Country','state','Region','city','latitude','longitude','AttackType','Killed','Wounded','Target','Summary','Group','Target_type','Weapon_type','Motive']]

In [None]:
data.head()

### What all manipulations have you done and insights you found?

Since it contains 135 columns. They have a huge proportion in dataset and Learning them doesn't make any sense. So, we will rename the columns name for better understaning and then we will only extract necessary columns.Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
plt.figure(figsize=(15,7))
sns.countplot(data=data, x='Year')
plt.title('countof terrorist activites each year')
plt.xticks(rotation=90)
plt.show()




##### 1. Why did you pick the specific chart?

a linr plot was chosen because it proviide an excilent visual reprasentation of the trend over time .

##### 2. What is/are the insight(s) found from the chart?

the insight that can be gained is the trend over the years. we can see of the freuency of attach is increasin ,decreasing or remaing relativly statble

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

yes,These insights are crucial for predicting future trends, which could help law enforcement and security agencies plan resources and strategies. However, if the trend shows an increase in terrorist activities, this could lead to a negative impact as it indicates a growing problem.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
plt.figure(figsize=(15,7))
sns.countplot(data=data, x='Region')
plt.title('Count of terrorist activities by region')
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

A bar plot is suitable for categorical data and helps in comparing the number of terrorist activities in each region.



##### 2. What is/are the insight(s) found from the chart?

We can see which regions experience the most terrorist activities, providing insight into geographical hotspots of terrorism.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This information is useful for focusing resources and counter-terrorism efforts on the most affected areas. A high frequency of attacks in a particular region could discourage investment and tourism, leading to negative growth.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
plt.figure(figsize=(15,7))
sns.lineplot(data=data, x="Year", y="Killed", estimator="sum")
plt.title("Number of people killed by terror attack")
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

A line plot was chosen to observe the

List item
List item
trend of casualties over time.

##### 2. What is/are the insight(s) found from the chart?

::The insight is the severity of terrorist activities over the years in terms of human lives lost.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This could influence policy making, disaster management planning, insurance, and healthcare provisions. An increasing trend could lead to negative growth by discouraging population stability, investment, and development.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
plt.figure(figsize=(17,7))
sns.countplot(data=data, x='AttackType')
plt.title('Attack types')
plt.show

##### 1. Why did you pick the specific chart?

\ A bar plot is used to compare the frequencies of different categories - in this case, attack types.

##### 2. What is/are the insight(s) found from the chart?

We can learn about the most commonly used methods in terrorist attacks.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

These insights can help in developing and implementing measures to prevent and respond to these specific types of attacks. If certain types of attacks are prevalent, it may signify a failure to adequately address those threats, possibly leading to negative impacts.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
# Chart - 5 visualization code
plt.figure(figsize=(15,7))

# Get the count of each group
group_data = data[data['Group'] != 'Unknown']['Group'].value_counts().head(10)

# Use this count for plotting
sns.barplot(x=group_data.index, y=group_data.values)

plt.title('Top 10 terrorist groups with highest no of attacks')
plt.xticks(rotation=90)
plt.xlabel('Group')
plt.ylabel('Count')
plt.show()


##### 1. Why did you pick the specific chart?

A bar plot is suitable for comparing the number of attacks by different terrorist groups.

##### 2. What is/are the insight(s) found from the chart?


We can identify which groups are responsible for the most terrorist activities.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This information could be important for intelligence agencies in prioritizing threats and focusing their counter-terrorism efforts. If a particular group is increasingly active, it could contribute to instability and negative growth.

**5**. **Solution** **to** **Business** **Objective**

What do you suggest the client to achieve Business Objective ?

Based on the exploratory data analysis conducted on the Global Terrorism Dataset, there are several recommendations that could be provided to a client interested in using this information to decrease the impact of terrorism, and thereby meet the stated business objective.

Focus on Hotspot Regions: The regions with the highest frequencies of terrorist activities should be prioritized for intervention efforts. These regions may need more robust security measures, targeted socio-economic programs to address root causes of terrorism, or more substantial international assistance.

Understand Yearly Trends: Keeping track of the rise or fall of terrorist incidents over the years could help forecast potential future threats and adjust counter-terrorism strategies accordingly.

Prioritize Major Threat Groups: Our analysis shows that certain terrorist groups are more active than others. Intelligence efforts should be concentrated on these high-impact groups to prevent future attacks.

Target Most Common Attack Types: Understanding the most common types of attacks used by terrorists can help in developing preventive measures and response strategies. For instance, if bombings are the most common attack type, more resources could be directed towards bomb detection and disposal.

# **Conclusion**

The Exploratory Data Analysis (EDA) conducted on the Global Terrorism Dataset provided significant insights into trends and patterns in global terrorism from 1970 through 2017. With the help of the Python libraries Pandas, Matplotlib, Seaborn, and NumPy, we were able to handle, visualize and interpret complex data related to terrorist activities.

Through this analysis, we identified trends over time, regional hotspots, dominant terrorist groups, and preferred modes of attacks. All these findings are crucial for devising effective counter-terrorism strategies and interventions.

The process underscored the power of data-driven decision-making. By using EDA, we were able to transform raw data into meaningful insights. For instance, understanding that certain regions are more prone to terrorist attacks or that specific terrorist groups are more active allows security agencies and policymakers to allocate resources more efficiently, thereby potentially saving lives and property.

However, while this data analysis provides a robust foundation, it's important to acknowledge that addressing terrorism requires more than just understanding past data. It necessitates a comprehensive approach that includes current intelligence, geopolitical considerations, and on-the-ground realities.

To conclude, this project demonstrates the potential of data analysis in informing and shaping counter-terrorism efforts. It provides a useful starting point for further study and action, emphasizing the importance of continuous data collection, analysis, and interpretation in tackling global security challenges like terrorism.

### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***