# **Project Name**    -



##### **Project Type**    - EDA(Global Terrorism Dataset)
##### **Contribution**    - Individual
##### **Author -** Kapil Musle

# **Project Summary -**

The objective of this project is to conduct an exploratory data analysis (EDA) of the Global Terrorism Database (GTD), an open-source dataset that contains comprehensive information on both domestic and international terrorist attacks occurring globally from 1970 to 2017. Developed and maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism (START) at the University of Maryland, the database encompasses details of over 180,000 recorded terrorist incidents. The project's principal aim is to dig deep into this expansive dataset, identify significant trends, patterns, and insights pertaining to terrorism-related activities, and present these discoveries visually for an enhanced understanding.

A critical aspect of this project is the extensive use of Python libraries tailored for data analysis and visualization. The cornerstone of data manipulation, including loading the dataset, cleaning data, and executing sophisticated aggregation operations, will be the Pandas library. This powerful, high-performance tool offers efficient data structures and makes the handling of large datasets effortless.

To facilitate advanced numerical operations and speed up computation, the project employs the NumPy library. Given its proficiency in handling multi-dimensional arrays and matrices, NumPy is the perfect companion for data processing operations.

The project doesn't stop at numerical data analysis; it brings the extracted insights to life through vivid, informative visualizations, courtesy of the Matplotlib and Seaborn libraries. These libraries provide an array of visualization styles, enabling the display of data in ways that are both appealing and informative. From bar plots and scatter plots to histograms and heatmaps, the project will utilize a minimum of five different visualizations to reveal relationships between variables and provide a graphical representation of the dataset's characteristics.

Exploring the GTD through this project will pave the way for an intricate understanding of terrorism patterns over the past decades. The goal is to unveil potential trends in attack frequency, most targeted countries, preferred methods of attack, types of weapons used, casualties, and the evolution of terrorist organizations, among other relevant dimensions.

By examining these factors, the project aims to provide a detailed overview of global terrorism trends, informing counter-terrorism strategies and policies. Additionally, the findings may also help understand the characteristics of regions prone to attacks and the reasons behind their vulnerability.

In conclusion, this project offers a data-driven exploration into the dark world of terrorism, aiming to shed light on the complex patterns hidden within the enormity of the GTD. The end product of this project will be an array of valuable insights that have the potential to contribute substantially to ongoing counter-terrorism efforts and inform future research in this field. The combination of data manipulation, numerical computation, and graphic visualization is expected to yield a robust and comprehensive exploration of the dataset, leading to substantial key findings pertaining to global terrorism.



# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


By employing exploratory data analysis (EDA) methods on the Global Terrorism Database (GTD), the goal is to pinpoint global hotspots of terrorism and understand the changing trends in terrorist activities. Through this analysis, we aim to extract insights pertinent to security concerns, which could play a crucial role in formulating effective counter-terrorism strategies.

As a security/defence analyst, try to find out the hot zone of terrorism. What all security issues and insights you can derive by EDA?

#### **Define Your Business Objective?**

The business objective of this project is to leverage the data contained within the Global Terrorism Database (GTD) to derive actionable insights into terrorist activities worldwide from 1970 to 2017. By conducting a comprehensive exploratory data analysis (EDA), the goal is to identify the key patterns, trends, and correlations related to global terrorism, thereby enabling better-informed decision-making for security analysts, policy-makers, and counter-terrorism agencies.

Specifically, the objectives include:

Identification of global "hot zones" for terrorist activities: By determining the most affected regions, we can better understand where resources might be best allocated to prevent future attacks.

Analysis of frequency and intensity of attacks: Understanding how these have evolved over time can provide insights into the changing dynamics of terrorism and allow for more accurate risk assessments.

Examination of methodologies and weapons used in attacks: This can shed light on the operational preferences of terrorist organizations and potentially provide early indicators of future threats.

Assessment of casualty trends: This can help identify the most devastating types of attacks and allow for targeted response planning to minimize human loss.

Unveiling patterns related to terrorist organizations: This can potentially aid in understanding their strategies, thereby supporting intelligence agencies in their counter-terrorism efforts.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive/')

In [None]:
import pandas as pd
# Load the dataset from a CSV file with 'latin1' encoding
df = pd.read_csv('/content/drive/MyDrive/Project/Mod 2 (Numerical prog in python)/Global Terrorism Data.csv', encoding='latin1')


### Dataset First View

In [None]:
# Dataset First Look
df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
rows, cols = df.shape
print(f"The dataset has {rows} rows and {cols} columns")

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_rows = df.duplicated().sum()
print(f"There are {duplicate_rows} duplicate_rows in the dataset")

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_values = df.isnull().sum()
print(missing_values)

In [None]:
# Visualizing the missing values
plt.figure(figsize=(8, 6))
sns.heatmap(df.isnull(), cmap='viridis', cbar=False)
plt.title('Missing Values Heatmap')
plt.show()

### What did you know about your dataset?

**Dataset Size:** The dataset is quite large, containing 181,691 entries or rows.

**Feature Quantity:** The dataset contains 135 features or columns.

**Data Types:** The dataset has a mix of data types. There are 55 features with floating point numbers (float64), 22 features with integers (int64), and 58 features with objects (object). The object datatype in pandas typically means the column contains string (text) data.

**Memory Usage:** The dataset uses over 187.1 MB of memory.

**Missing Values:** There are some columns with a large number of missing values. For example, the 'approxdate' column has 172,452 missing. Values and the 'related' column has 156,653 missing values. However, several columns do not have any missing values, such as 'eventid', 'iyear', 'imonth', 'iday', 'INT_LOG', 'INT_IDEO', 'INT_MISC', and 'INT_ANY'.

**Duplicate Values:** There are no duplicate values in the dataset.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
columns = df.columns

print("Columns in the dataset: ")

for column in columns:
  print(column)

In [None]:
# Dataset Describe:  dataset summary is provided in the Data wrangling section

### Variables Description

**eventid:** Unique ID for each event or terrorist attack.

**iyear:** Year the terrorist attack occurred.

**imonth:** Month the terrorist attack occurred.

**iday:** Day the terrorist attack occurred.

**country_txt:** Name of the country where the terrorist attack occurred.

**region_txt:** Name of the region where the terrorist attack occurred.

**city:** City where the terrorist attack occurred.

**attacktype1_txt:** The general method of attack employed.

**target1:** The specific person, building, installation, etc., that was targeted.

**nkill:** Number of confirmed fatalities for the incident.

**nwound:** Number of confirmed non-fatal injuries.

**gname:** Name of the group that carried out the attack.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_country = df['country_txt'].unique()
print(unique_country)

print()                 #this satement leaves a gap in the output.

unique_years = df['iyear'].unique()
print(unique_years)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Selecting the relevent columns for our anaysis:
df = df[["iyear","imonth","iday","country_txt","region_txt","provstate","city",
       "latitude","longitude","location","summary","attacktype1_txt","targtype1_txt",
       "gname","motive","weaptype1_txt","nkill","nwound","addnotes"]]
df

In [None]:
# renaming the columns to their appropriate names:
df.rename(columns={"iyear":"Year","imonth":"Month","iday":"Day","country_txt":"Country",
                   "region_txt":"Region","provstate":"Province/State","city":"City",
                   "latitude":"Latitude","longitude":"Longitude","location":"Location",
                   "summary":"Summary","attacktype1_txt":"Attack Type","targtype1_txt":"Target Type",
                   "gname":"Group Name","motive":"Motive","weaptype1_txt":"Weapon Type",
                   "nkill":"Killed","nwound":"Wounded","addnotes":"Add Notes"},inplace=True)

In [None]:
df.head()

In [None]:
df.tail()

In [None]:
df.info()

In [None]:
# checking the shape of the modified dataset.
rows, cols = df.shape
print(f"There are {rows} rows and {cols} columns in the modified dataset")

In [None]:
# Checking for null values in the modified dataset
df.isnull().sum()

In [None]:
# Manipulating some missing columns
df['Killed'] = df['Killed'].fillna(0)
df['Wounded'] = df["Wounded"].fillna(0)

In [None]:
# Creating a new column "Casualty"
df['Casualty'] = df['Killed']+df['Wounded']
df.tail()

In [None]:
#dataset describe:
df.describe()

### What all manipulations have you done and insights you found?



*   Selected 19 important columns from 135 columns for our analysis.

*   Created a new column by name " Casualty".

*   The data consists of terrorist activities ranging from the year: 1970 to 2017

*   Maximum number of people killed in an event were: 1570

*   Maximum number of people wounded in an event were: 8191

*   Maximum number of total casualties in an event were: 9574

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

### **The storytelling and visualizations in this project will be based on these parameters:**


*   Year Wise Attacks.
*   Region wise Attacks.

*   Country Wise Attacks - Top 10.
*   City Wise Attacks - Top 10.

*   Terrorist Group wise Attacks.
*   'Attack Type' wise Attacks.

*   Target Type Wise Attacks.
*   Group + Country wise Attacks - Top 10.

*   Humanity affected(World-wide) by Terrorist Attacks from 1970 - 2017.


### **All of the above parameters will be calculated and visualized on the basis of:**

1.   Number of Attacks.
2.   Total Casualty.

1.   Total people Killed.
2.   Total people Wounded.
















#### **Chart - 1  --- Number of attack each year**





In [None]:
# Grouping the data by year and counting the number of attacks in each year
attacks_by_year = df.groupby('Year').size()

# Plotting the number of attacks in each year
plt.figure(figsize=(12, 6))
attacks_by_year.plot(kind='bar', color='skyblue')
plt.title('Number of Terrorist Attacks Each Year (1970-2017)')
plt.xlabel('Year')
plt.ylabel('Number of Attacks')
plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.show()


##### 1. Why did you pick the specific chart?

I chose a bar chart to visualize the number of attacks in each year for the following reasons:

1. **Comparison:** Bar charts are effective for comparing the number of attacks across different years. The length of each bar represents the number of attacks in a specific year, making it easy to compare values visually.

2. **Time Series Data:** Since the data spans multiple years (1970-2017), a bar chart is suitable for displaying this time series data in a clear and organized manner.

3. **Categorical Data:** Bar charts are commonly used to represent categorical data, such as years in this case. Each bar corresponds to a specific category (year), making it intuitive for viewers to understand the distribution of attacks over time.

4. **Clarity:** Bar charts are simple and straightforward, making them easy to interpret for a wide audience. They provide a quick visual summary of the number of attacks in each year without overwhelming viewers with complex visuals.

5. **Highlighting Trends:** Bar charts can help identify trends, spikes, or patterns in the data. By visualizing the number of attacks in each year, you can easily spot any significant changes or fluctuations over time.


##### 2. What is/are the insight(s) found from the chart?

(i). Most number of attacks(16903) in 2014

(ii). Least number of attacks(471) in 1971

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.



*   From this chart we can clearly see that terrorist activities have started increasing substantially since 2005.
*   Countries and their Millitary Organisations can proactively develop strategies to mitigate the terrorism risks.
*   The chart can prompt an evaluation of the readiness of military organizations to respond to the evolving threat landscape.
*   From analyzing the number of terrorist attacks over time can potentially have both positive and negative impacts on businesses, depending on how they are leveraged.

#### **Chart - 2 Total Casualties (Killed + Wounded) in each Year**

In [None]:
cas= df[['Year','Killed']].groupby('Year').sum()
cas.head()

In [None]:
yw=df[["Year","Wounded"]].groupby("Year").sum()
yw.head()

In [None]:
# Group the data by year and calculate the total casualties for each year
total_casualties_per_year = df.groupby('Year')[['Killed', 'Wounded']].sum().sum(axis=1)

# Create a bar chart to visualize the total casualties per year
plt.figure(figsize=(12, 6))
total_casualties_per_year.plot(kind='bar', color='skyblue')
plt.title('Total Casualties (Killed + Wounded) per Year')
plt.xlabel('Year')
plt.ylabel('Total Casualties')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--', linewidth=0.5)
plt.show()


##### 1. Why did you pick the specific chart?

The bar chart was chosen as a visualization method for displaying the total casualties (killed + wounded) per year for the following reasons:

Comparison: Bar charts are effective for comparing the total casualties across different years at a glance. The length of each bar represents the magnitude of casualties in a specific year, making it easy to compare the values visually.

Clear Representation: Bar charts provide a clear and straightforward representation of the data. The axis labels and bar heights make it easy for viewers to interpret the total casualties for each year.

Categorical Data: Bar charts are well-suited for displaying categorical data, such as years, where each bar represents a distinct category (year) and the height of the bar represents the value (total casualties).

Ease of Interpretation: Bar charts are familiar to most people and are easy to interpret, making them a popular choice for presenting data in a visually appealing and understandable manner.

##### 2. What is/are the insight(s) found from the chart?

*   The number of casualties(killed + wounded) were Highest in 2014, followed by 2015, 2016 and 2017.
*   The number of casualties(killed + wounded) were Least in 1970 and 1971.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Here are some ways in which these insights could be beneficial:

**Risk Assessment and Mitigation:** Understanding the trends in terrorist attacks and the total number of people killed can help businesses assess and mitigate risks associated with operating in high-risk areas. This information can inform security strategies, crisis response plans, and risk management practices.

**Resource Allocation:** Businesses can use these insights to allocate resources more effectively, such as investing in security measures, insurance, or emergency response systems in regions with a higher incidence of terrorist attacks.

**Market Opportunities:** Companies in the security, defense, or risk assessment industries can leverage this information to identify market opportunities for providing services and products that address security and risk-related concerns.

Regarding insights that could lead to negative growth, one potential scenario could be if the data reveals a significant increase in the total number of people killed in terrorist attacks in regions where a business operates. This could have adverse effects on business operations and growth for the following reasons:

**Security Concerns:** A rise in terrorist attacks and casualties could lead to increased security concerns for businesses operating in affected regions. This could result in higher security costs, disruptions to operations, and potential loss of consumer confidence.

**Economic Impact:** Persistent insecurity due to a high number of casualties in terrorist attacks can negatively impact the local economy, leading to reduced consumer spending, decreased investor confidence, and potential market instability.

**Reputation Risk:** Businesses operating in regions with high levels of violence and casualties may face reputational risks, as stakeholders could associate the company with the negative events occurring in the area.

#### **Chart - 3 Region wise Attacks from year 1970-2017 through area plot**

In [None]:

import seaborn as sns

# Assuming df is your DataFrame containing the terrorist attack data

# Filter the DataFrame for the years 1970 to 2017
filtered_df = df[(df['Year'] >= 1970) & (df['Year'] <= 2017)]

# Group the data by "Region" and count the number of attacks in each region for each year
region_year_attacks = filtered_df.groupby(['Year', 'Region']).size().reset_index(name='Attack_Count')

# Create the stacked area plot
plt.figure(figsize=(12, 6))
sns.set(style='whitegrid')
sns.lineplot(x='Year', y='Attack_Count', hue='Region', data=region_year_attacks, marker='o', sort=False)
plt.fill_between(region_year_attacks['Year'], region_year_attacks['Attack_Count'], alpha=0.7)
plt.title('Region-wise Attacks from 1970 to 2017')
plt.xlabel('Year')
plt.ylabel('Number of Attacks')
plt.legend(title='Region', loc='upper left', bbox_to_anchor=(1, 1))
plt.tight_layout()
plt.show()


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Filter the data for the years 1970 to 2017
df_filtered = df[(df['Year'] >= 1970) & (df['Year'] <= 2017)]

# Create a cross-tabulation of 'Year' and 'Region' columns
cross_tab = pd.crosstab(df_filtered['Year'], df_filtered['Region'])

# Plot the cross-tabulation as a stacked area plot
cross_tab.plot(kind='area', stacked=True, figsize=(15, 6))
plt.title('Region-wise Attacks from 1970 to 2017')
plt.ylabel('Number of Attacks')
plt.show()


##### 1. Why did you pick the specific chart?

*   Stacked Area Chart Effective for illustrating the total number of attacks and how each region's contribution evolves over the years.
*   Provides a clear view of how the distribution of attacks changes annually among different regions.

##### 2. What is/are the insight(s) found from the chart?

*   "Middle East & North Africa" suffered the Most number of Terrorist attacks.
*   "Australasia & Oceania" has the Least number of Terrorist attacks.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Impact:** Industries related to security technologies, risk assessment tools, and crisis management solutions may find increased demand in regions facing higher terrorism risks. Businesses in these sectors may explore opportunities to provide innovative solutions to address security challenges.

Knowledge about the regions with lower incidence of terrorism, such as Australia and Oceania, can inform market expansion strategies. Businesses may consider these regions as relatively safer environments for investment and market entry.

**Negative Impact:** Insights about regions with higher terrorism risks can impact insurance and financial planning. Businesses operating in high-risk areas may face higher insurance premiums, and financial planning may need to account for potential disruptions due to security incidents.

#### **Chart - 4 Country wise attack Top-10**

In [None]:
ct=df["Country"].value_counts().head(10)
ct

In [None]:
# Load your dataset containing the terrorist attack data
# Assuming df is your DataFrame

# Filter the DataFrame for the years 1970 to 2017
filtered_df = df[(df['Year'] >= 1970) & (df['Year'] <= 2017)]

# Group the data by "Country" and count the number of attacks in each country
country_attacks = filtered_df['Country'].value_counts().head(10)

# Create a bar plot for the top 10 countries with the most attacks
plt.figure(figsize=(12, 6))
country_attacks.plot(kind='bar', color='skyblue')

plt.title('Top 10 Countries with the Most Terrorist Attacks (1970-2017)')
plt.xlabel('Country')
plt.ylabel('Number of Attacks')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

The choice of a **bar chart** for visualizing country-wise terrorist attacks for the top 10 countries is based on several factors:

**Comparison of Values:** Bar charts are effective for comparing the numerical values of different categories, in this case, the number of attacks in each country. The length of each bar directly represents the value it represents, making it easy to compare the number of attacks across countries.

**Top-N Analysis:** Bar charts are commonly used to highlight the top N values in a dataset. In this scenario, we are interested in identifying and comparing the top 10 countries with the highest number of terrorist attacks, making a bar chart a suitable choice for this specific analysis.

**Simplicity and Clarity:** Bar charts are simple and intuitive, making them easy to interpret for a wide range of audiences. They provide a clear and straightforward visualization of the data without unnecessary complexity.

**Categorical Data:** Bar charts are well-suited for displaying categorical data, such as countries in this case. Each bar represents a distinct category (country), making it easy to understand which countries have the highest number of attacks.

Overall, the bar chart is a widely used and effective visualization tool for presenting categorical data and comparing values across different categories, making it a suitable choice for visualizing the top 10 countries with the most terrorist attacks in this context.

##### 2. What is/are the insight(s) found from the chart?

*   Most number of people killed are from: Iraq.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Impact:** Governments of country(Iraq) should invest in security infrastructure and services. Businesses operating in the defense and security sector might find opportunities for government contracts to address security concerns.

#### **Chart - 5 City wise Attacks - Top 10**

In [None]:
city=df["City"].value_counts()[1:11] # Slices the Series to include only the elements from index 1 to index 10. It selects the top 10 cities with the highest counts.
city

In [None]:
# Chart - 5 visualization code

# Load your dataset containing the terrorist attack data
# Assuming df is your DataFrame

# Filter the DataFrame for the years 1970 to 2017
filtered_df = df[(df['Year'] >= 1970) & (df['Year'] <= 2017)]

# Group the data by "City" and count the number of attacks in each city
city_attacks = filtered_df['City'].value_counts().head(10)

# Create a horizontal bar plot for the top 10 cities with the most attacks
plt.figure(figsize=(10, 6))
city_attacks.sort_values().plot(kind='barh', color='salmon')

plt.title('Top 10 Cities with the Most Terrorist Attacks (1970-2017)')
plt.xlabel('Number of Attacks')
plt.ylabel('City')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

The choice of a **horizontal bar chart** for visualizing the top 10 cities with the most terrorist attacks was made based on several considerations:

1. **Comparison of Values**: Horizontal bar charts are effective for comparing values across different categories. In this case, we are comparing the number of terrorist attacks in each city, making it easy to see which cities have the highest number of attacks.

2. **Space for Labels**: Since city names can be long and may not fit well on a vertical bar chart without rotation, a horizontal bar chart allows for longer labels to be displayed without overlap, making it easier to read the city names.

3. **Readability**: The horizontal orientation of the bars can sometimes make it easier for viewers to compare values, especially when there are multiple bars to compare.

4. **Top 10 Ranking**: A horizontal bar chart is a common choice for displaying rankings, such as the top 10 cities with the most attacks, as it visually emphasizes the differences in values.

5. **Aesthetics**: Horizontal bar charts can also be visually appealing and provide a different perspective compared to vertical bar charts, adding variety to the visualization.

Overall, the choice of a horizontal bar chart was made to effectively communicate the data on the top 10 cities with the most terrorist attacks in a clear and visually engaging manner, considering the specific characteristics of the data and the need for easy comparison and readability.

##### 2. What is/are the insight(s) found from the chart?

**Baghdad** - capital of Iraq suffered the highest number of terrorist attacks

**Athens** - capital of Greece suffered the least number of terrorist attacks.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Impact:** Companies providing emergency response and recovery services might find an increased demand for their services in areas prone to terrorist attacks. This could lead to positive business growth opportunities.

**Negative Impact:** Frequent terrorist attacks in a region, especially the capital city, can have a significant negative impact on the tourism and hospitality industry. Businesses in these sectors may experience reduced tourist inflow and face challenges in attracting investments.

#### **Chart - 6 Method Of Attack**

In [None]:
at=df["Attack Type"].value_counts()
at

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Create a pie chart for types of attacks
attack_counts = df['Attack Type'].value_counts()
plt.figure(figsize=(10, 6))
plt.pie(attack_counts, labels=attack_counts.index, autopct='%1.1f%%', startangle=140, colors=plt.cm.tab20.colors)
plt.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
plt.title('Types of Attacks', fontsize=12)
plt.show()


In [None]:
# Chart - 18 visualization code
at.plot(kind="bar",color="red",figsize=(15,6))
plt.title("Types of Attacks",fontsize=13)
plt.xlabel("Attack Types",fontsize=13)
plt.xticks(fontsize=12)
plt.ylabel("Number of Attacks",fontsize=13)
plt.show()

##### 1. Why did you pick the specific chart?

**Pie charts** are good for emphasizing dominant or significant categories. In this case, it helps to highlight the Methods of attack.

##### 2. What is/are the insight(s) found from the chart?

Since from the above pie and bar chart it is clear that Bombing/Explosion method was mostly used.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insight that the most common method of attack is "Bombing/Explosion" can have both positive and negative implications for businesses, depending on the context and industry. Here are some considerations:

### Positive Business Impact:
1. **Security Investments**: Understanding that bombings and explosions are the most common methods of attack can prompt businesses to invest in robust security measures to protect their assets, employees, and customers. This can lead to increased security spending, which may enhance overall safety and mitigate potential risks.

2. **Crisis Management Planning**: Businesses can use this insight to develop comprehensive crisis management plans that specifically address the threat of bombings and explosions. Being prepared for such scenarios can help minimize disruptions to operations and maintain business continuity during crises.

3. **Market Opportunities**: Businesses that offer security solutions, such as surveillance systems, threat detection technologies, or security consulting services, may find increased demand for their products and services in response to the prevalent threat of bombings and explosions.

### Negative Growth Implications:
1. **Decreased Consumer Confidence**: Heightened awareness of bombings and explosions as common attack methods can instill fear and uncertainty among consumers. This may lead to decreased consumer confidence, reduced foot traffic in physical locations, and lower sales for businesses, particularly in high-risk areas.

2. **Increased Operational Costs**: Implementing enhanced security measures in response to the threat of bombings and explosions can result in increased operational costs for businesses. This may include expenses related to security personnel, equipment upgrades, insurance premiums, and compliance with regulatory requirements, potentially impacting profitability.

3. **Supply Chain Disruptions**: Terrorist attacks involving bombings and explosions can disrupt supply chains, leading to delays in production, distribution, and delivery of goods and services. Businesses that rely on efficient supply chains may experience negative growth due to disruptions caused by security concerns and logistical challenges.

#### **Chart - 7 Type of Target**

In [None]:
ta=df["Target Type"].value_counts()
ta

In [None]:
# Chart - 7 visualization code
import matplotlib.pyplot as plt
import seaborn as sns

# Set the figure size
plt.figure(figsize=(13, 6))

# Create the count plot for the 'Target Type' column
sns.countplot(df['Target Type'], order=df['Target Type'].value_counts().index, palette='magma')

# Rotate x-axis labels for better readability
plt.xticks(rotation=90)

# Set x-axis label
plt.xlabel('Type')

# Set plot title
plt.title('Type of Target')

# Display the plot
plt.show()


##### 1. Why did you pick the specific chart?

The code provided is creating a **count plot** using the Seaborn library in Python.

A count plot shows the count of observations in each category of a categorical variable. In this case, it is displaying the count of different types of targets in terrorist incidents. The x-axis typically represents the categories, and the y-axis represents the count of occurrences of each category.

In the code snippet provided, `sns.countplot()` from the Seaborn library is used to create this type of plot. The `order` parameter is used to specify the order in which the categories should be displayed, and the `palette` parameter is used to set the color palette for the plot.

Overall, a count plot is a useful visualization for understanding the distribution of categorical data.

##### 2. What is/are the insight(s) found from the chart?

Attacks are such that Private citizen and property are mostly effected and secondly the Military.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Insights derived from analyzing the type of attacks, particularly the high count of incidents targeting private citizens and property being destroyed, can indeed have both positive and negative impacts on businesses:

**Positive Business Impact:**
1. Increased Security Investments
2. Crisis Management Planning
3. Market Opportunities

**Negative Business Impact:**
1. Decreased Consumer Confidence
2. Heightened Operational Costs
3. Supply Chain Disruptions

In conclusion, while insights into the types of attacks targeting private citizens and property can prompt positive actions such as increased security investments and improved crisis management, they can also lead to negative consequences such as decreased consumer confidence, higher operational costs, and supply chain disruptions. Businesses must carefully balance addressing security risks with mitigating potential negative impacts to maintain their operations and reputation.

#### **Chart - 8 Terrorist Group wise Attacks - Top 10**

In [None]:
grp=df["Group Name"].value_counts()[1:10]
grp

In [None]:
# Chart - 8 visualization code
plt.figure(figsize=(15, 6))
grp.plot(kind="pie", autopct='%1.1f%%', colors=["red", "orange", "yellow", "green", "blue", "purple", "brown", "pink", "gray", "cyan"])

# Adding title and labels
plt.title("Terrorist Group-wise Attacks - Top 10", fontsize=13)
plt.xlabel("Terrorists groups", fontsize=13)
plt.ylabel("")  # Empty ylabel for better aesthetics

# Display the chart
plt.show()

##### 1. Why did you pick the specific chart?

Pie charts are good for emphasizing dominant or significant categories. In this case, it helps to highlight the Terrorists groups carrying out the highest number of attacks.

##### 2. What is/are the insight(s) found from the chart?

*   Most number of attacks are carried out by: Taliban.
*   Least number of attacks are carried out by: Boko Haram.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### **Chart - 9 Group + Country wise - Top10**

In [None]:
gca=df[["Group Name","Country"]].value_counts().drop("Unknown")
top_10_countries = gca.head(10)
top_10_countries

In [None]:
# Chart - 9 visualization code

plt.figure(figsize=(10, 10))
plt.pie(top_10_countries, labels=top_10_countries.index, autopct="%1.1f%%", startangle=140,  colors=["red", "yellow", "green", "orange", "blue", "purple", "brown", "pink", "gray", "cyan"])

# Adding a title
plt.title("Distribution of Attacks by a Particular Group in Top 10 Countries", fontsize=13)

# Display the plot
plt.show()

##### 1. Why did you pick the specific chart?

Pie charts are good for emphasizing dominant or significant categories. In this case, it helps to highlight the "Distribution of Attacks by a Particular Group in Top 10 Countries".

##### 2. What is/are the insight(s) found from the chart?

*   Most number of attacks :"Taliban" in "Afghanistan"
*   Least number of attacks : "Boko Haram" in "Nigeria"

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Throught the given insight foreign investment can not be consider at the location "Afghanistan" as this place is highly effected by Talibanis.

#### **Chart - 10 Geographical Patterns of Attaacks**

In [None]:
# Chart - 10 visualization code
plt.figure(figsize=(12, 8))
attacks_heatmap = df.groupby(['Region', 'Country']).size().unstack().fillna(0)
sns.heatmap(attacks_heatmap, cmap='Reds')
plt.title('Geographical Patterns of Attacks')
plt.show()


##### 1. Why did you pick the specific chart?

**Heatmap** representing the frequency of attacks in different regions and countries. The color intensity in the heatmap will indicate the relative frequency of attacks in each region-country combination, with darker colors representing higher frequencies.

Heatmaps are useful for visualizing data in a grid format, where the color intensity provides a quick way to understand patterns and relationships within the data. In this case, the heatmap will help visualize the geographical patterns of terrorist attacks based on the provided data.

##### 2. What is/are the insight(s) found from the chart?

Middle East and North Africa region are prone to attacks.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### **Chart - 11 Trend of Terrorist Incidents Before and After the War on Terror**

In [None]:
# Chart - 11 visualization code
import matplotlib.pyplot as plt

# Filter data for incidents before and after the War on Terror
before_war = df[df['Year'] < 2001]
after_war = df[df['Year'] >= 2001]

# Group data by year and count the number of terrorist incidents
before_counts = before_war['Year'].value_counts().sort_index()
after_counts = after_war['Year'].value_counts().sort_index()

# Plotting the trend before and after the War on Terror
plt.figure(figsize=(12, 6))
plt.plot(before_counts.index, before_counts.values, label='Before War on Terror')
plt.plot(after_counts.index, after_counts.values, label='After War on Terror')
plt.axvline(x=2001, color='red', linestyle='--', label='Start of War on Terror')
plt.xlabel('Year')
plt.ylabel('Number of Terrorist Incidents')
plt.title('Trend of Terrorist Incidents Before and After the War on Terror')
plt.legend()
plt.grid(True)
plt.show()


##### 1. Why did you pick the specific chart?

The chart created in the provided code snippet is a **line plot**.

A line plot is a type of chart that displays data points connected by straight line segments. It is commonly used to visualize trends over a continuous interval, such as time. In this case, the line plot is used to show the trend of terrorist incidents over time, specifically before and after the War on Terror.

In the line plot:
- The x-axis represents the years.
- The y-axis represents the number of terrorist incidents.
- Two lines are plotted: one for the number of incidents before the War on Terror and one for the number of incidents after the War on Terror.
- A vertical dashed line at the year 2001 marks the start of the War on Terror.

Line plots are effective for showing trends and patterns in data over time and are commonly used in various fields for visualizing time series data.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### **Chart - 12 - Correlation Heatmap**

In [None]:
# Chart - 12 visualization code
# Assuming 'df' is your DataFrame with the relevant columns
# Select only numeric columns for correlation calculation
numeric_df = df.select_dtypes(include=['float64', 'int64'])

# Create a correlation matrix
correlation_matrix = numeric_df.corr()

# Create a heatmap of the correlation matrix
fig, axes = plt.subplots(1, 1, figsize=(15, 6))
sns.heatmap(correlation_matrix, annot=True, ax=axes)
plt.show()


##### 1. Why did you pick the specific chart?

When dealing with a large number of variables, a correlation heatmap helps in performing multivariate analysis. It allows us to explore relationships between multiple pairs of variables simultaneously.

##### 2. What is/are the insight(s) found from the chart?

*   Positive correlation (values closer to 1) between variables indicates that as one variable increases, the other variable tends to increase as well.
*   Negative correlation (values closer to -1) suggests that as one variable increases, the other variable tends to decrease.

#### **Chart - 13 - Total number of terrorist attack in each country and regions using barplot**

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Assuming 'df' is your DataFrame containing the data
# Replace 'df' with the name of your DataFrame

# Create a figure with two subplots
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Plot for the total number of terrorist attacks per country
sns.barplot(x=df['Country'].value_counts()[:20].values, y=df['Country'].value_counts()[:20].index, ax=axes[0], palette='magma')
axes[0].set_title('Terrorist Attacks per Country')

# Plot for the total number of terrorist attacks per region
sns.barplot(x=df['Region'].value_counts().values, y=df['Region'].value_counts().index, ax=axes[1])
axes[1].set_title('Terrorist Attacks per Region')

plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Bar plots are effective for showing the frequency or count of occurrences for each category (in this case, countries and regions) in a clear and visually appealing manner. They help in identifying patterns, trends, and differences in the distribution of terrorist activities across various locations.

##### 2. What is/are the insight(s) found from the chart?

We can derive several insights from the output:

**High-Risk Countries:** Identify countries with a high frequency of terrorist attacks. This information can help businesses assess the security risks associated with operating in these regions and develop appropriate security measures.

**Regional Trends:** Understand the distribution of terrorist activities across different regions. This insight can guide businesses in assessing regional security threats and tailoring their security strategies accordingly.

**Hotspots:** Identify specific countries or regions that serve as hotspots for terrorist activities. This information can be crucial for businesses to prioritize security investments and crisis management efforts in these high-risk areas.

**Regional Disparities:** Compare the number of attacks across different regions to identify disparities in terrorist activities. Understanding these variations can help businesses tailor their security measures based on regional threat levels.

**Opportunities for Security Solutions:** Recognize regions with a high frequency of attacks as potential markets for security solutions and services. Businesses can leverage this information to identify opportunities for offering security solutions tailored to the needs of these high-risk areas.

**Risk Assessment:** Use the insights to conduct a comprehensive risk assessment, considering both country-specific and regional factors that influence the security landscape. This assessment can inform decision-making processes related to security investments and crisis management strategies.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

**To achieve the business objective of decreasing the impact of terrorism based on the exploratory data analysis conducted on the Global Terrorism Dataset, the following recommendations can be provided to the client:**

1. **Focus on Hotspot Regions**: Prioritize regions with the highest frequencies of terrorist activities for intervention efforts. Implement robust security measures, socio-economic programs addressing root causes, and seek international assistance in these areas to mitigate risks and enhance security.

2. **Understand Yearly Trends**: Monitor and analyze yearly trends in terrorist incidents to anticipate potential future threats. This data can aid in adapting and optimizing counter-terrorism strategies to effectively respond to evolving threats and ensure preparedness.

3. **Prioritize Major Threat Groups**: Identify and prioritize high-impact terrorist groups based on the analysis. Allocate intelligence and resources to monitor and counter these groups effectively, aiming to prevent future attacks and disrupt their operations.

4. **Target Most Common Attack Types**: Develop preventive measures and response strategies tailored to the most common types of terrorist attacks identified in the analysis. For example, if bombings are prevalent, invest in bomb detection technologies, training, and response protocols to enhance security and mitigate risks associated with this specific attack method.

By implementing these recommendations, the client can strategically leverage the insights from the data analysis to enhance security measures, optimize resource allocation, and proactively address terrorism threats, ultimately working towards the business objective of reducing the impact of terrorism and ensuring a safer environment for their operations and stakeholders.

# **Conclusion**

The Exploratory Data Analysis (EDA) conducted on the Global Terrorism Dataset provided significant insights into trends and patterns in global terrorism from 1970 through 2017. With the help of the Python libraries Pandas, Matplotlib, Seaborn, and NumPy, we were able to handle, visualize and interpret complex data related to terrorist activities.

Through this analysis, we identified trends over time, regional hotspots, dominant terrorist groups, and preferred modes of attacks. All these findings are crucial for devising effective counter-terrorism strategies and interventions.

To conclude this project we have found some Insights derived from the above EDA:

1) Attacks has increased but number of people killed manier times as attack happened.

2) Iraq has the most attacks.

3) The Middle East and North Africa Regions has most taregeted.

4) Maximum number of attacks are from Bombing/Explosions.

5) There are maximum number of attacks in Private citizens and Property.

6) Taliban and ISIL has a most active organisation.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***