<a href="https://colab.research.google.com/github/Neetu-Verm/Data_Science-/blob/main/Yet_another_copy_of_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Global Terrorism Analysis




##### **Project Type**    - Exploratory Data Analysis (EDA)

# **Project Summary -**

The Global Terrorism Database (GTD) contains detailed information on over 180,000 terrorist attacks worldwide from 1970 to 2017. This project involved loading, cleaning, and exploring the GTD data to uncover key trends and patterns in terrorist activities. Key insights include significant fluctuations in attack frequency over time, geographic hotspots like the Middle East and South Asia, prevalent attack types such as bombings, and frequently targeted entities including governments and civilians. Visualizations such as line plots, bar plots, and heatmaps effectively communicate these findings, which are valuable for policy makers, security agencies, researchers, and public awareness.









# **GitHub Link -**

https://github.com/Neetu-Verm/Data_Science-.git

# **Problem Statement**


The problem statement for EDA of global Terrorism data invloves examinning and analyzing a comprehensive dataset related to terrorist incidents worldwide. The dataset includes information about various aspects of terrorist attacks,such as locatio,data,type of attack,target,numbber of casualties,and more.The goal of the EDA is to gain insights,discover patterns,and generate visualizatios that can help us better understanding the nature and trends of global terrorism.








#### **Define Your Business Objective?**

To systematically analyze the Global Terrorism Database (GTD) to uncover trends, geographic hotspots, and patterns in terrorist activities, thereby providing valuable insights and actionable intelligence for policy makers, security agencies, researchers, and the general public to enhance decision-making and improve counter-terrorism strategies.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import missingno as msno


### Dataset Loading

In [None]:
# Load Dataset
gtd_df = pd.read_csv('/content/Global Terrorism Data.csv',encoding='ISO-8859-1')

### Dataset First View

In [None]:
# Dataset First Look
print(gtd_df.head(5))

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
num_rows, num_cols = gtd_df.shape
print(f"Number of rows: {num_rows}")
print(f"Number of columns: {num_cols}")

### Dataset Information

In [None]:
# Dataset Info
gtd_df.info()

#### Duplicate Values

In [None]:
gtd_df.duplicated()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_values_count = gtd_df.isnull().sum()
print(missing_values_count)


In [None]:
# Visualizing the missing values

plt.figure(figsize=(10,6))
sns.heatmap(gtd_df.isna())
plt.xlabel('column')
plt.ylabel('null value')
plt.title('null values count')
plt.show


### What did you know about your dataset?

The dataset covers global terrorist incidents from 1970 to 2017, including attack details, casualties, and perpetrator information. It offers insights into trends, patterns, and impacts of terrorism worldwide.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

print(gtd_df.columns)


In [None]:
# Dataset Describe
print(gtd_df.describe())


### Variables Description

Variables include year (incident year), country (incident location), attack_type (method), n_killed, and n_wounded (casualties), providing comprehensive terrorism data insights.









### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_value={}
for column in gtd_df.columns:
  unique_value[column]=gtd_df[column].unique()
unique_value


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
from sklearn.impute import SimpleImputer
import pandas as pd

# Original columns
original_columns = gtd_df.columns

# Handle missing values using the most frequent value for each column
imputer = SimpleImputer(strategy='most_frequent')
imputed_data = imputer.fit_transform(gtd_df)

# Check the shape of imputed data
print("Shape of imputed data:", imputed_data.shape)
print("Shape of original data:", gtd_df.shape)

# Convert to DataFrame
gtd_df_imputed = pd.DataFrame(imputed_data, columns=original_columns[:imputed_data.shape[1]])

# Verify the columns
print("Columns in imputed DataFrame:", gtd_df_imputed.columns)
print("Columns in original DataFrame:", gtd_df.columns)



### What all manipulations have you done and insights you found?

Loaded the Global Terrorism Database, handled missing values by filling with the most frequent values, and prepared for analysis by ensuring data readiness through preprocessing steps like encoding and scaling.








## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
plt.subplots(figsize=(25,8))
sns.countplot(x='iyear',data=gtd_df,ec='black')
plt.xticks(rotation=90)
plt.xlabel('year',fontsize=20)
plt.ylabel('count',fontsize=20)
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?


I chose a countplot to visualize the yearly frequency of terrorist attacks, highlighting trends and fluctuations over time.




##### 2. What is/are the insight(s) found from the chart?

The chart reveals significant fluctuations in the frequency of terrorist attacks over the years, with notable peaks during certain periods. This indicates periods of heightened global terrorism activity.








##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights can help create a positive business impact by informing policy makers and security agencies about trends in terrorism, aiding in resource allocation, threat assessment, and the development of targeted counter-terrorism strategies.








#### Chart - 2

In [None]:
# Chart - 2 visualization
a=gtd_df.iyear
b=gtd_df.region_txt
pd.crosstab(a,b).plot(kind='area',figsize=(20,6))
plt.xlabel('year')
plt.ylabel('count')
plt.title('terroist activities by region in each year')
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

I chose an area chart to show the distribution and trends of terrorist activities across different regions over time.

##### 2. What is/are the insight(s) found from the chart?

The chart highlights regional trends in terrorism, showing which regions experienced increases or decreases in attacks over the years.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights help prioritize regional security measures and allocate resources effectively, enhancing counter-terrorism efforts.


#### Chart - 3

In [None]:
r_type=gtd_df.region.value_counts().to_frame().reset_index()
r_type.columns=['region_name','count']
sns.barplot(y='region_name',x='count',data=r_type,orient='h',palette='flare',ec='black')
plt.grid(True)
plt.title('number of total attacks in each region')
plt.ylabel('region')
plt.xlabel('number of attacks')
plt.show()


##### 1. Why did you pick the specific chart?

I chose a horizontal bar plot to clearly compare the total number of attacks across different regions.

##### 2. What is/are the insight(s) found from the chart?

 The chart identifies regions with the highest and lowest total number of terrorist attacks, highlighting geographic hotspots of terrorism.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights guide policymakers and security agencies in focusing resources and preventive measures on the most affected regions.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
top_10_country=gtd_df.country.value_counts()[:10].to_frame().reset_index()
top_10_country.columns=['country_name','count']
plt.figure(figsize=(15,5))
sns.barplot(data=top_10_country,x='country_name',y='count',ec='black',lw=1)
plt.grid(True)
plt.title('top 10 countries with most attacks',fontsize=20)
plt.xlabel('country',fontsize=15)
plt.ylabel('count',fontsize=15)
plt.show()

##### 1. Why did you pick the specific chart?

I chose a bar plot to effectively display and compare the number of terrorist attacks in the top 10 most affected countries.


##### 2. What is/are the insight(s) found from the chart?

The chart highlights the countries with the highest frequency of terrorist attacks, identifying key hotspots for global terrorism.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights assist in prioritizing international aid, diplomatic efforts, and security cooperation with the most affected countries to combat terrorism.













#### Chart - 5

In [None]:
c=gtd_df.attacktype1

In [None]:
# Chart - 5 visualization code
top10_city=gtd_df.city.value_counts().to_frame()[0:10].reset_index()
top10_city.columns=['city','count']
plt.figure(figsize=(12,5))
sns.barplot(x='city',y='count',data=top10_city,ec='black',palette='Set3')
plt.title('most affected city ',fontsize=15)
plt.xlabel('name of the city',fontsize=12)
plt.ylabel('num of attack',fontsize=12)
plt.grid(True)
plt.show()


##### 1. Why did you pick the specific chart?

I chose a bar plot to show the top 10 cities with the highest number of terrorist attacks, making comparisons straightforward.

##### 2. What is/are the insight(s) found from the chart?

The chart identifies the cities most frequently targeted by terrorist attacks, highlighting urban areas with significant security concerns.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights can guide urban security enhancements and targeted counter-terrorism measures in the most affected cities.






#### Chart - 6

In [None]:
# Chart - 6 visualization code
perc_country=(gtd_df['country'].value_counts()[:15]/gtd_df['targtype1'].shape[0])*100
attacked=gtd_df.country.value_counts()[:15].to_frame()
attacked.colums=['attacked']
kills=gtd_df.groupby(['country'])['targtype1'].sum().sort_values(ascending=False).to_frame()
attacked.merge(kills,how='left',left_index=True,right_index=True).plot.bar(width=0.6)
fig=plt.gcf()
fig.set_size_inches(20,16)
plt.title('attacks vs kills in mostly attacks 15 countries',fontsize=20,weight='bold')
plt.ylabel('attacks vs kills',fontsize=15)
plt.xlabel('country',fontsize=15)
plt.show()

##### 1. Why did you pick the specific chart?

 I chose a bar plot to compare the number of attacks and total killings across the top 15 countries with the most attacks.

##### 2. What is/are the insight(s) found from the chart?

The chart illustrates the relationship between the number of attacks and fatalities in the most affected countries, showing which countries have high attack counts but lower or higher fatalities.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, understanding this relationship aids in assessing the severity of attacks and developing strategies to address both attack frequency and lethality in targeted countries.








#### Chart - 7

In [None]:
# Chart - 7 visualization code
perc_country=(gtd_df['country'].value_counts()[:15]/gtd_df['targtype2'].shape[0])*100
attacked=gtd_df.country.value_counts()[:15].to_frame()
attacked.colums=['attacked']
kills=gtd_df.groupby(['country'])['targtype2'].sum().sort_values(ascending=False).to_frame()
attacked.merge(kills,how='left',left_index=True,right_index=True).plot.bar(width=0.6)
fig=plt.gcf()
fig.set_size_inches(20,16)
plt.title('attacks vs kills in mostly attacks 15 countries',fontsize=20,weight='bold')
plt.ylabel('attacks vs kills',fontsize=15)
plt.xlabel('country',fontsize=15)
plt.show()

##### 1. Why did you pick the specific chart?

I chose a bar plot to compare the number of attacks and fatalities (by a different target type) across the top 15 countries with the most attacks.




##### 2. What is/are the insight(s) found from the chart?

The chart reveals the relationship between attack frequency and fatalities for different target types in the most affected countries, indicating where high attack rates are associated with high or low fatalities.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can inform targeted interventions and aid in allocating resources effectively to address high-impact regions and reduce fatalities.

#### Chart - 8

In [None]:
# Chart - 8 visualization code

perc_country=(gtd_df['country'].value_counts()[:15]/gtd_df['targtype3'].shape[0])*100
attacked=gtd_df.country.value_counts()[:15].to_frame()
attacked.colums=['attacked']
kills=gtd_df.groupby(['country'])['targtype3'].sum().sort_values(ascending=False).to_frame()
attacked.merge(kills,how='left',left_index=True,right_index=True).plot.bar(width=0.6)
fig=plt.gcf()
fig.set_size_inches(20,16)
plt.title('attacks vs kills in mostly attacks 15 countries',fontsize=20,weight='bold')
plt.ylabel('attacks vs kills',fontsize=15)
plt.xlabel('country',fontsize=15)
plt.show()

##### 1. Why did you pick the specific chart?

I chose a bar plot to compare the number of attacks and fatalities in the top 15 countries, focusing on another target type for detailed analysis.

##### 2. What is/are the insight(s) found from the chart?

The chart shows the relationship between attack frequency and fatalities for the top 15 countries, giving insight into how severe the attacks are in different countries.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, understanding this data helps in crafting targeted counter-terrorism strategies and improving response measures by focusing on high-impact countries.







#### Chart - 9

In [None]:
# Chart - 9 visualization code

plt.subplots(figsize=(25,8))
sns.countplot(x='country_txt',data=gtd_df,ec='black')
plt.xticks(rotation=90)
plt.xlabel('country_txt',fontsize=20)
plt.ylabel('count',fontsize=20)
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

 The count plot shows event frequencies by country, highlighting which countries have the most or least events for clear analysis.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals countries with the highest and lowest event counts, identifying regions with significant or minimal activity.










##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Insights help focus resources on high-activity areas, improving response. Negative growth may be indicated by high-risk regions needing mitigation.








#### Chart - 10

In [None]:
# Chart - 10 visualizati

# Select columns for the histogram
columns_histogram = ['nkill', 'nwound']

# Drop rows with missing values in the selected columns
gtd_df_histogram = gtd_df[columns_histogram].dropna()

# Create histograms
plt.figure(figsize=(14, 6))

# Histogram for 'nkill'
plt.subplot(1, 2, 1)
sns.histplot(gtd_df_histogram['nkill'], bins=30, kde=True, color='blue')
plt.title('Distribution of Fatalities (nkill)')
plt.xlabel('Number of Fatalities')
plt.ylabel('Frequency')

# Histogram for 'nwound'
plt.subplot(1, 2, 2)
sns.histplot(gtd_df_histogram['nwound'], bins=30, kde=True, color='red')
plt.title('Distribution of Injuries (nwound)')
plt.xlabel('Number of Injuries')
plt.ylabel('Frequency')

plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

The bar chart visualizes the distribution of attack types, showing the frequency of each attack type across different countries.


##### 2. What is/are the insight(s) found from the chart?

Identifies which attack types are most and least common, revealing patterns in attack methods across countries.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 Helps prioritize security measures for common attack types. Negative insights may highlight prevalent attack methods needing immediate countermeasures.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
attack_type=gtd_df['attacktype1']
count=gtd_df['region']
plt.figure(figsize=(10,6))
plt.bar(attack_type,count,color='pink')
plt.xlabel('attacktype')
plt.ylabel('region')
plt.title('distribution of attack type')
plt.xticks(rotation=45)
plt.grid(axis='y',linestyle='--',alpha=0.7)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

The bar chart shows the distribution of attack types across different regions, highlighting which regions experience specific attack types.



##### 2. What is/are the insight(s) found from the chart?

Reveals which attack types are prevalent in different regions, indicating regional vulnerabilities and attack patterns.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps target security measures by region. Insights could reveal high-risk regions, necessitating focused intervention strategies to mitigate threats.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
plt.subplots(figsize=(25,8))
sns.countplot(x='imonth',data=gtd_df,ec='black')
plt.xticks(rotation=90)
plt.xlabel('month',fontsize=20)
plt.ylabel('count',fontsize=20)
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?


To visualize the distribution of events by month, highlighting monthly trends and variations.

##### 2. What is/are the insight(s) found from the chart?

Reveals months with peak and low event occurrences, indicating seasonal patterns in the data.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Insights aid in resource planning and timing of interventions. Negative growth insights














#### Chart - 13

In [None]:
# Chart - 13 visualization code
plt.subplots(figsize=(25,8))
sns.countplot(x='iday',data=gtd_df,ec='black')
plt.xticks(rotation=90)
plt.xlabel('day',fontsize=20)
plt.ylabel('count',fontsize=20)
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

To visualize the distribution of events by day, identifying daily trends and patterns in event occurrences.


##### 2. What is/are the insight(s) found from the chart?

Identifies days with the highest and lowest event counts, showing potential daily patterns or anomalies.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps allocate resources and plan interventions based on daily trends. Negative growth insights could indicate high-risk days needing increased security measures.






#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code


# Select a subset of columns for the pairplot and correlation heatmap
# Adjust the column names based on your dataset
columns_subset = ['iyear', 'imonth', 'iday', 'nkill', 'nwound', 'country', 'region']

# Drop rows with missing values in the selected columns
gtd_df_subset = gtd_df[columns_subset].dropna()

# Create the pairplot
plt.figure(figsize=(15,10))
sns.pairplot(gtd_df_subset)
plt.show()

# Calculate the correlation matrix
corr = gtd_df_subset.corr()

# Create the correlation heatmap
plt.figure(figsize=(10,8))
sns.heatmap(corr, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()



##### 1. Why did you pick the specific chart?

A correlation heatmap is chosen to visualize the pairwise correlations between different metrics of terrorist activities (such as Attacks, Casualties, Hostage Incidents, Bombings, and Armed Assaults), providing insights into how these variables relate to each other.


##### 2. What is/are the insight(s) found from the chart?

Insights include identifying strong positive or negative correlations between metrics. For example, a high positive correlation between Attacks and Casualties may indicate that more attacks generally lead to higher casualty numbers. Conversely, a low correlation between Bombings and Armed Assaults may suggest that these methods of attack are often used independently of each other.

This visualization helps in understanding the interdependencies between different aspects of terrorist activities, guiding policy-making and resource allocation strategies accordingly.








#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
# Select a subset of columns for the pairplot (choosing numerical columns is often best for pairplots)
columns_subset = ['iyear', 'imonth', 'iday', 'nkill', 'nwound']  # Adjust the column names as per your dataset

# Drop rows with missing values in the selected columns
gtd_df_subset = gtd_df[columns_subset].dropna()

# Create the pairplot
sns.pairplot(gtd_df_subset)
plt.show()


##### 1. Why did you pick the specific chart?

A pair plot is chosen because it allows for the visualization of pairwise relationships between different variables (Attacks, Casualties, Hostage Incidents, Bombings, and Armed Assaults) in a single grid of plots. This enables quick insights into correlations, distributions, and potential patterns within the data.

##### 2. What is/are the insight(s) found from the chart?

Insights include understanding how different metrics of terrorist activities relate to each other. For instance, you can observe whether increases in one metric coincide with increases or decreases in another, providing insights into potential causal relationships or dependencies.
This visualization is useful for exploratory data analysis to uncover relationships and patterns that may not be immediately apparent from individual variables alone.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

To achieve the business objective of improving security and policy-making based on terrorism data, I suggest leveraging data-driven insights to prioritize resource allocation, enhance surveillance, and implement targeted intervention strategies effectively.








# **Conclusion**

Harnessing data insights from terrorism analytics enables informed policy decisions, enhances security measures, and fosters proactive strategies, crucial for mitigating risks and safeguarding communities against evolving threats globally.








### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***