# **Project Name - EDA on Global Terrorism**



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Author**    - Allan Cheerakunnil Alex

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from wordcloud import WordCloud

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive/')

### Dataset First View

In [None]:
# Dataset First Look
data_path = "/content/drive/MyDrive/Data Science/AlmaBetter Projects/Python project 2/"

# Loading the Global Terrorism Dataset
data = pd.read_csv(data_path + 'Global Terrorism Data.csv', encoding='latin-1')
data.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
rows, cols = data.shape
print(f'There are {rows} rows and {cols} columns in the dataset.')

### Dataset Information

In [None]:
# Dataset Info
data.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_rows = data.duplicated().sum()

print(f'There are {duplicate_rows} duplicate_rows in the dataset')

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_values = data.isnull().sum()
print(missing_values)

In [None]:
# Missingno matrix or the seaborn heatmap can be used.
# Seaborn heatmap is used.
# Visualizing the missing values
# Plot a heatmap of missing values
plt.figure(figsize=(10, 6))
sns.heatmap(data.isnull(), cbar=False, cmap='viridis')
plt.title('Missing Values Heatmap')
plt.show()

### What did you know about your dataset?

In [None]:
# Dataset Size
print("Dataset Size:", len(data))

# Feature Quantity
print("Number of Features:", len(data.columns))

# Data Types
print("Data Types:")
print(data.dtypes.value_counts())

# Memory Usage
print("Memory Usage:")
print(data.info(memory_usage='deep'))

# Missing Values
print("Missing Values:")
missing_values = data.isnull().sum()
print(missing_values[missing_values > 0].sort_values(ascending=False))

- **Dataset Size:** The dataset is quite large, containing 181,691 entries or rows.

- **Feature Quantity:** The dataset contains 135 features or columns.

- **Data Types:** The dataset has a mix of data types. There are 55 features with floating-point numbers (float64), 22 features with integers (int64), and 58 features with objects (object). The object datatype in pandas typically means the column contains string (text) data.

- **Memory Usage:** The dataset uses over 626.8 MB of memory.

- **Missing Values:** There are some columns with a large number of missing values. For example, the 'gsubname3' column has 181,671 missing values, 'weapsubtype4' and 'weapsubtype4_txt' columns have 181,621 missing values each, 'weaptype4' and 'weaptype4_txt' columns have 181,618 missing values each. However, several columns do not have any missing values, such as 'eventid', 'iyear', 'imonth', 'iday', 'INT_LOG', 'INT_IDEO', 'INT_MISC', and 'INT_ANY'. There are also columns like 'guncertain1' with 380 missing values, 'ishostkid' with 178 missing values, 'specificity' with 6 missing values, 'doubtterr' and 'multiple' with 1 missing value each.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
columns = data.columns

print("Columns in the dataset")

for column in columns:
  print(column)

In [None]:
# Dataset Describe
summary = data.describe()

print(summary)

### Variables Description

**eventid**: A unique identifier for each terrorist incident.

**iyear, imonth, iday**: Date components of the incident, indicating the year, month, and day, respectively.

**country, country_txt**: Numeric and textual representations of the country where the incident occurred.

**region, region_txt**: Numeric and textual representations of the region where the incident occurred.

**provstate**: The name or abbreviation of the province or state where the incident occurred.

**city**: The name of the city or location where the incident occurred.

**attacktype1, attacktype1_txt**: Numeric and textual representations of the primary method of attack.

**targtype1, targtype1_txt**: Numeric and textual representations of the primary target type.

**weaptype1, weaptype1_txt**: Numeric and textual representations of the primary weapon type used.

**nkill**: Number of confirmed kills.

**nwound**: Number of confirmed injuries.

**gname**: The name of the terrorist group responsible for the incident.

**summary**: A brief description or summary of the incident.

**motive**: The perceived motive or reason behind the terrorist incident.

**related**: Information on related incidents.

**ishostkid**: Indicates whether hostages were taken (1 if hostages taken, 0 if not).

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_countries = data['country_txt'].unique()
print(unique_countries)

print()  #this will leave gap

unique_year = data['iyear'].unique()
print(unique_year)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
print(data.isnull().sum())

In [None]:
pd.set_option('display.max_rows', None)
print(data.dtypes)

In [None]:
pd.reset_option('display.max_rows')

In [None]:
data.rename(columns={'iyear':'Year','imonth':'Month','iday':'Day','country_txt':'Country','provstate':'state','region_txt':'Region','attacktype1_txt':'AttackType','target1':'Target','nkill':'Killed','nwound':'Wounded','summary':'Summary','gname':'Group','targtype1_txt':'Target_type','weaptype1_txt':'Weapon_type','motive':'Motive'},inplace=True)

In [None]:
data=data[['Year','Month','Day','Country','state','Region','city','latitude','longitude','AttackType','Killed','Wounded','Target','Summary','Group','Target_type','Weapon_type','Motive']]

In [None]:
data.head()

In [None]:
print(data.dtypes)

### What all manipulations have you done and insights you found?

### Given the dataset's extensive nature with 135 columns, which may be overwhelming for comprehensive learning, the decision has been made to enhance clarity and focus by renaming the columns for better understanding, subsequently extracting only the necessary features for streamlined analysis.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization with custom color
plt.figure(figsize=(12, 6))
attacks_per_year = data['Year'].value_counts().sort_index()
sns.lineplot(x=attacks_per_year.index, y=attacks_per_year.values)
plt.title('Number of Terrorist Attacks Over the Years')
plt.xlabel('Year')
plt.ylabel('Number of Attacks')
plt.show()

##### 1. Why did you pick the specific chart?

**The line chart was chosen to represent the number of terrorist attacks over the years. A line chart is suitable for showing trends and patterns over a continuous variable, in this case, the progression of attacks over different years.**

##### 2. What is/are the insight(s) found from the chart?

**The line chart visually depicts the trend in the number of terrorist attacks over the years. It helps in identifying whether there is a significant increase, decrease, or any noticeable pattern in the frequency of attacks.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart can be valuable for businesses, governments, or organizations involved in security and risk management. Understanding the trend in terrorist attacks over the years allows for better preparation, resource allocation, and planning to address security concerns. It can contribute to the development of strategies aimed at preventing and mitigating the impact of terrorist incidents.**

**For businesses operating in regions affected by terrorism, this information can be crucial for risk assessment and business continuity planning. It may influence decisions related to security investments, insurance coverage, and overall risk management.**

**Government agencies can use this data to enhance security measures, allocate resources effectively, and develop policies to counter terrorism. The insights gained can contribute to the formulation of evidence-based counterterrorism strategies.**

#### Chart - 2

In [None]:
pd.crosstab(data.Year, data.Region).plot(kind='area',figsize = (15,6))
plt.title('Terrorist Activities by Region in each Year')
plt.ylabel('Number of Attacks')
plt.show()

##### 1. Why did you pick the specific chart?

**The area plot depicting terrorist activities by region over each year was chosen. This chart is suitable for visualizing the trends and patterns of terrorist activities in different regions over time. The stacked areas provide a sense of the overall volume and the relative contribution of each region to the total.**

##### 2. What is/are the insight(s) found from the chart?

**The chart offers insights into how terrorist activities have evolved over the years across different regions. By observing the areas under each curve, one can identify regions that have consistently experienced high levels of terrorist activities and those that have seen fluctuations. It also helps in comparing the overall distribution of attacks across regions.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart can be beneficial for various stakeholders, including government agencies, security organizations, and businesses operating in regions prone to terrorism.**

**Government Agencies: Government bodies can use this information to allocate resources effectively and prioritize regions that require heightened security measures. It aids in the development of regional-specific counterterrorism strategies.**

**Security Organizations: Private security firms can tailor their services based on the historical patterns of terrorist activities in different regions. Understanding the dynamics allows for better preparation and risk mitigation strategies.**

**Businesses: Companies operating in regions with a history of terrorism can use this information for risk assessment and business continuity planning. It helps in identifying regions with higher security risks, allowing businesses to implement targeted security measures.**

#### Chart - 3

In [None]:
# Filter the data for the top 10 years
top_10_years = data['Year'].value_counts().head(10).index
filtered_data = data[data['Year'].isin(top_10_years)]

# Group by year and calculate the sum of killed and wounded
casualties_per_year = filtered_data.groupby('Year')[['Killed', 'Wounded']].sum()

# Create a stacked bar chart
casualties_per_year.plot(kind='bar', stacked=True, figsize=(12, 6), colormap='viridis')
plt.title('Casualties (Killed and Wounded) in the Top 10 Years')
plt.xlabel('Year')
plt.ylabel('Number of Casualties')
plt.show()


##### 1. Why did you pick the specific chart?

**The stacked bar chart was chosen to represent the casualties (both killed and wounded) over the top 10 years. This chart is effective for visualizing the total impact of terrorist attacks on casualties over a specific period, broken down by the number of killed and wounded individuals.**

##### 2. What is/are the insight(s) found from the chart?

**The chart provides a visual overview of the casualties (killed and wounded) caused by terrorist attacks in the top 10 years. The stacked bars show the contribution of each year to the total number of casualties, allowing for comparison and identification of the years with the highest impact.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart can be valuable for various stakeholders, including government agencies, security organizations, and businesses operating in regions affected by terrorism.**

**Government Agencies: Security agencies can use this information to assess the overall impact of terrorist attacks on public safety. It helps in resource allocation, emergency response planning, and the development of policies aimed at reducing casualties.**

**Security Organizations: Private security firms can tailor their services based on the historical patterns of casualties. Understanding the trends allows for better preparation and risk mitigation strategies to minimize the impact of future attacks.**

**Businesses: Companies operating in regions with a high risk of terrorism can use this information for risk assessment and business continuity planning. It helps in understanding the potential impact on employees and infrastructure, allowing businesses to implement measures to enhance safety.**

#### Chart - 4

In [None]:
# Get the top 5 target types
top_target_types = data['Target_type'].value_counts().nlargest(5)

# Create a pie chart for the distribution of the top 10 target types
plt.figure(figsize=(8, 8))
top_target_types.plot.pie(autopct='%1.1f%%', colors=sns.color_palette('viridis'), startangle=90)
plt.title('Distribution of Top 5 Target Types')
plt.ylabel('')
plt.show()


##### 1. Why did you pick the specific chart?

**A pie chart was chosen to represent the distribution of the top 5 target types. Pie charts are effective for displaying the proportion of different categories within a whole. In this case, it helps visualize the percentage distribution of attacks across the selected target types.**

##### 2. What is/are the insight(s) found from the chart?

**The chart provides insights into the proportion of terrorist attacks targeting different types of entities. By looking at the slices of the pie, one can quickly grasp the relative significance of each target type within the top 5.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Security Planning: The insights gained from this chart can be valuable for security planning and resource allocation. Understanding which types of targets are more frequently attacked allows for the implementation of targeted security measures.**

**Risk Management: Businesses and organizations can use this information for risk assessment. Knowing the types of targets that are more susceptible to attacks helps in developing risk mitigation strategies.**

**Public Awareness: The information from this chart can also contribute to public awareness and education. By understanding the common targets, the public, as well as relevant authorities, can be better prepared and vigilant.**

#### Chart - 5

In [None]:
# Create a word cloud for the most common target keywords
plt.figure(figsize=(12, 8))
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(' '.join(data['Target'].dropna()))
plt.imshow(wordcloud, interpolation='bilinear')
plt.title('Word Cloud: Most Common Target Keywords')
plt.axis('off')
plt.show()


##### 1. Why did you pick the specific chart?

**The word cloud chart is chosen to visually represent the most common target keywords in a textual dataset. It provides a quick and intuitive overview of the prominent words by emphasizing the size of the words based on their frequency.**

##### 2. What is/are the insight(s) found from the chart?

**The word cloud reveals the words or phrases that occur most frequently in the "Target" column of the dataset. Larger words in the cloud indicate higher frequency. It helps identify patterns, trends, or recurring themes in the targets of terrorist attacks.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from the word cloud can be valuable for understanding the common targets of terrorist attacks. This information might be useful for businesses or organizations involved in risk management, security, or international affairs. It can aid in making informed decisions related to security measures, threat assessments, and preparedness strategies, potentially contributing to a safer environment.**

**I am truly grateful for the opportunity to share my insights and knowledge.**