<a href="https://colab.research.google.com/github/bharathkumar7887/global-terrorism/blob/main/global_terrorism_analysis_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - ***Global Terrorism Dataset***



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project summary**


The objective of this project is to conduct an exploratory data analysis (EDA) of the Global Terrorism Database (GTD), an open-source dataset that contains comprehensive information on both domestic and international terrorist attacks occurring globally from 1970 through 2017. Developed and maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism (START) at the University of Maryland, the database encompasses details of over 180,000 recorded terrorist incidents. The project's principal aim is to dig deep into this expansive dataset, identify significant trends, patterns, and insights pertaining to terrorism-related activities, and present these discoveries visually for an enhanced understanding.

A critical aspect of this project is the extensive use of Python libraries tailored for data analysis and visualization. The cornerstone of data manipulation, including loading the dataset, cleaning data, and executing sophisticated aggregation operations, will be the Pandas library. This powerful, high-performance tool offers efficient data structures and makes the handling of large datasets effortless.

To facilitate advanced numerical operations and speed up computation, the project employs the NumPy library. Given its proficiency in handling multi-dimensional arrays and matrices, NumPy is the perfect companion for data processing operations.

The project doesn't stop at numerical data analysis; it brings the extracted insights to life through vivid, informative visualizations, courtesy of the Matplotlib and Seaborn libraries. These libraries provide an array of visualization styles, enabling the display of data in ways that are both appealing and informative. From bar plots and scatter plots to histograms and heatmaps, the project will utilize a minimum of five different visualizations to reveal relationships between variables and provide a graphical representation of the dataset's characteristics.

Exploring the GTD through this project will pave the way for an intricate understanding of terrorism patterns over the past decades. The goal is to unveil potential trends in attack frequency, most targeted countries, preferred methods of attack, types of weapons used, casualties, and the evolution of terrorist organizations, among other relevant dimensions.

By examining these factors, the project aims to provide a detailed overview of global terrorism trends, informing counter-terrorism strategies and policies. Additionally, the findings may also help understand the characteristics of regions prone to attacks and the reasons behind their vulnerability.

In conclusion, this project offers a data-driven exploration into the dark world of terrorism, aiming to shed light on the complex patterns hidden within the enormity of the GTD. The end product of this project will be an array of valuable insights that have the potential to contribute substantially to ongoing counter-terrorism efforts and inform future research in this field. The combination of data manipulation, numerical computation, and graphic visualization is expected to yield a robust and comprehensive exploration of the dataset, leading to substantial key findings pertaining to global terrorism.

# **GitHub Link -**

# **Problem Statement**


Using exploratory data analysis (EDA) techniques on the GTD, identify the hot zones of terrorism globally and discern the evolving patterns of terrorist activities. What insights related to security issues can be derived from this analysis that could be instrumental in shaping counter-terrorism strategies?

#### **Define Your Business Objective?**

The business objective of this project is to leverage the data contained within the Global Terrorism Database (GTD) to derive actionable insights into terrorist activities worldwide from 1970 to 2017. By conducting a comprehensive exploratory data analysis (EDA), the goal is to identify the key patterns, trends, and correlations related to global terrorism, thereby enabling better-informed decision-making for security analysts, policy-makers, and counter-terrorism agencies.

Specifically, the objectives include:

Identification of global "hot zones" for terrorist activities: By determining the most affected regions, we can better understand where resources might be best allocated to prevent future attacks.

Analysis of frequency and intensity of attacks: Understanding how these have evolved over time can provide insights into the changing dynamics of terrorism and allow for more accurate risk assessments.

Examination of methodologies and weapons used in attacks: This can shed light on the operational preferences of terrorist organizations and potentially provide early indicators of future threats.

Assessment of casualty trends: This can help identify the most devastating types of attacks and allow for targeted response planning to minimize human loss.

Unveiling patterns related to terrorist organizations: This can potentially aid in understanding their strategies, thereby supporting intelligence agencies in their counter-terrorism efforts.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


5. You have to create at least 20 logical & meaningful charts having important insights.

[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]







# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from matplotlib import animation
pd.options.mode.chained_assignment = None
import seaborn as sns

from geopy.geocoders import Nominatim # # convert an address into latitude and longitude values
import folium

### Dataset Loading

In [None]:
#mountin the drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Importing the dataset
data_path='/content/drive/MyDrive/eda_project/Global_Terrorism_Data.csv'
terrorism_df=pd.read_csv(data_path,encoding='ISO-8859-1' )

### Dataset First View

In [None]:
# Dataset First
terrorism_df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns
rows,columns=terrorism_df.shape
print(f'rows ={rows}')
print(f'columns={columns}')


### Dataset Information

In [None]:
# Dataset Info
terrorism_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_rows=terrorism_df.duplicated().sum()
print(f'duplicate rows={duplicate_rows}')

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(terrorism_df.isnull().sum())

In [None]:
# Visualizing the missing values

# Checking Null Value by plotting Heatmap
sns.heatmap(terrorism_df.isnull(), cbar=False)

### What did you know about your dataset?

eventid: Unique ID for each event or terrorist attack.

iyear: Year the terrorist attack occurred.

imonth: Month the terrorist attack occurred.

iday: Day the terrorist attack occurred.

country_txt: Name of the country where the terrorist attack occurred.

region_txt: Name of the region where the terrorist attack occurred.

city: City where the terrorist attack occurred.

attacktype1_txt: The general method of attack employed.

target1: The specific person, building, installation, etc., that was targeted.

nkill: Number of confirmed fatalities for the incident.

nwound: Number of confirmed non-fatal injuries.

gname: Name of the group that carried out the attack.


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
terrorism_df.columns

In [None]:
# Dataset Describe
terrorism_df.describe(include='all')

### Variables Description

**eventid** : unique id for each terrorist attcak or event

**iyear** : year the attck took place

**imonth**: month  the attck took place

iday : day the attck took place

**approxdate**
**extended**
**resolution**
**country **
**country_txt**
**region**
       ...
       'addnotes', 'scite1', 'scite2', 'scite3', 'dbsource', 'INT_LOG',
       'INT_IDEO', 'INT_MISC', 'INT_ANY', 'related'

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable(most important)

# unique countries
unique_countries=terrorism_df['country_txt'].unique()
print(unique_countries)

In [None]:
# unique year
unique_year=terrorism_df['iyear'].unique()
print(unique_year)

In [None]:
#unique wepons
unique_attack_type=terrorism_df['attacktype1_txt'].unique()
print(unique_attack_type)

In [None]:
#unique terrorist groups
unique_terrorist_groups=terrorism_df['gname'].unique()
print(unique_terrorist_groups)

## 3. ***Data Wrangling***

In [None]:
# renaming columns to make more sense
terrorism_df.rename(columns={'iyear':'Year', 'imonth':'Month', 'iday':'Day','country_txt':'Country','region_txt':'Region','provstate':'State','attacktype1_txt':'Attack_type','targtype1_txt':'Target_type','gname':'Terrorist_group','weaptype1_txt':'Weapons_used','nkill':'People_killed','nwound':'People_wounded',},inplace=True)

In [None]:
print(list(terrorism_df.columns))

In [None]:
#selecting only specific columns which are related to our analyisis
selected_columns=['eventid', 'Year', 'Month', 'Day','Country' ,'Region', 'State', 'city', 'latitude', 'longitude','Attack_type','Target_type','natlty1_txt','Terrorist_group','Weapons_used','People_killed', 'People_wounded']
selected_df=terrorism_df[selected_columns]
selected_df.head()

In [None]:
# checking for null values
selected_df.isna().sum()

### Data Wrangling Code

### What all manipulations have you done and insights you found?

Since it contains 135 columns. They have a huge proportion in dataset and Learning them doesn't make any sense. So, we will rename the columns name for better understaning and then we will only extract necessary columns.



## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

**1) What is the trend of terrorist attacks over the years?**

In [None]:
f = plt.figure(figsize=(14, 7))

sns.set(font_scale=1.1)
year_count = sns.countplot(x='Year', data=selected_df, palette='flare')
year_count.set_xticklabels(year_count.get_xticklabels(), rotation=70)
plt.ylabel('No of attacks', fontsize=12)
plt.xlabel('Year', fontsize=12)
plt.title('Number of Terrorist Attacks Year by Year', fontsize=12)

**1. Why did you pick the specific chart?**

**2) Which are the top ten countries with most the terrorist attacks**?

In [None]:
fig= plt.figure(figsize=(13, 7))
terror_country = sns.barplot(x=selected_df['Country'].value_counts()[0:10].index, y=selected_df['Country'].value_counts()[0:10], palette='RdYlGn')
terror_country.set_xticklabels(terror_country.get_xticklabels(), rotation=70)
terror_country.set_xlabel('Country', fontsize=12)
terror_country.set_ylabel('No of attacks', fontsize=12)
plt.title('Top 10 Countries: Most Attacks by Terrorist Groups', fontsize=12)
plt.show()

**3) Which type of terrorist attack has resulted in the most no of deaths?**

**2) What are the main places where terrorist attacks happen?**

In [None]:
from wordcloud import WordCloud
cities = selected_df.Target_type.dropna(False)
plt.subplots(figsize=(20,10))
wordcloud = WordCloud(background_color = 'white',
                     width = 512,
                     height = 384,).generate(' '.join(cities))
plt.axis('off')
plt.imshow(wordcloud)
plt.title('Most Popular Targets',
        fontdict={'family': 'serif',
        'color':  'black',
        'weight': 'bold',
        'size': 26,})
plt.show()
