<a href="https://colab.research.google.com/github/Naivaidya3008/GTD-project/blob/main/GTD_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Project Name - Global Terrorism Database project

Project Type - EDA/Global Terrorism Database project
Contribution - Team
Team Member 1 - NAIVAIDYA TRIPATHI
Team Member 2 - MOHIT
Team Member 3 - SOHIL SINGHANIA
Team Member 4 - TILAK R

# **Project Summary -**



---

The objective of the project is to conduct an exploratory data analysis (EDA) of the Global Terrorism Database (GTD), an open-source dataset that contains comprehensive information on both domestic and international terrorist attacks occurring globally from 1970 through 2017. Developed and maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism (START) at the University of Maryland, the database encompasses details of over 180,000 recorded terrorism incidents. The project's principal aim is to delve deep into this expansive dataset, identify significant trends, patterns, and insights pertaining to terrorism-related activities, and visually present these discoveries for an enhanced understanding.

A critical aspect of this project is the extensive use of Python libraries for data analysis and visualization. The cornerstone of data manipulation, including loading the dataset, cleaning data, and executing sophisticated aggregation operations, will be the Pandas library. This powerful, high-performance tool offers efficient data structures and makes the handling of large datasets effortless.

To facilitate advanced numerical operations and speed up computation, the project employs the NumPy library. Given its proficiency in handling multi-dimensional arrays and matrices, NumPy is the perfect companion for data processing operations.

The project doesn't stop at numerical data analysis; it brings the extracted insights to life through vivid, informative visualizations, courtesy of the Matplotlib and Seaborn libraries. These libraries provide an array of visualization styles, enabling the display of data in ways that are both appealing and informative. From bar plots and scatter plots to histograms and heatmaps, the project will utilize a minimum of five different visualizations to reveal relationships between variables and provide a graphical representation of the dataset's characteristics.

Exploring the GTD through this project will pave the way for an intricate understanding of terrorism patterns over the past decades. The goal is to unveil potential trends in attack frequency, most targeted countries, preferred methods of attack, types of weapons used, casualties, and the evolution of terrorist organizations, among relevant dimensions.

By examining these factors, the project aims to provide a detailed overview of global terrorism trends, informing counter-terrorism strategies and policies. Additionally, the findings may also help understand the characteristics prone to attacks and the reasons behind their vulnerability.

In conclusion, this project offers a data-driven exploration into the dark world of terrorism, aiming to shed light on the complex patterns hidden within the enormity of the GTD. The end product of this project will be an array of valuable insights that have the potential to contribute substantially to ongoing counter-terrorism efforts and inform future research in this field. The combination of data manipulation, numerical computation, and graphic visualization is expected to yield a robust and comprehensive exploration of the dataset, leading to substantial key findings pertaining to global terrorism.

---

# **GitHub Link -**

https://github.com/Naivaidya3008/GTD-project


# **Problem Statement**


Using exploratory data analysis (EDA) techniques on the Global Terrorism Database (GTD), we aim to identify the global hot zones of terrorism and discern evolving patterns in terrorist activities. This analysis may yield valuable insights related to security issues that could play a crucial role in shaping effective counter-terrorism strategies.

#### **Define Your Business Objective?**

The business objective of this project is to leverage the data contained within the Global Terrorism Database (GTD) to derive actionable insights into terrorist activities worldwide from 1970 to 2017. By conducting a comprehensive exploratory data analysis (EDA), the goal is to identify the key patterns, trends, and correlations related to global terrorism, thereby enabling better-informed decision-making for security analysts, policymakers, and counter-terrorism agencies.

Specifically, the objectives include:

Identification of global "hot zones" for terrorist activities. By determining the most affected regions, we can better understand where resources might be best allocated to prevent future attacks.

Analysis of the frequency and intensity of attacks: Understanding how these have evolved over time can provide insights into the changing dynamics of terrorism and allow for more accurate risk assessments.

Examination of methodologies and weapons used in attacks: This can shed light on the operational preferences of terrorist organizations and potentially provide early indicators of future threats.

Assessment of casualty trends: This can help identify the most devastating types of attacks and allow for targeted response planning to minimize human loss.

Unveiling patterns related to terrorist organizations: This can potentially aid in understanding their strategies, thereby supporting intelligence agencies in their counter-terrorism efforts.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px


### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive/')

In [None]:
data_set = "/content/drive/MyDrive/GTD Dataset/Global Terrorism Data.csv"
df = pd.read_csv(data_set, encoding="ISO-8859-1")

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
rows, cols = df.shape
print(f'There are {rows}rows and {cols}columns in the dataset.')

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_rows = df.duplicated().sum()
print(f'There are {duplicate_rows} duplicate row in the dataset. ')

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_values = df.isnull().sum()
print(missing_values)

In [None]:
# Visualizing the missing values
import missingno as msno

# Visualizing the missing values as a matrix
msno.matrix(df)



### What did you know about your dataset?

**DATASET SIZE**: The dataset is quite large, containing 181,691 entries or rows.

**FEATURE QUANTITY**: The dataset contains 135 features or columns.

**DATA TYPES**: The data set has a mix of data types. There are 55 features with floating point numbers(float64), 22 feature with integers(int64), and 58 features with objects(object). The object datatype in Pandas typically means the column contains string(text) data.

**MEMORY USAGE**: The dataset uses over 187.1 MB of memory.
MISSING Values: There are some columns with a large number of missing values. For example, the'appoxdate' column has 172,452 missing values and the 'related' column has 156,653 missing values. However, several columns do not have any missing values, such as 'eventid','iyear','imonth','iday','INT_LOG','INT_IDEO,'INT_MISC', and 'INT_ANY' .

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
columns = df.columns
print("Columns in Dataset:")

for column in columns:
    print(column)


In [None]:
# Dataset Describe
summary = df.describe()
print(summary)

### Variables Description

eventid: Unique ID for each event or terrorist attack.

iyear: Year the terrorist attack occcurred.

imonth: Month the terrorist attack occurred.

country_txt: Name of the country where the terrorist attack occured.

reigon_txt: Nmae of the reigon where the terroist attack occured.

city: City where the terrorist attack occurred.

attacktype_txt: The genral method of attack employed.

target1: The specific person, building, installing, etc.,that was targeted.

nkill: number of confirmed fatalities for the incident.

nwound: Number of confirmed non-fatal injuries.

gname: Name of the group that carried out the attack


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

unique_countries = df['country_txt'].unique()

print("Unique Countries:")
print(unique_countries)

print()

unique_year = df['iyear'].unique()
print("Unique Years:")
print(unique_year)



## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
print(df.isnull().sum())

In [None]:
pd.set_option('display.max_rows', None)
print(df.dtypes)

In [None]:
pd.reset_option('display.max_rows')

In [None]:
df.rename(columns={'iyear':'Year','imonth':'Month','iday':'Day','country_txt':'Country','provstate':'state','region_txt':'Region',
                       'attacktype1_txt':'AttackType','targtype1_txt':'Target','nkill':'fatalities','nwound':'injuries',
                       'summary':'Summary','gname':'Group','weaptype1_txt':'Weapon'},inplace=True)


In [None]:
df['Casualities'] = df['fatalities']+ df['injuries']

In [None]:
df_most_selected = df[['Year','Month','Day','Country','state','Region','AttackType','Target','fatalities','injuries','success','Summary','Group','Weapon','Casualities','city','longitude','latitude']]


In [None]:
# checking statits paramenter on continious variables of data frame
df_most_selected.describe()

In [None]:
df.columns

### What all manipulations have you done and insights you found?

since it contains 135 coloumns. They have a huge proportion in dataset and Learing them doesn't make any sense. So, we will rename the coulmns name for better understanding and then we will only extract necessary columns.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
custom_colors = ['#00FFFF', '#0000FF', '#336BFF', '#FF33E1', '#A733FF']

plt.figure(figsize=(15, 11))
sns.countplot(data=df, x='Year',)
plt.title('Count of Terrorist Activities Each Year')
plt.xticks(rotation=90)
plt.show()


##### 1. Why did you pick the specific chart?

A line plot was chosen because it provides an excellent visual representation of the trend over time



##### 2. What is/are the insight(s) found from the chart?

The insight that can be gained is the trend of terrorist activites over the years. We can see if the frequency of attacks is increasing , decreasing, or remaining realtively stable.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

These insights are crucial for predicting future,trends,which could help law enforcement and security agencies plan resources and strategies.However, if the trend shows an increase in terrorist activities, this could lead to a negative impact as it indicates a growing problem.

#### Chart - 2

In [None]:

cross_tab = pd.crosstab(df['Year'], df['Region'])

cross_tab.plot(kind='area', figsize=(15, 6))

plt.title('Terrorist Activities by Region in Each Year')
plt.xlabel('Year')
plt.ylabel('Number of Attacks')

plt.show()

##### 1. Why did you pick the specific chart?

A area chart italicized text effectively visualizes both the total number of terrorist attacks over the years and the contribution of each region to that total.

##### 2. What is/are the insight(s) found from the chart?

The chart shows the yearly variations in the number of attacks by region, helping to identify trends, security hotspots, and potential areas of concern or improvement over time.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The impact of insights depends on the specific findings; insights indicating declining attacks can positively impact business expansion, while rising attacks in key areas may lead to negative impacts necessitating risk management and potential disruptions.


#### Chart - 3

In [None]:


plt.figure(figsize=(15, 7))
sns.lineplot(data=df, x='Year', y='fatalities', estimator='sum')
plt.title('Number of People Killed by Terror Attacks')
plt.xticks(rotation=90)
plt.show()


##### 1. Why did you pick the specific chart?

A line plot was chosen to obeserve the trend of casualities over time.

##### 2. What is/are the insight(s) found from the chart?

The insight is the severity of terrorist activiities over the years in terms of human lives lost.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This coild influence policy making, disaster management planning, insurance, and healthcare provisions. An increasing trend could lead to negative growth by discouraging population stability, investment, and deveplopment.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
plt.figure(figsize=(15, 7))
sns.countplot(data=df, x='AttackType')
plt.title('AttackType')
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

A bar plot is used to compare the frequencies of different catagrories- in this case, attack types.

##### 2. What is/are the insight(s) found from the chart?

We can learn about the most commonly used methods in terroist attacks.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

These insights can help in developing and implementing measures to prevent and respond to these specific types of attacks . if certain types of attacks are prevalent, it may signify a failure to adequately address those threats, possibly leading to negative impacts.



#### Chart - 5

In [None]:
# Chart - 5 visualization codepli.figure(figsize(15,7))

group_data = df[df['Group'] != 'Unknown']['Group'].value_counts().head(10)

plt.figure(figsize=(12, 6))
sns.barplot(x=group_data.index, y=group_data.values)
plt.title('Top 10 Terrorist Groups with Highest Number of Attacks')
plt.xticks(rotation=90)
plt.xlabel('Group')
plt.ylabel('Count')
plt.show()




##### 1. Why did you pick the specific chart?

A bar plot is suitable for comparing the number of attackes by different terrorist groups


##### 2. What is/are the insight(s) found from the chart?

We can identify which groups are responseble for the terrorist activities.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This information coulb be  important for intelligence in prioritizing threats and foucusing their counter- terrorism efforts . If a paticular group is increasingly active, it could contribute to instablility and negative growth.


#### Chart - 6

In [None]:
terr=df.groupby(['Country'],as_index=False).count()

In [None]:
import plotly.express as px

# Assuming 'terr' is your DataFrame with attack data, and it has columns 'Country' and 'fatalities'
fig = px.choropleth(
    terr,
    locations='Country',
    locationmode='country names',
    color='fatalities',
    color_continuous_scale='Viridis',
    title='Total Fatalities by Country',
    hover_name='Country',
    range_color=(terr['fatalities'].min(), terr['fatalities'].max()),
)

fig.update_geos(
    projection_type="orthographic",
    showcoastlines=True,  # Show coastlines on the map
    coastlinecolor="Black",
    coastlinewidth=0.5,
)

fig.update_layout(
    geo=dict(
        showland=True,  # Show land areas
        landcolor="rgb(217, 217, 217)",
    )
)

fig.show()

##### 1. Why did you pick the specific chart?

##### 2. What is/are the insight(s) found from the chart?

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

#### Chart - 7

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Get user input for the specific year
user_year = int(input("Enter a year (1970-2017): "))

# Check if the input year exists in the dataset
if user_year in df['Year'].values:
    # Extract data for the specified year
    data_for_year = df.loc[df['Year'] == user_year]

    # Create a pie chart
    labels = ['Fatalities', 'Injuries']
    sizes = [data_for_year['fatalities'].values[0], data_for_year['injuries'].values[0]]
    colors = ['#FF0000', '#FFFF00']  # Customize colors here (yellow and red)
    explode = (0.1, 0)  # Explode the 1st slice (Fatalities)

    plt.figure(figsize=(6, 6))
    plt.pie(sizes, explode=explode, labels=labels, colors=colors, autopct='%1.1f%%', startangle=140)
    plt.title(f'Distribution of Fatalities and Injuries in {user_year}')

    # Display the plot
    plt.show()
else:
    print(f"\033[91mInvalid input: Data for the year {user_year} is not available in the dataset. Please enter a valid year.\033[0m")


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8 -

In [None]:
# Pair Plot visualization code
import matplotlib.pyplot as plt

# Assuming you have a DataFrame named 'df' with the relevant columns
# df['Year'] contains the years, df['Fatalities'] contains the number of fatalities, and df['Injuries'] contains the number of injuries

# Group the data by year and calculate the total fatalities and injuries for each year
yearly_data = df.groupby('Year')[['fatalities', 'injuries']].sum().reset_index()

# Create a line plot to show the trend over the years
plt.figure(figsize=(12, 6))
plt.plot(yearly_data['Year'], yearly_data['fatalities'], label='fatalities', marker='o')
plt.plot(yearly_data['Year'], yearly_data['injuries'], label='injuries', marker='o')

plt.xlabel('Year')
plt.ylabel('Count')
plt.title('Number of fatalities and Injuries Over the Years')
plt.legend()
plt.grid(True)

plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the exploratry data analysis conduted on the Global Terroism Dataset, there are several recommendations that could be provided to a client interested in using this information to decrease the impact of terrorism, and thereby meet the stated bussiness objective.

Focus on Hotspot Reigons: The reigons with the higest frequencies of terroist activities should be prioritize for intervention efforts. These reigons may need more securoty measures, targeted socio-economic programs to address root causes of terroism, or more substanical international assistance.

Understand Yearly Trends: Keeping track of the rise or fall of terrorist incidents over the years could help forcast potencial future threats and adjust counter-terrorism strategies accordingly.

Prior Major Threat Groups: Our analysis shows that certain terrorist groups are most active than others. Intelligence efforts should be concentrated on thesis high- impact groups to prevent future attacks.

target Most Common Attack Types: Understanding the most common types of attack used by terrorists can help in developing preventive measures and responses strategis. For instance, if bombing are the most common attack type, more resources could be directed towards bomb detection and disposal.

# **Conclusion**

The Exploratory Data Analysis (EDA)
 conduted on the Global terrorism Dataset(GTD) provided significant into trends and patterns in global
terrorist from 1970 through 2017. With the help of the python libraries Pandas, Mataplotlib,Seaborn, and Numpy, we were able to handel, visualize and intercept complex data related to terroist activities.

Through this analysis, we identified trends over time, regional hotspots, dominates terroist groups, and preferred modes of attacks. all these findings are crucial for devising effective counter-terrorism stratrgies and interventions.

The process underscored the power of data-driven decision-making.By using EDA, we were able to transform raw data into meaningful insights. For instance, understanding that certain reigons are more prone to terroist attacks or that specific terrorist groups are more active allows security agencies and policymakers to allocate
resoureces more efficientky, thereby potencially saving lives and property.

However, while this data analysis provides a robust foundation, it's important to acknowledge that addressing terrorism requires more than just understanding past data. it necessitaters a comprenhensive approch that includes current intelligence, geopolitical considerations, and on-the- ground realities.

To conclude,ths project demostrates the potential of data analysis in information and shaping counter-terrorism efforts. it provides a useful strating point for further study and action, emphansizing the importance of continuous data collection, analysis, and interpretation in tackling global security like teerorism.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***