In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

Project Name: Global Terrorism EDA

Contributor: Meghna Phanse

Contribution: Individual

In [None]:
df = pd.read_excel("/kaggle/input/global-terrorism-index-2023/Global Terrorism Index 2023.xlsx")

# First Look at Dataset

In [None]:
df.head()

# Introduction 

The dataset "Global Terrorism Index 2023" is an Excel file that likely contains information related to the Global Terrorism Index (GTI) for the year 2023. This index is a comprehensive measure of the impact of terrorism on countries worldwide. It provides insights into the frequency, severity, and overall impact of terrorist incidents, along with rankings and scores for various nations. Below is an introduction to the variables present in this dataset:
1. iso3c: This variable represents the ISO 3166-1 alpha-3 country code, which is a three-letter code used to uniquely identify countries or territories. It is a standardized way of identifying nations.

2. Country: This variable contains the names of the countries or territories included in the dataset. It represents the locations for which the Global Terrorism Index is calculated.

3. Rank: This variable denotes the rank of each country or territory in the Global Terrorism Index for the year 2023. The rank indicates a country's position relative to others in terms of the impact of terrorism, with lower ranks typically indicating a lower impact.

4. Score: The score variable represents the Global Terrorism Index score assigned to each country or territory for the year 2023. This score quantifies the overall impact of terrorism, with higher scores indicating a greater impact.

5. Incidents: This variable likely indicates the total number of terrorist incidents recorded in each country or territory during the year 2023. It provides insight into the frequency of terrorist activities.

6. Fatalities: The fatalities variable is likely a count of the total number of fatalities resulting from terrorist incidents in each country or territory during 2023. It measures the human cost of terrorism.

7. Injuries: This variable probably represents the total number of injuries caused by terrorist incidents in each country or territory for the year 2023. It provides information about the physical harm caused by terrorism.

8. Hostages: The hostages variable may indicate the number of individuals taken as hostages in terrorist incidents within each country or territory in 2023. This variable relates to the impact on individuals' safety and security.

9. Year: This variable specifies the year for which the data is recorded, which in this case is likely 2023. It's essential for tracking changes over time.

# Importing Libraries

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from datetime import datetime
from datetime import date
import folium

# Knowing Dataset

In [None]:
df.columns

In [None]:
df.isnull().sum()

In [None]:
df.info()

In [None]:
# Checking Duplicate columns
df.columns.value_counts()

In [None]:
df.describe()

There seem to be no Null Values or duplicate columns.

In [None]:
df.sample(6)

# Check Unique values for variables

In [None]:
df[df['Country']=='Algeria'] # experimenting

It appears to display historical data for Algeria from 2012 to 2022, including information such as the Global Terrorism Index (Rank and Score), the number of incidents, fatalities, injuries, and hostages for each year. This subset allows you to see how these variables have changed over time in Algeria.

In [None]:
df[df['Fatalities'] == df['Hostages']] #checking for correlations

It appears that there are instances where the number of fatalities is equal to the number of hostages. This could suggest that in these cases, all hostages ended up as fatalities, indicating a particularly tragic outcome in these terrorist incidents.

In [None]:
len(df[df['Fatalities'] == df['Hostages']])

In [None]:
df[df['Incidents']== 0]

It shows countries and years where there were no reported terrorist incidents. This could be useful for identifying regions or time periods with low terrorist activity.

In [None]:
df['Country'].unique()

In [None]:
df['iso3c'].unique()

In [None]:
df['Incidents'].max()

In [None]:
df['Fatalities'].max()

In [None]:
df['Fatalities'].idxmax()

In [None]:
df.iloc[652]

In [None]:
df['Country'].value_counts()[:3]

In [None]:
len(df[df['Incidents'] == 0])

In [None]:
len(df[df['Fatalities'] == 0])

Insights from the above manipulations:

Maximum number of incidents recorded in the dataset, which is 1,673. This suggests that there was a particularly high number of terrorist incidents in at least one of the years or countries represented in the dataset. 

Maximum number of fatalities recorded in the dataset, which is 4,514. This indicates that in one of the instances, there was a very high number of fatalities resulting from a terrorist incident.

Highest number of fatalities occurred in Iraq in the year 2016, with a Global Terrorism Index rank of 1 and a score of 10.0. It had 1,545 incidents, 4,514 fatalities, 4,514 injuries, and 20 hostages.

It appears that Iraq, Vietnam, and Oman each have 11 entries. This suggests that the dataset contains extensive information about these countries' experiences with terrorism.

There are 1,082 such years, indicating that a significant portion of the dataset represents years with no recorded terrorist incidents. There are 1,269 such years, indicating that a substantial portion of the dataset represents years with no recorded fatalities resulting from terrorist incidents.

# Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables

In [None]:
# correlation matrix
cm = df[['Rank', 'Score', 'Incidents', 'Fatalities', 'Injuries', 'Hostages', 'Year']].corr()
f, ax = plt.subplots(figsize = (10,8))
sns.heatmap(cm, vmax=0.9, square = True, annot = True)

In [None]:
# Calculate the correlation matrix
correlation_matrix = df[['Score', 'Incidents', 'Fatalities', 'Injuries', 'Hostages']].corr()

# Print the correlation matrix
print(correlation_matrix)

A high positive correlation between "Fatalities - Injuries" and "Incident - Fatalities" suggests that when there are more fatalities in a terrorist incident, there are also more injuries. In other words, incidents with higher fatalities tend to result in more injuries. This may indicate that more severe incidents tend to cause both fatalities and injuries, possibly due to the use of more lethal methods or larger-scale attacks.

If an incident results in fatalities, it is likely to involve a larger number of injured individuals, which may require more extensive medical and emergency response resources.

While correlation indicates a statistical relationship, it does not imply causation. Further investigation and analysis are needed to understand the underlying factors driving this correlation. Factors such as the type of attack, the use of explosives or weaponry, the target location, and security measures can all influence the number of fatalities and injuries in terrorist incidents.

Incidents and Fatalities have a high correlation of 0.860467, suggesting that as the number of incidents increases, the number of fatalities tends to increase as well.

Similarly, Incidents and Injuries have a high correlation of 0.863206, indicating a strong positive relationship between these two variables.

Fatalities and Injuries also have a high correlation of 0.912888, which is the highest in this matrix. This suggests that incidents with more fatalities also tend to have more injuries.

Hostages has the lowest correlations with all other variables, suggesting that the number of hostages taken may not be strongly influenced by the other variables in this dataset.

In [None]:
# Group the data by year and sum the incident counts
yearly_totals = df.groupby('Year')['Incidents'].sum().reset_index()

# Create a line chart to visualize the trend
plt.figure(figsize=(10, 6))
plt.plot(yearly_totals['Year'], yearly_totals['Incidents'], marker='o', linestyle='-', color='red')
plt.title('Terrorism Incidents Over Time (2012-2022)')
plt.xlabel('Year')
plt.ylabel('Total Incidents')
plt.grid(True)

# Show the plot
plt.show()

The graph shows an overall upward trend in the total number of incidents over this period, indicating an increase in terrorism incidents worldwide.

The year 2020 stands out with the highest number of incidents, exceeding 5500. This suggests a significant surge in such incidents during that year.

Conversely, 2016 had the lowest number of incidents, with just over 3500. This indicates a relatively peaceful period compared to other years.

There’s a sharp increase in incidents from 2018 to 2020, which might suggest a period of heightened global conflict or instability.

In [None]:
# Group the data by country and calculate the sum of incidents
total_incidents_by_country = df.groupby('Country')['Incidents'].sum().reset_index()

# Sort the data in descending order to get the top 5 countries with the highest total incidents
top_5_countries = total_incidents_by_country.sort_values(by='Incidents', ascending=False).head(5)

# Print the top 5 countries with the highest number of total incidents
print(top_5_countries)

Iraq has the highest number of incidents, with a total of 11,183. This suggests that Iraq has been the most affected by these incidents.

Afghanistan follows next with 4,443 incidents, which is less than half of those in Iraq but still significant.

Pakistan and Somalia have also experienced a high number of incidents, with 3,358 and 2,951 respectively.

India is listed as well with a total of 2,872 incidents.

In [None]:
df['Year'] = df['Year'].astype(int)

# Group the data by year and country and calculate the sum of incidents
yearly_totals = df.groupby(['Year', 'Country'])['Incidents'].sum().reset_index()

# Pivot the table to have years as columns and countries as rows
pivot_table = yearly_totals.pivot(index='Country', columns='Year', values='Incidents').fillna(0)

# Calculate the increase in incidents for each country from 2012 to 2022
increase = pivot_table[2022] - pivot_table[2012]

# Sort countries based on the increase in incidents in descending order
top_countries = increase.sort_values(ascending=False).head(5).index

# Filter the data for the top 5 countries
top_countries_data = pivot_table.loc[top_countries]

# Create a line chart to visualize the trend
plt.figure(figsize=(12, 8))

for country in top_countries_data.index:
    plt.plot(top_countries_data.columns, top_countries_data.loc[country], marker='o', linestyle='-', label=country)

plt.title('Trend of Incidents for Top 5 Countries with Highest Increase (2012-2022)')
plt.xlabel('Year')
plt.ylabel('Total Incidents')
plt.legend(loc='upper right')
plt.grid(True)

# Show the plot
plt.show()

Myanmar shows the highest increase in incidents, with a sharp rise from 2018 to 2022. This suggests a significant escalation in incidents during this period.

Both Burkina Faso and Mali show a steady increase in incidents over the years, indicating a persistent issue in these countries.

Chile shows a slight increase in incidents from 2012 to 2018, but then a decrease from 2018 to 2022. This could indicate an improvement in the situation or effective countermeasures implemented.

Somalia shows a slight increase in incidents over the years, suggesting a gradual escalation.

In [None]:
# Ensure the "Year" column is treated as an integer
df['Year'] = df['Year'].astype(int)

# Group the data by year and country and calculate the sum of incidents
yearly_totals = df.groupby(['Year', 'Country'])['Incidents'].sum().reset_index()

# Pivot the table to have years as columns and countries as rows
pivot_table = yearly_totals.pivot(index='Country', columns='Year', values='Incidents').fillna(0)

# Calculate the decrease in incidents for each country from 2012 to 2022
decrease = pivot_table[2012] - pivot_table[2022]

# Sort countries based on the decrease in incidents in descending order
top_countries_decrease = decrease.sort_values(ascending=False).head(5).index

# Filter the data for the top 5 countries with the greatest decrease
top_countries_decrease_data = pivot_table.loc[top_countries_decrease]

# Create a line chart to visualize the trend of decreasing incidents
plt.figure(figsize=(12, 8))

for country in top_countries_decrease_data.index:
    plt.plot(top_countries_decrease_data.columns, top_countries_decrease_data.loc[country], marker='o', linestyle='-', label=country)

plt.title('Trend of Decreasing Incidents for Top 5 Countries (2012-2022)')
plt.xlabel('Year')
plt.ylabel('Total Incidents')
plt.legend(loc='upper right')
plt.grid(True)

# Show the plot
plt.show()

Iraq and Pakistan show a decreasing trend in incidents over the years. This could suggest that measures taken in these countries have been effective in reducing these incidents.

On the other hand, Russia, Afghanistan, and Yemen show an increasing trend in incidents. This indicates a worsening situation in these countries.

Iraq had the highest number of incidents in 2012, but the number has decreased significantly by 2022.

Yemen, despite having the lowest number of incidents in 2012, shows an increase over the years, with a notable rise by 2022.

In [None]:
# Initialize an empty DataFrame to store the results
top_countries_by_year = pd.DataFrame(columns=['Year', 'Country', 'Incidents'])

# Create a figure with subplots for each year
fig, axs = plt.subplots(nrows=3, ncols=4, figsize=(16, 12))
fig.suptitle('Top 10 Countries with Highest Incidents by Year (2012-2022)')

# Loop through the years from 2012 to 2022
for i, year in enumerate(range(2012, 2023)):
    # Filter the dataset for the current year
    incidents_year = df[df['Year'] == year]
    
    # Group the data by country and calculate the sum of incidents
    top_countries_year = incidents_year.groupby('Country')['Incidents'].sum().reset_index()
    
    # Sort the data in descending order to get the top 10 countries
    top_10_countries_year = top_countries_year.sort_values(by='Incidents', ascending=False).head(10)
    
        # Add the year to the results
    top_10_countries_year['Year'] = year
    
    # Concatenate the results for the current year to the overall results DataFrame
    top_countries_by_year = pd.concat([top_countries_by_year, top_10_countries_year], ignore_index=True)
    
    # Create a bar chart for the current year
    row, col = i // 4, i % 4
    ax = axs[row, col]
    ax.barh(top_10_countries_year['Country'], top_10_countries_year['Incidents'], color='pink')
    ax.set_title(f'Year {year}')
    ax.set_xlabel('Total Incidents')
    ax.set_ylabel('Country')
    
# Adjust layout
plt.tight_layout(rect=[0, 0.03, 1, 0.95])

# Show the plots
plt.show()

The countries that consistently appear in the top 10 across these years are Afghanistan, Pakistan, and Iraq. This suggests that these countries have been persistently affected by these incidents over this period.

In [None]:
# Group the data by year and calculate the sum of fatalities
total_fatalities_by_year = df.groupby('Year')['Fatalities'].sum().reset_index()

# Sort the data in descending order to get the top 5 years with the highest total fatalities
top_5_years = total_fatalities_by_year.sort_values(by='Fatalities', ascending=False).head(5)

# Print the top 5 years with the highest number of total fatalities
print(top_5_years)

The year 2015 had the highest number of fatalities, with a total of 10,881. This suggests that 2015 was the most deadly year among those listed.

2016 follows next with 10,372 fatalities, which is slightly less than 2015 but still significant.

2013 and 2014 also had a high number of fatalities, with 10,317 and 10,129 respectively.
2012 is listed as well with a total of 9,227 fatalities.

In [None]:
# Filter the dataset for the year 2015
fatalities_2015 = df[(df['Year'] == 2015)]

# Group the data by country and calculate the sum of fatalities for 2015
fatalities_by_country_2015 = fatalities_2015.groupby('Country')['Fatalities'].sum().reset_index()

# Sort the data in descending order to get the top 10 countries with the most fatalities in 2015
top_10_countries_2015 = fatalities_by_country_2015.sort_values(by='Fatalities', ascending=False).head(10)

# Print the top 10 countries with the most fatalities in 2015
print(top_10_countries_2015)

In [None]:
# Filter the dataset for the year 2015
fatalities_2015 = df[df['Year'] == 2015]

# Group the data by country and calculate the sum of fatalities for 2015
fatalities_by_country_2015 = fatalities_2015.groupby('Country')['Fatalities'].sum().reset_index()

# Sort the data in descending order to get the top 10 countries with the most fatalities in 2015
top_10_countries_2015 = fatalities_by_country_2015.sort_values(by='Fatalities', ascending=False).head(10)

# Create a bar chart to visualize the top 10 countries with the most fatalities in 2015
plt.figure(figsize=(12, 8))
plt.barh(top_10_countries_2015['Country'], top_10_countries_2015['Fatalities'], color='pink')
plt.title('Top 10 Countries with Most Fatalities in 2015')
plt.xlabel('Total Fatalities')
plt.ylabel('Country')

# Show the plot
plt.show()

Iraq had the highest number of fatalities, with a total of 2,974. This suggests that Iraq was the most affected by these incidents in 2015.

Nigeria follows next with 2,003 fatalities, which is slightly less than Iraq but still significant.

Afghanistan also had a high number of fatalities, with 1,008.

Other countries like Pakistan, Egypt, Yemen, Somalia, Cameroon, Turkey, and Libya also experienced a considerable number of fatalities ranging from 234 to 658.

In [None]:
# Filter the dataset for rows where "Rank" is equal to 1
rank_1_data = df[df['Rank'] == 1]

# Count the occurrences of each unique country with rank 1
rank_1_counts = rank_1_data['Country'].value_counts()

# Print the unique countries and their respective counts
for country, count in rank_1_counts.items():
    print(f'Country: {country}, Count: {count}')

Iraq has ranked first 7 times. This suggests that Iraq has frequently been at the top of this ranking.

Afghanistan has also ranked first, but less frequently than Iraq, with a total of 4 times.

In [None]:
# Filter the dataset for Iraq and Afghanistan
iraq_data = df[df['Country'] == 'Iraq']
afghanistan_data = df[df['Country'] == 'Afghanistan']

# Create a line plot to visualize incidents (you can change 'Incidents' to the desired variable)
plt.figure(figsize=(12, 8))
plt.plot(iraq_data['Year'], iraq_data['Incidents'], label='Iraq', marker='o', linestyle='-', color='black')
plt.plot(afghanistan_data['Year'], afghanistan_data['Incidents'], label='Afghanistan', marker='o', linestyle='-', color='red')

plt.title('Incidents in Iraq and Afghanistan (2012-2022)')
plt.xlabel('Year')
plt.ylabel('Incidents')
plt.legend()
plt.grid(True)

# Show the plot
plt.show()

Both Iraq (represented by the blue line) and Afghanistan (represented by the red line) show a downward trend in incidents from 2016 and 2018 respectively to 2022. This suggests a decrease in these incidents over these periods in both countries.

Iraq had peaks in the number of incidents in 2014 and 2016, suggesting that these years were particularly challenging.

Afghanistan had a peak in the number of incidents in 2018, indicating a significant increase in incidents during that year.

In [None]:
# Filter the dataset for Iraq and Afghanistan
iraq_data = df[df['Country'] == 'Iraq']
afghanistan_data = df[df['Country'] == 'Afghanistan']

# Create a line plot to visualize fatalities
plt.figure(figsize=(12, 8))
plt.plot(iraq_data['Year'], iraq_data['Fatalities'], label='Iraq', marker='o', linestyle='-', color='black')
plt.plot(afghanistan_data['Year'], afghanistan_data['Fatalities'], label='Afghanistan', marker='o', linestyle='-', color='red')

plt.title('Fatalities in Iraq and Afghanistan (2012-2022)')
plt.xlabel('Year')
plt.ylabel('Fatalities')
plt.legend()
plt.grid(True)

# Show the plot
plt.show()

Both Iraq (represented by the blue line) and Afghanistan (represented by the red line) show a significant number of fatalities over this period, indicating the severe impact of these incidents in both countries.

Iraq had the highest number of fatalities in 2014, with around 4,000. This suggests that 2014 was a particularly deadly year in Iraq.

Afghanistan had its highest number of fatalities in 2018, with around 3,500, indicating a significant surge in deadly incidents during that year.

In [None]:
total_incidents_by_country.head()

In [None]:
pip install geopandas

In [None]:
# Create a Folium map centered around the world
world_map = folium.Map(location=[0, 0], zoom_start=2)

# Display the map
world_map

In [None]:
import geopandas as gpd

In [None]:
# Create a Choropleth layer with the GeoJSON data (replace with your GeoJSON data)
folium.Choropleth(
    geo_data='/kaggle/input/world-geojson/custom.geo.json',  # Replace with your GeoJSON file path
    data=total_incidents_by_country,
    columns=['Country', 'Incidents'],
    key_on='feature.properties.name',
    fill_color='YlOrRd',  # Color scheme (YlOrRd for shades of red)
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Terrorist Incidents by Country',
).add_to(world_map)

In [None]:
world_map

# Conclusion

The exploratory data analysis of the global terrorism dataset has revealed several key insights into the nature of terrorist incidents and their impact over time. These insights provide a deeper understanding of the patterns, trends, and correlations within the dataset:

* Correlation Analysis:

1. The strong positive correlations between "Fatalities - Injuries," "Incident - Fatalities," and "Incidents - Injuries" highlight the interconnectedness of these variables. This suggests that incidents resulting in more fatalities tend to also cause a higher number of injuries, indicating a higher level of severity.

2. Conversely, the "Hostages" variable shows the lowest correlations with other variables, suggesting that the number of hostages taken may not be strongly influenced by the presence of other variables in the dataset.

* Temporal Trends:

1. The analysis of temporal trends revealed an overall upward trajectory in the total number of incidents, indicating a global increase in terrorism incidents over time. The year 2020 stands out as a period of significant surge in incidents, while 2016 was comparatively peaceful.

2. There was a notable escalation in incidents from 2018 to 2020, indicating a period of heightened global conflict or instability.

* Country-Specific Findings:

1. Iraq emerged as the most affected country, experiencing the highest number of incidents and fatalities. Afghanistan, Pakistan, and Somalia also faced significant challenges in terms of the number of incidents.

2. Myanmar exhibited a sharp rise in incidents from 2018 to 2022, indicating a significant escalation in violence during this period. Burkina Faso and Mali showed persistent issues with a steady increase in incidents.

3. In contrast, Iraq and Pakistan demonstrated a decreasing trend in incidents, potentially suggesting effective measures to reduce such incidents. Conversely, Russia, Afghanistan, and Yemen experienced an increasing trend in incidents, reflecting a worsening situation in these countries.

* Yearly Analysis:

1. The yearly analysis revealed fluctuating trends in incidents and fatalities, with some years experiencing peaks, signifying particularly challenging periods.

2. Iraq and Afghanistan showed significant fluctuations in both incidents and fatalities, with several years experiencing intense violence.