<a href="https://colab.research.google.com/github/Dipu1764/Global-Terrorism-project/blob/main/Global_Terrorism_Data_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Global Terrorism Data



# **Project Summary -**

***Global Terrorism Analysis***

In this project, we analyzed a global terrorism dataset, focusing on trends, patterns, and insights using a range of data visualization techniques. The dataset included details on terrorist attacks like time, location, attack type, and targets.

**Temporal Trends:**

A bar chart of monthly attacks showed higher terrorist activity during specific months (e.g., June-August), suggesting seasonality in attacks.

Yearly trends indicated spikes in terrorist activities in certain regions, particularly South Asia and the Middle East, reflecting ongoing conflicts.

**Regional Distribution:**

Bar charts of attacks by region revealed that South Asia and the Middle East are the most impacted by terrorism. These regions face long-standing conflicts and instability.

The stacked bar chart comparing attack types across regions showed that bombings are the most common method worldwide, while armed assaults and kidnappings vary regionally.

**Attack Type and Targets:**

Governments, private citizens, and businesses are the most frequently targeted, as shown by the bar chart of target types. Terrorists aim to disrupt political and social stability.

**Terrorist Groups:**

A horizontal bar chart highlighted that groups like the Taliban and ISIS dominate terrorist activities globally, particularly in recent years.


# **GitHub Link -**

https://github.com/Dipu1764

# **Problem Statement**


***Problem Statement: Global Terrorism Data Analysis***

The aim of this project is to analyze global terrorism data to uncover patterns and trends related to terrorist activities worldwide. Using various visualization techniques, we explore the frequency and distribution of terrorist attacks based on different factors such as time (year, month), region, attack type, target type, and terrorist group.

**Specifically, we address the following questions:**

What are the temporal trends in terrorist attacks over the years and months?

Which regions and countries experience the highest frequency of terrorist attacks?

What are the most common types of terrorist attacks and their primary targets?

Which terrorist groups are responsible for the majority of attacks?

How do attack types vary by region, and how are they distributed over time?

By answering these questions, we aim to provide actionable insights into global **terrorism** trends, which could inform strategies for improving national security and **counter-terrorism** efforts

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





6. You may add more ml algorithms for model creation. Make sure for each and every algorithm, the following format should be answered.


*   Explain the ML Model used and it's performance using Evaluation metric Score Chart.


*   Cross- Validation & Hyperparameter Tuning

*   Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

*   Explain each evaluation metric's indication towards business and the business impact pf the ML model used.




















# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import time
from IPython.display import clear_output
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
# Load Dataset
terrorism = pd.read_csv('/content/Global Terrorism Data.csv', encoding='latin1')

### Dataset First View

In [None]:
# Dataset First Look
terrorism.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
terrorism.shape

### Dataset Information

In [None]:
# Dataset Info
terrorism.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
terrorism.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
terrorism.isnull().sum()

In [None]:
# Visualizing the missing values
terrorism.isnull().sum().plot(kind='bar', figsize=(10, 6))
plt.title('Missing Values in Terrorism Dataset')
plt.xlabel('Columns')
plt.ylabel('Number of Missing Values')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()


### What did you know about your dataset?

1 - **Shape of the Dataset:** The number of rows and columns -- 181691, 135

2 - **Preview of Data:** The head() method shows the first five rows.

3 - **Data Types & Missing Values:** The info() method displays data types and non-null counts.

4 - **Missing Values:** Displays the total missing values in each column.

5 - **Duplicate Rows:** Helps identify if there are any duplicate rows in the dataset.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
terrorism.columns

In [None]:
# Dataset Describe
terrorism.describe()

### Variables Description

**1 - columns Description** -- The columns attribute in a pandas DataFrame returns a list of all the column names (i.e., variables) present in the dataset.

This command helps you understand what kind of data the dataset captures.


**2 - describe() Description** -- The describe() function in pandas generates descriptive statistics for the numerical columns in the DataFrame. It provides an overview of the central tendency, spread, and shape of the distribution of the dataset’s numerical features.

**A - Central Tendency:** By looking at the mean and median, you can understand the central point around which your data is clustered.

**B - Dispersion:** The standard deviation (std) and quartiles give you an idea of how spread out the values are.

**C - Extremes:** The min and max values show the range of data.

**D - Missing Values:** If the count is less than the total number of rows, that indicates missing values in that particular column.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_values = terrorism.nunique()
print(unique_values)

# Find the unique values in the 'country' column
unique_countries = terrorism['country'].unique()
print(unique_countries)


# Find the number of unique values in all categorical columns
categorical_columns = terrorism.select_dtypes(include=['object']).nunique()
print(categorical_columns)


# Find the number of unique values in all numerical columns
numerical_columns = terrorism.select_dtypes(include=['number']).nunique()
print(numerical_columns)


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# 1. Renaming columns for better readability (optional)
terrorism.rename(columns={
    'iyear': 'year',
    'imonth': 'month',
    'iday': 'day',
    'nkill': 'num_killed',
    'nwound': 'num_wounded'
}, inplace=True)

# 2. Handling Missing Values
# Option 1: Drop rows with missing values
terrorism_cleaned = terrorism.dropna()

# Option 2: Fill missing values with a default value
terrorism['num_killed'].fillna(0, inplace=True)
terrorism['num_wounded'].fillna(0, inplace=True)

# 3. Dropping Irrelevant Columns
# You might want to drop some columns that are irrelevant for your analysis
columns_to_drop = ['eventid', 'propcomment', 'scite1', 'scite2', 'scite3']
terrorism_cleaned = terrorism.drop(columns=columns_to_drop)

# 4. Converting Data Types
terrorism['year'] = terrorism['year'].astype(int)

# 5. Filtering the Data
terrorism_filtered = terrorism[terrorism['year'] >= 2000]

# 6. Creating New Columns
terrorism['total_victims'] = terrorism['num_killed'] + terrorism['num_wounded']

# 7. Dropping duplicate rows
terrorism_cleaned = terrorism.drop_duplicates()

# 8. Resetting the index after dropping rows
terrorism_cleaned.reset_index(drop=True, inplace=True)

# 9. Display cleaned dataset
print(terrorism_cleaned.head())


### What all manipulations have you done and insights you found?

**1 - Renaming Columns:**
We renamed columns to make them more understandable and intuitive:

Insight:
Clear and readable column names make the dataset easier to work with and interpret. This step improves code readability but does not directly yield insights.

**2 - Handling Missing Values:**
We either dropped rows with missing values using dropna() or filled missing values in the num_killed and num_wounded columns with 0 using fillna(0).

Insight:
Handling missing values ensures that statistical calculations (like sums and averages) are accurate. If many rows were dropped, it could indicate poor data quality, especially in columns with critical information like num_killed. Filling missing values with zeros ensures that we treat those rows as non-lethal or non-injury incidents.

**3 - Dropping Irrelevant Columns:**
We dropped columns that are likely irrelevant for analysis, such as:

Insight:
These columns may not contribute useful information for certain types of analysis (e.g., these might be textual or reference-based columns). By dropping them, we focus on the key variables like attack type, region, casualties, etc.

**4 - Converting Data Types:**
We ensured that the year column is of integer type.

Insight:
This conversion ensures that year-based operations (e.g., filtering by year or time series analysis) work properly. If a column is misrepresented as a string, this could lead to issues when performing numerical or date-based operations.

**5 - Filtering Data:**
We filtered the dataset to only include incidents that occurred from the year 2000 onward.

Insight:
Filtering by year allows us to focus on more recent events. This could help in trend analysis or understanding modern terrorism patterns. For example, comparing attack types before and after 2000 could reveal shifts in global terrorism strategies.

**6 - Creating New Columns:**
We created a new column total_victims by adding num_killed and num_wounded.

Insight:
This new column provides a comprehensive measure of the impact of each attack. You can now sort or group data based on total victims to understand which regions or attack types are the deadliest. It can also help in summarizing the human cost of terrorism.

**7 - Dropping Duplicates:**
We removed any duplicate rows in the dataset using drop_duplicates().

Insight:
Removing duplicates ensures that the analysis isn't biased by repeated records of the same event. This step improves data quality and prevents inflated statistics (e.g., counting the same attack multiple times).

**8 - Resetting Index:**
After dropping rows, we reset the index to maintain a clean DataFrame.

Insight:
This step doesn’t provide new insights but helps maintain the cleanliness of the DataFrame after rows are dropped or filtered out.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 Which year had the highest number of terrorist attacks?

In [None]:
# Chart - 1
# Rename 'country_txt' to 'country' for better readability
terrorism.rename(columns={'country_txt': 'country'}, inplace=True)

# Check for duplicate columns
if terrorism.columns.duplicated().any():
  # Print duplicate column names
  print(f"Duplicate columns: {terrorism.columns[terrorism.columns.duplicated()]}")
  # Drop duplicate 'country' columns, keeping the first one
  terrorism = terrorism.loc[:, ~terrorism.columns.duplicated()]
else:
  # Check if 'country' is part of a MultiIndex
  if isinstance(terrorism.columns, pd.MultiIndex):
    # If it's a MultiIndex, specify the level in value_counts
    country_attacks = terrorism['country'].value_counts(level=0) # Assuming 'country' is at level 0
  else:
    # Proceed as before
    country_attacks = terrorism['country'].value_counts()


# visualization code
# 2. Find the country with the highest number of attacks
most_attacked_country = country_attacks.idxmax()
highest_attacks = country_attacks.max()

print(f"The country with the highest number of attacks is: {most_attacked_country} with {highest_attacks} attacks.")

# 3. Visualize the top 10 countries with the most attacks
top_10_countries = country_attacks.head(10)

# Plotting the data
plt.figure(figsize=(12, 8))
top_10_countries.plot(kind='bar', color='coral')

# Add titles and labels
plt.title('Top 10 Countries with the Highest Number of Terrorist Attacks', fontsize=16)
plt.xlabel('country', fontsize=14)
plt.ylabel('Number of Attacks', fontsize=14)

# Rotate x-axis labels for better readability
plt.xticks(rotation=45, ha='right', fontsize=12)

# Add gridlines for better visualization
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Show plot
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

**1 -** Clearly compare different countries.

**2 -** Visually emphasize the top 10 countries.

**3 -** Provide easy interpretation of categorical data.

##### 2. What is/are the insight(s) found from the chart?

**Insights from the Chart (Top 10 Countries with Most Terrorist Attacks):**

**1 - Concentration of Attacks:** A small number of countries dominate the global landscape in terms of terrorist attacks, with one country having significantly more attacks than others.

**2 -Country with Highest Attacks:** The country with the highest number of attacks stands out clearly, indicating it is the most affected by terrorism in the dataset.

**3 -Disparity in Attack Numbers:** There is a noticeable drop in the number of attacks between the top-ranking country and the others, showing an uneven distribution of terrorist activities globally.

**4 - Regional Hotspots:** The countries in the top 10 are typically in regions of political instability or conflict, pointing to potential regional hotspots for terrorism.

**5 - Policy Implication:** Countries with the highest attack counts may need targeted counterterrorism policies and international assistance.

**6 - Patterns in Geography:** The chart suggests that certain geographic areas experience more frequent attacks, potentially due to sociopolitical factors unique to those regions.

**7 - Global Distribution:** While the attacks are global, the chart shows that a few countries disproportionately experience the majority of terrorist incidents.

**8 - Ranking Clarity:** The visual ranking helps quickly identify the most affected countries, aiding in prioritizing regions for counterterrorism measures.

**9 - Comparative Analysis:** The chart allows for a simple comparison between countries, making it easy to see which nations are significantly more impacted by terrorism.

**10 - Focus Areas for Research:** The top 10 countries may warrant further study to understand the underlying causes and drivers of terrorism in these regions.

#### Chart - 2 Which attack type is most frequently used in terrorist incidents?

In [None]:
# Chart - 2 visualization code
# Count attack types
attack_types = terrorism['attacktype1_txt'].value_counts()

# Visualization using Donut Chart
plt.figure(figsize=(8, 8))
plt.pie(attack_types.head(10), labels=attack_types.head(10).index, autopct='%1.1f%%', colors=plt.cm.Set3.colors, wedgeprops=dict(width=0.4))
plt.title('Top 10 Attack Types (Donut Chart)', fontsize=16)
plt.show()




##### 1. Why did you pick the specific chart?

I chose the donut chart because it visually emphasizes the proportional distribution of the top 10 attack types, making it easy to compare their relative frequencies. Its compact, circular design adds clarity by focusing on the most frequent categories. The hollow center also enhances visual appeal and reduces clutter compared to a regular pie chart.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals that Bombing/Explosion is the most common attack type, accounting for a significant portion of incidents. Armed Assault and Assassination are also prevalent, but to a lesser extent. These top three attack types dominate the dataset, highlighting the preferred methods used by terrorists.

#### Chart - 3  Which terrorist group is responsible for the most attacks?

In [None]:
# Chart - 3 visualization code
# Count attacks per terrorist group
terrorist_groups = terrorism['gname'].value_counts().drop('Unknown')

# Visualization
# Visualization using Horizontal Bar Chart
plt.figure(figsize=(10, 6))
terrorist_groups.head(10).plot(kind='barh', color='teal')
plt.title('Top 10 Terrorist Groups by Number of Attacks (Horizontal Bar Chart)', fontsize=16)
plt.xlabel('Number of Attacks', fontsize=12)
plt.ylabel('Terrorist Group', fontsize=12)
plt.show()



##### 1. Why did you pick the specific chart?

I chose the horizontal bar chart because it effectively displays categorical data with longer labels, such as terrorist group names, in a clear and readable format. It allows for easy comparison of the top 10 groups by the number of attacks, making it straightforward to see which groups are most active. The horizontal layout also helps accommodate longer text labels and provides a better visual hierarchy for categorical comparisons.

##### 2. What is/are the insight(s) found from the chart?

**1 - Most Active Groups:** The chart highlights the top 10 terrorist groups with the highest number of attacks, showing which groups are the most active.

**2 -Prolific Groups:** A small number of groups are responsible for a significant portion of the total attacks, indicating a concentration of terrorist activity among these organizations.

**3 - Prioritization for Security:** Identifying these top groups helps in prioritizing counterterrorism efforts and resource allocation to address the most influential and dangerous groups.

#### Chart - 4 What are the most common target types for terrorist attacks?

In [None]:
# Chart - 4 visualization code
# Count target types
target_types = terrorism['targtype1_txt'].value_counts()

# Visualization
plt.figure(figsize=(10, 6))
target_types.plot(kind='bar', color='orange')
plt.title('Frequency of Target Types', fontsize=16)
plt.xlabel('Target Type', fontsize=12)
plt.ylabel('Number of Attacks', fontsize=12)
plt.show()


##### 1. Why did you pick the specific chart?

I chose the bar chart because it clearly displays the frequency of each target type, making it easy to compare the number of attacks across different categories. The vertical bars effectively highlight the distribution of attacks and identify the most and least common target types, providing a straightforward visual comparison.

##### 2. What is/are the insight(s) found from the chart?

The chart shows which target types are most frequently attacked, revealing that some types are significantly more common than others. This information helps in understanding attack patterns and focusing security efforts on the most targeted types.

#### Chart - 5 How does the frequency of attacks vary by region over time?

In [None]:
# Chart - 5 visualization code
# Group by year and region
attacks_by_region_year = terrorism.groupby(['year', 'region_txt']).size().unstack(fill_value=0)

# Visualization using Heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(attacks_by_region_year, cmap='YlGnBu', annot=False, linewidths=0.5)
plt.title('Frequency of Attacks by Region Over Time', fontsize=16)
plt.xlabel('Region', fontsize=12)
plt.ylabel('Year', fontsize=12)
plt.show()


##### 1. Why did you pick the specific chart?

I chose the heatmap because it effectively visualizes the frequency of attacks across different regions over time. The heatmap uses color gradients to show variations in attack frequency, making it easy to identify trends and patterns at a glance. This chart is particularly useful for comparing multiple regions and observing changes over years.

##### 2. What is/are the insight(s) found from the chart?

The heatmap reveals how attack frequencies vary by region over the years. It highlights regions with consistently high or low attack rates and identifies periods of increased or decreased terrorist activity. This helps in understanding regional trends and prioritizing security measures based on historical attack patterns.

#### Chart - 6 What are the trends in the number of attacks by different terrorist groups over the past decade?

In [None]:
# Filter data for the past decade
recent_data = terrorism[terrorism['year'] >= 2014]

# Group by year and terrorist group, then sum up the attacks per group
attacks_by_group_year = recent_data.groupby(['gname', 'year']).size().unstack(fill_value=0)

# Sum attacks by each group over the past decade
total_attacks_by_group = attacks_by_group_year.sum()

# Sort groups by total attacks in descending order
total_attacks_by_group = total_attacks_by_group.sort_values(ascending=False)

# Visualization using Horizontal Bar Chart
plt.figure(figsize=(12, 8))
total_attacks_by_group.plot(kind='barh', color='skyblue')
plt.title('Total Number of Attacks by Terrorist Groups (Past Decade)', fontsize=16)
plt.xlabel('Total Number of Attacks', fontsize=12)
plt.ylabel('Terrorist Group', fontsize=12)
plt.show()

##### 1. Why did you pick the specific chart?

I chose the horizontal bar chart because it provides a clear and straightforward comparison of the total number of attacks by different terrorist groups over the past decade. The horizontal layout accommodates longer group names and makes it easy to rank and identify the most active groups.

##### 2. What is/are the insight(s) found from the chart?

The chart highlights which terrorist groups were most active over the past decade, showing their total number of attacks. It reveals which groups are the primary contributors to terrorist activities and helps prioritize counterterrorism efforts based on the volume of attacks.

#### Chart - 7 Group by month and count the number of attacks

In [None]:
# Chart - 7 visualization code
# Group by month and count the number of attacks
attacks_by_month = terrorism['month'].value_counts().sort_index()

# Visualization using Bar Chart
plt.figure(figsize=(12, 6))
attacks_by_month.plot(kind='bar', color='darkorange')
plt.title('Number of Terrorist Attacks by Month', fontsize=16)
plt.xlabel('Month', fontsize=12)
plt.ylabel('Number of Attacks', fontsize=12)
plt.xticks(ticks=range(1, 13), labels=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'], rotation=45)
plt.show()



##### 1. Why did you pick the specific chart?

The bar chart is chosen for visualizing the number of terrorist attacks by month because it clearly displays the distribution of attacks across different months in a straightforward and easily interpretable manner. The bar chart format allows for a quick comparison of attack frequencies by month, highlighting any seasonal patterns or anomalies in the data.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals the distribution of terrorist attacks across different months.

**1 - Seasonal Trends:** Certain months may show higher or lower frequencies of attacks, indicating potential seasonal patterns.

**2 - Peak Months:** Specific months may have significantly higher numbers of attacks, which could be linked to geopolitical events or seasonal factors.

#### Chart - 8 different approach to visualizing the monthly terrorist attacks trend over the years.

In [None]:
# Chart - 8 visualization code
# Reset index to prepare for FacetGrid
monthly_attacks = monthly_attacks.reset_index()
monthly_attacks_melted = monthly_attacks.melt(id_vars=['year'], var_name='month', value_name='number_of_attacks')

# Map month numbers to names
monthly_attacks_melted['month'] = monthly_attacks_melted['month'].map({
    1: 'Jan', 2: 'Feb', 3: 'Mar', 4: 'Apr', 5: 'May', 6: 'Jun',
    7: 'Jul', 8: 'Aug', 9: 'Sep', 10: 'Oct', 11: 'Nov', 12: 'Dec'
})

# Visualization using Facet Grid Line Plot
g = sns.FacetGrid(monthly_attacks_melted, col='month', col_wrap=3, height=4, aspect=1.5, sharey=False)
g.map(sns.lineplot, 'year', 'number_of_attacks', marker='o')

# Adjust titles and labels
g.set_titles("{col_name}")
g.set_axis_labels('Year', 'Number of Attacks')
g.fig.suptitle('Monthly Terrorist Attacks Trend Over the Years', fontsize=16)
plt.subplots_adjust(top=0.9)  # Adjust title position
plt.show()



##### 1. Why did you pick the specific chart?

The Facet Grid Line Plot is chosen for its ability to display trends for each month separately. This visualization is particularly useful for:

**1 - Detailed Analysis:** It allows a clear view of how terrorist attacks vary month-by-month over multiple years, facilitating comparisons between months.

**2 - Trend Identification:** The line plot format helps in observing trends and fluctuations in attack frequencies over time for each month, making it easier to identify patterns and anomalies.

##### 2. What is/are the insight(s) found from the chart?

**1 - Monthly Trends:** The chart reveals whether certain months consistently experience higher or lower numbers of attacks. For instance, it might show that some months consistently have more attacks, indicating possible seasonal patterns or specific time-related factors influencing attack frequency.

**2 - Yearly Fluctuations:** By displaying each month separately, it helps in understanding how the frequency of attacks changes year-over-year within each month. This can highlight any significant increases or decreases in attacks during specific months over the years.


#### Chart - 9 Number of Terrorist Attacks by Region in 2016

In [None]:
# Chart - 9 visualization code

data_2016 = terrorism[terrorism['year'] == 2016]

# Count the number of attacks per region
attacks_by_region_2016 = data_2016['region'].value_counts()

# Check if attacks_by_region_2016 is empty
if attacks_by_region_2016.empty:
    print("No terrorist attacks found for the year 2016.")
else:
    # Visualization using Bar Chart
    plt.figure(figsize=(12, 8))
    attacks_by_region_2016.plot(kind='bar', color='purple')
    plt.title('Number of Terrorist Attacks by Region in 2016', fontsize=16)
    plt.xlabel('Region', fontsize=12)
    plt.ylabel('Number of Attacks', fontsize=12)
    plt.xticks(rotation=45)
    plt.show()


##### 1. Why did you pick the specific chart?

The bar chart is chosen to visualize the number of terrorist attacks by region for the year 2016 because:

**Clarity:** Bar charts are effective for comparing categorical data across different regions, making it easy to see which regions experienced the highest or lowest number of attacks.

**Direct Comparison:** It provides a straightforward way to compare attack frequencies between regions, helping to identify regional hotspots or areas with relatively fewer incidents.

##### 2. What is/are the insight(s) found from the chart?

**Regional Distribution:** The chart highlights which regions experienced the highest and lowest numbers of terrorist attacks in 2016, revealing geographical patterns in terrorist activity for that year.

**Focus Areas:** It helps identify regions that may require more focused counter-terrorism efforts or resources based on the frequency of attacks, potentially guiding policy and security measures for the most affected regions.

#### Chart - 10 How does the frequency of terrorist attacks vary by region and attack type?

In [None]:
# Chart - 10 visualization code
# Group by region and attack type, then count the number of attacks
attacks_by_region_type = terrorism.groupby(['region', 'attacktype1_txt']).size().unstack(fill_value=0)

# Visualization using Stacked Bar Chart
plt.figure(figsize=(14, 8))
attacks_by_region_type.plot(kind='bar', stacked=True, colormap='tab20', figsize=(12, 8))
plt.title('Frequency of Terrorist Attacks by Region and Attack Type', fontsize=16)
plt.xlabel('Region', fontsize=12)
plt.ylabel('Number of Attacks', fontsize=12)
plt.xticks(rotation=45)
plt.legend(title='Attack Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

The stacked bar chart was chosen for its ability to show both the total number of attacks and the composition of different attack types within each region. This type of chart is effective for:

**Comparative Analysis:** It allows for comparison of total attack frequencies across regions while simultaneously breaking down the data into attack types.

**Detail and Overview:** It provides a detailed view of how each attack type contributes to the overall number of attacks in each region, facilitating insights into regional attack patterns and preferences.

##### 2. What is/are the insight(s) found from the chart?

**Regional Attack Profiles:** The chart reveals which regions experience the highest number of attacks and the distribution of attack types within those regions. This helps in identifying the dominant attack methods used in different regions.

**Attack Type Prevalence:** By showing the stacked segments, it provides insight into the relative prevalence of each attack type within regions, indicating whether certain types of attacks are more common in specific areas.

#### Chart - 14 - Correlation Heatmap ( How do the frequencies of different attack types correlate with the number of attacks in various regions? )

In [None]:
# Correlation Heatmap visualization code
# Create dummy variables for attack types and regions
attack_dummies = pd.get_dummies(terrorism['attacktype1_txt'])
region_dummies = pd.get_dummies(terrorism['region'])

# Concatenate the dummy variables with the original dataframe
data_with_dummies = pd.concat([attack_dummies, region_dummies], axis=1)

# Compute the correlation matrix
correlation_matrix = data_with_dummies.corr()

# Focus on attack types and regions
# Ensure correct column and index filtering
attack_region_correlation = correlation_matrix[attack_dummies.columns].loc[region_dummies.columns]

# Check if the DataFrame is empty
if attack_region_correlation.empty:
    print("Error: attack_region_correlation is empty. Check your filtering logic.")
else:
    # Visualization using Heatmap
    plt.figure(figsize=(14, 10))
    sns.heatmap(attack_region_correlation, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
    plt.title('Correlation Between Attack Types and Regions', fontsize=16)
    plt.xlabel('Regions', fontsize=12)
    plt.ylabel('Attack Types', fontsize=12)
    plt.show()

##### 1. Why did you pick the specific chart?

The Correlation Heatmap was chosen because it effectively visualizes the strength and direction of relationships between multiple variables. In this case:

**Insight into Relationships:** It provides a clear visual representation of how attack types correlate with regions, helping to identify which types of attacks are more prevalent in specific regions.

**Pattern Detection:** It allows for easy detection of patterns and associations that might not be immediately obvious from raw data alone, making it easier to understand complex relationships.

##### 2. What is/are the insight(s) found from the chart?

**Regional Attack Patterns:** The heatmap highlights how certain attack types are associated with specific regions. For example, it might reveal that specific regions experience particular types of attacks more frequently than others.

**Correlation Strength:** It shows the strength of correlations between different attack types and regions, indicating whether some attack types are strongly associated with certain regions or if their presence is more random across different areas.

#### Chart - 12 - Pair Plot ( What are the relationships between the month of attack, region, and extended duration of terrorist events? )

In [None]:
# Pair Plot visualization code
# Select relevant columns for the analysis
data_for_pairplot = terrorism[['imonth', 'region', 'extended']]

# Rename columns for better readability in the plot
data_for_pairplot.columns = ['Month', 'Region', 'Extended']

# Visualize using Pair Plot
sns.pairplot(data_for_pairplot, diag_kind='kde', plot_kws={'alpha':0.5, 's':40})
plt.suptitle('Pair Plot of Month, Region, and Extended Duration of Terrorist Events', y=1.02, fontsize=16)
plt.show()



##### 1. Why did you pick the specific chart?

The Pair Plot was chosen to visualize potential relationships between multiple variables (month of attack, region, and event duration). It allows for an easy-to-understand comparison of how these factors interact, while also showing their individual distributions

##### 2. What is/are the insight(s) found from the chart?

The Pair Plot reveals any trends or patterns, such as whether certain regions experience more extended attacks during specific months. It also helps identify correlations between variables, like regions prone to prolonged events or months with higher attack occurrences.

# **Conclusion**

**Conclusion:**

The analysis revealed that terrorism is concentrated in specific regions and tends to peak at certain times. These findings can help guide counter-terrorism strategies, focusing efforts where they are most needed. Visualizations helped uncover key trends, offering insights into the nature and spread of terrorist attacks worldwide.

### ***Hurrah! You have successfully completed your Data Visualization Capstone Project !!!***