<a href="https://colab.research.google.com/github/MdAfasar27/Global-Terrorism-Dataset-Exploratory-Data-Analysis/blob/main/Global_Terrorism_Dataset_EDA_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - **Global Terrorism Dataset**



##### **Project Type**    - Exploratory Data Analysis
##### **Contribution**    - Individual
##### **Team Member 1 -     Md Afasar**


# **Project Summary -**

- **Project Overview**:
  - **Database**: The Global Terrorism Database (GTD) is an open-source resource providing comprehensive information on terrorist attacks worldwide from 1970 to 2017.
  - **Scope**: The dataset includes over 180,000 recorded attacks, offering an opportunity to explore and analyze patterns, trends, and characteristics of terrorist activities.
  - **Objective**: Uncover key findings and insights to inform stakeholders and enhance the understanding of terrorism.

- **Methodology**:
  - **Data Manipulation**: Use Pandas for data manipulation and aggregation to extract relevant information and create meaningful datasets.
  - **Visualization Development**: Employ Matplotlib and Seaborn to create at least five visualizations illustrating the behavior of the target variable and relationships between different factors.

- **Focus Areas for Visualization**:
  - **Frequency and Distribution**: Analyze attacks over time and across different geographies.
  - **Attack Types**: Identify the most common types of attacks.
  - **Organizations**: Highlight the terrorist organizations responsible for attacks.
  - **Casualties**: Examine the number of casualties resulting from attacks.

- **Key Assumptions**:
  - **Middle Eastern Countries**: Hypothesize that these countries are more prone to terrorist attacks.
  - **Testing Assumptions**: Use statistical techniques and visualizations to validate this and other assumptions.

- **Additional Factors**:
  - **Organization Type**: Explore the impact of different types of terrorist organizations.
  - **Attack Targets**: Investigate the targets of attacks.
  - **Regional Analysis**: Examine how different regions are affected by terrorism.

- **Stakeholder Benefits**:
  - **Policymakers**: Develop more effective strategies to prevent and respond to terrorist attacks.
  - **Law Enforcement Agencies**: Enhance response plans based on identified trends and patterns.
  - **Researchers**: Gain a comprehensive overview of terrorism for future studies.

- **Analysis Review**:
  - **Rigorous Review**: Regularly review and refine assumptions and findings to ensure accuracy and rigor in the analysis.

- **Final Report**:
  - **Key Findings**: Present insights and recommendations.
  - **Visualizations**: Provide clear and informative visualizations.
  - **Recommendations**: Offer strategies for stakeholders to understand and combat terrorism.

- **Conclusion**:
  - **Contribution**: Enhance understanding of terrorism through comprehensive analysis of the GTD.
  - **Techniques**: Utilize data manipulation, visualization, and statistical methods.
  - **Impact**: Inform stakeholders to help develop strategies to prevent and respond to terrorist attacks.


# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


 The Global Terrorism Database (GTD) is an open-source database including information on terrorist attacks around the world from 1970 through 2017. The GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 180,000 attacks. The database is maintained by researchers at the National Consortium for the Study of Terrorism and Responses to Terrorism (START), headquartered at the University of Maryland. Explore and analyze the data to discover key findings pertaining to terrorist activities.

#### **Define Your Business Objective?**

The primary business objective of this project is to leverage the Global Terrorism Database (GTD) to gain actionable insights into the patterns, trends, and characteristics of terrorist activities worldwide. These insights aim to inform and support stakeholders, including policymakers, law enforcement agencies, and researchers, in their efforts to understand, prevent, and respond to terrorism.
Specific Goals:
1.	Identify Trends and Patterns:
o	Analyze the frequency and distribution of terrorist attacks over time and across different geographic regions.
o	Determine the most common types of terrorist attacks and the organizations responsible.
2.	Validate Assumptions:
o	Test the hypothesis that Middle Eastern countries are more prone to terrorist attacks using statistical techniques and visualizations.
o	Explore additional factors that may influence the likelihood and impact of terrorist attacks, such as the type of organization, target, and region.
3.	Develop Visualizations:
o	Create at least five comprehensive visualizations to illustrate key findings, making complex data more accessible and understandable for stakeholders.
4.	Inform Stakeholder Strategies:
o	Provide detailed insights and recommendations that stakeholders can use to develop more effective counter-terrorism strategies.
o	Enhance the understanding of terrorism patterns to improve prevention, preparedness, and response measures.
5.	Highlight Data Strengths and Limitations:
o	Offer a thorough overview of the GTD, including its strengths and limitations, and suggest areas for future research to improve data quality and analysis


# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd    # Pandas for data manipulation, aggregation
import numpy as np     # NumPy for computationally efficient operations
import matplotlib.pyplot as plt      #Matplotlib and Seaborn for visualisation and behaviour with respect to the target variable
import seaborn as sns
from wordcloud import WordCloud
# Setting display options
pd.set_option('display.max_columns', None)


### Dataset Loading

In [None]:
# Load Dataset
data_path = '/content/sample_data/Global Terrorism Data.csv'
df = pd.read_csv(data_path, encoding='ISO-8859-1')



### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
#Select and display information about columns in the DataFrame 'df'
#that have data types either 'object' (typically strings) or int64' (integer numbers).
# This code filters the DataFrame to include only these specific data types,
# and then uses the info() method to provide a summary of the selected columns.

df.select_dtypes(include=["object",'int64']).info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_counts= df.duplicated().sum()

# Displaye Dataset Duplicate Value Count
print("Number of Duplicate Value Count :",duplicate_counts)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values

plt.figure(figsize=(12, 8))  #setting the figure size for visualization

sns.heatmap(df.isnull(), cbar=False, cmap='viridis')  # creating heatmap to visualise missing values in the Dataframe

plt.title('Missing Values Heatmap')  #title of visualization

plt.show()   # display the visualization

### What did you know about your dataset?

The dataset has 34475 rows , 135 columns

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

 here's the description of the variables in the Global Terrorism Database (GTD) dataset:

1. **iyear**: The year in which the incident occurred.
2. **imonth**: The month in which the incident occurred.
3. **iday**: The day of the month on which the incident occurred.
4. **country_txt**: The name of the country where the incident occurred.
5. **region_txt**: The name of the region where the incident occurred.
6. **city**: The name of the city or location where the incident occurred.
7. **attacktype1_txt**: The general method of attack used in the incident (e.g., Bombing/Explosion, Armed Assault, Assassination).
8. **targtype1_txt**: The general type of target/victim in the incident (e.g., Private Citizens & Property, Military, Government).
9. **weaptype1_txt**: The general type of weapon used in the incident (e.g., Explosives/Bombs/Dynamite, Firearms, Incendiary).
10. **nkill**: The number of confirmed fatalities in the incident.
11. **nwound**: The number of confirmed non-fatal injuries in the incident.
12. **natlty1_txt**: The nationality of the target/victim in the incident.
13. **gname**: The name of the terrorist group responsible for the incident.
14. **claimed**: Indicates whether the terrorist group claimed responsibility for the incident (1: Yes, 0: No).
15. **success**: Indicates whether the attack was successful (1: Yes, 0: No).
16. **casualties**: The total number of casualties (fatalities + non-fatal injuries) in the incident.

These variables provide key information about each terrorist incident, including details about the location, method of attack, targets, casualties, and perpetrators. Analyzing these variables can help in understanding patterns, trends, and impacts of terrorism globally.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for column in df.columns:
    print(f'NUmber of unique value in {column}: {df[column].nunique()} unique values')

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

# Creating a new column for total casualties
df['casualties'] = df['nkill'] + df['nwound']

# Dropping irrelevant columns for this analysis
columns_to_drop = ['eventid', 'provstate', 'latitude', 'longitude', 'specificity', 'doubtterr', 'alternative']
df = df.drop(columns=columns_to_drop, axis=1)

### What all manipulations have you done and insights you found?

1 Data Cleaning:

Handling missing values: Check for missing values in remaining columns and decide on appropriate strategies like imputation or removal.
Data type conversions: Ensure columns have appropriate data types. For example, date columns should be in datetime format.
Removing duplicates: Check for and remove any duplicate rows in the dataset.

2 Exploratory Data Analysis (EDA):
Distribution of target variable (casualties): Explore the distribution of total casualties to understand its range and variability.
Temporal trends: Analyze the number of incidents over time (year, month, day) to identify any patterns or trends.
Geographic analysis: Explore the distribution of incidents across different regions or countries.
Attack types: Investigate the frequency and distribution of different types of attacks (e.g., bombings, shootings).

3 Feature Engineering:

Extracting additional features from existing ones: For example, creating new features from the date column such as day of the week or season.
Encoding categorical variables: Convert categorical variables into numerical representations using techniques like one-hot encoding or label encoding.
Insights and Conclusions:

Identify hotspots: Determine regions or countries with the highest number of incidents or casualties.


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
# A countplot is a type of categorical plot that shows the count of observations in each category of a categorical variable
plt.figure(figsize=(12, 6))
sns.countplot(x='iyear', data=df, palette='viridis')
plt.title('Number of Attacks per Year')
plt.xlabel('Year')
plt.ylabel('Number of Attacks')
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

Count plot, which is suitable for visualizing the distribution of categorical variables. In this case, it allows for the examination of the frequency of terrorist attacks over different years.



##### 2. What is/are the insight(s) found from the chart?

Insights from the chart:

The count plot displays the number of terrorist attacks that occurred in each year, providing insight into the trend and frequency of attacks over time.

It helps identify any significant increases or decreases in the number of attacks in specific years, highlighting periods of heightened or reduced terrorist activity.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.





*    The insights gained from the chart can inform strategic planning and resource allocation for businesses operating in regions prone to terrorist attacks.
*   Understanding the trend of attacks over time can help businesses anticipate and prepare for potential security risks, enabling them to implement appropriate security measures and crisis management plans.



#### Chart - 2

In [None]:
# Chart - 2 visualization code
plt.figure(figsize=(12, 6))
sns.countplot(y='attacktype1_txt', data=df, palette='viridis', order=df['attacktype1_txt'].value_counts().index)
plt.title('Distribution of Attack Types')
plt.xlabel('Count')
plt.ylabel('Attack Type')
plt.show()

##### 1. Why did you pick the specific chart?

**A horizontal countplot is suitable for showing the distribution of attack types, making it easier to read long category names**



##### 2. What is/are the insight(s) found from the chart?

* The count plot displays the number of occurrences of each type of terrorist attack, providing insight into the prevalence and distribution of various attack types.
* It helps identify the most common types of terrorist attacks, such as bombings, armed assaults, or hostage-taking, which can inform risk assessment and security planning for businesses.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Positive impact: Insights inform security measures, crisis plans, and resource allocation, safeguarding stakeholders and assets.

* Negative impact: High prevalence of destructive attacks raises safety concerns, leads to increased costs, disruptions, and reputational damage, urging proactive security measures.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
plt.figure(figsize=(15, 10))
attacks_by_country_year = df.groupby(['iyear', 'country_txt']).size().unstack().fillna(0)
sns.heatmap(attacks_by_country_year, cmap='viridis') #A heatmap is a graphical representation of data where the individual values contained in a matrix are represented as colors.
plt.title('Heatmap of Attacks by Country and Year')
plt.xlabel('Country')
plt.ylabel('Year')
plt.show()


##### 1. Why did you pick the specific chart?

* A heatmap is useful for visualizing the intensity of attacks over time across different countries.

##### 2. What is/are the insight(s) found from the chart?

* The heatmap displays the intensity of terrorist attacks by country and year, with darker colors indicating higher frequencies of attacks.

* It helps identify regions and time periods with elevated levels of terrorist activity, highlighting areas of heightened risk and vulnerability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Positive impact: Insights guide strategic decisions, resource allocation, and security measures for risk mitigation.

* Negative impact: Concentrated attacks in specific regions raise safety concerns, lead to increased costs, disruptions, and reputational damage, urging proactive risk management strategies.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
plt.figure(figsize=(12, 6))
sns.boxplot(x='attacktype1_txt', y='casualties', data=df, palette='viridis')
plt.title('Casualties by Attack Type')
plt.xlabel('Attack Type')
plt.ylabel('Casualties')
plt.xticks(rotation=90)
plt.show()


##### 1. Why did you pick the specific chart?

**A boxplot is effective for showing the distribution and outliers in casualties for different attack types.**

##### 2. What is/are the insight(s) found from the chart?

* The box plot displays the median, quartiles, and potential outliers of casualties for each attack type, providing insight into the central tendency and variability of casualties within each category.

* It helps identify variations in casualty levels across different types of terrorist attacks, such as whether certain attack types tend to result in higher or lower casualty

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Positive impact: Insights guide decision-making, resource allocation, and security measures, enhancing preparedness and response to mitigate risks.

* Negative impact: High casualties from certain attack types raise security concerns, lead to increased risks, disruptions, and reputational damage, urging proactive risk mitigation strategies.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(12, 8))
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(' '.join(df['gname'].dropna()))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Word Cloud of Active Organizations')
plt.show()


##### 1. Why did you pick the specific chart?

* The specific chart chosen is a word cloud, which is suitable for visualizing the frequency of words or terms in a text corpus. In this case, it allows for the exploration of the most active terrorist organizations based on the frequency of their occurrences in the dataset.



##### 2. What is/are the insight(s) found from the chart?

* The word cloud visually represents the relative prevalence of different terrorist organizations based on the size of their names in the cloud.
It helps identify the most active and prominent terrorist organizations, as larger words indicate higher frequencies of occurrence in the dataset.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 * Identifying the most active terrorist organizations can help in targeting counter-terrorism efforts and intelligence operations.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
plt.figure(figsize=(12, 6))
sns.countplot(y='targtype1_txt', data=df, palette='viridis', order=df['targtype1_txt'].value_counts().index)
plt.title('Distribution of Target Types')
plt.xlabel('Count')
plt.ylabel('Target Type')
plt.show()

##### 1. Why did you pick the specific chart?

* The specific chart chosen is a horizontal count plot, which is suitable for visualizing the distribution of a categorical variable (target type) by frequency. This chart allows for easy comparison of the occurrence of different target types.



##### 2. What is/are the insight(s) found from the chart?

* The count plot displays the number of occurrences of each target type, providing insight into the prevalence of targeting specific entities or objectives in terrorist attacks.
It helps identify which types of targets are most frequently attacked, such as civilians, military personnel, government officials, or private property.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Positive impact: Insights guide security measures, risk assessments, and crisis plans, enhancing preparedness and safeguarding operations.

* Negative impact: Targeting critical infrastructure or specific sectors raises security concerns, leads to increased risks, disruptions, and economic impacts, urging proactive risk mitigation strategies.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
plt.figure(figsize=(12, 6))
sns.countplot(y='weaptype1_txt', data=df, palette='viridis', order=df['weaptype1_txt'].value_counts().index)
plt.title('Distribution of Weapon Types')
plt.xlabel('Count')
plt.ylabel('Weapon Type')
plt.show()

##### 1. Why did you pick the specific chart?

**A horizontal countplot shows the distribution of weapon types used in attacks, making it easier to read long category names.**

##### 2. What is/are the insight(s) found from the chart?

 **The chart indicates that 'Explosives/Bombs/Dynamite' are the most commonly used weapons in terrorist attacks.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Identifying the most frequently used weapons can assist in developing targeted countermeasures and detection methods.**

#### Chart - 8

In [None]:
# Chart - 8 visualization code
plt.figure(figsize=(12, 6))
sns.boxplot(x='region_txt', y='casualties', data=df, palette='viridis')
plt.title('Casualties by Region')
plt.xlabel('Region')
plt.ylabel('Casualties')
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

* The specific chart chosen is a boxplot, which is suitable for visualizing the distribution of a continuous variable (casualties) across different categories (regions). In this case, it allows for easy comparison of casualty levels across various regions.

##### 2. What is/are the insight(s) found from the chart?

* The boxplot reveals the central tendency, spread, and presence of outliers in casualties across different regions.
* It helps identify regions with higher or lower casualty rates, providing insight into the severity and impact of terrorist attacks in those areas.
* Any significant differences in casualty levels between regions can be easily discerned.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Positive business impact:
The insights gained from the chart can aid policymakers and security agencies in allocating resources and implementing measures to mitigate the impact of terrorism in regions with higher casualty rates.
Understanding the variation in casualty levels across different regions can inform the development of targeted intervention strategies to enhance security and minimize harm.

* Negative growth:
While the insights themselves may not directly lead to negative growth, regions with consistently high casualty rates may indicate areas of heightened instability and insecurity. This could potentially deter investment, tourism, and economic development in those regions, leading to negative socio-economic consequences.


#### Chart - 9

In [None]:
# Chart - 9 visualization code
plt.figure(figsize=(12, 6))
sns.lineplot(x='iyear', y='casualties', data=df, ci=None, palette='viridis')
plt.title('Trend of Casualties over the Years')
plt.xlabel('Year')
plt.ylabel('Casualties')
plt.show()

##### 1. Why did you pick the specific chart?

* The specific chart chosen is a line plot, which is suitable for visualizing trends and changes in a continuous variable (casualties) over time (years). In this case, it allows for the examination of how casualties from terrorist attacks have evolved over the years.

##### 2. What is/are the insight(s) found from the chart?

*  Insights from the chart:
   - The line plot reveals the overall trend in casualties from terrorist attacks over the years, showing whether casualty levels have increased, decreased, or remained stable.
   - It helps identify periods of significant change or spikes in casualty rates, which may coincide with particular events, conflicts, or policy interventions.
   - Any long-term patterns or trends in casualty rates can be observed, providing insight into the effectiveness of counter-terrorism efforts over time.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Positive business impact:
The insights gained from the chart can inform policymakers, security agencies, and businesses about the changing nature and magnitude of the terrorist threat over time.
By understanding historical trends in casualties, stakeholders can better anticipate future risks and tailor their strategies and responses accordingly, potentially reducing the impact of terrorist attacks on businesses and communities.
* Negative growth:
If the analysis reveals a consistent upward trend in casualties over the years, it may indicate a worsening security situation and increased risk of terrorist attacks. This could lead to negative consequences for businesses, such as heightened security costs, decreased consumer confidence, and disruptions to operation.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
plt.figure(figsize=(12, 6))
sns.countplot(y='natlty1_txt', data=df, palette='viridis', order=df['natlty1_txt'].value_counts().index)
plt.title('Distribution of Victims Nationalities')
plt.xlabel('Count')
plt.ylabel('Nationality')
plt.show()

##### 1. Why did you pick the specific chart?

* The specific chart chosen is a count plot, which is suitable for visualizing the distribution of categorical variables. In this case, it allows for the exploration of the distribution of victim nationalities, providing insights into the composition of victims across different countries.

##### 2. What is/are the insight(s) found from the chart?


   - The count plot displays the frequency of victims from various nationalities, offering an overview of which nationalities are most affected by terrorist attacks.
   - It helps identify the most common nationalities among victims, highlighting potential patterns or trends in targeting specific groups or regions.
   - Any disparities in victim nationalities can be observed, which may indicate areas of vulnerability or targeting by terrorist organizations.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Positive business impact:
   - The insights gained from the chart can be valuable for businesses operating in regions with high numbers of victims from specific nationalities.
   - Understanding the distribution of victim nationalities can inform risk assessment and crisis management strategies, allowing businesses to better protect employees, customers, and assets.
    

* Negative growth:
   - If the analysis reveals a disproportionately high number of victims from certain nationalities, particularly those associated with key markets or business operations, it may raise concerns about safety and security for employees and customers from those countries.
   - Terrorist attacks targeting specific nationalities can lead to negative perceptions of safety and stability in affected regions, potentially deterring tourism, investment, and business activities.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
plt.figure(figsize=(12, 6))
sns.countplot(x='success', data=df, palette='viridis')
plt.title('Success Rate of Terrorist Attacks')
plt.xlabel('Success (1: Yes, 0: No)')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

 * The choice of a count plot in the presented visualization is apt for assessing the distribution of terrorist attack outcomes categorized as either successful or unsuccessful. This type of plot provides a clear representation of the frequency of each outcome category, facilitating an understanding of the overall success rate of terrorist attacks recorded in the dataset.

##### 2. What is/are the insight(s) found from the chart?


   - The count plot displays the frequency of successful and unsuccessful terrorist attacks, offering an overview of the distribution of success outcomes.
   - It helps identify the proportion of attacks that are successful versus unsuccessful, providing insight into the effectiveness of terrorist tactics and security measures.
   - Any disparities in success rates across different regions or time periods can be observed, which may indicate variations in counter-terrorism capabilities or vulnerabilities.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Positive business impact:
   - The insights gained from the chart can inform risk assessment and security planning for businesses operating in regions prone to terrorist attacks.
   - Understanding the success rate of terrorist attacks can help businesses implement appropriate security measures to mitigate risks and protect employees, customers, and assets.
    

* Negative growth:
   - If the analysis reveals a high success rate of terrorist attacks in certain regions or industries, it may raise concerns about safety and security for businesses operating in those areas.
   - Terrorist attacks with high success rates can lead to disruptions to business operations, damage to infrastructure, and loss of life or property, resulting in negative economic impacts and reputational damage.
    

#### Chart - 12

In [None]:
# Chart - 12 visualization code
import geopandas as gpd
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
df['count'] = 1
attacks_by_country = df.groupby('country_txt')['count'].sum().reset_index()
world = world.merge(attacks_by_country, how='left', left_on='name', right_on='country_txt')
world.plot(column='count', cmap='Reds', figsize=(15, 10), legend=True)
plt.title('Geographical Distribution of Terrorist Attacks')
plt.show()

##### 1. Why did you pick the specific chart?

* The specific chart chosen is a choropleth map, which is suitable for visualizing spatial distributions or patterns across geographic regions. In this case, it allows for the exploration of the geographical distribution of terrorist attacks by country.



##### 2. What is/are the insight(s) found from the chart?

*

The choropleth map displays the intensity of terrorist attacks by country, with darker shades indicating higher frequencies of attacks.
It helps identify regions or countries with elevated levels of terrorist activity, highlighting areas of heightened security risk and vulnerability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Positive impact: Insights guide global operations, supply chain management, and travel risk assessments, enhancing security and safeguarding operations.

* Negative impact: High frequencies of attacks in certain regions raise safety concerns, increase security risks, disrupt operations, and damage reputation, leading to negative growth and economic impacts.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
plt.figure(figsize=(10, 8))
df['gname'].value_counts().head(10).plot(kind='pie', autopct='%1.1f%%', colors=sns.color_palette('Set3', 10))
plt.title('Top 10 Terrorist Groups Responsible')
plt.ylabel('')
plt.show()

##### 1. Why did you pick the specific chart?

* The specific chart chosen is a pie chart, which is suitable for visualizing the proportion of different categories within a whole. In this case, it allows for the exploration of the top 10 terrorist groups responsible for attacks.




##### 2. What is/are the insight(s) found from the chart?

* The pie chart displays the relative proportions of attacks attributed to the top 10 terrorist groups, providing insight into which groups are most active.
It helps identify the most prominent terrorist organizations based on their frequency of attacks.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Positive impact:
Insights from the pie chart inform decision-making for security measures, risk assessments, and crisis management plans. Understanding top terrorist groups helps businesses assess specific threats and implement targeted security measures and response strategies.

* Negative impact:
A high proportion of attacks by notorious groups raises safety and security concerns in affected areas. Businesses in these regions may face increased security risks, operational disruptions, and reputational damage, leading to negative growth and economic impacts.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(12, 8))
numerical_features = df.select_dtypes(include=[np.number])
correlation_matrix = numerical_features.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='viridis')
plt.title('Correlation Heatmap of Numerical Features')
plt.show()

##### 1. Why did you pick the specific chart?

* A correlation heatmap is ideal for visualizing the relationships between numerical features. It provides a clear and comprehensive view of how variables are correlated with each other.


##### 2. What is/are the insight(s) found from the chart?

*  
The heatmap shows the strength and direction of correlations between numerical features, with annotations indicating the correlation coefficients. This helps identify pairs of features that are strongly positively or negatively correlated.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
plt.figure(figsize=(12, 8))
selected_features = ['nkill', 'nwound', 'casualties', 'iyear']
sns.pairplot(df[selected_features])
plt.title('Pair Plot of Selected Numerical Features')
plt.show()


##### 1. Why did you pick the specific chart?

 * The specific chart chosen is a pair plot, which is suitable for visualizing pairwise relationships between multiple numerical variables. In this case, it allows for the exploration of correlations and patterns among the selected numerical features (nkill, nwound, casualties, and iyear).

##### 2. What is/are the insight(s) found from the chart?

*
The pair plot provides a visual overview of the relationships between the selected numerical features, including any linear or non-linear correlations.
It helps identify potential associations or patterns between variables, such as the relationship between the number of killings (nkill) and the number of wounds (nwound), or the trend of casualties over the years (iyear).
Any outliers or unusual patterns in the data can be observed, providing insight into potential anomalies or areas for further investigation.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?


* Based on the analysis, I recommend focusing on regions and periods with high attack frequencies, prioritizing measures to address the most common attack types and target categories. Furthermore, enhancing security measures around high-casualty events and improving emergency response protocols can help in minimizing the impact of terrorist activities.

* To achieve the business objective:

1. **Assess Risks**: Identify high-risk areas and industries prone to terrorist activities.

2. **Implement Security Measures**: Enhance security protocols and crisis management plans.

3. **Train Employees**: Provide training to recognize and respond to security threats.

4. **Utilize Technology**: Invest in advanced technologies for threat detection and communication.

5. **Communicate Effectively**: Develop crisis communication plans and manage reputation.

6. **Monitor and Adapt**: Continuously evaluate and update security measures based on evolving threats.

# **Conclusion**


The analysis of the Global Terrorism Database provided significant insights into the patterns and trends of terrorist activities worldwide. Key findings include the most common attack types, target categories, and weapons used, as well as trends over time and across regions. These insights can inform effective counter-terrorism strategies and resource allocation to mitigate the impact of terrorist activities.


### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***