<a href="https://colab.research.google.com/github/akshayarsul/capstone-project-1/blob/main/Copy_of_EDA_Capstone%20project_GTD.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Global Terrorism Dataset. 



##### **Project Type**    - EDA
##### **Contribution**    - Individual

# **Project Summary -**Exploratory Data Analysis of Global Terrorism Dataset

 

The Global Terrorism Dataset (GTD) is a comprehensive collection of information on global terrorist incidents. In this capstone project, we aim to conduct an Exploratory Data Analysis (EDA) of the GTD to gain insights into the patterns, trends, and characteristics of global terrorism.

Terrorism is a global issue that poses significant threats to international peace and security. Understanding the underlying factors, target locations, and tactics used by terrorist groups is crucial for developing effective counterterrorism strategies. The GTD provides a wealth of data spanning several decades, making it an invaluable resource for studying terrorism.

The primary objective of this capstone project is to analyze and visualize the GTD using various statistical and data visualization techniques. We will explore the dataset to uncover patterns and trends related to terrorist incidents, examine the distribution of attacks across different regions and countries, and investigate the factors that contribute to the severity of attacks.

To achieve our objectives, we will follow a structured approach. First, we will perform data cleaning and preprocessing to ensure the dataset is in a suitable format for analysis. This step includes handling missing values, standardizing variables, and resolving inconsistencies in the data.

Next, we will conduct descriptive analysis to understand the basic characteristics of the GTD. We will calculate summary statistics, such as mean, median, and mode, to gain insights into variables like the number of attacks, fatalities, and injuries. Furthermore, we will examine the temporal distribution of terrorist incidents to identify any significant spikes or trends over time.

One crucial aspect of our analysis will be the geographical exploration of terrorism. We will visualize the distribution of attacks across different regions and countries using maps and heatmaps. This analysis will help us identify high-risk areas and evaluate the effectiveness of counterterrorism efforts in different regions.

Furthermore, we will investigate the factors that contribute to the severity of attacks. We will explore variables such as attack type, target type, and weapons used to assess their impact on the number of casualties. By identifying these factors, we can gain insights into the motivations and strategies of terrorist groups.

Throughout the project, we will employ a variety of data visualization techniques, including bar charts, line plots, scatter plots, and heatmaps, to effectively communicate our findings. These visualizations will enhance our understanding of the data and enable us to present the results in a clear and concise manner.

In conclusion, this capstone project aims to conduct an in-depth exploratory analysis of the Global Terrorism Dataset. By leveraging statistical analysis and data visualization techniques, we seek to uncover patterns, trends, and characteristics of terrorism incidents worldwide. The insights gained from this analysis will contribute to a better understanding of global terrorism and support the development of more targeted counterterrorism strategies.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The Global Terrorism Database (GTD) is an open-source database including information on terrorist attacks around the world from 1970 through 2017. The GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 180,000 attacks. The database is maintained by researchers at the National Consortium for the Study of Terrorism and Responses to Terrorism (START), headquartered at the University of Maryland. Explore and analyze the data to discover key findings pertaining to terrorist activities.

#### **Define Your Business Objective?**

1. Identify Key Trends: Analyze the temporal, spatial, and demographic trends of terrorism incidents worldwide. This includes understanding patterns in the frequency, severity, and location of attacks over time.

2. Uncover Attack Characteristics: Examine the various characteristics of terrorist attacks, such as the attack type, target type, weapon type, and the groups responsible. Identify the most prevalent forms of terrorism and their impact on different target types.

3. Investigate Geographical Hotspots: Identify regions and countries that are most affected by terrorism. Analyze the distribution of attacks across different regions, countries, and specific locations to highlight the areas requiring greater attention and counterterrorism efforts.

4. Study Terrorist Organizations: Explore the activities of different terrorist organizations, their affiliations, and their level of involvement in various attacks. Identify the most active and dangerous terrorist groups globally.

5. Assess Counterterrorism Measures: Evaluate the effectiveness of counterterrorism measures implemented by different countries or regions. Analyze the impact of counterterrorism initiatives and identify potential gaps or areas for improvement.

6. Provide Data-driven Insights: Summarize the findings and insights from the EDA process in a clear and concise manner. Present visualizations, statistics, and trends to facilitate decision-making processes and policy development related to counterterrorism strategies.

By achieving these objectives, the Global Terrorism Dataset EDA Capstone Project aims to contribute to the broader understanding of terrorism and support efforts to combat this global threat more effectively.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Loading the global terrorism dataset
data = pd.read_csv('global terrorism data.csv', encoding='latin-1')
data.head()
 




In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Dataset First View

In [None]:
# Dataset first look
print("Shape of the dataset:", data.shape)



### Dataset Columns count

In [None]:
# Dataset Columns count
data.columns

### Dataset Information

In [None]:
# Dataset Info
print("\nDataset information:")
print(data.info())


#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
import pandas as pd

# Load the dataset
data = pd.read_csv('global terrorism data.csv', encoding='latin-1')

# Count duplicate values
duplicate_count = data.duplicated().sum()

# Display the count
print("Number of duplicate values:", duplicate_count)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_values = data.isnull()
missing_values_count = missing_values.sum(axis=0)
missing_values_df = pd.DataFrame(missing_values_count, columns=['Missing Values Count'])

print(missing_values_df)

In [None]:
# Visualizing the missing values
plt.figure(figsize=(10, 6))
sns.heatmap(data.isnull(), cmap='viridis')
plt.title('Missing Values Heatmap')
plt.show()

### What did you know about your dataset?

1. Importing Libraries: Begin by importing the necessary Python libraries such as pandas, numpy, and matplotlib.

2. Loading the Dataset: Read the dataset into a pandas DataFrame using the appropriate function, such as 'read_csv()'.

3. Data Cleaning: Clean the dataset by handling missing values, removing irrelevant columns, and converting data types if necessary. You may need to consult the dataset documentation or explore the data to understand its structure and decide which cleaning steps are required.

4. Data Visualization: Create visualizations using matplotlib, seaborn, or other visualization libraries to better understand the patterns and relationships within the data. This can include plots such as bar charts, histograms, scatter plots, and heatmaps.

5. Drawing Conclusions: Summarize your findings and draw conclusions based on the patterns, relationships, and trends discovered during the EDA process.


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns 
data.columns

In [None]:
# Dataset Describe
print(data.describe())

### Variables Description 

1. Event ID: A unique identifier for each terrorism event.

2. Year: The year in which the event occurred.

3. Month: The month in which the event occurred.

4. Day: The day of the month on which the event occurred.

5. Country: The country where the event took place.

6. Region: The region (e.g., Africa, Asia, Europe, Middle East) where the event occurred.

7. City: The city or location where the event occurred.

8. Latitude: The latitude coordinate of the location.

9. Longitude: The longitude coordinate of the location.

10. Attack Type: The type of attack (e.g., bombing/explosion, assassination, armed assault).

11. Target Type: The type of target (e.g., business, government, military, police).

12. Weapon Type: The type of weapon(s) used in the attack.

13. Casualties: The number of casualties (i.e., killed and wounded).

14. Group Name: The name of the group responsible for the attack, if known.

15. Motive: The motive or ideology behind the attack.

16. Summary: A brief summary or description of the event.

These variables provide various aspects of the terrorism incidents, allowing you to analyze patterns, trends, and relationships within the dataset. By performing exploratory data analysis, you can gain insights into the frequency, geographical distribution, attack types, and other relevant information related to global terrorism.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for column in data.columns:
    unique_values = data[column].unique()
    print(f'Unique values for {column}:')
    print(unique_values)
    print()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# Select relevant columns for analysis
columns_to_keep = ['iyear', 'imonth', 'iday', 'country', 'region', 'success', 'suicide',
                   'attacktype1', 'attacktype1_txt', 'targtype1', 'targtype1_txt',
                   'gname', 'weaptype1', 'weaptype1_txt']

# Rename columns for clarity
column_names = {'iyear': 'Year', 'imonth': 'Month', 'iday': 'Day', 'country': 'Country',
                'region': 'Region', 'success': 'Success', 'suicide': 'Suicide',
                'attacktype1': 'AttackTypeCode', 'attacktype1_txt': 'AttackType',
                'targtype1': 'TargetTypeCode', 'targtype1_txt': 'TargetType',
                'gname': 'GroupName', 'weaptype1': 'WeaponTypeCode', 'weaptype1_txt': 'WeaponType'}

data.rename(columns=column_names, inplace=True)

# Convert codes to meaningful labels
attack_types = {1: 'Assassination', 2: 'Armed Assault', 3: 'Bombing/Explosion',
                4: 'Hijacking', 5: 'Hostage Taking (Barricade Incident)',
                6: 'Hostage Taking (Kidnapping)', 7: 'Facility/Infrastructure Attack',
                8: 'Unarmed Assault', 9: 'Unknown'}
target_types = {1: 'Business', 2: 'Government (General)', 3: 'Police', 4: 'Military',
                5: 'Abortion Related', 6: 'Airports & Aircraft', 7: 'Government (Diplomatic)',
                8: 'Educational Institution', 9: 'Food or Water Supply', 10: 'Journalists & Media',
                11: 'Maritime', 12: 'NGO', 13: 'Other', 14: 'Private Citizens & Property',
                15: 'Religious Figures/Institutions', 16: 'Telecommunication', 17: 'Terrorists/Non-State Militia',
                18: 'Tourists', 19: 'Transportation', 20: 'Utilities', 21: 'Violent Political Party'}

data['AttackType'] = data['AttackTypeCode'].map(attack_types)
data['TargetType'] = data['TargetTypeCode'].map(target_types)

# Replace missing values with NaN
data.replace(['Unknown', 'Unknown'], np.nan, inplace=True)

# Drop rows with missing values
data.dropna(inplace=True)

# Reset the index
data.reset_index(drop=True, inplace=True)

# Save the cleaned dataset
data.to_csv('cleaned_global_terrorism_dataset.csv', index=False)                















### What all manipulations have you done and insights you found?

Data Manipulations:

1. Data Cleaning: Removing irrelevant columns, handling missing values, converting data types, and standardizing values.

2. Feature Engineering: Creating new features from existing ones, such as calculating attack durations or grouping countries into regions.

3. Aggregation: Summarizing data at different levels of granularity, like aggregating attacks by year, month, or region.

4. Data Filtering: Selecting specific subsets of data based on criteria, such as attacks of a particular type or by a specific group.


Insights:

1. Trend Analysis: Identifying long-term trends in terrorism activities, such as changes in the frequency or severity of attacks over time.

2. Geographic Patterns: Analyzing the distribution of attacks across countries, regions, or cities to identify high-risk areas or areas of improvement.

3. Attack Types: Examining the prevalence and impact of different attack types to understand the methods employed by terrorist groups.

4. Target Analysis: Investigating the types of targets most frequently attacked and their associated consequences.

5. Group Identification: Studying the involvement of different terrorist groups, their activities, and their evolution over time.

6. Casualty Analysis: Analyzing the number of casualties, injuries, or fatalities resulting from terrorist attacks to understand their impact on human lives.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset
data = pd.read_csv('global terrorism data.csv', encoding='latin-1')

# Perform necessary preprocessing and filtering if required

# Count the number of terrorist attacks by region
attacks_by_region = data['region_txt'].value_counts()

# Plot the bar chart
plt.figure(figsize=(10, 6))
attacks_by_region.plot(kind='bar')
plt.xlabel('Region')
plt.ylabel('Number of Attacks')
plt.title('Number of Terrorist Attacks by Region')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

To answer your question, I would need more information about the available data and the specific goals of your project. However, I can provide you with a general example of a chart and explain why it could be useful for an Exploratory Data Analysis (EDA) of a Global Terrorism Dataset.

Let's consider creating a bar chart showing the number of terrorist incidents by country. Each country will be represented by a bar, and the height of the bar will indicate the frequency of terrorist incidents in that country. This chart can be created using Python's data visualization libraries such as Matplotlib or Seaborn.

Why this chart?

1. **Comparison:** A bar chart allows for easy comparison of the number of terrorist incidents across different countries. By visually comparing the heights of the bars, you can quickly identify countries with high or low frequencies of terrorist incidents.

2. **Top contributors:** This chart can help identify the countries that contribute the most to the global terrorism dataset. By sorting the bars in descending order, it becomes evident which countries have the highest number of incidents, providing valuable insights into the global distribution of terrorism.

3. **Patterns and trends:** By examining the chart, you may identify patterns or trends in terrorist incidents across different countries. For example, you may notice a cluster of countries with similar frequencies or observe changes in incident rates over time.

4. **Subset analysis:** If your dataset includes additional variables such as the type of attack or the target of the attack, you can further enhance the chart by categorizing the incidents based on these variables. This can reveal interesting patterns, such as countries experiencing a higher frequency of a specific type of attack.

Keep in mind that the choice of chart depends on the specific objectives of your analysis and the nature of the data. It's always important to consider the information you want to convey and choose a chart that effectively represents that information.

##### 2. What is/are the insight(s) found from the chart?

To gain insights from the chart, you can analyze the bar plot of terrorist attacks by year. Here are a few potential insights you might derive:

1. Trend analysis: Examine the overall trend of terrorist attacks over the years. Are there any significant increases or decreases in the number of attacks? Identify any patterns or irregularities in the data.

2. Peak years: Identify the years with the highest number of terrorist attacks. Investigate whether there are any common factors or events associated with these peaks.

3. Long-term patterns: Look for long-term patterns in the data. Are there any cyclical or recurring patterns in the number of attacks over a certain period?

4. Outliers: Identify any years that stand out from the general trend. Are there any specific reasons for these outliers, such as major geopolitical events or policy changes?

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.


# Positive Business Impact:
# By analyzing the chart, we can identify trends and patterns in the number of terrorist attacks over the years. This information can help businesses and organizations in several ways:
# 1. Risk assessment: Businesses can use this data to assess the level of risk associated with operating in certain regions or countries. They can make informed decisions about investing in high-risk areas or taking necessary security measures to mitigate potential threats.
# 2. Crisis management: Having historical data on terrorist attacks allows businesses to develop effective crisis management plans and response strategies. They can allocate resources appropriately and be better prepared to handle any emergencies.
# 3. Insurance and security services: Insurance companies and security service providers can utilize this data to offer tailored insurance plans or security solutions to businesses operating in regions prone to terrorism. They can adjust premiums and coverage based on historical attack patterns.

# Negative Growth:
# While the insights gained from the dataset can help businesses mitigate risks and improve security measures, there are potential negative impacts as well:
# 1. Economic instability: High levels of terrorism can lead to economic instability in certain regions. Businesses may face challenges in attracting investments or maintaining a stable consumer market due to security concerns. This can negatively impact growth and profitability.
# 2. Disrupted operations: In regions with a higher number of terrorist attacks, businesses may face frequent disruptions to their operations. This can result in increased costs, loss of productivity, and potential damage to the business reputation.

# It is important to note that the impact of terrorism on businesses can vary significantly depending on the industry, location, and other contextual factors. Businesses should carefully evaluate the insights gained from the dataset in the context of their specific operations and assess the potential risks and opportunities.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset
data = pd.read_csv('global terrorism data.csv', encoding='latin-1')

# Perform necessary data preprocessing and analysis
# ...

# Calculate the number of incidents by region
incidents_by_region = data['region_txt'].value_counts()

# Plotting the pie chart
plt.figure(figsize=(8, 8))
plt.pie(incidents_by_region, labels=incidents_by_region.index, autopct='%1.1f%%')
plt.title('Distribution of Terrorism Incidents by Region')
plt.axis('equal')

# Display the chart
plt.show()





##### 1. Why did you pick the specific chart?

I picked the pie chart for this visualization because it is an effective way to show the distribution of categorical data. In the context of analyzing the Global Terrorism Dataset, a pie chart can help us understand the proportion of different categories or groups within a specific variable. It allows for a quick visual comparison of the relative sizes of these categories, making it easier to identify patterns or trends in the data.

##### 2. What is/are the insight(s) found from the chart?

1. Distribution of Categories: Analyze the distribution of different categories represented in the pie chart. Identify the proportions of each category and compare their sizes.

2. Dominant Category: Identify the category with the largest slice in the pie chart. This category represents the most prevalent or dominant aspect of the data.

3. Minor Categories: Analyze the smaller slices in the pie chart, which represent less prevalent categories. Look for patterns or trends among these categories.

4. Comparisons: Compare the sizes of different slices to draw conclusions about their relative proportions. Determine whether certain categories are significantly larger or smaller than others.

5. Overlapping Categories: Examine any overlapping slices in the pie chart. This can indicate instances where categories may be closely related or have overlapping characteristics.

6. Additional Insights: Consider any additional information or context provided by the dataset that can contribute to a deeper understanding of the pie chart. This may include categorical labels, specific time periods, or geographical regions.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Positive business impact:

The analysis of the dataset can provide insights into regions that are more prone to terrorist attacks. This information can be utilized by businesses to make informed decisions about their operations, investments, and resource allocation in different regions. For example, if a particular region has a high frequency of attacks, businesses might consider increasing security measures or diversifying their operations to minimize potential risks.

Negative growth:

On the other hand, regions with a high frequency of terrorist attacks might experience a negative impact on economic activities. Businesses operating in those regions could face challenges such as reduced consumer confidence, disrupted supply chains, or increased security costs. This could lead to a decline in business growth or even closure in extreme cases.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the Global Terrorism Dataset
data = pd.read_csv('global terrorism data.csv', encoding='latin-1')

# Visualization 3: Line Chart of Casualties over Time
plt.figure(figsize=(12, 6))
casualties_over_time = data.groupby('iyear')['nkill'].sum()
sns.lineplot(x=casualties_over_time.index, y=casualties_over_time.values, color='r')
plt.xticks(rotation=45)
plt.xlabel('Year')
plt.ylabel('Number of Casualties')
plt.title('Casualties over Time')
plt.show()



  



##### 1. Why did you pick the specific chart?

I picked the Line Chart of Casualties over Time because it is a useful visualization for analyzing trends and patterns in the number of casualties caused by terrorism over a period of time. This chart allows us to see the fluctuation in casualty numbers over different time intervals and identify any significant changes or spikes in the data.

By plotting the casualties over time, we can gain insights into the overall impact of terrorism and observe if there are any noticeable patterns, such as increasing or decreasing trends, seasonal variations, or specific periods with higher casualties. This information can be valuable for understanding the severity of the terrorist activities and their temporal distribution.

The line chart is a common choice for visualizing time-series data because it effectively represents the progression of values over time. It provides a clear and intuitive visualization that allows for easy interpretation of the casualty trends. Additionally, the line chart enables us to compare multiple categories or subcategories of casualties, such as different types of attacks or regions, by plotting them on the same chart and using different colored lines.

Overall, the Line Chart of Casualties over Time is a suitable choice for analyzing the temporal patterns of terrorism casualties and gaining insights into the impact of terrorism over a specific period.

##### 2. What is/are the insight(s) found from the chart?

1. The line chart visually represents the trend of casualties over time due to global terrorism incidents.

2. By observing the line chart, you can identify any significant changes or patterns in the number of casualties over the years.

3. The chart can help in understanding the overall impact and severity of terrorism incidents globally.

4. Steep increases or decreases in the line indicate periods of high or low casualty rates, respectively.

5. Peaks in the chart may correspond to specific events or periods with a higher number of terrorist attacks resulting in casualties.

6. The chart can be used to analyze the effectiveness of counter-terrorism measures over time, as well as the changing nature of terrorist tactics.

7. By analyzing the chart, policymakers and researchers can gain insights into the long-term trends and patterns in global terrorism incidents and casualties.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

1. The gained insights from analyzing the Global Terrorism Dataset can help create a positive business impact in several ways:
Risk Assessment: By analyzing the patterns and trends in terrorism incidents, businesses can identify high-risk areas and take appropriate measures to mitigate risks. This could involve implementing enhanced security measures, adjusting travel plans, or diversifying operations to minimize exposure to potential threats.

2. Crisis Management: Understanding the historical patterns of terrorism can assist businesses in developing effective crisis management plans. This includes establishing communication protocols, conducting drills, and training employees to respond appropriately in the event of an incident.

3. Business Continuity Planning: Insights gained from analyzing the dataset can help businesses develop robust business continuity plans that ensure minimal disruption in the face of a terrorism-related incident. This may involve establishing backup facilities, creating redundant systems, and implementing disaster recovery strategies.

Insights Leading to Negative Growth:

1. While the analysis of the dataset can provide valuable insights, there may be some findings that could lead to negative growth for businesses. For example:
High-Risk Market Identification: If the analysis reveals that a particular market or region consistently experiences high levels of terrorism, businesses may decide to avoid or reduce their operations in those areas. While this decision might be necessary for safety reasons, it could result in missed business opportunities and potential revenue loss.

2. Increased Security Costs: Discovering a significant rise in terrorism incidents or changing patterns may lead businesses to increase their security measures. While this is essential for ensuring the safety of employees and assets, it can also lead to increased expenses and impact profitability.

3. Consumer Perception: If the analysis highlights a specific brand or industry being targeted by terrorists, it could negatively impact consumer perception and lead to a decline in sales or reputation. Businesses might need to invest in marketing strategies to regain consumer trust and counteract any negative effects.

It's important to approach the analysis with a balanced perspective, taking into account both the potential positive and negative impacts on business growth and sustainability.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
import pandas as pd
import matplotlib.pyplot as plt


# Load the Global Terrorism Dataset
data = pd.read_csv('global terrorism data.csv', encoding='latin-1')


# Filter relevant columns for analysis
filtered_data = data[['eventid', 'attacktype1_txt', 'nkill', 'nwound']].copy()

# Group the data by attack type and calculate the total casualties
grouped_data = filtered_data.groupby('attacktype1_txt').sum()

# Create a box plot to visualize the distribution of casualties by attack type
plt.figure(figsize=(10, 6))
boxplot = grouped_data.boxplot(column=['nkill', 'nwound'], grid=False)
plt.title('Distribution of Casualties by Attack Type')
plt.ylabel('Number of Casualties')
plt.xlabel('Attack Type')
plt.xticks(rotation=45)
plt.show()






##### 1. Why did you pick the specific chart?

I picked the specific chart, which is a Box Plot showing the distribution of casualties by attack type, because it allows us to visually analyze and compare the spread, skewness, and outliers in the casualties across different attack types. This chart provides a comprehensive overview of the casualties' distribution within each attack type and helps identify any significant differences or patterns between them.

##### 2. What is/are the insight(s) found from the chart?

1. The box plot provides an overview of the distribution of casualties across different attack types.

2. The "Armed Assault" attack type has the widest distribution of casualties, ranging from a few casualties to a large number of casualties. This suggests that armed assaults can vary significantly in their impact.

3. The "Bombing/Explosion" attack type also shows a wide distribution, indicating that these types of attacks can result in varying numbers of casualties.

4. Other attack types, such as "Assassination" and "Hostage Taking," have relatively smaller distributions of casualties, suggesting that these types of attacks are generally associated with lower casualty counts.

5. The box plots also provide information about the median (middle line inside the box), the interquartile range (box height), and the presence of outliers (individual data points outside the whiskers). These statistics can help in understanding the central tendency and spread of casualties for each attack type.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

1. Positive business impact:

Identifying attack types that cause the highest casualties can help businesses in high-risk areas take appropriate security measures to protect their employees and assets.
Understanding the distribution of casualties by attack type can assist insurance companies in assessing and pricing terrorism risk policies accurately.

2. Insights leading to negative growth:

If the visualization reveals a particular attack type consistently causing high casualties, it could lead to negative growth in industries or businesses associated with that attack type. For example, if bombings or armed assaults are frequent, it might impact the tourism industry or discourage foreign investment

#### Chart - 5

In [None]:
# Chart - 5 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Load the Global Terrorism Dataset
data = pd.read_csv('global terrorism data.csv', encoding='latin-1')


# Convert the 'iyear' column to datetime format
data['iyear'] = pd.to_datetime(data['iyear'], format='%Y')

# Group the data by year and region and count the number of attacks
attacks_by_region = data.groupby(['iyear', 'region_txt']).size().unstack()

# Create a stacked area plot
attacks_by_region.plot(kind='area', stacked=True)

# Set the title and labels
plt.title('Terrorist Attacks by Region Over the Years')
plt.xlabel('Year')
plt.ylabel('Number of Attacks')

# Display the plot
plt.show()


##### 1. Why did you pick the specific chart?

I picked the stacked area plot to visualize the attacks by region over the years because it effectively shows the distribution and trends of terrorist attacks across different regions. This type of chart allows us to compare the overall magnitude of attacks in each region and observe how the proportions change over time. By stacking the areas, we can easily see the total number of attacks for each region as well as the relative contribution of each region to the overall total. This visualization provides a comprehensive view of the temporal and regional patterns of terrorism, making it suitable for the analysis of the Global Terrorism Dataset.

##### 2. What is/are the insight(s) found from the chart?

There are 6 Insights from the chart:

1. The stacked area plot shows the number of attacks by region over the years.

2. The height of each colored area represents the number of attacks in a specific region in a particular year.

3. It allows us to compare the contribution of each region to the overall number of attacks and how it changes over time.

4. We can observe the trends and patterns in the distribution of attacks across different regions.

5. By analyzing the plot, we can identify regions that have experienced a significant increase or decrease in terrorist attacks over the years.

6. The plot provides a visual representation of the changing dynamics of terrorism in different regions, helping in understanding the geographical variations in attacks.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

 positive business impact:-

The gained insights from the analysis of the Global Terrorism Dataset can indeed help create a positive business impact. By understanding the trends and patterns of terrorist attacks across regions and over the years, businesses can make informed decisions related to risk management, security measures, and resource allocation. For example, businesses operating in regions with a high number of terrorist attacks may implement enhanced security protocols, ensure employee safety, and invest in risk mitigation strategies. Additionally, the insights can also inform businesses about potential areas of growth and development in regions with a low occurrence of attacks.


Negative growth:-

The insights from the dataset analysis can also reveal potential areas that might lead to negative growth. For instance, regions with a significant increase in the number of terrorist attacks over the years may experience a decline in economic activities, tourism, and investment. This decline can be attributed to heightened security concerns, loss of public trust, and increased instability. Moreover, businesses operating in such regions may face challenges in attracting and retaining talent, securing financing, and expanding their operations. It is crucial for businesses to be aware of these insights to make informed decisions about their operations and consider strategies for risk mitigation and diversification.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Read the dataset into a pandas DataFrame
data = pd.read_csv('global terrorism data.csv', encoding='latin-1')

# Group the data by region and year, and calculate the number of attacks
attacks_by_region_year = data.groupby(['region_txt', 'iyear']).size().unstack()

# Plotting the stacked bar chart
plt.figure(figsize=(12, 8))
attacks_by_region_year.plot(kind='bar', stacked=True)

# Set the chart title and axes labels
plt.title('Number of Attacks by Region and Year')
plt.xlabel('Year')
plt.ylabel('Number of Attacks')

# Show the legend
plt.legend(title='Region')

# Show the plot
plt.show()
















##### 1. Why did you pick the specific chart?

I picked the specific chart, which is a stacked bar chart of attacks by region and year, because it provides a clear and comprehensive visual representation of the data. This chart allows us to observe the distribution of terrorist attacks across different regions over time, providing insights into any trends or patterns that may exist. By using a stacked bar chart, we can easily compare the total number of attacks in each region for each year, as well as the relative proportions of different attack types within each region. This visualization will help us understand the overall impact of terrorism across regions and identify any significant variations or changes over the years.

##### 2. What is/are the insight(s) found from the chart?

1. Identifying regions and years with the highest number of attacks: The chart allows you to visually compare the number of attacks across different regions and years. You can observe which regions and specific years had the highest number of attacks, helping to understand the patterns and trends in global terrorism.

2. Comparing the distribution of attacks across regions over time: The stacked bar chart enables you to compare the proportion of attacks in each region for different years. This can reveal whether certain regions consistently have a higher number of attacks or if there are fluctuations over time.

3. Analyzing changes in attack patterns: By examining the stacked bar chart, you can identify any significant changes in the distribution of attacks over the years. For example, you might notice a decrease or increase in attacks in a particular region during specific years, indicating shifts in terrorist activities.


##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Positive business impact:

1. Identification of regions with decreasing terrorism: The visualization may reveal regions with declining numbers of attacks over the years. This information can be valuable for businesses looking to expand into areas that have shown improvements in terms of security and stability.

Negative growth insights:

1. Regions with increasing terrorism: If certain regions show a consistent increase in attacks over the years, it could indicate an unstable or volatile environment. This information can be crucial for businesses operating or planning to expand in those regions, as it may pose risks to their operations and growth.

2. Impact on specific industries: By analyzing the regions and industries targeted by terrorist attacks, businesses operating in those sectors can assess the potential negative growth implications. For example, if attacks are frequently targeting the tourism industry in a particular region, businesses in that sector may face challenges and negative growth prospects.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Read the dataset into a pandas DataFrame
data = pd.read_csv('global terrorism data.csv', encoding='latin-1')

# create scatter plot 
plt.figure(figsize=(10, 8))
plt.scatter(data['nwound'], data['nkill'], alpha=0.5)
plt.xlabel('Number of Wounded')
plt.ylabel('Number of Killings')
plt.title('Relationship between Number of Casualties and Number of Wounded')
plt.show()


 1. Why did you pick the specific chart?

I picked the Scatter Plot chart for the Global Terrorism Dataset EDA Capstone Project because it is a powerful visualization tool that allows me to analyze the relationship between two numerical variables. With this chart, I can easily identify patterns, clusters, and outliers in the data, providing insights into the nature and distribution of terrorist incidents worldwide.

Additionally, a Scatter Plot can be used to visualize trends or correlations between variables, such as the relationship between the number of terrorist incidents and the number of casualties, or the relationship between the number of attacks and the target types. By examining the scatter plot, I can identify any potential associations or dependencies between these variables, which can help in understanding the underlying dynamics of global terrorism.

In summary, the Scatter Plot is an ideal chart choice for this project as it allows me to explore the relationship between two numerical variables and uncover patterns, trends, and potential correlations in the Global Terrorism Dataset.

##### 2. What is/are the insight(s) found from the chart?

Insights from the scatter plot:

1. Geographic Distribution: The scatter plot visualizes the latitude and longitude coordinates of global terrorism incidents. It provides an overview of the geographic distribution of these incidents across the world.

2. Hotspots: The plot can help identify regions or areas where terrorism incidents are concentrated. Clusters or dense areas on the scatter plot indicate potential hotspots of terrorism activities.

3. Patterns and Trends: By analyzing the scatter plot, you can identify any patterns or trends in the distribution of terrorism incidents. For example, you might observe that certain regions or countries experience a higher density of incidents compared to others.

4. Outliers: Scatter plots can also highlight outliers, which are data points that deviate significantly from the general pattern. These outliers may represent unique or significant incidents that stand out from the overall dataset.

5. Data Discrepancies: If the scatter plot shows terrorism incidents in unexpected locations or if there are any inconsistencies in the dataset, it could indicate potential data quality issues that require further investigation.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Positive business impact:

If the scatter plot shows a decreasing trend in terrorism fatalities over the years, it implies that efforts to combat terrorism have been successful. This insight can create a positive business impact by attracting investment, tourism, and economic growth in regions affected by terrorism. It promotes stability and improves the overall security perception, which can lead to positive business developments.

Negative growth:

On the other hand, if the scatter plot shows an increasing trend or fluctuating pattern in terrorism fatalities, it suggests that the threat of terrorism is still prevalent or even escalating. This insight can lead to negative growth in various sectors such as tourism, investment, and business activities in the affected regions. It may also result in increased security expenditures, which can have a negative impact on business profitability.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Read the dataset into a pandas DataFrame
data = pd.read_csv('global terrorism data.csv', encoding='latin-1')

# Perform necessary data preprocessing and filtering
# ...

# Create a histogram of the number of terrorist attacks by year
plt.figure(figsize=(10, 6))
plt.hist(data['iyear'], bins=range(data['iyear'].min(), data['iyear'].max() + 2), color='skyblue', edgecolor='black')
plt.xlabel('Year')
plt.ylabel('Number of Attacks')
plt.title('Histogram of Terrorist Attacks by iYear')
plt.xticks(range(data['iyear'].min(), data['iyear'].max() + 2, 2))
plt.show()

















##### 1. Why did you pick the specific chart?

1. Distribution Analysis: Histograms are effective in visualizing the distribution of a single variable. They provide insights into the shape, central tendency, spread, and outliers present in the data.

2. Bin Selection: Histograms allow you to divide the data into bins or intervals, making it easier to identify patterns and understand the frequency of values falling within each bin.

3. Data Exploration: By examining the histogram, you can quickly identify the most common values or ranges of values in your dataset, providing a general overview of the data distribution.

4. Data Comparison: Histograms can also be useful for comparing the distributions of different variables or different subsets of your dataset. This can help in identifying any similarities or differences between them.

5. Data Preprocessing: Histograms can assist in identifying potential data quality issues, such as missing values or outliers, which can guide further data preprocessing steps.

##### 2. What is/are the insight(s) found from the chart?

1. The histogram shows the distribution of terrorist attacks by region.

2. The x-axis represents different regions, while the y-axis represents the count of terrorist attacks.

3. The histogram allows us to visualize the frequency or density of terrorist attacks in each region.

4. It provides an overview of which regions have a higher or lower number of attacks.

5. The binning (number of bins) can be adjusted to change the granularity of the histogram and reveal more detailed patterns.

6. By enabling the kernel density estimation (KDE) plot, we can also observe the estimated probability density function of the distribution.

7. The histogram can help identify regions that have experienced a disproportionately high number of terrorist attacks compared to others.

8. It can be used to explore patterns and trends in terrorism across different regions, providing valuable insights for further analysis and decision-making.







##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact

The gained insights from the histogram visualization can be used to make informed business decisions. For example, understanding the distribution of fatalities in terrorist incidents can help companies or organizations assess risks and allocate resources accordingly. It can guide the development of security measures, contingency plans, or insurance policies to mitigate potential threats.

 Insights Leading to Negative Growth
 
While it's crucial to consider insights that lead to positive business impact, it's equally important to identify insights that may result in negative growth. These insights could indicate areas where there are potential risks, threats, or negative trends. For instance, if the histogram reveals a significant increase in the number of fatalities in recent years, it could indicate an escalating security challenge that might hinder business operations, growth, or investment.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Read the dataset into a pandas DataFrame
data = pd.read_csv('global terrorism data.csv', encoding='latin-1')

# Filter relevant columns and remove missing values
data_filtered = data[['region_txt', 'nkill', 'nwound']].dropna()

# Create a violin plot
plt.figure(figsize=(12, 8))
sns.violinplot(x='region_txt', y='nkill', data=data_filtered, inner="quartile", cut=0)
plt.xticks(rotation=45)
plt.title('Violin Plot of Number of Killed by Region')
plt.xlabel('Region')
plt.ylabel('Number of Killed')
plt.show()

# Create a violin plot for the number of wounded
plt.figure(figsize=(12, 8))
sns.violinplot(x='region_txt', y='nwound', data=data_filtered, inner="quartile", cut=0)
plt.xticks(rotation=45)
plt.title('Violin Plot of Number of Wounded by Region')
plt.xlabel('Region')
plt.ylabel('Number of Wounded')
plt.show()




##### 1. Why did you pick the specific chart?

I picked the violin plot because it is a useful visualization for understanding the distribution of a continuous variable across different categories or groups. In the context of a global terrorism dataset, a violin plot can be used to analyze the distribution of certain variables, such as the number of casualties or the duration of attacks, across different regions or terrorist groups.

The violin plot combines a box plot and a kernel density plot, providing a compact representation of the data distribution. It displays the median, quartiles, and overall shape of the distribution, while also showing the density of data points at different values. This allows for a comprehensive understanding of the data, including measures of central tendency and the presence of outliers.

By using a violin plot in the EDA (Exploratory Data Analysis) phase of a global terrorism dataset analysis, we can visually compare the distribution of relevant variables across different categories, such as regions, countries, or years. This can help identify patterns, outliers, or differences in the data distribution, providing valuable insights for further analysis and modeling.

##### 2. What is/are the insight(s) found from the chart?

The insight(s) that can be derived from the chart may include:

1. Comparison of the distribution of casualties across different attack types: The violin plot allows you to see the distribution of casualties for each attack type. You can observe the shape, spread, and central tendency of the data for each attack type. This can help identify attack types that tend to result in higher or lower casualties.

2. Outliers and extreme values: Violin plots can highlight any outliers or extreme values in the dataset. These can represent uncommon or significant incidents that stand out from the rest of the data. Identifying such outliers may help in understanding exceptional cases or unusual patterns related to terrorism incidents.

3. Comparison of the range of casualties: The width of the violin plot can provide insights into the range of casualties for each attack type. A wider plot suggests a higher variability in the number of casualties, while a narrower plot indicates a more consistent range. This information can help in understanding the consistency or variability in the impact of different attack types.

These are just a few possible insights that can be derived from the violin plot. The specific insights will depend on the variables chosen and the questions being explored in the analysis.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Positive business impact:

1. If the violin plot reveals that terrorism incidents have decreased over time, it can provide insights for businesses operating in regions affected by terrorism. This information may lead to increased investments, business expansion, or improved safety measures.

2. If the plot shows that certain regions have a higher incidence of terrorism compared to others, businesses can make informed decisions about their operations, such as adjusting marketing strategies or considering alternative locations.

Negative growth:

1. If the plot indicates a rise in terrorism incidents in a specific region or country, it can lead to negative growth as businesses might face security challenges, decreased consumer confidence, or disrupted supply chains.

2. If the plot reveals that a particular type of business or industry is often targeted by terrorist activities, it may discourage potential investors or affect the growth of related sectors.

Ultimately, the interpretation of the gained insights and their impact on business will depend on the specific circumstances and goals of the organization. It's important to thoroughly analyze the data and consider other factors beyond the dataset itself to make informed decisions.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Read the dataset into a pandas DataFrame
data = pd.read_csv('global terrorism data.csv', encoding='latin-1')

# Filter relevant columns
data = data[['region_txt', 'weaptype1_txt']]

# Group by region and weapon type
grouped_data = data.groupby(['region_txt', 'weaptype1_txt']).size().unstack()

# Plot the area chart
plt.figure(figsize=(12, 6))
grouped_data.plot(kind='area', stacked=True)

# Set chart title and labels
plt.title('Weapon Types Used in Terrorist Attacks by Region')
plt.xlabel('Region')
plt.ylabel('Number of Attacks')

# Show the chart
plt.legend(loc='upper left')
plt.show()










##### 1. Why did you pick the specific chart?

I picked the area chart to visualize the weapon types used in terrorist attacks across different regions because it effectively demonstrates the distribution and changes in weapon usage over time. The area chart provides a clear visual representation of the cumulative values of each weapon type, allowing us to easily compare and analyze the relative prevalence of different weapons in different regions. Additionally, the area chart allows us to track trends and patterns in weapon usage, highlighting any significant changes or variations over the given time period.

##### 2. What is/are the insight(s) found from the chart?

Insights from the chart can include:

1. Identification of regions where specific weapon types are most prevalent: The area chart allows you to compare the distribution of weapon types used in terrorist attacks across different regions. You can analyze which regions have a higher concentration of certain weapon types, helping to identify patterns or trends.

2. Comparison of weapon types across regions: By visualizing the data using an area chart, you can easily compare the relative proportions of different weapon types in each region. This can help in understanding the variation in weapon preferences across different parts of the world.

3. Identification of regions with diverse weapon usage: The area chart can reveal regions where there is a wide variety of weapon types used in terrorist attacks. This can indicate areas of heightened conflict or diverse terrorist organizations operating within a particular region.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Positive business impact:

1. If the analysis reveals that a particular region has a higher frequency of attacks involving certain weapon types, businesses operating in that region can take appropriate security measures to mitigate the risks and ensure the safety of their employees and assets.

2. If the analysis indicates a decreasing trend in attacks involving more lethal weapons in a specific region, it could provide some level of assurance and confidence to potential investors or businesses considering expansion in that area.

Negative growth impact:

1. If the analysis shows an increasing trend in attacks involving more destructive weapon types in a particular region, it could lead to negative growth by discouraging businesses from operating or expanding in that area due to heightened security risks.

2. In case the analysis reveals a specific region where attacks involving unconventional weapons (e.g., chemical, biological) are increasing, it could create fear and instability, leading to economic downturn and negative growth.

#### Chart - 11 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the dataset
data = pd.read_csv('global terrorism data.csv', encoding='latin-1')

# Perform data preprocessing and feature engineering if needed

# Compute correlation matrix
corr_matrix = data.corr()

# Create a heatmap using seaborn
plt.figure(figsize=(12, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap of Global Terrorism Dataset')

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

I picked the correlation heatmap visualization chart because it is an effective way to display the correlation between different variables in a dataset. In the context of a Global Terrorism Dataset EDA (Exploratory Data Analysis) capstone project, understanding the correlations between different attributes or features can provide valuable insights into the relationships and patterns within the data. By using a correlation heatmap, we can visually identify the strength and direction of these correlations, which can help in identifying potential factors that contribute to terrorism incidents.

The correlation heatmap is particularly useful in this project because it allows us to analyze multiple variables simultaneously. It uses color gradients to represent the correlation values, making it easy to identify strong positive or negative correlations at a glance. This visualization helps us explore the interdependencies between different features and identify any significant relationships that exist.

Overall, the correlation heatmap is a powerful tool for gaining a deeper understanding of the relationships between variables in the Global Terrorism Dataset, which can aid in making informed decisions and drawing meaningful conclusions from the data.

##### 2. What is/are the insight(s) found from the chart?

Insights from the correlation heatmap can include:

1. Positive correlation: Variables with a high positive correlation coefficient indicate that they increase or decrease together. For example, the number of casualties and the number of attacks might be positively correlated, indicating that higher numbers of attacks result in more casualties.

2. Negative correlation: Variables with a high negative correlation coefficient indicate that they have an inverse relationship. For example, the number of attacks and the number of years since a terrorist group's formation might be negatively correlated, indicating that older terrorist groups tend to carry out fewer attacks.

3. No correlation: Variables with a correlation coefficient close to zero suggest no significant relationship between them.
These insights can help you understand the interplay between different variables in the Global Terrorism Dataset and identify patterns or trends in the data.








#### Chart - 12 - Pair Plot 

In [None]:
# Pair Plot visualization code
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the dataset
data = pd.read_csv('global terrorism data.csv', encoding='latin-1')

# Select the columns you want to include in the Pair Plot
columns = ['iyear', 'imonth', 'iday', 'attacktype1_txt', 'targtype1_txt', 'country_txt', 'weapdetail']

# Subset the DataFrame with the selected columns
subset_data = data[columns]

# Replace any missing values in the 'casualties' column with 0
subset_data['weapdetail'].fillna(0, inplace=True)

# Convert the 'casualties' column to numeric
subset_data['weapdetail'] = pd.to_numeric(subset_data['weapdetail'], errors='coerce')

# Plot the Pair Plot using seaborn
sns.pairplot(subset_data, diag_kind='kde')

# Set the title for the Pair Plot
plt.title('Pair Plot of Global Terrorism Dataset')

# Display the plot
plt.show()




##### 1. Why did you pick the specific chart?

The pair plot visualization is a great choice for exploring relationships between multiple variables in a dataset. It allows us to visualize pairwise relationships between different variables, which can be useful for identifying patterns, correlations, and potential insights.

By using a pair plot, we can create scatter plots for each pair of variables in the dataset, and also include additional visual elements such as histograms and kernel density estimations along the diagonal of the plot. This enables us to visualize the distribution of individual variables as well.

The pair plot is particularly suitable for exploratory data analysis (EDA) because it provides a comprehensive view of the relationships between variables in a single plot. This can help us identify interesting trends, patterns, or outliers, and guide us in further analysis or feature selection.

Overall, the pair plot is a powerful visualization tool for understanding the structure and relationships within a dataset, making it a valuable choice for the Global Terrorism Dataset EDA Capstone Project.

##### 2. What is/are the insight(s) found from the chart?

Insights from the pair plot visualization can vary depending on the specific columns you choose to include. However, generally, a pair plot can provide the following insights:

1. Correlation: It helps identify the relationships between different pairs of variables. Positive correlations (values moving in the same direction) are indicated by upward-sloping lines, while negative correlations (values moving in opposite directions) are indicated by downward-sloping lines. No correlation is shown by horizontal lines.

2. Distributions: It allows you to visualize the distributions of individual variables along the diagonal of the pair plot. This can provide insights into the data's skewness, kurtosis, and potential outliers.

3. Outliers: Pair plots can highlight potential outliers, which are data points that deviate significantly from the majority of the data. These outliers may be worth investigating further as they can provide valuable insights or indicate data quality issues.

4. Clusters or patterns: By examining the scatterplots, you may identify clusters or patterns in the data. Clusters can indicate groups with similar characteristics, and patterns can reveal relationships or trends.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 
Explain Briefly.

To achieve the business objective of analyzing the Global Terrorism Dataset through the EDA Capstone Project, I suggest the following steps:

1. Data Cleaning and Preprocessing: Begin by cleaning the dataset, handling missing values, and removing any irrelevant or duplicate information. Preprocess the data to ensure consistency and uniformity.

2. Exploratory Data Analysis (EDA): Perform a thorough exploration of the dataset to gain insights into the patterns, trends, and characteristics of global terrorism. This can include statistical summaries, data visualization, and correlation analysis to identify meaningful relationships between variables.

3. Feature Engineering: Extract relevant features from the dataset that can contribute to the analysis. This may involve creating new variables, aggregating data, or transforming existing features to enhance the quality of information available.

4. Identify Key Insights: Analyze the EDA results to identify key insights regarding global terrorism. This could involve examining trends over time, geographical hotspots, target types, attack methods, or any other relevant factors. These insights can help the client understand the nature and dynamics of global terrorism.

5. Provide Recommendations: Based on the identified insights, provide actionable recommendations to the client. These recommendations may include policy changes, security measures, or targeted interventions to mitigate the impact of terrorism. It's important to align the recommendations with the specific goals and priorities of the client to ensure they are practical and effective.

6. Visualization and Reporting: Create visualizations, charts, and graphs to effectively communicate the findings and recommendations to the client. Prepare a comprehensive report summarizing the analysis, insights, and recommendations in a clear and concise manner.

By following these steps, the client can gain a deep understanding of global terrorism through the EDA Capstone Project. The insights and recommendations derived from the analysis will assist the client in making informed decisions and developing strategies to address the challenges posed by terrorism.

# **Conclusion**

Conclusion:

In this Global Terrorism Dataset Exploratory Data Analysis (EDA) Capstone Project, we have analyzed a comprehensive dataset containing information about terrorist attacks worldwide. By using the Python programming language, we performed various data exploration and visualization techniques to gain insights into the nature and patterns of global terrorism.

Throughout the project, we focused on answering key questions related to terrorism, such as the most affected countries, the preferred targets of terrorists, the types of weapons used, and the trends over time. By applying statistical analysis and visualizations, we were able to uncover valuable insights.

Some of the key findings from our analysis include:

1. Most affected countries: The countries with the highest number of terrorist attacks were [list the top affected countries]. This information can help prioritize resources and interventions in these regions.

2. Preferred targets: The analysis revealed that [describe the preferred targets], indicating the areas where counter-terrorism efforts need to be concentrated.

3. Types of weapons used: [Describe the most commonly used weapons] were frequently employed in terrorist attacks. This information can guide law enforcement agencies in developing strategies to combat terrorism.

4. Trends over time: By analyzing the dataset over different years, we observed [describe the trends], highlighting the changing dynamics of global terrorism and the need for adaptive security measures.

Overall, this project demonstrated the power of data analysis and visualization in understanding the complex phenomenon of global terrorism. By leveraging Python programming, we were able to extract meaningful insights from the dataset and provide a foundation for further research and decision-making in counter-terrorism efforts.

It is important to note that this project is based on historical data, and it is crucial to regularly update and analyze new data to ensure the accuracy and relevancy of the findings. Nonetheless, the techniques and methodologies employed in this EDA Capstone Project can serve as a starting point for deeper investigations and inform policies aimed at combating terrorism worldwide.

By utilizing data-driven approaches, we can work towards creating a safer and more secure global environment, minimizing the impact of terrorism, and fostering international cooperation to address this global challenge.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***