<a href="https://colab.research.google.com/github/Abhisek358/Global---Terrorism--Dataset/blob/main/Sample_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -

## ***Global Terrorism Dataset***



##### **Project Type**    - EDA
##### **Contribution**    - Team
##### **Team Member 1 -**Abhisek Swain
##### **Team Member 2 -**Aditya Raj
##### **Team Member 3 -**Lucky Saxsena
##### **Team Member 4 -**Swarup Sunil Mane
##### **Team Member 5 -**Talari Venkatesh

# **Project Summary -**

The "Exploring and Analyzing Global Terrorism Trends" project delves into the vast and comprehensive Global Terrorism Database (GTD) spanning from 1970 to 2017. This open-source repository, meticulously curated by the National Consortium for the Study of Terrorism and Responses to Terrorism (START) at the University of Maryland, contains a staggering record of over 180,000 terrorist incidents worldwide. The project's primary objective is to extract invaluable insights, patterns, and trends from this extensive dataset to enhance our understanding of global terrorism and contribute to better counter-terrorism strategies.

# **GitHub Link -**

# **Problem Statement**


Using exploratory data analysis(EDA) techniques on the GTD, identify the hot zones of terrorism globally and discern the evolving patterns of terrorist activities. What insights related to security issues can be derived from this analysis that colud be instrumental in shaping counter-terrorism strategies?

#### **Define Your Business Objective?**

The business objective is to leverage the GTD's data to extract actionable insights that support efforts to counter terrorism comprehensively, efficiently, and effectively, ultimately contributing to improved security and safety on a global scale.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import statsmodels.api as sm

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')
df=pd.read_csv('/content/drive/MyDrive/Global Terrorism Data.csv',encoding='latin-1')

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
rows,cols = df.shape
print(f'There are {rows} rows and {cols} columns in the dataset')

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_rows = df.duplicated().sum()
print(f'There are {duplicate_rows} duplicate_rows in dataset')

 Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
nan_count=df.isnull().sum()
#i just want to check how many columns have missing values
missing_values=nan_count[nan_count>0]
missing_values

In [None]:
# Visualizing the missing values
plt.figure(figsize=(14,6))
sns.heatmap(df.isnull(), cbar=False)
plt.title('Missing Values Heatmap')
plt.show()

### What did you know about your dataset?

Dataset Size : The dataset is quite large, containing 181,691 entries or rows.

Feature Quantity : The dataset contains 135 features or columns.

Data Types : The dataset has a mix data types. There are 55 features with floating point numbers(float64),22 features with intigers(int64),and 58 features with objects(objesct). The object datatype in pandas typically means the column contains string(text) data.

Memory Usage : The dataset uses over 187.1 MB memory.

Missing Values: There are some columns with a large number of missing values.For example,the'approxdate' column has 172,452 missing values and the'related' column has 156,653 missing values.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
columns = df.columns
print("columns in the dataset")
for column in columns:
  print(column)

In [None]:
# Dataset Describe
df.describe(include='all')

In [None]:
#  Renaming the column for clarity and consistency
# We are renaming the 'old_column_name' to 'new_column_name' to improve data interpretability
df.rename(columns={'iyear':'Year','imonth':'Month','iday':"day",'gname':'Group','country_txt':'Country','region_txt':'Region','provstate':'State','city':'City','latitude':'latitude',
    'longitude':'longitude','summary':'summary','attacktype1_txt':'Attacktype','targtype1_txt':'Targettype','weaptype1_txt':'Weapon','nkill':'kill',
     'nwound':'Wound'},inplace=True)

### Variables Description

Year: The year in which the incident occurred.

Month: The month in which the incident occurred.

Day: The day of the month when the incident.

Country: The country where the incident took place.

State: The state or province within the country.

Region: The geographical region of the incident.

City: The city where the incident occurred.

Latitude: The latitude coordinate of the incident location.

Longitude: The longitude coordinate of the incident location.

Attacktype: The type of attack (Assassination, Bombing/Explosion).

Kill: The number of people killed in the incident.

Wound: The number of people wounded in the incident.

Target1: A description of the primary target of the attack.

Summary: A brief summary or description of the incident.

Group: The group responsible for the attack.

Targettype: The type of target that was attacked (Government, Military).

Weapon: The weapon or method used in the attack (Explosives, Firearms).

Motive: The suspected motive or reason behind the attack.

INT_LOG: A variable indicating whether the incident was part of international logistics (0 for no, 1 for yes,-9 for Uknown).

INT_IDEO: A variable indicating whether the incident was part of international ideology (0 for no, 1 for yes,-9 for Unknown).

Success: A binary variable indicating the success of the attack (0 for unsuccessful, 1 for successful).

Individual: A binary variable indicating if the attack was carried out by an individual (0 for no, 1 for yes).

Multiple: A binary variable indicating if multiple individuals or groups were involved in the attack (0 for no, 1 for yes).

Dbsource: The source of the data.

Nkillter: The number of terrorists killed in the incident.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in df.columns.tolist():
  print("No. of unique values in ",i,"is",df[i].nunique(),".")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# NaN percentage in each and every columnn of the dataset
nan_percentage=(nan_count/len(df))*100
nan_percentage

In [None]:
# Identify columns with a high percentage of missing values (>= 50%)
column_with_high_nan=nan_percentage[nan_percentage>=50]
#Iterate through columns with a high percentage of missing values
# and print the column names along with their missing value percentage
for key,values in column_with_high_nan.items():
  print(key,values) # These are the list of columns with high missing values
print(len(column_with_high_nan))

In [None]:
#Identify columns with a  missing values (< 50%)
missing_values_less_then_50=nan_percentage[nan_percentage<50]
print(missing_values_less_then_50)
print(len(missing_values_less_then_50))#these are the columns which has missing vlaue less then 50

In [None]:
#Selecting columns with less missing data and that are relevant for analysis
df1=df[['Year','Month','day','Country','State','Region','City','latitude','longitude',"Attacktype",'kill','Wound','target1','summary','Group','Targettype','Weapon','motive','INT_LOG','INT_IDEO','success','individual','multiple','dbsource','nkillter']]

In [None]:
df1.head()

In [None]:
df1.columns

In [None]:
# Total sum of missing value in our selecting columns in new dataframe
df1.isnull().sum()

In [None]:
#Describing the new dtaframe
df1.describe()

In [None]:
#identify the columns which has missing value grater the 0
df1.isnull().sum()[df1.isnull().sum()>0]

In [None]:
# Handling missing values of the 'kill' column
df1['kill'].isna().sum()

In [None]:
#Drawing the QQ plot to check the df['kill'] is normally distribuated or not
sm.qqplot(df1['kill'], line='s')
plt.title("Q-Q Plot")
plt.show()

**This** **Graph**  **clearly** **indicates** **that** **the** **data** **is** n**ot normally Distribuated**

In [None]:
# Handling missing values in the 'kill' column by replacing them with zeros
df1['kill'].fillna(0,inplace=True)
#This code snippet converts the 'kill' column values to integers after cleaning.
df1['kill'] = df1['kill'].apply(lambda x: int(float(pd.to_numeric(x, errors='coerce'))))

 filling missing values in the "kill" column that aligns with the distribution of the data. Given that the majority of values are zero (0th to 50th percentiles),mean of the column is 2.4 and the standard deviation(11.54) is relatively high, a reasonable choice could be to fill missing values with the median, which is 0.

In [None]:
# Handling missing values in the 'Wound' column
df1['Wound'].isna().sum()

In [None]:
# Handling missing values in the 'wound' column by replacing them with zeros
df1['Wound'].fillna(0,inplace=True)
#This code snippet converts the 'wound' column values to integers after cleaning.
df1['Wound'] = df1['Wound'].apply(lambda x: int(float(pd.to_numeric(x, errors='coerce'))))

 filling missing values in the "wound" column that aligns with the distribution of the data. Given that the majority of values are zero (0th to 50th percentiles) mean is 3.16 and the standard deviation(35.94) is relatively high, a reasonable choice could be to fill missing values with the median, which is 0

In [None]:
#  Handling missing values in the 'nkillter' column by replacing them with zeros
df1['nkillter'].fillna(0,inplace=True)

In [None]:
# Handling missing values in the 'multiple' column by replacing them with 1
df['multiple'].fillna(0,inplace=True)

multiple column has only one misssing vlaue this column has binary value which is 0,1 fill  missing value either with 0 or 1\

In [None]:
# Handling missing values in the city,target1 and state  columns
print(df1['City'].isna().sum())
print(df1['State'].isna().sum())
print(df1['target1'].isna().sum())

In [None]:
# Counting the number of records where 'City' is labeled as 'Unknown'
print(len(df1[df1['City']=='Unknown']))
# Counting the number of records where 'State' is labeled as 'Unknown'
print(len(df1[df1['State']=='Unknown']))
# Counting the number of records where 'target1' is labeled as 'Unknown'
print(len(df1[df1['target1']=='Unknown']))

In [None]:
# Handling missing values in the city,target1 and state  columns and replacing the missing value with 'Unknown'
df1.fillna(value={'City':'Unknown', 'target1': 'Unknown','State':'Unknown'}, inplace=True)

replasing missing values in the columns 'City', 'State', and 'target1' with the value 'Unknown' because  this value appears frequently in those columns.

### What all manipulations have you done and insights you found?

All manipulations we have done yet.

Handling Missing Values:

Missing values in the 'kill' column were replaced with zeros. The 'kill' column values were converted to integers after cleaning. Missing values in the 'Wound' column were replaced with zeros. The 'Wound' column values were converted to integers after cleaning. Missing values in the 'nkillter' column were replaced with zeros. Missing values in the 'multiple' column were replaced with zeros. Missing values in the 'City,' 'target1,' and 'State' columns were addressed. Counting Unknown Values:

The number of records where 'City' is labeled as 'Unknown' was counted. The number of records where 'State' is labeled as 'Unknown' was counted. The number of records where 'target1' is labeled as 'Unknown' was counted. Handling Missing Values in 'City,' 'target1,' and 'State' Columns:

Missing values in the 'City,' 'target1,' and 'State' columns were replaced with 'Unknown.'

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
#data disteribution
# this represent the distribution of data on each series in the dataframe
# Create histograms to visualize the distribution of data for each series in the DataFrame
df1.hist(figsize=(20,10))

##### 1. Why did you pick the specific chart?

Histograms were chosen as the specific chart because they are a suitable choice for visualizing the distribution of numeric data. They display how data is distributed across different value ranges, making it easy to identify patterns, central tendencies, and any skewness in the data.



##### 2. What is/are the insight(s) found from the chart?

The insights that can be derived from histograms include:

Central Tendency: You can identify the central value (mean, median) of each variable by looking at the peak of the histogram.

1. Spread: The width of the histogram gives an idea of the spread or variability of the data.             
2. Skewness: The shape of the histogram (symmetrical or skewed) indicates the distribution's skewness (positive or negative).                   
3. Outliers: Extreme values or outliers can be observed as values that fall far from the bulk of the data.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

#Positive business impact
The insights gained from the histograms can be valuable for various business decisions:

Understanding the distribution of sales, customer demographics, or any other relevant data can inform marketing strategies.
Detecting outliers can help identify potential issues in quality control or fraud detection.
Identifying central tendencies can guide pricing or inventory decisions.
Ultimately, the positive business impact depends on the specific context and how these insights are applied.
#Negative Business impact
Histograms themselves don't lead to negative growth; instead, they reveal information that can be used to prevent or address potential negative outcomes. For example:

If the histogram reveals a skewed distribution of customer satisfaction scores, indicating many dissatisfied customers, it can be a warning sign of negative growth. Action can then be taken to improve customer satisfaction and prevent further decline.



#### Chart - 2

In [None]:
# Chart - 2 visualization
#idetify years with a higher frequency of attacks?
# Get unique years from the 'Year' column
year=df['Year'].unique()
# Count the number of occurrences of each year and reset the index to create a DataFrame
year_count=df['Year'].value_counts().reset_index()
# Rename the columns for clarity
year_count=year_count.rename(columns={'index':'year','Year':'value'})
# Create a bar plot to visualize the number of attacks per year
plt.figure(figsize=(12, 8))
sns.barplot(x='year',y='value',data=year_count)    # Create the bar plot
# Add labels and a title to the plot for clarity
plt.xlabel('year')  # X-axis label
plt.ylabel('count') # Y-axis label
plt.title('per year attack')
# Rotate the X-axis labels for better readability
plt.xticks(rotation=90)
plt.show()


##### 1. Why did you pick the specific chart?

A bar chart (specifically a barplot) was chosen as the specific chart because it is effective at displaying the frequency or count of discrete categories, in this case, the number of terrorist attacks per year. Bar charts are particularly suitable when you want to compare and visualize data across different categories or groups.

##### 2. What is/are the insight(s) found from the chart?

The insights that can be derived from the bar chart of the number of terrorist attacks per year include:

Trends Over Time: By looking at the bars, you can identify trends in the number of attacks over the years. For example, you can see whether there are any significant increases or decreases in attacks over time.

Key Years: It's easy to spot years with exceptionally high or low numbers of attacks.
Periodic Patterns: If there are recurring patterns in the data (e.g., spikes in certain years), those patterns become apparent.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

## Positive Business Impact:

The insights gained from this chart can be valuable for various purposes:

Risk Assessment: Organizations, governments, or security agencies can use this information to assess the security situation and allocate resources accordingly.

Policy Decisions: Governments may use this data to formulate policies to counter terrorism or address specific issues in high-impact years.
Resource Allocation: Businesses operating in regions affected by terrorism can use this data for risk assessment and resource allocation.
The positive impact depends on how the insights are applied and the specific goals of the organization or government.

## Negative Business Impact:

The bar chart itself doesn't indicate negative growth, but it can reveal negative trends or alarming patterns. For example:

If the chart shows a consistent upward trend in the number of attacks, it may signal a worsening security situation, potentially leading to negative growth in affected regions or industries.
If specific years have disproportionately high numbers of attacks, it
could indicate negative impacts on local economies and stability.
Negative growth would be a result of the underlying security issues highlighted by the chart rather than the chart itself.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
# Create a cross-tabulation of the 'Year' and 'Region' columns and then plot it as a stacked area plot
# This shows the number of terrorist attacks by region over the years
pd.crosstab(df1.Year, df1.Region).plot(kind='area',stacked=True,figsize=(20,10))
plt.ylabel('No:of Attacks',fontsize=25)  # Y-axis label
plt.xlabel("Years",fontsize=25)          # Y-axis label
plt.title('Terrorist Activities (Region) In Each Year',fontsize=30) # Title of the plot
plt.show()

##### 1. Why did you pick the specific chart?

The specific chart, a stacked area chart, was chosen because it effectively visualizes the temporal trends and distribution of terrorist activities across regions over multiple years, allowing for easy comparison and understanding of regional contributions to the total number of attacks.

##### 2. What is/are the insight(s) found from the chart?

From the provided stacked area chart titled "Terrorist Activities (Region) In Each Year," we can gain the following insights:

1. **Temporal Trends**: The chart illustrates how the total number of terrorist attacks has evolved over the years. It allows us to identify periods of increased or decreased activity.

2. **Regional Contributions**: Each colored segment in the stacked area represents a specific region's contribution to the total number of attacks in a given year. This provides insights into which regions have the highest and lowest levels of terrorist activity.

3. **Changes in Dominant Regions**: Over time, the chart reveals whether certain regions consistently dominate in terms of attacks or if there are shifts in which regions are most active.

4. **Periods of Stability and Instability**: By examining the fluctuations in the stacked areas, it's possible to identify periods of stability and instability in different regions or overall.

5. **Comparison of Regions**: The chart allows for easy visual comparison between regions, making it clear which regions have a significant impact on the overall trend.

Overall, this chart provides a comprehensive view of how terrorist activities are distributed across regions and how they change over time. It aids in understanding historical patterns, identifying regions of concern, and potentially informing security and policy decisions. Specific insights will depend on the data within the "df1" DataFrame and the context of the analysis.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights from the chart may have limited direct impact on businesses but can influence strategic decisions and risk management:

## Positive Business Impact:

Risk Mitigation: Businesses operating in regions with higher attacks can use insights to bolster security measures, potentially reducing the likelihood of security incidents.

Informed Expansion: For businesses considering expansion, understanding regional trends can aid in making informed decisions about entering or avoiding specific markets.

## Negative Business Impact:

Operational Challenges: Regions with consistent attacks may entail higher operational costs due to increased security measures, insurance premiums, and potential disruptions.

Market Instability: Operating in high-risk regions can lead to market instability, potentially hindering growth and long-term profitability.

In essence, while the insights can support risk management, they also pose challenges for businesses in terms of costs, stability, and reputation. Balancing opportunities with risks is essential for making informed decisions and potentially mitigating negative growth.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
# Count the occurrences of each country in the 'Country' column,
# then reset the index and take the top 10 most frequent countries
country_count=df1[['Country']].value_counts().reset_index().head(10)
country_count
country_count =country_count.rename(columns={0: 'Values'})
country_count
plt.figure(figsize=(10, 6))
sns.barplot(x='Country', y='Values', data=country_count)
plt.xlabel('Country') # X-axis label
plt.ylabel('Values') # y-axis label
plt.title('Most Attack Country')
plt.xticks(rotation=50)
plt.show()

##### 1. Why did you pick the specific chart?

I chose a bar chart to visualize the top 10 most frequent countries in the 'Country' column because it is an effective way to represent the frequency of occurrences for each country in a clear and straightforward manner.

##### 2. What is/are the insight(s) found from the chart?

Insights from the chart we can under stand following.

Most Frequent Countries: The chart shows the top 10 countries with the highest number of recorded terrorist attacks. This provides a quick overview of which countries are most affected by terrorism according to the dataset.

Comparison: You can easily compare the frequency of attacks in these top 10 countries, identifying the countries with the highest and lowest occurrences.

Distribution: It highlights the distribution of attacks across these countries, emphasizing which ones are significantly more affected than others.

Outliers: The chart can also help identify any outliers or countries with exceptionally high numbers of attacks, which may require special attention in terms of risk assessment and mitigation.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights from this chart are unlikely to directly create a positive business impact as they pertain to terrorist activities. However, they can be valuable for security and risk assessment, which indirectly impacts businesses by helping them mitigate potential risks and ensure the safety of their operations and assets. Negative growth can occur if businesses operate in regions with high terrorism-related risks, which may lead to increased costs, operational disruptions, and reputational damage.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
iraq_data = df1[df1['Country'] == 'Iraq']

#Group the data by terrorist group and count the incidents
group_counts = iraq_data['Group'].value_counts()

# Find the group with the highest activity
most_active_group = group_counts.idxmax()
print(f"The most active terrorist group in Iraq is {most_active_group} with {group_counts.max()} incidents.")

# Create a DataFrame for the most active group in Iraq
active_group_in_country = iraq_data[iraq_data['Group'] == most_active_group]

#  Create a scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(active_group_in_country['longitude'], active_group_in_country['latitude'], s=20, alpha=0.5)
plt.xlabel('Longitude') # X-axis label
plt.ylabel('Latitude')  # y-axis label
plt.title(f'Scatter Plot of Incidents for {most_active_group} in Iraq')
plt.grid(True)

plt.show()

print(f"The most active terrorist group is: {most_active_group}")
print(f"The country with the highest activity for this group is: Iraq")


In [None]:
iraq_data = df1[df1['Country'] == 'Iraq']

# Filter out incidents with "unknown" group
iraq_data = iraq_data[iraq_data['Group'] != 'Unknown']

# Group the data by terrorist group and count the incidents
group_counts = iraq_data['Group'].value_counts()

# Find the group with the highest activity
most_active_group = group_counts.idxmax()
print(f"The most active terrorist group in Iraq is {most_active_group} with {group_counts.max()} incidents.")

# Create a DataFrame for the most active group in Iraq
active_group_in_country = iraq_data[iraq_data['Group'] == most_active_group]

# Create a scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(active_group_in_country['longitude'], active_group_in_country['latitude'], s=20, alpha=0.5)
plt.xlabel('Longitude')  # X-axis label
plt.ylabel('Latitude')   # Y-axis label
plt.title(f'Scatter Plot of Incidents for {most_active_group} in Iraq')
plt.grid(True)

plt.show()

print(f"The most active terrorist group is: {most_active_group}")
print(f"The country with the highest activity for this group is: Iraq")

##### 1. Why did you pick the specific chart?

I chose a scatter plot to visualize the geographic distribution of terrorist incidents attributed to the most active terrorist group in Iraq. Here's why I selected this specific chart:

Geographic Insights: A scatter plot is a suitable choice when you want to visualize the geographic distribution of data points, in this case, the locations of terrorist incidents. It helps us understand where these incidents are concentrated.

##### 2. What is/are the insight(s) found from the chart?

Insights from the chart:

Geographic Concentration: The scatter plot shows the locations (latitude and longitude) of terrorist incidents attributed to the most active terrorist group in Iraq. It allows us to see if these incidents are concentrated in specific regions or dispersed across the country.

Hotspots: By observing clusters or patterns in the plot, you can identify potential hotspots of terrorist activity. These insights can be valuable for security assessments and decision-making.

ISIL done most number of Attacks in Iraq

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights might not necessarily lead to a positive business impact but can be crucial for security and risk management:

## Positive Impact:

Businesses operating in Iraq or nearby regions can use this information to assess security risks and implement appropriate security measures to protect their personnel and assets.

## Negative Growth:

The presence of a highly active terrorist group in a region can lead to negative growth due to increased security costs, potential disruptions to operations, and reputational risks.

It's crucial for businesses to be aware of such security challenges and have risk mitigation strategies in place to address potential negative impacts.

In summary, while the insights from the scatter plot may not directly result in positive business growth, they are essential for risk assessment and security planning, which are critical for maintaining stability and minimizing negative impacts in regions affected by terrorism.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
# Identifying the Most Active Terrorist Groups in the World
##df1['Group'].value_counts() counts the occurrences of each unique terrorist group.
most_active_group=df1['Group'].value_counts().reset_index().head(11)
most_active_group=most_active_group.rename(columns={'index': 'Group','Group':'values'})
#Droping the first row of the dataframe most_active_group
#most_active_group.drop(0) this code wiil drop the first row from the dataframe
most_active_group= most_active_group.drop(0)
most_active_group.reset_index(drop=True,inplace=True)
#idetify the most active group
plt.figure(figsize=(12,6))
sns.barplot(x='Group',y='values',data=most_active_group)
plt.xlabel('Terrorist Group')
plt.ylabel('Number of Attack Attempt')
plt.xticks(rotation=90)

plt.title('Most active Group')

##### 1. Why did you pick the specific chart?

A bar chart is a suitable choice for visualizing the number of attack attempts by different terrorist groups because it allows for easy comparison between groups. Each bar represents a terrorist group, and the height of the bar corresponds to the number of attack attempts, making it straightforward to identify which groups are the most active.Answer Here.

##### 2. What is/are the insight(s) found from the chart?

he chart provides insights into the most active terrorist groups in the world based on the number of attack attempts. By examining the chart, you can quickly identify the top 10 most active groups, as they are represented by the highest bars.

Taliban: With 7,478 recorded attack attempts, the Taliban is the most active terrorist group during the analyzed period Islamic State of Iraq and the Levant (ISIL): ISIL ranks second in activity, with 5,613 recorded attack attempts. Shining Path (SL): The Shining Path group is among the top three most active groups, with 4,555 recorded attack attempts

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

## Positive Business Impact:

However, for most businesses, this specific information may not directly lead to positive business impact but could be valuable for broader geopolitical analysis and risk assessment.

## Neagative Business Impact:

While the chart provides insights into the most active terrorist groups, it doesn't directly indicate negative

#### Chart - 7

In [None]:
# Chart - 7 visualization code
#which group has killed the most people in attack
df2=df1.groupby(['Group'])[['kill','Wound']].sum().sort_values(by='kill',ascending=False).reset_index().head(10)
df2 = df2.drop(0)
plt.figure(figsize=(15,10))
sns.barplot(x='Group',y='kill',data=df2)
plt.xlabel('Group name')
plt.ylabel('Kill')
plt.title('Group which killed hieghest people')
plt.xticks(rotation=75)
plt.show()

##### 1. Why did you pick the specific chart?

I chose a bar chart to visualize the terrorist groups responsible for the highest number of fatalities in attacks. Here's why I selected this specific chart:

   1.
Comparison: A bar chart is effective for comparing the number of fatalities caused by different terrorist groups. It allows for a clear visual comparison between groups.

##### 2. What is/are the insight(s) found from the chart?

Insights from the chart:

ISIL is the most dangerous terrorist group with killing highest number of people Tailban and Boko-Hram on rank secound and Third in our chart if this group give threats then counter terrorism agency need to take it seriously

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights might not necessarily lead to a positive business impact, but they are crucial for risk assessment and security planning:

## Positive Impact:
Businesses operating in regions affected by these high-fatality terrorist groups can use this information to better understand the security landscape and develop strategies to mitigate risks.

## Negative Growth:

The presence of such deadly terrorist groups can lead to negative growth due to increased security costs, potential disruptions to operations, and reputational risks.
Failure to address the threat posed by these groups could result in severe consequences, including harm to personnel and damage to assets.
In summary, the insights from the bar chart are essential for understanding the impact of terrorist groups and are critical for risk assessment and security planning. Businesses operating in regions where these groups are active should use this information to inform their security strategies and risk management efforts.



#### Chart - 8

In [None]:
# Chart - 8 visualization code
#identify the most common used weapon in Terrorism
#autopct='%1.1f%%' it shows the percentage in pie chart
df1.value_counts('Weapon').head().plot(kind='pie',figsize=(10,6),autopct='%1.1f%%')

##### 1. Why did you pick the specific chart?

1.Clear Comparison: A pie chart provides a clear and concise way to compare the proportions of different categories (in this case, weapon types). It allows viewers to easily see the relative distribution of each category at a glance.

2.Percentage Representation: Pie charts represent data as percentages of the whole, making it straightforward to understand the proportion of each weapon type in relation to the total number of incidents.

##### 2. What is/are the insight(s) found from the chart?

Insights:

1. Dominance of Explosives: The most striking insight is that explosives are the most commonly used weapon type around 51% in terrorist incidents

2. Significant Use of Firearms: Firearms are the second most frequently used weapon category around 32% although they are notably smaller in proportion compared to explosives.

3. Use of Unknown Weapons: The presence of an "Unknown" category highlights the challenge of identifying the specific weapon type in some incidents. This emphasizes the importance of improving data accuracy and intelligence in counter-terrorism efforts.

4. Incendiary Devices and Melee Weapons: Incendiary devices and melee weapons represent smaller but still significant portions of the chart. This indicates that these weapon types are used in a notable number of incidents.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

## Positive Business Impcts:

1. Security Service Providers: Companies that offer security services, including those specializing in explosives detection, firearm security, and threat assessment, may see increased demand for their expertise and solutions.
Intelligence Gathering: Counter-terrorism departments must gather intelligence on terrorist organizations, their tactics, and their access to different weapon types. This information helps in understanding potential threats and planning counter-measures.
2. Explosives Detection: Invest in advanced explosives detection technology and equipment to identify and neutralize explosive threats. This includes bomb-sniffing dogs, X-ray scanners, and chemical detectors. I4. ncendiary Device Detection: Develop capabilities for detecting and neutralizing incendiary devices, particularly in public places where fires can cause significant damage.

## Negative Impact:
1. Disruption of Operations: Acts of terrorism involving weapons, such as bombings or shootings, can disrupt business operations. This disruption may result in physical damage to facilities, loss of productivity, and supply chain interruptions.
2.Tourism Downturn: Regions affected by violent conflicts or terrorism often experience a decline in tourism. This can negatively impact businesses in the tourism and hospitality sectors, including hotels, restaurants, and entertainment venues.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
# What is the most common target type,and percentage
df1.value_counts('Targettype').head().plot(kind='pie',figsize=(10,8),autopct='%1.1f%%')

##### 1. Why did you pick the specific chart?

I chose a pie chart to visualize the distribution of the most common target types because it's a suitable choice for representing the composition of a categorical variable (in this case, different target types) and showing the percentage distribution of each category.

##### 2. What is/are the insight(s) found from the chart?

Insights:

1. Private Citizens & Property Predominance: The most significant insight is that attacks on private citizens and property constitute the largest share around 31% of terrorist incidents. This suggests that terrorists often target civilians and civilian infrastructure.

2. Military and Police Attacks: Military and police targets also account for a substantial portion of terrorist attacks. This highlights the vulnerability of security forces and their involvement in counter-terrorism efforts.

3. Government and Business Targets: Government (General) and Business targets follow closely in terms of attack frequency.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can help inform decision-making and risk assessment:

## Positive Impact:
1. Businesses, government agencies, and organizations can use this information to assess the risk to specific target types and allocate resources for security and risk management accordingly.

2. For businesses operating in regions with a high incidence of attacks on specific target types, these insights can inform security measures and contingency plans.

## Negative Growth:

1. Depending on the nature of the business, attacks on certain target types may pose higher risks. For example, if a business operates in a sector frequently targeted by terrorists, it may experience negative growth due to security concerns, increased costs, and potential disruptions.

In summary, the insights gained from the pie chart can be valuable for assessing the prevalence of different target types in terrorist attacks. These insights can be used to inform security and risk management strategies, potentially mitigating negative impacts on businesses and organizations operating in regions with specific target type vulnerabilities.



#### Chart - 10

In [None]:
# Chart - 10 visualization
num_attack_df1 = df1.groupby('Year').size()
num_attack_df1.name = "number of attacks"
num_attack_df1.head()
#We group terrorist attacks by year and the number of persons killed
terroristtrends = df1.groupby('Year').agg({'kill':'sum','Wound':'sum'})
terroristtrends = pd.concat([terroristtrends,num_attack_df1],axis=1)
terroristtrends.head()
# Let's create a new column composed by dead and wounded persons by year named victims
terroristtrends['victims']=terroristtrends['kill']+terroristtrends['Wound']
fig = px.line(terroristtrends,x=terroristtrends.index, y='victims', title='Terrorist attacks trends',template='plotly_dark')
fig.data[0].name="number of victims"
fig.update_traces(showlegend=True)
fig.add_scatter(x = terroristtrends.index, y = terroristtrends['number of attacks'], mode ='lines',name='number of attacks')


fig.update_layout(xaxis_title='Year',yaxis_title='Terrorism Trends')
fig.show()



##### 1. Why did you pick the specific chart?

The specific chart chosen is a line chart that visualizes the trends in terrorist attacks and the number of victims (combined fatalities and wounded individuals) from 1970 to 2017.

##### 2. What is/are the insight(s) found from the chart?

Insights

1. Overall Increase in Attacks: The chart reveals that the total number of terrorist attacks has generally increased over the years, particularly from the mid-2000s onwards. This suggests a rising trend in global terrorism incidents.

2. Fluctuations in Victims: The line representing the number of victims (combining those killed and wounded) from 1970 to 2017 most number of victim of Terrorism are bw the years 2012 to 2017 ,the years 2014 experienced more lethal attacks.
3. Correlation between Attacks and Victims: There appears to be a correlation between the number of attacks and the number of victims. When the number of attacks increases, the number of victims also tends to rise.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can be valuable for various purposes, including security, risk assessment, and policy development:

## Positive Impact:
1. Businesses and organizations operating in regions or industries prone to terrorism can use this information to assess the security risks they face and develop strategies to mitigate these risks.

2. Government agencies and security organizations can use this data to allocate resources effectively and enhance counterterrorism efforts.

## Negative Growth:

1. The positive correlation between the number of attacks and the number of victims implies that an increase in terrorist attacks often leads to a higher number of casualties. This could have negative implications for businesses and regions experiencing such spikes.

2. Negative growth may occur if a business operates in an area with a significant increase in terrorist attacks, as it may lead to higher security costs, operational disruptions, and potential harm to personnel and assets.

In summary, the insights gained from the chart provide a historical perspective on terrorism trends. While they are crucial for understanding the security landscape, they also highlight the potential risks and challenges that businesses and organizations may face in regions affected by terrorism.

#### Chart - 11 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(15,10))
#this show how much related one perameter to the other in the data set
sns.heatmap(np.round(df1.corr(),2),annot=True, cmap='BuPu')

##### 1. Why did you pick the specific chart?

1. Correlation Exploration: A correlation heatmap is specifically designed to display the correlation coefficients between pairs of variables in a dataset.
2. Numeric Data: Heatmaps are most suitable for datasets with numeric data, in a dataset like the Global Terrorism Database (GTD), where many numeric attributes are present.
3. Annotated Values: The inclusion of annotated values within each cell of the heatmap provides precise correlation coefficients

##### 2. What is/are the insight(s) found from the chart?

Insights:

1. Geographical Trends: The "latitude" and "longitude" variables show negative correlations with other variables, indicating that specific geographic coordinates may not be strongly correlated with attack-related factors.
2. Fatalities and Injuries: The "kill" variable is positively correlated with the "Wound" (number of injuries) variable, indicating that when number of kill and increased then number of injured people increases
3. Terrorist Kill and civilian kill- The kill variabel is positively correlated with the 'nkillter'(Terrorist killed) column which show that the when number of civilian killing increases then number of Terrorist kill also incrases

#### Chart - 12 - Pair Plot

In [None]:
# Pair Plot visualization code
# Select the columns of interest for the pair plot
columns_of_interest = [ 'kill','Wound','nkillter',]
#Create a subset DataFrame with the selected columns
subset_df = df1[columns_of_interest]

# Create a pair plot using seaborn
sns.pairplot(subset_df)
plt.show()


##### 1. Why did you pick the specific chart?

1. Pair plots are primarily used to gain insights into relationships between variables. They provide a visual representation of pairwise interactions, making it easier to identify patterns and correlations
2. Multivariate Analysis: Pair plots are particularly useful when dealing with multiple variables (columns) in a dataset. it allow us to visualize how each variable relates to every other variable, providing a comprehensive view of data relationships.

##### 2. What is/are the insight(s) found from the chart?

Insights.

1. nkillter and kill-There is a digonally upward trend in these columnn which means the nkillter(Terrorist killed) and kill column are positivly correlated
2. wound and kill - there is digonally little upword trend in these column which means the wound (injured) and the kill column has positivly correleted







## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1. Geospatial Intelligence: Terrorism significantly increased in the Middle East and North Africa and South Asia regions from 2014 to 2017

Solution: Invest in geospatial intelligence to monitor and analyze hotspots of terrorist activity in these regions. This will enable proactive responses and resource allocation.
2. Targeted Threat Assessment: Private Citizens & Property are the most frequently targeted.

Solution: Prioritize threat assessments for civilian areas, critical infrastructure, and public spaces to enhance security measures and protect civilians.
3. Focus on Active Groups:Identify and monitor the most active terrorist groups, such as Taliban, ISIL, and Boko Haram.

Solution: Allocate intelligence resources to track and disrupt the operations of these groups, dismantling their networks and preventing attacks.
4. Weapon and Attack Type Analysis:Explosives and firearms are the most commonly used weapons in attacks.

Solution: Enhance border security and implement stricter controls on firearms and explosives to reduce the availability of these weapons to terrorists.
5. Cross-Agency Collaboration:

Solution: Encourage collaboration and information sharing between national and international counter-terrorism agencies, promoting a coordinated response to global threats.
6. Public Awareness:

Solution: Educate the public about recognizing signs of radicalization and reporting suspicious activities to law enforcement.

# **Conclusion**

1). We can see that terrorist activity has increased very rapidly after 2010.

2). we can also see that middle east and north africa region is most effected by terror activity

3).IN south Ashian region Iraq suffered a lot.

4).most terrorist group was unknown means no one takes the responsibility

5). Isil has killed the most people in the world

6). most common attack metthod to killl people was explossion a bomb.

7).More number of terrorist activity happend on private citizens andproperty to damage more.

8). In 2014 number of terrorist Terrorist activity was highest in 2014, hence casualty is also highest.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***