# Global Terrorism

### As a security/defence analyst try to find out the hot zones of terrorism.

### 1. Importing and cleaning data

#### 1.1 Importing libraries

In [None]:
# Importing necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Importing data
data = pd.read_csv('globalterrorismdb_0718dist.csv')

In [None]:
# To display all the rows and columns setiing display option
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

In [None]:
data.head()

In [None]:
# Checking shape of the data
data.shape

###### - We can see the data has 181691 rows and 135 columns.

#### 1.2 Cleaning the data.

In [None]:
# Checking for null values
data.isnull().sum()

In [None]:
# Finding out the null value percentage and the uniques values in a column
def missing_values(data):
    data_null= pd.DataFrame()
    for col in data.columns:
        data_null.loc[col, 'Unique']=len(data[col].unique())
        data_null.loc[col, '% of null values']=round(data[col].isnull().sum()*100/len(data),2)
        data_null.loc[col, 'dtype']=data[col].dtype
        
    return data_null.sort_values(ascending= False, by= '% of null values')

In [None]:
missing_values(data)

###### -It can be observed that most of the variables have >80% missing values. We will drop these variables as they are not going to be helpful in analysis.

In [None]:
# Renaming columns for better understanding and analysis
data.rename(columns={'iyear':'Year', 'imonth':'Month','iday':'day','provstate':'Province','attacktype1':'Attack_type',
                     'attacktype1_txt':'Attacktype_text','targtype1':'Target_type','targtype1_txt':'Targetype_text',
                     'targsubtype1':'Target_subtype','targsubtype1_txt':'Targetsubtype_text','corp1':'Corporation',
                     'target1':'Target','natlty1_txt':'Nationality','gname':'Gang','weaptype1_txt':'Weapon_used',
                     'weapsubtype1_txt':'Weapon_subtype','nkill':'Killed','nwound':'Wounded','ishostkid':'Hostages/Kidnappings'
                     },inplace= True)

In [None]:
data.describe()

In [None]:
# Looking for unique values in various columns.
for features in data:
    print(features ,':', len(data[features].unique()))

In [None]:
# Dropping columns with > 80% null values.
data= data.loc[:,data.isnull().sum()/len(data)<0.20] #Extracting data with less than 20% value.

In [None]:
data.head()

In [None]:
data.shape

###### As we can see that almost 65% of the columns are dropped because they had null values >80.

In [None]:
# Dropping these columns as these wont be used in analysis.
data=data.drop(columns=['eventid','dbsource','specificity','guncertain1','INT_LOG','INT_IDEO','INT_MISC','INT_ANY'])

In [None]:
data.head()

In [None]:
# Converting date columns into date time format
data['Date'] = pd.to_datetime(data[['Year','Month','day']], errors = 'coerce')
data.head()

In [None]:
# Making this date related columns more specific as we are going to used it further.
data['week'] = data['Date'].dt.isocalendar().week
data['day_name'] = data['Date'].dt.day_name()
data['weekday'] = data['Date'].dt.weekday
data['is_weekend'] = np.where(((data['Date']).dt.dayofweek) < 5,0,1)
data.dropna(subset = ['Date','latitude','longitude'], inplace = True)
data.reset_index(inplace = True, drop = True)
data['casualities'] = data['Killed']+data['Wounded']

In [None]:
data.head()

In [None]:
# checking for null values again.
data.isnull().sum()

In [None]:
# Imputing the missing values

def Nan_imputation(df, feature):
    if(df[feature].dtype== 'int') or (df[feature].dtype=='float'):
        df[feature + 'med'] = df[feature].fillna(data[feature].median())
        df.drop(feature, axis=1)
    
    elif(df[feature.dtype=='categorical']) or (df[feature].dtype=='bool'):
        df[feature + 'mod'] = df[feature].fillna(data[feature].mode()[0])
        df.drop(feature, axis=1)

In [None]:
# checking the dtypes again
data.dtypes

### 2. Analysing and visualizing the data

In [None]:
print ('Country with highest terrorist attacks:' ,data['country_txt'].value_counts().index[0])
print ('Region with highest terrorist attacks:' ,data['region_txt'].value_counts().index[0])
print ('Maximum terrorist activities happened in year-', data['Year'].value_counts().index[0])
print ('Maximum deaths that happened in a terrorist activities are:', data['Killed'].max())

In [None]:
attacks = data.country_txt.value_counts()[:25].to_frame() #Top 15 values.
attacks.columns = ['Attacks']
kills = data.groupby(['country_txt'])['Killed'].sum().sort_values(ascending= False).to_frame()
attacks.merge(kills, how='left', left_index=True, right_index=True).plot.bar(color= sns.color_palette('copper',2))
fig=plt.gcf()
fig.set_size_inches(18,6)
plt.ylabel('Count')
plt.xlabel('Country')
plt.show()

#### Conclusion-
- Iraq has seen most number of terrorist acts and also has most number of casualities.
- Afghanistan has seen the 2nd most number of casulaities.
- It can be observed that these terrorists mostly focus on the densely populated areas.

In [None]:
plt.subplots(figsize=(15,6))
sns.countplot('Year', data=data, palette = 'afmhot_r', edgecolor= sns.color_palette('dark',5))
plt.title('Number of Terrorist activities every year')
plt.xticks(rotation=90)
plt.show()

#### Conclusion- 
- The act of terrorism exponentially incread after the year 2004.
- Most terrorist activities happened in year 2014.

In [None]:
terror_regions = pd.crosstab(data.Year, data.region_txt)
terror_regions.plot()
fig=plt.gcf()
fig.set_size_inches(25,10)
plt.legend(loc="upper left")
plt.show()

#### Conclusion-
- Middle east region has seen most number of terrorist attacks till now in 21st century.
- South Asia region is not far behind and is 2nd most affected region on the world map.
- Mostly the region behind these attacks can be OIL in middle east and POVERTY in south aisa.

In [None]:
plt.subplots(figsize=(15,6))
sns.countplot('Month', data=data, palette= 'Paired',edgecolor='Black')
plt.title('Number of terrorist activities each month')
plt.show()

#### Couclusion-
- The world has seen most terror acts in May.
- The difference in number of attacks is not very significant in any month. 

In [None]:
plt.subplots(figsize=(15,6))
sns.countplot('day', data=data,palette = 'Paired', edgecolor= 'Black')
plt.title("Number of terrorist attacks for each day")
plt.show()

#### Conclusion-
- These terrorists have no concept of holidays or breaks. 
- Not a single day has passed by when a terrorist attack is below the count of 5500 except 31st of any month (Because 31st doesnt come every month)

In [None]:
plt.subplots(figsize=(15,6))
sns.countplot('region_txt', data= data, palette= "copper_r", order= data['region_txt'].value_counts().index)
plt.xticks(rotation=90)
plt.xlabel('Regions')
plt.title('Number of terrorist attacks by region')
plt.show()

#### Conclusion- 
- Middle eastern regions and South Asia has seen most number of terrorist activities mostly because of the OIL and POVERTY.
- Also most of the countries in these regions are politically unrest countries. 
- Some of these countries has even seen military coups.

In [None]:
plt.subplots(figsize=(15,6))
sns.countplot('Targetype_text', data=data,palette= sns.color_palette('copper_r'),edgecolor = 'Black', order = data["Targetype_text"].value_counts().index)
plt.xticks(rotation=90)
plt.title('Terrorist targets')
plt.xlabel('Count of terrorist target types')
plt.show()

#### Conclusion- 
- Terrorists mostly target the citizens of the country.
- After citizens of the nation, miltary personnel are targetted the most due to cross border activities.

In [None]:

data['success']=data['success'].replace({1:'Successful Operation', 0:'Unsuccessful Operation'})
data['success'].value_counts().plot(kind='pie', autopct="%1.1f%%", cmap="Pastel2")
plt.title('Successful vs Unseccessful Operations')
plt.show()

#### Conclusion-
- The terrorists have conducted almost 90% successful operations which shows the failure and ignorance of most of the inland intelligence and security.

In [None]:
pd.crosstab(data.Weapon_used, data.Attacktype_text).plot.barh(stacked= True, width=1,color=sns.color_palette('RdYlBu_r',10))
fig= plt.gcf()
plt.ylabel('Weapon Type')
plt.xlabel('Number of attacks')
fig.set_size_inches(12,8)

#### Conclusion-
- Explosives are the favourite things used by the terrorists to create terror. 

In [None]:
pd.crosstab(data.region_txt, data.Attacktype_text).plot.barh(stacked=True, width=1, color=sns.color_palette('RdYlBu_r',8))
fig=plt.gcf()
fig.set_size_inches(12,8)
plt.ylabel('Region')
plt.xlabel('Number of attacks')
plt.title('Most affected regions by terrorism')
plt.show()

#### Conclusion- 
- As we already know Middle east is the favourite region of terrorists. Most of the terror activities are done by using explosives.
- Not only in middle east, every other region has seen maximum terror activities as a result of explosions and bombing.

In [None]:
# Plotting geo_coordinates using Plotly Mapbox country wise
# if there is no attack happened in any country, it would not be in this data
import random
import plotly.graph_objects as go
mapbox_access_token = "pk.eyJ1IjoibWF0c3VqanUiLCJhIjoiY2tmcXFiczFiMGRpdzMybzBxZmxtaTVxbiJ9.0zdao0fZdKyGb7CO8dPAVg"
def geo_coordinate(data , country = None , color =None):
    if country is not None:
        data = data[data['country_txt'] == country]
        random.seed(210)
        zoom = 5
    else:
        data = data
        country = 'Whole World'
        zoom = 2
    
    fig = go.Figure()
    new_customdatadf = np.stack(  # stacking of columns along last axis
    (
        
        data["city"],
        data["Year"],
        data["Province"],
        data['country_txt'],
        data['Gang'],
        data['Attack_type']
    ),
    axis=-1,
    )
    fig.add_traces(
        go.Scattermapbox(
            lon=data["longitude"],
            lat=data["latitude"],
            mode="markers",
            marker=dict(size=10, allowoverlap=False, opacity=0.7, color=color),
            # text=df_sub["casualities_median"],
            customdata=new_customdatadf,  # we have to first stack the columns along the last axis
            hovertemplate="""<extra></extra>lat: %{lat}<br>long: %{lon}<br>casualities: %{customdata[0]}
            <br>city: %{customdata[1]}<br>State: %{customdata[3]}<br>Country: %{customdata[4]}<br>Group 
            taken responsibiltiy: %{customdata[5]}<br>Attack_type: %{customdata[6]}<br>attack happened in:
            %{customdata[2]}""",
        ),
    )

    fig.update_layout(
        title=dict(
            text=f"<b>Satellite Overview of {country}</b>",
            font=dict(family="Cabin Sketch", size=20, color="black",),
            xanchor="left",
            xref="container",
        ),
        uirevision="foo",
        hovermode="closest",
        hoverdistance=2,
        mapbox=dict(
            accesstoken=mapbox_access_token,
            style="dark",
            center=dict(
                lat=random.choice(data["latitude"].tolist()),
                lon=random.choice(data["longitude"].tolist()),
            ),
            zoom=zoom,
        ),
        annotations = [dict(showarrow=False,
        text='(Zoom In/Zoom Out to see all points properly)',
        xanchor='right',
        x=1,
        yanchor='top',
        y=1.1
                )]
    )
    return fig

In [None]:
geo_coordinate(data , color = 'Crimson' , country = 'Iraq')


#### As Iraq has seen most number of terror activities lets try to see what it actually looks like on a map.
#### Conclusion-
- As we can see in the map above that most of the terror activities has happened near the capital of Iraq because the terrorist also know if the capital is at unrest that means the country is at the unrest.
- After Baghdad a lot of terror activites has happened in neighbouring states of Baghdad which are densely populated.

In [None]:
geo_coordinate(data , color = 'Crimson' , country = 'Pakistan')


#### COnclusion-
- The main group responsible for attacks in Pakistan is "Tehrik-i-Taliban Pakistan" (Taliban group for this region) but as per current scenerio "Baloch Republican Army" is increasing their attacks.
- Karachi seems like their favourite place and it means it is densly populated city of Pakistan.

In [None]:
geo_coordinate(data , color = 'Crimson' , country = 'India')


#### Conclusion-
- India has also seen a fair amount of terrorism even after being a peace loving country.
- Most of the activities has happened near the border areas of Pakistan and Bangladesh. 
- We can also see a lot of red dots in the coastal areas because terrorist have also entered the country using the sea.

In [None]:
geo_coordinate(data , color = 'Crimson'  )


#### Conclusion-
- Almost every country has seen a fair amount of terror but Middle east and South Asia are the favourite hot spots of terrorists.


### Summary

#### This word terrorism is not new to anyone of us but we should also keep in mind that terrorists and their gangs/groups have their own agendas and they are formed to fulfill those agendas.
#### For eg.- Terrorist groups like Taliban in Afghanistan was formed when Russia invaded Afghanistan for capturing it and US didn't want direct involvemnet in the fight. But US also didnt want to be a spectatre so they started making a group of local afghans and started giving them the training to fight Russian troops. After Russia left the country the Americans set their foot in Afghanistan whhich was not liked by Taliban. So they started giving hard time to Americans as well.
#### Therefore, we can say that most of the terrorist activities are a result of global politics and difference in ideologies.
