## After the Exploration of terrorism data in EDA.ipynb now we analyze and present

### Import libraries

In [2]:
import os
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Main Objective

Analyze the data and draw conclusions on the distribution and nature of terrorist incidents recorded around the world. In your analysis, include maps that visualize the location of different incidents

# Questions & Answers:

- How has the number of terrorist activities changed over the years? Are there certain regions where this trend is different from the global averages?
- Is the number of incidents and the number of casualties correlated? Can you spot any irregularities or outliers?
- What are the most common methods of attacks? Does it differ in various regions or in time?
- Plot the locations of attacks on a map to visualize their regional spread;

# Warning
    This File contains all the information analyzed with educational propouse and non official conclusions 

# Q1 
## (How has the number of terrorist activities changed over the years? Are there certain regions where this trend is different from the global averages?)

## Initialize clean dataset

In [3]:
base_path = "../datasets/datasets_processed/datasets_for_analysis" # dataset path
dfs = [] # dataframe list
for root, dirs, files in os.walk(base_path): # append dataframes in dfs list
    for file in files:
        if file.endswith('.csv'):
            file_path = os.path.join(root, file)
            df = pd.read_csv(file_path, index_col=0)
            dfs.append(df)
df_total = pd.concat(dfs, ignore_index=True) # concat dataframes

## Create Viz and see the terrorist activities changes over the years

In [4]:

country_stats = df_total.groupby(['country_txt', 'iyear']).agg(
    num_attacks=('eventid', 'count'),
    latitude=('latitude', 'mean'),
    longitude=('longitude', 'mean')
).reset_index()

country_stats['iyear'] = country_stats['iyear'].astype(int)
country_stats = country_stats.sort_values('iyear')

fig = px.scatter_mapbox(country_stats,
                        lat="latitude",
                        lon="longitude",
                        size="num_attacks",
                        color="num_attacks",
                        hover_name="country_txt",
                        hover_data={"num_attacks": True, "iyear": True, "latitude": False, "longitude": False},
                        zoom=1,
                        height=800,
                        title="Number of attacks per country",
                        color_continuous_scale=px.colors.sequential.Plasma,
                        size_max=40,
                        animation_frame='iyear',
                        animation_group='country_txt')

fig.update_layout(mapbox_style="carto-positron")
fig.update_layout(margin={"r":0,"t":40,"l":0,"b":0})
fig.update_layout(coloraxis_colorbar=dict(title="N. attemps"))
fig.update_traces(marker=dict(opacity=0.7))

fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 800
fig.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 600
fig.layout.sliders[0].currentvalue.prefix = "Year: "

fig.show()

## Periods

**Between 1970-1980:**
- Europe is the most llamative continent in this decade and the mayority of attempts where in this continent.
- in the late 1970s there's an increment of terrorist attemps in colombia and central america 

**In the 80s:**
- central america, Colombia, Peru and Chile is where more terrorist attemps occured in the decade followed by india, sri lanka, philipines and south africa

**In the 90s:**
- in the beggining of decade Colombia and Peru were the countries with the most terrorist attacks.
- in the middle of decade Pakistan, India and Sri lanka have the most terrorist attacks followed by Colombia, Algeria and Europe
- the late 1990s saw the fewest cases of terrorist attacks since 

**2000-2017**
- the terrorist attacks cases are low until 2004
- in the 2004-2017 iraq become the country with most terrorist attacks in the world followed by Pakistan, India and Afghanistan and the African continent


## **We can conclude that:**
- 1970-1976 and 2000-2003 are the periods with the lowest cases of terrorist attacks compared with others decades
- exponential growth of terrorist attacks in general
- Regions like south america the terrorist attacks were incremented in the late 70s until de early 2000s
- from 2000-2017 Asia is the continent with most terrorist attacks 

# Q2
## (Is the number of incidents and the number of casualties correlated? Can you spot any irregularities or outliers?)

### Drop rows where the casualties are unknow (NaN)

In [5]:
df_nonan = df_total.dropna(subset=['nkill', 'nwound'])
df_nonan

Unnamed: 0,eventid,iyear,imonth,iday,extended,country_txt,region_txt,provstate,city,latitude,...,gname,guncertain1,individual,nkill,nkillus,nkillter,nwound,nwoundus,nwoundte,ishostkid
0,199001000001,1990,1,0,0,Lebanon,Middle East & North Africa,Beirut,Beirut,33.888523,...,Unknown,0.0,0,0.0,,,0.0,,,1.0
1,199001010001,1990,1,1,0,India,South Asia,Jammu and Kashmir,Srinagar,34.083740,...,Unknown,0.0,0,0.0,,,0.0,,,0.0
2,199001010002,1990,1,1,0,India,South Asia,Jammu and Kashmir,Srinagar,34.083740,...,Unknown,0.0,0,0.0,,,0.0,,,0.0
3,199001010003,1990,1,1,0,India,South Asia,Jammu and Kashmir,Srinagar,34.083740,...,Unknown,0.0,0,0.0,,,0.0,,,0.0
4,199001010004,1990,1,1,0,Bolivia,South America,Cochabamba,Cochabamba,-17.382789,...,Alejo Calatayu,0.0,0,0.0,,,0.0,,,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
168583,197812290004,1978,12,29,0,United States,North America,New York,New York City,40.697132,...,Omega-7,0.0,0,0.0,,,0.0,,,0.0
168585,197812290007,1978,12,29,0,El Salvador,Central America & Caribbean,La Paz,Zacatecoluca,13.500000,...,Armed Forces of National Resistance (FARN),0.0,0,0.0,,,0.0,,,1.0
168586,197812290008,1978,12,29,0,El Salvador,Central America & Caribbean,San Vicente,San Vicente,13.641210,...,Armed Forces of National Resistance (FARN),0.0,0,0.0,,,0.0,,,1.0
168587,197812300001,1978,12,30,0,Namibia,Sub-Saharan Africa,Erongo,Swakopmund,-22.684698,...,South-West Africa People's Organization (SWAPO),0.0,0,0.0,,,60.0,,,0.0


## group total of deaths and wounds and order by year

In [22]:
df_casualties_country = df_nonan.groupby(['country_txt', 'iyear']).agg(
        num_deaths=('nkill', 'sum'),
        num_injuries=('nwound', 'sum'),
        latitude=('latitude', 'mean'),
        num_attemps=('eventid', 'count'),
        longitude=('longitude', 'mean')).reset_index()

df_casualties_country['iyear'] = df_casualties_country['iyear'].astype(int)
df_casualties_country = df_casualties_country.sort_values('iyear')
df_casualties_country['total_victims'] = df_casualties_country['num_deaths'] + df_casualties_country['num_injuries']
df_casualties_country.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3351 entries, 2770 to 3350
Data columns (total 8 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   country_txt    3351 non-null   object 
 1   iyear          3351 non-null   int64  
 2   num_deaths     3351 non-null   float64
 3   num_injuries   3351 non-null   float64
 4   latitude       3303 non-null   float64
 5   num_attemps    3351 non-null   int64  
 6   longitude      3303 non-null   float64
 7   total_victims  3351 non-null   float64
dtypes: float64(5), int64(2), object(1)
memory usage: 235.6+ KB


In [23]:
fig = px.scatter_mapbox(df_casualties_country,
                        lat="latitude",
                        lon="longitude",
                        size="total_victims",
                        color="num_deaths",
                        hover_name="country_txt",
                        hover_data={"num_deaths": True, "num_injuries": True, "iyear": True, "latitude": False, "longitude": False},
                        zoom=1,
                        height=800,
                        title="Number of victims per country",
                        color_continuous_scale=px.colors.sequential.amp,
                        size_max=40,
                        animation_frame='iyear',
                        animation_group='country_txt')

fig.update_layout(mapbox_style="carto-positron")
fig.update_layout(margin={"r":0,"t":40,"l":0,"b":0})
fig.update_layout(coloraxis_colorbar=dict(title="N. attemps"))
fig.update_traces(marker=dict(opacity=0.7))

fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 600
fig.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 400
fig.layout.sliders[0].currentvalue.prefix = "Year: "

fig.show()

In [19]:
df_casualties = df_nonan.groupby(['iyear']).agg(
    num_deaths=('nkill', 'sum'),
    num_injuries=('nwound', 'sum'),
    latitude=('latitude', 'mean'),
    num_attemps=('eventid', 'count'),
    longitude=('longitude', 'mean')
).reset_index()

df_casualties['iyear'] = df_casualties['iyear'].astype(int)
df_casualties = df_casualties.sort_values('iyear')
df_casualties['total_victims'] = df_casualties['num_deaths'] + df_casualties['num_injuries']

## Incidents and casualties

In [20]:
fig = px.line(df_casualties, x="iyear", y="num_attemps")
fig.show()

In [21]:
import plotly.graph_objects as go
import pandas as pd


fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df_casualties['iyear'],
    y=df_casualties['total_victims'],
    name='Total Victims',
    mode='lines+markers',
    line=dict(color='firebrick', width=2)
))

fig.add_trace(go.Scatter(
    x=df_casualties['iyear'],
    y=df_casualties['num_attemps'],
    name='Number of Attempts',
    mode='lines+markers',
    line=dict(color='royalblue', width=2)
))

fig.update_layout(
    title='Victims and Attemps',
    xaxis_title='Year',
    yaxis_title='Count',
    hovermode='x unified',
    template='plotly_white',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    )
)

fig.show()

In [26]:
df_casualties_country['decade'] = (df_casualties_country['iyear'] // 10) * 10
df = df_casualties_country.groupby(['decade']).agg(
    total_victims=('total_victims', 'sum'),
    num_attemps=('num_attemps', 'sum'),
    num_deaths=('num_deaths', 'sum'),
    num_injuries=('num_injuries', 'sum')
)
df

Unnamed: 0_level_0,total_victims,num_attemps,num_deaths,num_injuries
decade,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1970,8308.0,5166,2922.0,5386.0
1980,98566.0,21766,55646.0,42920.0
1990,131333.0,26541,58435.0,72898.0
2000,179265.0,18944,54549.0,124716.0
2010,402396.0,80016,159076.0,243320.0


## **We can conclude that:**
- in some parts of the graph there a relation between the number of terrorist attacks and total victims in 2010s
- in 2001 there's an outlier where the number of victims is higher than the average  in the 2000s
- we can see that the total victims as been increased during all decades and others such as num_injuries with and exeption with num_attemps and num_deaths that in the 2000s decreased compared to the previous decade.