<h1> Use this <a href="http://nbviewer.jupyter.org/github/AdaTeamElec/ADA2017-Homeworks/blob/master/project/IntroAda.ipynb">link</a> to properly display maps </h1>

# Intro

## Abstract
Terrorism is a subject largely covered in the media, and, unfortunately, we became accustomed to its presence worldwide, particularly over the last decade. Nevertheless, the problem we are facing today is not new. The source of certain conflicts dates from multiple decades, some of which are still lasting today. Our goal is to track and vizualize terrorism evolution through the past 50 years based on "The Global Terrorism Database". There are many questions we can ask ourselves about terrorism, such as "Is EU less safe nowadays ?", "Did attack mediums & reasons change over the years ?" or "Can we discriminate current/future conflictual zones ?". It would be presumptuous from us to say that we are going to solve major issues, or even predict futur attacks. However, through the exploration of the dataset, and by trying to answer those interrogations, we aim to grasp an overview and a better understanding to the evolution of terrorism.

## Plan

1. [Raw data understanding and cleaning](#raw_data)
    1. [Field selections using documentation](#fields_select)
    2. [Data exploration ](#data_exploration)
2. [Data visualization](#data_viz)
    1. [ Worldmap heatmap all-time & over the years](#world_overview)
    2. [Some global evolutions over the years](#attacks_casualities)
3. [Groups](#groups)
    1. [Groups locations](#loc_groups)
    2. [Groups collaborations](#colla_groups)
    3. [Groups territories](#terr_groups)
4. [Events that marked the world](#events_world)
    1. [North America Bombings (1970)](#NAB_1970)
    2. [Nothern Irland Religion conflict (1972-1973)](#EU_1972)
    3. [Nothern Irland and Basque Country (1975-1977)](#EU_1975)
    4. [Salvadoran Civil War (1981-1983)](#CA_1981)
    5. [South America Conflicts (1984-1987)](#SA_1984)
    6. [Middle East (2003-2007)](#ME_2003)
    7. [South Asia (2008-2013)](#ME_2008)
    8. [Middle East (2013-Today)](#ME_2013)
5. [References](#references)

---

# 1 Raw data understanding  <a id='raw_data'></a>

## 1.1 Field selections using documentation  <a id='fields_select'></a>

First of all, we need to take a deep look into the details of our dataset to sort out the relevant data we will be using to conduct our observations. The Global Terrorism Dataset contains 135 features and approximately 170'000 entries. In order to select the label we will keep, we used the official [documentation](http://start.umd.edu/gtd/downloads/Codebook.pdf) from the dataset which describes each features precisely. Let's make a quick summary of the labels from the dataset we decided to use for our project.

* `eventid` : this is the id of any entry, written as 12 numbers (first 8 digits are the date of event and last 4 digits are a sequential case number for the given day). This will be used as our index too.
* `iyear`, `imonth`, `iday` : Year, month and day of the event. In some rare occasion the month or days are unknown.
* `country_txt` : id and name of the country where the event took place.
* `region_txt` : id and region where the event took place.
* `city` : This field contains the name of the city, village, or town in which the incident occurred. If the city, village, or town for an incident is unknown, then this field contains the smallest administrative area below provstate which can be found for the incident (e.g., district).  
* `latitude` and `longitude` : Latitude and Longitude values where the event took place.
* `doubtterr` : boolean value set as 1 if there is a doubt to whether the incident is an act of terrorism and 0 if there is no doubt of a terrorist attack.
* `success` : boolean value set as 1 if the incident was successful or 0 if it was not. As stated in the documentation, "Success of a terrorist strike is defined according to the tangible effects of the attack. Success is not judged in terms of the larger goals of the perpetrators. For example, a bomb that exploded in a building would be counted as a success even if it did not succeed in bringing the building down or inducing government repression." 
* `suicide` : boolean value set as 1 if the attack perpetrator did not intend to escape from the attack alive, 0 otherwise.
* `attacktype1_txt` : This field captures the general method of attack and often reflects the broad class of tactics used. It consists of nine categories, which are defined below :
    1. Assassination
    2. Armed Assault
    3. Bombing/Explosion
    4. Hijacking 
    5. Hostage taking (barricade incident) 
    6. Hostage taking (kidnapping)
    7. Facility/Infrastructure Attack
    8. Unarmed Assault
    9. Unknown 
* `targtype1_txt` : The target/victim type field captures the general type of target/victim. When a victim is attacked specifically because of his or her relationship to a particular person, such as a prominent figure, the target type reflects that motive. For example, if a family member of a government official is attacked because of his or her relationship to that individual, the type of target is “government.” This variable consists of the following 22 categories: <br>
    1. Business
    2. Government (General)
    3. Police
    4. Military
    5. Abortion related
    6. Airport & aircraft
    7. Government (Diplomatic), differs from the other entry as here are taken into account representation of a gouvernment on a foreign soil (embassy, consulate...)
    8. Educational institution
    9. Food or water supply
    10. Journalist & media
    11. Maritime facilities, including ports
    12. NGO
    13. Other
    14. Private citizens & property, include attacks in a public area against private citizens
    15. Religious figures/insititutions
    16. Telecommunication
    17. Terrorists/non-state militias
    18. Tourists
    19. Transportation (other than aviation)
    20. Unknown
    21. Utilities, facilities for generation or transmission of energy
    22. Violent political parties
* `gname` : This field contains the name of the group that carried out the attack. In order to ensure consistency in the usage of group names for the database, the GTD database uses a standardized list of group names that have been established by project staff to serve as a reference for all subsequent entries.  
* `gname2` : This field is used to record the name of the second perpetrator when responsibility for the attack is attributed to more than one perpetrator. Conventions follow “Perpetrator Group” field.  
* `gname3` : same as for gname2
* `nperps` : This field indicates the total number of terrorists participating in the incident. (In the instance of multiple perpetrator groups participating in one case, the total number of perpetrators, across groups, is recorded). There are often discrepancies in information on this value.   
* `weaptype1_txt` : This field records the general type of weapon used in the incident. It consists of the following categories: <br>
    1. Biological
    2. Chemical
    3. Radiological
    4. Nuclear
    5. Firearms
    6. Explosive/bonbs/dynamite
    7. Fake weapons
    8. Incendiary
    9. Melee
    10. Vehicle
    11. Sabotage equipment 
    12. Other
    13. Unknown
* `nkill` : This field stores the number of total confirmed fatalities for the incident. The number includes all victims and attackers who died as a direct result of the incident.   
* `nkillter`: This field stores the number of confirmed terrorists fatalities.
* `nwound` : This field records the number of confirmed non-fatal injuries to both perpetrators and victims. 
* `nwoundte` : This field records the number of confirmed non-fatal terrorists injuries. 


We are now reduced to 22 features instead of the original 135 from the dataset. A part from the kept features, we explored some other features such as `weaptype2`, `weapsubtype` or `motive` to see if those would bring added informations thus be relevant to use also. However, we decided to drop them because of a too large amount of NaN or unknown entries. Our choice focused on labels that would allow us to answer the questions asked in the description, as well as labels relevant to get a pertinent visualization of the data.


## 1.2 Data exploration  <a id='data_exploration'></a>

Let's begin the work by importing the libraries and creating a dataframe to explore the data furthermore. As cautious wannabe data scientist, we will explore in detail each field and check the proportion of non categorized or Unknown-labeled entries to make sure each feature we kept countains relevant data. <br>
*NOTE : during this section we are only exploring data without drawing any conclusions nor making assumptions regarding the data. This will come further in our analysis*

In [None]:
import pandas as pd
import os
import numpy as np
import datetime
import time
import seaborn as sns
import matplotlib.pyplot as plt
from IPython.core.display import display, HTML
import warnings

from scipy.sparse import csc_matrix
import networkx as nx
import utils.signedgraph as sg
import utils.heat_gsp as heat_gsp
from pygsp import graphs, filters

%pylab inline
%matplotlib inline

sns.set_context("notebook")
warnings.filterwarnings('ignore')

In [None]:
data_path = 'data'
gtd_path = os.path.join(data_path, 'globalterrorismdb_0617dist.csv')

In [None]:
fields = ['eventid', 'iyear', 'imonth', 'iday', 'country_txt', 'region_txt', 'city', 
          'latitude', 'longitude', 'doubtterr', 'attacktype1_txt',  'success', 
          'suicide', 'weaptype1_txt', 'targtype1_txt', 'gname', 'gname2', 
          'gname3', 'compclaim', 'nperps', 'nkill', 'nkillter', 'nwound', 'nwoundte']
date_fileds = ['iyear', 'imonth', 'iday']

df = pd.read_csv(gtd_path, encoding='latin', usecols=fields, index_col='eventid', low_memory=False)

In [None]:
print('Is index unique: {}'.format(df.index.is_unique))

In [None]:
df.dtypes

According to the documentation, month or day (or both) can be set to 0 if the exact date of the attack is unknown. We created a function to set the value to of the field to 1 in the case of an unknown date. We then count the proportion of unknown date within the dataset, just to make sure it is not too high.

In [None]:
# According to documentation both month and day can be 0 (if unknown), we set them to 0
def parse_date(row):
    return datetime.date(row.iyear, int(row.imonth) if not np.isnan(row.imonth) else 1, 
                         int(row.iday) if not np.isnan(row.iday) else 1)

In [None]:
# Count number entries with uncertain date (either month or day)
df[date_fileds] = df[date_fileds].replace(0, np.nan)
n_uncertain = np.sum(np.sum(df[date_fileds].isnull(), axis=1) != 0 )
df['date'] =df.apply(lambda x: parse_date(x), axis=1)
print('Uncertain dates: {:.2f}%, ({}/{})'.format(100*n_uncertain/len(df), n_uncertain, len(df)))

We check the proportion of entries without geographic coordinates. In the case when the coordinates are unknow, we decided to completely drop the row of data. This is due to the fact that we want to have the location informations in order to represent the data with maps.

In [None]:
n_geo = len(df)
df.dropna(subset=('latitude', 'longitude'), inplace=True)
print('Entries without geographic coordinates droped: {:.2f}%, ({}/{})'.format(
    100*(n_geo-len(df))/n_geo, n_geo-len(df), n_geo))

We check that each item has a valid entry for city, country or region. We see that it is missing a few entries for city name, but as we will not use this feature often, we won't drop those entries.

In [None]:
print('Proportion of data without a valid city entry: {:.2f}%'.format(100*(1-df.city.value_counts().sum()/len(df))))
print('Proportion of data without a valid country entry: {:.2f}%'.format(100*(1-df.country_txt.value_counts().sum()/len(df))))
print('Proportion of data without a valid region entry: {:.2f}%'.format(100*(1-df.region_txt.value_counts().sum()/len(df))))

We are now checking the amount of attacks that are categorized as unsure terror attacks. We wanted to check this particular feature to make sure that in the dataset there is a large majority of attacks that are hundred percent sure to be terror attack. If it would not have been the case, the whole dataset as well as our study would not have been relevant. <br>
There is 15% of attacks for which there is a doubt to categorize them as terror attacks. The entry -9 represent cases for which the value was not available at all when the dataset was constructed. We decided to assign them as if it was sure they were terror attacks.

In [None]:
print('Repartition of data in %:\n{}'.format(100*df.doubtterr.value_counts()/len(df)))
df.loc[df.doubtterr < 0, 'doubtterr'] = 0
df.doubtterr = df.doubtterr.astype('category')
df.doubtterr.cat.categories = ['N_DOUBT', 'DOUBT']
print('\nRepartition of data in %(after cleaning):\n{}'.format(100*df.doubtterr.value_counts()/len(df)))

For the two upcoming fields, data is, as expected, binary and completly categorized.

In [None]:
print('Unique values in field: {}'.format(np.unique(df.success)))
print('Percentage of sucessful attacks: {:.2f}%'.format(100*df.success.mean()))

In [None]:
print('Unique values in field: {}'.format(np.unique(df.suicide)))
print('Percentage of suicide attacks: {:.2f}%'.format(100*df.suicide.mean()))

Time to explore if the proportion of attack types and see if there are any NaN values.

In [None]:
print('Type of attack and repartition in dataset in %')
100*df.attacktype1_txt.value_counts()/len(df)

We do the same for the repartion of target types and the repartion of weapon types.

In [None]:
print('Type of target and repartition in dataset in %')
100*df.targtype1_txt.value_counts()/len(df)

In [None]:
print('Repartition of weapon type in dataset in %')
100*df.weaptype1_txt.value_counts()/len(df)

We now check the number of unique entries that categorize the name of the group conducting the terror attacks. According to the documentation, a work as been done to standardize the entries within this field by using a specific list of group names established by project staff.

In [None]:
pd.value_counts(df[['gname', 'gname2', 'gname3']].values.ravel('K')).head(10)

In [None]:
df[['gname', 'gname2', 'gname3']] = df[['gname', 'gname2', 'gname3']].replace({'Unknown': np.nan})
n_group = len(pd.unique(df[['gname', 'gname2', 'gname3']].values.ravel('K')))
print('Number of unique group name: {}'.format(n_group))

We will now explore data with numerical values. First we look at field corresponding to the number of perpretrators of an attack. As expected, it countains a large amount of unknown entries as it is not easy to know how many perpetrators of an attack there was. We decided to keep this row anyway as to explore, if possible, the evolution of terror attack, and the number of perpetrators is a value that could give an insight to know this.

In [None]:
df.loc[df.nperps < 0, 'nperps'] = np.nan
print('Percentage of entries with unknown # Perpretrators {:.2f}%'.format(100*np.sum(df.nperps.isnull())/len(df)))
print('Range of # Perpretrators: {} upto {}'.format(int(df.nperps.min()), int(df.nperps.max())))

We look now at the number of victims of terror attacks. As the dataset count the total number of fatalities, perpetrators included, we found it relevant to keep also the number of killed terrorists to conduct our analysis. Same logic applies for the number of wounded.

In [None]:
df['nkillnter'] = df.nkill-df.nkillter.fillna(0)
df.loc[df.nkillnter < 0, 'nkillnter'] = 0

print('Range total # of victims: [{}, {}]'.format(df.nkill.min(), df.nkill.max()))
print('Range # of non terrorists victims: [{}, {}]'.format(df.nkillnter.min(), df.nkillnter.max()))
print('Range # of terrorists victims: [{}, {}]'.format(df.nkillter.min(), df.nkillter.max()))

In [None]:
df['nwoundnter'] = df.nwound-df.nwoundte.fillna(0)
df.loc[df.nwoundnter < 0, 'nwoundnter'] = 0

print('Range total # of wounded: [{}, {}]'.format(df.nwound.min(), df.nwound.max()))
print('Range # of non terrorists wounded: [{}, {}]'.format(df.nwoundnter.min(), df.nwoundnter.max()))
print('Range # of terrorists wounded: [{}, {}]'.format(df.nwoundte.min(), df.nwoundte.max()))

Now we have a better understanding of our dataset and we are confident to have a dataset we can work with. We can go further in our analysis and do data visualization.

---
# 2 Data Visualization  <a id='data_viz'></a>

## 2.1 Worldmap heatmap all-time & over the years  <a id='world_overview'></a>

Here we defined basic function for map plot. The first function extract `latitude` and `longitude` from the data that will be plotted in folium maps. The next two functions are used to plot the actual heat map of attacks (overall and with times steps).

In [None]:
import folium
from folium.plugins import HeatMap, HeatMapWithTime
from folium.plugins import MarkerCluster

def get_data_longlat(df, val=None):
    if val is not None:
        df_t = df.loc[df[val] > 0]
        data_year = df_t[['latitude', 'longitude']].values
        return np.concatenate((data_year, np.expand_dims(df_t[val], axis=1)), axis=1)
    else:
        data_year = df[['latitude', 'longitude']].values
        return np.concatenate((data_year, np.ones((len(data_year), 1))), axis=1)

def get_heatmap_time(df, coord=[30., 5.], zoom=2):
    data_all = []
    year_label = []
    for year, d in df.groupby('iyear'):
        data_all.append(get_data_longlat(d).tolist())
        year_label.append('Year: {}'.format(year))
    m = folium.Map(coord, tiles='stamentoner', zoom_start=zoom)
    HeatMapWithTime(data_all, index=year_label, radius=10, max_opacity=1).add_to(m)
    return m
    
def get_heatmap(df, val=None, coord=[30., 5.], zoom=2, min_opacity=0.5, blur=5):
    data = get_data_longlat(df, val) 
    m = folium.Map(coord, tiles='stamentoner', zoom_start=zoom)
    HeatMap(data.tolist(), radius=5, min_opacity=min_opacity, blur=blur).add_to(m)
    return m
    
rand_seed = 0
np.random.seed(rand_seed)

Due to the large amount of data and the limitation of Folium, we decided to select a random subset of n samples ($n=50000$). The first map displays thoses samples as heat map. Even if around 1/3 of the dataset is displayed we can assume the distribution is similar for the complete dataset. We can clearly see the region of conflicts for example in Middle East or even in India. However this map have no temporal view. Therefore, we display as well a time-wise evolution of the data, that show conflicts yearwise.

In [None]:
n = 50000
id_sub = np.random.permutation(len(df))[:n]
df_sub = df.iloc[id_sub]

In [None]:
m_overall = get_heatmap(df_sub, coord=[30., 5.])
m_overall_time = get_heatmap_time(df_sub, coord=[30., 5.])

display(HTML('<h4>{}</h4>'.format('50 years of terrorism worldwide')))
display(m_overall)
display(HTML('<h4>{}</h4>'.format('Yearly evolution of terrorism worldwide')))
display(m_overall_time)

## 2.2 Some global evolutions over the years   <a id='attacks_casualities'></a>

### 2.2.1 Number of Attacks by years

Here we will focus on the evolution of the number of attacks. First we will look at the data from a worldwide perspective. We can notice that year 1993 is missing in our data. It is unlikely that no terrorist attacks occured during this period. According to [Codebook](http://start.umd.edu/gtd/downloads/Codebook.pdf) an incident that led to the loss of the data by the collector is the reason of the abscence of data for this year.

> In addition, users familiar with the GTD’s Data Collection Methodology are aware that incidents 
of terrorism from 1993 are not present in the GTD because they were lost prior to START’s 
compilation of the GTD from multiple data collection efforts. 

Overall we can clearly see that the number of attacks has increased from 1970 to 1992, before decreasing again until 2005. Then it increased quickly until 2014 where it peaked. The number of attacks is now decreasing again but stays incredibly high compared to the beginning of the years 2000.

In [None]:
year_span = 1+df.iyear.max()-df.iyear.min()
plt.figure(figsize=(16,5))
df.date.hist(bins=year_span, label='# Attacks')
plt.xlabel('Time'); plt.ylabel('# Attack'); plt.legend(); 
plt.title('Evolution of number of attacks worldwide', fontsize=12, fontweight='bold')

The number of attacks is sometimes not really relevant. In present days, terrorism is unfortunately linked to isolated events with huge number of casualities (Nice or Bataclan attacks for example). We will therefore plot as well the number of death linked to terrorism over the years. The number of death have increased over the years to reach a peak of over 40'000 death in 2015. It represent an average number of <b> 110 death per day</b>. Note that year 1993 is still missing in our data (year not represented in this case).

In [None]:
plt.figure(figsize=(16,5))
df.groupby('iyear').nkill.sum().plot(kind='bar', width=1)
plt.xlabel('Time'); plt.ylabel('# casualties'); plt.legend(); 
plt.title('Evolution of casualties over the year Worldwide', fontsize=12, fontweight='bold')

### 2.2.2 Number of attack by months for each years

In [None]:
MONTHS = ['January', 'February',  'March', 'April', 'May', 'June', 'July', 
          'August', 'September', 'October', 'November', 'December']

def get_2d_comp(df, x_col, y_col, hue=None, normalize=False):
    if hue is None:
        df_month_year = df.groupby([x_col, y_col]).size().reset_index(name='frequ')
    else:
        df_month_year = df.groupby([x_col, y_col])[hue].sum().reset_index(name='frequ')
    df_month_year =  df_month_year.pivot(index=y_col, columns=x_col, values='frequ')
    if normalize:
        df_month_year = df_month_year.div(df_month_year.sum(axis=0), axis=1)
    return df_month_year

Here we will look at the evolution of attacks as a function of the month of the year. We can see that there are no visible tendencies. It means that, as a worldwide perspective, there is not a period within the year during which terrorist attacks are more frequent. We have to be careful in our analysis since local patterns can still exist (e.g. in South America, Middle East, etc.. ).

In [None]:
df_month_year = get_2d_comp(df,'iyear', 'imonth')
df_month_year.index = MONTHS
plt.figure(figsize=(16,4))
sns.heatmap(df_month_year)
plt.title('Evolution of casualties over the year and months', fontsize=12, fontweight='bold')

### 2.2.3 Number of Attacks by regions by years

An other way to look at the attacks data is to plot them as a function of the region. We choosed to normalize our data. In this case it means that the rows will sum to 1 for each years. It allows fairness with present days peaks of attacks and to highlight past conflict periods. We can now distinguish, for example, periods of trouble in South America from 1980 to 1990. As expected, Middle East ans South Asia (Afghanistan, Pakistan, India) are the countries with most terrorism activity nowadays, hence represent the most dangerous areas.

In [None]:
df_reg_year = get_2d_comp(df,'iyear', 'region_txt', normalize=True)
plt.figure(figsize=(14,4))
sns.heatmap(df_reg_year.fillna(0))
plt.title('Region-wise attacks over the year and months', fontsize=12, fontweight='bold')

We use here the same approach but this time we are considering the number of death and not the frequency of events. This time as well data are normalized. We can see a distinct peak for North America in 2001 which highlight the tragic event of the 11th September in New York.

In [None]:
df_reg_year_ca = get_2d_comp(df,'iyear', 'region_txt', 'nkill', normalize=True)
plt.figure(figsize=(14,4))
sns.heatmap(df_reg_year_ca.fillna(0))
plt.title('Region-wise casualities over the year and months', fontsize=12, fontweight='bold')

### 2.2.4 Weapons and Targets

Here we are going to look at the weapons used and target aimed over the years. Note that we are creating a field `year_f` which is a float approximation of the date. It is linked to the fact that seaborn does not handle Time entries. We also remove unknown values since they are not relevant to plot in our case

In [None]:
# New Dataframe with weapons and years, months
weapon_df = df[['weaptype1_txt', 'iyear', 'imonth']]
weapon_df['year_f'] = weapon_df.iyear + weapon_df['imonth']/12
weapon_df = weapon_df[weapon_df.weaptype1_txt != 'Unknown']

# New Dataframe with weapons and years, months
target_df = df[['targtype1_txt', 'iyear', 'imonth']]
target_df['year_f'] = target_df['iyear'] + target_df['imonth']/12
target_df = target_df[target_df.targtype1_txt != 'Unknown']

As we can see, the 4 most used weapons used in these terrorists attacks are Bombs, Incendiaries, Firearms and since the last 25 years also Melees. These values are almost the same since the beginning of the dataset. What is interesting to notice here is that Chemicals, Sabotage and Vehicles are increasing since ~ 10 years. Probably linked to the evolution of technology around the world.

We know that defining a subsample of the main dataset implies dealing with a bias as we have seen in course.
Nevertheless we decided to do it here as this allowed us to plot more visual graphs from which we can discern a tendancy (for the other analysis parts we still use the whole dataset) so that's what we do next.

In [None]:
plt.figure(figsize=(16, 10))
sns.stripplot(x=weapon_df["year_f"], y=weapon_df["weaptype1_txt"], jitter=True)

The first 5 rows seems to be the preferred targets since 50 years without lots of evolution. Some targets are increasing lately such as Transportations, Medias, Educationnal Institutions, Religious Institutions and Political parties. Terrorism is a mean to create fear so the propagation to new targets make sense since there are widely spread nowadays.

In [None]:
id_sub = np.random.permutation(len(target_df))[:10000]
r = target_df.iloc[id_sub]

plt.figure(figsize=(16, 10))
sns.stripplot(x=r["year_f"], y=r["targtype1_txt"], jitter=True)

### 2.2.5 Insight of 25 Deadliest attacks of all time

We represent the 25 deadliest terror attacks of all time in the next map. When you click on a marker, you will get more information about an attack. Those attacks are not representative from the whole dataset, but we see that 10 of the 25 deadliest attack were in Middle East and are recent. 

In [None]:
m = folium.Map(location=[30., 5.], tiles='Stamen Terrain', zoom_start=2)
marker_cluster = MarkerCluster().add_to(m)

for i in range(25):
    data = df.sort_values(by='nkill',  ascending=False)[i:i+1]
    info = 'Terror attack in '+ data['city'].values[0]+\
    '<br>Date: '+str(int(data['iday'].values[0]))+'.'+str(int(data['imonth'].values[0]))+'.'+str(data['iyear'].values[0])+\
    '<br>Target: '+str(data['targtype1_txt'].values[0])+\
    '<br>Perpetrators: '+str(data['gname'].values[0]).replace("'","")+' with '+data['weaptype1_txt'].values[0]+\
    '<br>Casualties: ' + str(int(data['nkill'].values[0]))
    
    folium.Marker(data[['latitude', 'longitude']].values.ravel('K'), popup=info, 
                  icon=folium.Icon(color='red', icon='info-sign')).add_to(marker_cluster)

display(HTML('<h4>{}</h4>'.format('Interactive map - Top 25 most deadly attacks recorded')))
display(m)

Note that <b>half</b> of the deadliest attacks occured during the last 8 years !

In [None]:
data = df.sort_values(by='nkill',  ascending=False)[:25]
print('Median date: ', data.sort_values('date').iloc[int(np.ceil(25/2))].date)

---

# 3. Groups  <a id='groups'></a>

## 3.1 Groups locations  <a id='loc_groups'></a>

To have better insight of the data we want to be able to locate each terrorist group. Of course it is not possible to get the exacte location of the group, but we can estimate it. As a proof of concept we define the location of eaxh group as the median of the attacks. To do so, we use a derivate of KMean algorithm named KMedoids. It is exaclly the same lgic, except the cluster center will be part of the set (Which is usually not the case for KMean). Doing so we avoid weird locations in the middle of the see. We used the KMedoids algorithm implemented by github user [letiantian](https://github.com/letiantian/kmedoids).

Here we build our function that will get for a specific terrorsit group: number of attacks (`frequ`), number of casualities (`nkill`), coordinates (`latitude`, `longitude`) and country (`country`).

In [None]:
from collections import OrderedDict
from sklearn.neighbors import DistanceMetric
from utils import kmedoids

def estimate_home(df, gname):
    # Get only attacks in which the group took part
    group_entries = np.logical_or(np.logical_or(df['gname'] == gname, df['gname2'] == gname), df['gname3'] == gname)
    # Extract number of attacks, number of casualities, and estimate coordinates (median)
    frequ = np.sum(group_entries)
    n_death = df.loc[group_entries, 'nkill'].fillna(0).sum()
    # Get distances for kmedoids and compute it
    coord = df.loc[group_entries, ['latitude', 'longitude']].values
    D = DistanceMetric.get_metric('haversine').pairwise(coord)
    M, C = kmedoids.kMedoids(D, 1)
    # Get coordinates of selected value
    coord = coord[M[0]]
    country = df.loc[group_entries].iloc[M[0]].country_txt
    return frequ, n_death, coord[0], coord[1], country

We can now get our data. We first look in our dataset for unique group names. Afterward we can get for each group all the statistics. Since this operation takes some times we add a display of iteration and save the results at the end.

In [None]:
# Look for unique name of groups in dataset
groups = pd.value_counts(df[['gname', 'gname2', 'gname3']].values.ravel('K')).index.values
# Create empty dataframe that we will fill with group data
df_groups = pd.DataFrame(index = groups,  
                         data = OrderedDict(( ('frequ', np.nan), ('nkill', np.nan), 
                                             ('latitude', np.nan), ('longitude', np.nan), ('country', np.nan) )) )
# Compute statistics for each groups
for i, gname in enumerate(groups):
    if i%500 == 0:
        print('{}/{} Computed homes'.format(i, len(groups)))
    df_groups.iloc[i] = estimate_home(df, gname)
# Save results to file to avoid performing task multiple at each run
df_groups.to_csv(os.path.join(data_path, 'groups_stats.csv'))

We can see that we have for each group: the number of attacks (`frequ`), number of casualities (`nkill`), coordinates (`latitude`, `longitude`) and country (`country`).

In [None]:
df_groups = pd.read_csv(os.path.join(data_path, 'groups_stats.csv'), index_col=0)
df_groups.head()

We define here the basic function to get color accorging to number of casualities and logaritm scale of values (see next cell for explanation)

In [None]:
from folium import LinearColormap

def get_ln_value(value, offset=0, factor=1):
    return np.log(1 + offset + factor*value)

def get_info(row):
    return  '<strong> {}</strong> <br># of attacks: {} <br>Casualties: {}'.format(
        row.name.replace('\'', ''), int(row.frequ), int(row.nkill))

kill_max = get_ln_value(df_groups.nkill.max())
linear_kill = LinearColormap(['green', 'yellow', 'red'], vmin=0, vmax=kill_max)

We can now display the group locations directly on a map. Here we have 2 important informations we want to display : the number of casualities and the number of attacks. A group that performs multiple attacks might not actually try to hurt population. Therefore we choosed to set the display as follows:

- The size of the circle gives an estimate of the total number of attacks. If the circle is small then the amounts of attacks is small.
- The color of the circle goes from green to red. If the circle if green the group did not kill a lot of personne. On the contrary if the circle is red therefore we can except a large amount of casualities.

Note that we use logaritm scale for both circle size and color. Which mean that if we compare two groups and group one performed 2 times more attack, the circle will not be twice as big.

In [None]:
m = folium.Map(location=[30., 5.], zoom_start=2, tiles='Stamen Toner')

for i, ids in enumerate(df_groups.loc[df_groups.frequ > 20].index):
    coord = df_groups.loc[ids, ['latitude', 'longitude']].values
    popup = get_info(df_groups.loc[ids])
    radius = get_ln_value(df_groups.loc[ids, 'frequ'])
    c_kill = linear_kill(get_ln_value(df_groups.loc[ids, 'nkill']))
    folium.CircleMarker(location=coord, radius=radius, 
                        color=c_kill, fill_color=c_kill, fill_opacity= 0.8, 
                        fill=True, popup=popup, weight=1).add_to(m)

display(HTML('<h3>{}</h3>'.format('Interactive map - Estimated location of terrorist groups')))
display(m)

## 3.2 Groups collaborations  <a id='colla_groups'></a>

Here we can investigate the collaboration between the groups. Some attacks were performed by 2 or more groups. We can therefore assurme that their are allies. However we have to be a bit carful. our dataset contains as well a field named `compclaim` which tell us if the groups are both competitively claim the attack. If it is the case, the two groups are supposed ennemies.

Since there can be more that two groups, we will match each pairs togther (group1/group2, group1/group3, group2/group3). Some values are set to -9 (Unknow) when we have no information. We choosed to discard those entries by putting them to nan. 

At the end we have 833 callaboration/competition in our data

In [None]:
# Create 2d table with for matching pair-wise groups
frames = [ df[['gname', 'gname2', 'compclaim']].values, 
           df[['gname', 'gname3', 'compclaim']].values, 
           df[['gname2', 'gname3', 'compclaim']].values ]
df_multi = pd.DataFrame(np.array(frames).reshape((-1, 3)))
df_multi.columns = ['gname', 'gname2', 'compclaim']
df_multi.compclaim = df_multi.compclaim.replace({-9: np.nan})
# Drop line with NaN (not relevant matching)
df_multi.dropna(subset=('gname', 'gname2', 'compclaim'), inplace=True)
print('Number of colaboration/comp:', len(df_multi))
df_multi.head()

To make the creation of our graph easier we build a new data frame with both the number of cooperations (`n_coop`) and the number of competitions (`n_comp`).

In [None]:
# Set attrbute value to 1
df_coop = df_multi.loc[df_multi.compclaim == 0].groupby(['gname', 'gname2']).size().reset_index(name='n_coop')
df_comp = df_multi.loc[df_multi.compclaim == 1].groupby(['gname', 'gname2']).size().reset_index(name='n_comp')
df_link = pd.merge(df_coop, df_comp, how='outer').fillna(0)
df_link.head()

We can now create our [adjacency matrices](https://en.wikipedia.org/wiki/Adjacency_matrix). Here we want to represent each group as a node. Therefore the number of rows/cloums will be equals to the number of unique groups in our data. Two matrices are created (`W_coop`, `W_comp`). The first one only contain cooperation between groups, the second one competitions. Note that is group A is friend with group B, it means that group B is friend with group A as well. Knowng that we will build an undirected graph by making `W_coop` and `W_comp` symetric. 

In [None]:
groups = pd.unique(df_link[['gname', 'gname2']].values.ravel('K'))
d = dict(zip(groups, np.arange(len(groups))))

# Create Adjacency matrix with # of attack claimed together as weight between links (e.i. groups)
W_coop = csc_matrix((df_link['n_coop'], 
                (df_link['gname'].replace(d), df_link['gname2'].replace(d))), shape=(len(d), len(d)))
W_coop = 0.5*(W_coop.T+W_coop)

W_comp = csc_matrix((df_link['n_comp'], 
                (df_link['gname'].replace(d), df_link['gname2'].replace(d))), shape=(len(d), len(d)))
W_comp = 0.5*(W_comp.T+W_comp)

Our matrix W is the difference between the number of cooperation and competitions. We will end up with an adjacency matrix with negative entries. The resulting graphs are known as [Signed graph](https://en.wikipedia.org/wiki/Signed_graph). To locate our cluster we only focus on groups that have at leat one friendship or disagreement link (entrie in W != 0). We will therefore avoid trivial solution with unconnected graphs.

Note that we take only cluster with at least 8 groups to have more data and relevant clusters.

In [None]:
W = W_coop - W_comp
G = nx.from_scipy_sparse_matrix(W)
clusters = [list(g) for g in nx.connected_components(G) if len(g) >= 8]

Now that we located connected groups, we need to cluster them to see which one works together or fight eachother. Each positive entriy ($W_{ij} > 0$) is synonym of potential friendship. On contrary negative entry ($W_{ij} < 0$) is synonym of potential animosity.

Based on [Spectral Analysis of Signed Graphs for Clustering, Prediction and Visualization](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.155.1809&rep=rep1&type=pdf) \[1\],  we implemeted Normalized signed [Laplacian](https://en.wikipedia.org/wiki/Laplacian_matrix) to represent graph and cluster results using an Spectral Graph Clusering approach (using eigenvalues and eigenvector of Laplacian). 

Note that to be able to determine the number of clusters, we used $\text{SignedRatioCut}(X,Y)$ defined in [1] as loss function that we try to minimize. 

In [None]:
groups_clustering = {}

for i, cluster in enumerate(clusters):
    print('Cluster length: {}\nNames: {}\n'.format(len(cluster), groups[cluster]))
    # Extract adjacency matrix fpr specific cluster
    W = sg.get_w_signed(W_coop, W_comp, cluster, binary=False)
    # Compute Nomralized signed laplacian
    L = sg.compute_L(W, normalized=True)
    # Get estimate of number of clusters
    ncluster = sg.estimate_ncluster(W, L, plotloss=False)
    
    # Get estimated clustering
    est_cgt = sg.get_clusters(L, ncluster)
    # Save results as dictionary
    groups_clustering[i] = {'W': W, 'ids': cluster, 'ncluster': ncluster, 'est_cgt': est_cgt}

Here we will display on left the unclustered graph (connected groups), which is given by the first two non null eigenvectors of Laplacian matrix. And on the right the clustered graph. Node colors represent the cluster groups belong to and the color of the link describe their relationship (blue=friend, red=enemmy)

In [None]:
plt.figure(figsize=(16, 12*len(groups_clustering)))

for i, item in enumerate(groups_clustering):
    ax = plt.subplot(len(groups_clustering), 1, i+1)
    ax.set_title('Clustered (N={}) and organized graph'.format(groups_clustering[item]['ncluster']), fontsize=14)
    sg.draw_graph(groups_clustering[item]['W'], 
                  groups_clustering[item]['est_cgt'], 
                  labels=groups[groups_clustering[item]['ids']], ax=ax, reorder=True, offset=4e-1)
    

## 3.3 Groups territories <a id='terr_groups'></a>

### 3.3.0 Graph Signal Processing (GSP) and Heat Kernel 

Here we are focusing on the middle east region. We have report on local attacks but it is difficult to determine the influence zone (territory) of the groups. You will not have a clear limit since some groups are commiting attacks in different countries. 

Here we are proposing a method to estimate the territories. We use graph representation since they keep the structure f the information and do not need interpolation \[3\]. Each point (node) of the gaph represent the location of at least on terrorist attack. Nodes are connected using kNN with k=20 parameter. Note that is is an arbitrary choice. We use local graph normalization \[2\] to be density independant. Doing so city areas will have the same importance as country side areas. The signal on the graph is the estimate of the intensity of the attack. 

Each group has its own signal as:

$$ f^{group}_i = \text{log}(1+N_i) + \text{log}(1 + \sum^{N_i} deaths) $$

where $N_i$ is hte number of attacks at this node performed by the group and $\sum^i death$ the total number of deaths. Note that we are considering the logarithm value. We are doing because an attack with 10 deaths is not 10 times more important than one with only one casuality. It also avoid having outliers that will introduce bias in our data. The second step is to use Heat Kernel $\hat{g}(\lambda_l) = e^{-t \lambda_l}$ as decribe in \[3\] to make signal spread along the graph. We choose a large $t=100$ value to allow signal to spread properly and have smooth results.

The node is considered as "belonging" to group A is the signal of group A is the highest compare at this specific node compared to all other groups. Note that there is a low threshold $= 5e-2$ where point do not belong to any group.


### 3.3.1 Practical case and estimation 


We consider onély middle east for this example

In [None]:
# Take only Middle East & North Africa points
df_ME = df.loc[df.region_txt == 'Middle East & North Africa']
df_ME['id_loc'] = df_ME.latitude.astype(str) + '_' + df_ME.longitude.astype(str)

As explained in the introduction of this section, we take terrorist atacks location as nodes of the graph. We remove duplicates since they are not relevant for the construction of the graph.

In [None]:
# Drop duplicates to build map
nodes = df_ME.drop_duplicates(subset='id_loc')
nodes = nodes.reset_index().reset_index()
nodes = nodes[['latitude', 'longitude', 'id_loc', 'index']].set_index('id_loc')
W = heat_gsp.knn_graph(nodes)
G = graphs.Graph(W)
G.estimate_lmax()
print('Are nodes unique: {}'.format(nodes.index.is_unique))
print('Adjacency matrix shape: {}'.format(W.shape))

We will as well discard groups with less than 10 attacks. We will end up with a bit more than 100 groups.

In [None]:
np.random.seed(0)
frequ_min = 10
cls_groups = pd.value_counts(df_ME[['gname', 'gname2', 'gname3']].values.flatten())
cls_groups = cls_groups[cls_groups.values > frequ_min].index.values
cls_groups = np.random.permutation(cls_groups)
print('Number of groups with at least {} attacks: {}'.format(frequ_min, len(cls_groups)))

Here we split the time range in 10 periods. We are note evenly spliting the time. We split the dates such that each interval have the same number of atacks.

In [None]:
n_time_interval = 10
intervals = np.linspace(0, len(df_ME)-1, n_time_interval+1).astype(int)
dates = df_ME.iloc[intervals].date.values

We get signal on the graph for each group according to description above.

In [None]:
df_time_inveral = df_ME.loc[np.logical_and(df_ME.date >= dates[9], df_ME.date <= dates[10])]
signal = heat_gsp.get_signal_attack(df_time_inveral, nodes, cls_groups)

Here we can see the attack of ISIS for the concerned period. We clearly see that they are a lot of black points (no ISIS attacks). It is difficult on this plot to estimate the region where ISIS is based.

In [None]:
id_ISIS = np.where(cls_groups == 'Islamic State of Iraq and the Levant (ISIL)')[0][0]
heat_gsp.plot_map_group(nodes, signal[:, id_ISIS], title='Attacks prformed by {}'.format(cls_groups[id_ISIS]))

Now we make our signal spread and we can clearly distinguish the zone that ISIS is ruling. As expected it is centred in northern Iraq and eastern Syria where the group is mainly based.

In [None]:
g_heat = filters.Heat(G, tau=100)
signal_heat = g_heat.filter(signal, method='chebyshev')
heat_gsp.plot_map_group(nodes, signal_heat[:,id_ISIS], 
                        title='Attacks prformed by {} - Spread'.format(cls_groups[id_ISIS]))

We do the same for each group and take the maximum value for each node. At the end we get this approximation of the terriotries for late 2016. We distinguish the main groups ISIS, PKK for example. In Isreal we can see Hezbola, Egypt Muslim Brotherhood etc... 

In [None]:
signal_heat_thresh = np.concatenate((5e-2*np.ones((signal_heat.shape[0], 1)), signal_heat), axis=1)
id_groups = np.argmax(signal_heat_thresh, axis=1)

In [None]:
heat_gsp.plot_map_cls(nodes, id_groups, cls_groups, 'Groups repartition in late 2016')

---

# 4. Events that marked the world   <a id='events_world'></a>

In [None]:
def get_actor_details(df, gname, n=5):
    group_entries = np.logical_or(np.logical_or(df['gname'] == gname, df['gname2'] == gname), df['gname3'] == gname)
    # Extract number of attacks, number of casualities, and estimate coordinates (median)
    frequ = np.sum(group_entries)
    n_death = df.loc[group_entries, 'nkill'].fillna(0).sum()
    name_weapon = df.loc[group_entries, 'weaptype1_txt'].value_counts().index[0]
    country = df.loc[group_entries, 'country_txt'].value_counts().index[0]
    return frequ, n_death, name_weapon, country

def get_main_actors(df):
    groups = pd.value_counts(df[['gname', 'gname2', 'gname3']].values.ravel('K')).index.values
    df_details = pd.DataFrame(index = groups,  
                              data = OrderedDict(( ('frequ', np.nan), ('nkill', np.nan), 
                                                   ('weapon', np.nan), ('country', np.nan) )) )
    for i, g in enumerate(groups):
        df_details.iloc[i] = get_actor_details(df, g)
        
    return df_details.sort_values('frequ', ascending=False)

Here we will focus on the previous results with attack around the world (region-wise). We will arbitrary set a threshold to highlight the most critical periodes. We set a low threshold to eliminate small events. The algorithm is just taking the peak for each region and thresholing the value to $0.8*\text{max value}$. Therefore for each region we get critical periodes. In the next part will take a look at thoses periodes to identify the reasons.

In [None]:
thresh = 0.8*df_reg_year.max(axis=1)
thresh[thresh < 0.20] = 0.2
thresh = df_reg_year.subtract(thresh, axis=0) >= 0

plt.figure(figsize=(14,4))
sns.heatmap(thresh); plt.title('Thresholded events', fontsize='12', fontweight='bold')

## 4.1 North America Bombings (1970)   <a id='NAB_1970'></a>

http://time.com/4501670/bombings-of-america-burrough/

The first very troubled period takes place in the early 70s, in North America. After some research, we found that in USA, at that time, there was an intense activity of bombing accross thw whole US territory, conducted by several groups but mostly left extremists whose goal was “to damage symbols of American power, such as empty courthouses and university buildings, a Pentagon bathroom, the U.S. Capitol". Although very violent, with more than 2500 explosions repertoried by the FBI in 18 months, this dark era of American History is nearly forgotten. <br>
From our exploration of the dataset, we can only confirm the informations found online. As we can see from the first five most active terror group in 1970 in the US, most of them were left extremists (Leff-Wing Militants, Student Radicals, Strikers...) and the most used mean to conduct the attack was with explosives. <br> 
The maps displayed below show respectively the attack frequency and the number of casualties of the attacks. From the first map, we see that indeed, a large amount of attacks we registered this year (in 1970), however, from the second map we see that despite all those attacks, only a fragment of them were lethal.

In [None]:
df_NA_1970 = df.loc[df.region_txt=='North America']
df_NA_1970 = df_NA_1970.loc[np.logical_and(df_NA_1970.iyear >= 1970, df_NA_1970.iyear <= 1970)]

In [None]:
m_frequ = get_heatmap(df_NA_1970, coord=[35., -95.], zoom=4, min_opacity=0.9, blur=5)
m_kill = get_heatmap(df_NA_1970, val='nkill', coord=[35., -95.], zoom=4, min_opacity=0.9, blur=5)

display(HTML('<h3>{}</h3>'.format('Group details'))); display(get_main_actors(df_NA_1970).head(5))
display(HTML('<h3>{}</h3>'.format('Attack frequencies'))); display(m_frequ)
display(HTML('<h3>{}</h3>'.format('Attack casualities'))); display(m_kill)

## 4.2 Nothern Irland Religion conflict (1972-1973)    <a id='EU_1972'></a>

https://en.wikipedia.org/wiki/The_Troubles 

The second focus zone bring us to Europe, more precisely in Northern Ireland where an ethno-nationalist conflict was raging between a North Ireland separatists against Ireland loyalists. The conflict was plagued by terror actions conducted by the Irish Republican Army (IRA) mainly in Ireland, but also in the rest of United Kingdom. A ceasefire was proclamed at the beginning of 1974. <br>
We see that during this period of time, the most active group in Europe was the IRA, and the most used mean of attack was firearms, followed by explosives. Although this conflict could be seen as a civil war, the IRA was categorized as a terrorist group for their actions against the sovereign country (United Kingdom at the time). <br>
From the map, we clearly see an intense activity of terror attacks in the region of the conflict, as well as a big amount of casualties on the casuatly map.

In [None]:
df_EU_1972 = df.loc[df.region_txt=='Western Europe']
df_EU_1972 = df_EU_1972.loc[np.logical_and(df_EU_1972.iyear >= 1972, df_EU_1972.iyear <= 1973)]

In [None]:
m_frequ = get_heatmap(df_EU_1972, coord=[50., -5.], zoom=5, min_opacity=0.5, blur=1)
m_kill = get_heatmap(df_EU_1972, val='nkill', coord=[50., -5.], zoom=5, min_opacity=0.5, blur=1)

display(HTML('<h3>{}</h3>'.format('Group details'))); display(get_main_actors(df_EU_1972).head(5))
display(HTML('<h3>{}</h3>'.format('Attack frequencies'))); display(m_frequ)
display(HTML('<h3>{}</h3>'.format('Attack casualities'))); display(m_kill)

## 4.3 Northern Ireland and Basque Country (1975-1977)  <a id='EU_1975'></a>

For this third focus, we will see two activity peaks in Europe. The first one is the continuation of the Northern Ireland conflict, after a year of ceasefire, the amount of attacks rose again. The IRA becomes again the most active terrorist group in Europe.

The second activity peak that we can see on the map is located in Spain. After Franco's death in 1975 and during the transition to become a democracy, Spain had to face a regain of activity of regional separatist organisation. We see in Spain two regions of terrorism activity : Catalonia and Basque Country. Undoubitatly, the activity of in the Basque Country is more intense, and the ETA sepratist group is conducting numerous terror attacks to claim independance of its region. We see on the map of casualties that the group is not only targetting infrastructures but is also largely killing the population. 

During this period, the most used means to conduct terror attacks are Explosives, Firearms and incendiaries.


In [None]:
df_EU_1975 = df.loc[df.region_txt=='Western Europe']
df_EU_1975 = df_EU_1975.loc[np.logical_and(df_EU_1975.iyear >= 1975, df_EU_1975.iyear <= 1977)]

In [None]:
m_frequ = get_heatmap(df_EU_1975, coord=[50., -5.], zoom=5, min_opacity=0.5, blur=1)
m_kill = get_heatmap(df_EU_1975, val='nkill', coord=[50., -5.], zoom=5, min_opacity=0.5, blur=1)

display(HTML('<h3>{}</h3>'.format('Group details'))); display(get_main_actors(df_EU_1975).head(5))
display(HTML('<h3>{}</h3>'.format('Attack frequencies'))); display(m_frequ)
display(HTML('<h3>{}</h3>'.format('Attack casualities'))); display(m_kill)

## 4.4 Salvadoran Civil War and Guatemalan Civil War  (1981-1983)    <a id='CA_1981'></a>

https://en.wikipedia.org/wiki/Salvadoran_Civil_War
https://en.wikipedia.org/wiki/Guatemalan_Civil_War

The fourth region and period of focus brings us in Central America during two deadly years of civil war in El Salvador and Guatemala from 1981 to 1983.
The principal belligerants from the first mentionned war were the Salvador State and the Farabundo Marti National Liberation Front. <br>
In Guatemala, the first opponent to the gouvernement is the Nicaraguan Democratic Force.

As it was two armed conflicts, the most used weapon for terror attack were the firearms, followed by explosives.

In [None]:
df_CA_1981 = df.loc[df.region_txt=='Central America & Caribbean']
df_CA_1981 = df_CA_1981.loc[np.logical_and(df_CA_1981.iyear >= 1981, df_CA_1981.iyear <= 1983)]

In [None]:
m_frequ = get_heatmap(df_CA_1981, coord=[15., -85.], zoom=5)
m_kill= get_heatmap(df_CA_1981, val='nkill', coord=[15., -85.], zoom=5)

display(HTML('<h3>{}</h3>'.format('Group details'))); display(get_main_actors(df_CA_1981).head(5))
display(HTML('<h3>{}</h3>'.format('Attack frequencies'))); display(m_frequ)
display(HTML('<h3>{}</h3>'.format('Attack casualities'))); display(m_kill)

## 4.5 South America Conflicts (1984-1987)    <a id='SA_1984'></a>

Durint the 80s South America was hit by multiple conflicts. We can on the first map 3 main zones: Columbia, Peru and Chile.

1. Columbia : This conflict period was the results of the war between Narcotraficante and police. Today we still know the name of Pablo Escobar and the Medelín Cartel. (More information: [Columbian conflict](https://en.wikipedia.org/wiki/Colombian_conflict#1980s)).
2. Peru: [The shining path](https://en.wikipedia.org/wiki/Shining_Path) was a communist militant group. They wanted to establish a dictatorship of the proletariat (communist ideology) including [cultural revolution](https://en.wikipedia.org/wiki/Shining_Path). We can observe that they caused a large amount of casualities.

In [None]:
df_SA_1984 = df.loc[df.region_txt=='South America']
df_SA_1984 = df_SA_1984.loc[np.logical_and(df_SA_1984.iyear >= 1984, df_SA_1984.iyear <= 1987)]

In [None]:
m_frequ = get_heatmap(df_SA_1984, coord=[-20., -60.], zoom=3, min_opacity=0.3, blur=1)
m_kill = get_heatmap(df_SA_1984, val='nkill', coord=[-20., -60.], zoom=3, min_opacity=0.3, blur=1)

display(HTML('<h3>{}</h3>'.format('Group details'))); display(get_main_actors(df_SA_1984).head(5))
display(HTML('<h3>{}</h3>'.format('Attack frequencies'))); display(m_frequ)
display(HTML('<h3>{}</h3>'.format('Attack casualities'))); display(m_kill)

## 4.6 Middle East (2003-2007)  <a id='ME_2003'></a>

Here we can see multiples zones of conflict. When mainly have Algeria and the [Algerian Islamic Extremists](https://en.wikipedia.org/wiki/Armed_Islamic_Group_of_Algeria). We can as well see both the conflict between irael and palestine and the emerging of Al-Qaida in Iraq. Note that medium were mainly explosive and bombs.

In [None]:
df_ME_2003 = df.loc[df.region_txt=='Middle East & North Africa']
df_ME_2003 = df_ME_2003.loc[np.logical_and(df_ME_2003.iyear >= 2003, df_ME_2003.iyear <= 2007)]

In [None]:
m_frequ = get_heatmap(df_ME_2003, coord=[32., 25.], zoom=4, min_opacity=0.5, blur=2)
m_kill = get_heatmap(df_ME_2003, val='nkill', coord=[32., 25.], zoom=4, min_opacity=0.5, blur=2)

display(HTML('<h3>{}</h3>'.format('Group details'))); display(get_main_actors(df_ME_2003).head(5))
display(HTML('<h3>{}</h3>'.format('Attack frequencies'))); display(m_frequ)
display(HTML('<h3>{}</h3>'.format('Attack casualities'))); display(m_kill)

## 4.7 South Asia (2008-2013)  <a id='ME_2008'></a>

The first group that is highloghted is the Taliban in Afganistan. With a huge number of casualities. Same results as the previous one concerning Middle east, most attacks were using bombs and explosives. After we have the communist party in India that is mainly using firearms. Note htat they have the highest ratio of number of attacks and casualities.

In [None]:
df_SAsia_2008 = df.loc[df.region_txt=='South Asia']
df_SAsia_2008 = df_SAsia_2008.loc[np.logical_and(df_SAsia_2008.iyear >= 2008, df_SAsia_2008.iyear <= 2013)]

In [None]:
m_frequ = get_heatmap(df_SAsia_2008, coord=[20., 80.], zoom=4, min_opacity=0.3, blur=1)
m_kill = get_heatmap(df_SAsia_2008, val='nkill', coord=[20., 80.], zoom=4, min_opacity=0.3, blur=1)

display(HTML('<h3>{}</h3>'.format('Group details'))); display(get_main_actors(df_SAsia_2008).head(5))
display(HTML('<h3>{}</h3>'.format('Attack frequencies'))); display(m_frequ)
display(HTML('<h3>{}</h3>'.format('Attack casualities'))); display(m_kill)

## 4.8 Middle East (2013-Today)    <a id='ME_2013'></a>

Here we can see that we are not dealing with Al-Qaida anymore. Here ISIS have clearly overtake Al-Qaida in term of teritorial domination. We can as well see that there is a rising problem in Yemen with a branch of Al-Qaida. It is also interesting to note that in palestine the medium changed. We have now more attacks with melee weapon (Probably mainly knives).

In [None]:
df_ME_2013 = df.loc[df.region_txt=='Middle East & North Africa']
df_ME_2013 = df_ME_2013.loc[np.logical_and(df_ME_2013.iyear >= 2013, df_ME_2013.iyear <= 2017)]

In [None]:
m_frequ = get_heatmap(df_ME_2013, coord=[32., 25.], zoom=4, min_opacity=0.5, blur=2)
m_kill = get_heatmap(df_ME_2013, val='nkill', coord=[32., 25.], zoom=4, min_opacity=0.5, blur=2)

display(HTML('<h3>{}</h3>'.format('Group details'))); display(get_main_actors(df_ME_2013).head(5))
display(HTML('<h3>{}</h3>'.format('Attack frequencies'))); display(m_frequ)
display(HTML('<h3>{}</h3>'.format('Attack casualities'))); display(m_kill)

# 5. References <a id='references'></a>

\[1\] J. Kunegis, S. Schmidt and A. Lommatzsch, [Spectral Analysis of Signed Graphs for Clustering, Prediction and Visualization](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.155.1809&rep=rep1&type=pdf)

\[2\] L. Zelnik-Manor and P. Perona, [Self-Tuning Spectral Clustering](http://papers.nips.cc/paper/2619-self-tuning-spectral-clustering.pdf)

\[3\] D. I Shuman, S. K. Narang, P. Frossard, A. Ortega and P. Vandergheynst [The Emerging Field of Signal Processing on Graphs](https://arxiv.org/pdf/1211.0053.pdf)