# Table of contents
1. [Introduction](#Introduction)
2. [Import data](#Import-data)
3. [Terrorist attacks by latitude/longitude](#Terrorist-attacks-by-latitude/longitude)
4. [Terrorist attacks for years](#Terrorist-attacks-for-years)
5. [Terrorist attacks per holidays](#Terrorist-attacks-per-holidays)
6. [Terrorist attacks per weekday](#Terrorist-attacks-per-weekday)
7. [Terrorist attacks by target](#Terrorist-attacks-by-target)
8. [Terrorist attacks by weapon](#Terrorist-attacks-by-weapon)
9. [Terrorist attacks to cities](#Terrorist-attacks-to-cities)
10. [Terrorist attacks per Moscow district's](#Terrorist-attacks-per-Moscow-district's)



# Introduction

The **aim** of this work was to learn how to use Plotly Library and to practice of drawing the graphs and diagrams.
The inspiration came through this
**[kernel](https://www.kaggle.com/abigaillarion/terrorist-attacks-in-united-states) (by [Abigail Larion](https://www.kaggle.com/abigaillarion))** and therefore it was decided to conduct appropriate analysis for Russia.

Let's begin.

In [1]:
import numpy as np
import pandas as pd
import urllib
import time

from datetime import datetime, date

from IPython.display import display, HTML
# import plotly.plotly as py
import plotly.graph_objs as go

from plotly import tools
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected=True)

import math

import ast
import string
import json

pd.options.mode.chained_assignment = None
mapbox_access_token = 'pk.eyJ1IjoibWFrcy1zaCIsImEiOiJjaXpzaTJ1dXQwMDR6MnhvMWtzejBoZWZsIn0.Ih8a2l8sQDnkMzEXuuPJKA'

#  Import data

In [2]:
terror_data = pd.read_csv('data/globalterrorismdb_0616dist.csv', encoding='ISO-8859-1',
                          usecols=[0, 1, 2, 3, 8, 11, 12, 13, 14, 26, 29, 35, 84, 100, 103])
terror_data = terror_data.rename(
    columns={'eventid':'id', 'iyear':'year', 'imonth':'month', 'iday':'day',
             'country_txt':'country', 'provstate':'state', 'targtype1_txt':'target',
             'weaptype1_txt': 'weapon', 'attacktype1_txt': 'attacktype', 'nkill':'fatalities', 
             'nwound':'injuries', 'addnotes': 'info'})
terror_data['fatalities'] = terror_data['fatalities'].fillna(0).astype(int)
terror_data['injuries'] = terror_data['injuries'].fillna(0).astype(int)
terror_data = terror_data.dropna(how='any', subset=['latitude', 'longitude'])
print('Matrix size:', terror_data.shape)

Matrix size: (152253, 15)


Let's extract data about Russia and make some adjustments.

In [3]:
terror_ru = terror_data[terror_data['country'] == 'Russia']

terror_ru.loc[:, 'day'] = terror_ru\
        .apply(lambda row: str(row['id'])[6:8] if row['day'] == 0 else row['day'], axis=1)
terror_ru.loc[:, 'date'] = pd.to_datetime(terror_ru[['day', 'month', 'year']])

terror_ru = terror_ru.drop_duplicates(['date', 'latitude', 'longitude', 'fatalities'])
terror_ru['text'] = terror_ru['date'].dt.strftime('%B %-d, %Y') + '<br>' +\
                     terror_ru['fatalities'].astype(str) + ' Killed, ' +\
                     terror_ru['injuries'].astype(str) + ' Injured'

terror_peryear = np.asarray(terror_ru.groupby('year').year.count())
terror_years = np.arange(1992, 2016)
terror_years = np.delete(terror_years, [1])        
        
terror_ru.head(3)        

Unnamed: 0,id,year,month,day,country,state,city,latitude,longitude,success,attacktype,target,weapon,fatalities,injuries,date,text
49616,199201060001,1992,1,6,Russia,,Grozny,43.316667,45.683333,0,Assassination,Military,Firearms,0,0,1992-01-06,"January 6, 1992<br>0 Killed, 0 Injured"
49626,199201060012,1992,1,6,Russia,,Grozny,43.316667,45.683333,1,Armed Assault,Military,Firearms,1,0,1992-01-06,"January 6, 1992<br>1 Killed, 0 Injured"
49756,199201170003,1992,1,17,Russia,,Makhachkala,42.978368,47.491066,0,Assassination,Government (General),Explosives/Bombs/Dynamite,0,1,1992-01-17,"January 17, 1992<br>0 Killed, 1 Injured"


Naturally, it is very interesting, but at this stage it can be seen that there are N/A values. Let's have the more detailed look at the missing data.

In [4]:
features = terror_ru.columns.tolist()

trace1 = go.Bar(
    x=features,
    y=[terror_ru[feature].describe()['count'] for feature in features],
    name='Observed Data',
    marker=dict(
        color='#009999'
    )    
)
trace2 = go.Bar(
    x=features,
    y=[terror_ru.shape[0] - terror_ru[feature].describe()['count'] for feature in features],
    name='Missing Data',
    marker=dict(
        color='#BF3030'
    )    
)

data = [trace1, trace2];
layout = go.Layout(
    title='Missingmess Map',
    barmode='relative',
    xaxis=dict(
        title='Feature',
    ),
    yaxis=dict(
        title='Count',
    ),    
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

Thus, so far so good.

# Terrorist attacks by latitude/longitude

Let's make the illustration of the major terrorist attacks over 2015 on the globe.

In [5]:
trace1 = go.Scattergeo(
    geo='geo3',
    lon=terror_data[(terror_data['year']==2015) & (terror_data['fatalities']>=7)]['longitude'],
    lat=terror_data[(terror_data['year']==2015) & (terror_data['fatalities']>=7)]['latitude'],
    mode='markers',
    marker=go.Marker(
        size = 3,
        opacity=0.7,
        color='#A60000',
    ),
    text=terror_data[(terror_data['year']==2015) & (terror_data['fatalities']>=7)]['country']
)

data=[trace1]

layout = go.Layout(
    title='Terrorist attacks with 7 or more fatalities in 2015',
    height=700,
    dragmode='zoom',
    geo3=dict(
        projection=dict(
            type='orthographic', 
        ),
        scope='world', 
        showlakes=True,
        showocean=True,
        showland=True,
        showcountries=True,
    )
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

Now look at Russia by year and analyse the situation more detailed. Let's create an animation.

In [6]:
# make figure
figure = {
    'data': [],
    'layout': {
        'title': 'Terrorist Attacks by Latitude/Longitude in Russia (1992-2015)',
        'width': 1000,
        'height': 700,
        'autosize': True,
        'hovermode': 'closest',
        'showlegend': False,
        'mapbox': {
            'accesstoken': mapbox_access_token,
            'bearing': 0,
            'pitch': 0,
            'zoom': 2,
            'style': 'light',
            'center': {
                'lat': 64.25,
                'lon': 94.15
            }
        }
    },
    'frames': [],
    'config': {'scrollzoom': True}
}

# make slider
figure['layout']['slider'] = {
    'args': [
        'slider.value', {
            'duration': 1000,
            'ease': 'cubic-in-out'
        }
    ],
    'initialValue': str(terror_years[0]),
    'plotlycommand': 'animate',
    'values': [str(year) for year in terror_years],
    'visible': True
}


figure['layout']['updatemenus'] = [
    {
        'buttons': [
            {
                'args': [None, {'frame': {'duration': 1100, 'redraw': False},
                         'fromcurrent': True, 'transition': {'duration': 1000, 'easing': 'quadratic-in-out'}}],
                'label': 'Play',
                'method': 'animate'
            },
            {
                'args': [[None], {'frame': {'duration': 0, 'redraw': False}, 'mode': 'immediate',
                'transition': {'duration': 0}}],
                'label': 'Pause',
                'method': 'animate'
            }
        ],
        'direction': 'left',
        'pad': {'r': 10, 't': 87},
        'showactive': False,
        'type': 'buttons',
        'x': 0.1,
        'xanchor': 'right',
        'y': 0,
        'yanchor': 'top'
    }
]

sliders_dict = {
    'active': 0,
    'yanchor': 'top',
    'xanchor': 'left',
    'currentvalue': {
        'font': {'size': 20},
        'prefix': 'Year:',
        'visible': True,
        'xanchor': 'right'
    },
    'transition': {'duration': 900, 'easing': 'cubic-in-out'},
    'pad': {'b': 10, 't': 50},
    'len': 0.9,
    'x': 0.1,
    'y': 0,
    'steps': []
}

# make initial data
figure['data'] = go.Data([
    go.Scattermapbox(
        lat=list(terror_ru[(terror_ru.fatalities > 0) & (terror_ru.year == terror_years[0])]['latitude']),
        lon=list(terror_ru[(terror_ru.fatalities > 0) & (terror_ru.year == terror_years[0])]['longitude']),
        mode='markers',
        marker=go.Marker(
            size = list(terror_ru[(terror_ru.fatalities > 0) & (terror_ru.year == terror_years[0])]['fatalities'] ** 0.255 * 8),
            opacity=0.5,
            color='#FF0000'
        ),
        text=list(terror_ru[(terror_ru.fatalities > 0) & (terror_ru.year == terror_years[0])]['text']),
        hoverinfo='text'
    ),
    go.Scattermapbox(
        lat=list(terror_ru[(terror_ru.fatalities == 0) & (terror_ru.year == terror_years[0])]['latitude']),
        lon=list(terror_ru[(terror_ru.fatalities == 0) & (terror_ru.year == terror_years[0])]['longitude']),
        mode='markers',
        marker=go.Marker(
            size = list(terror_ru[(terror_ru.fatalities == 0) & (terror_ru.year == terror_years[0])]['injuries'] ** 0.255 * 8),
            opacity=0.5,
            color='#009999'
        ),
        text=list(terror_ru[(terror_ru.fatalities == 0) & (terror_ru.year == terror_years[0])]['text']),
        hoverinfo='text'
    )]
)

# make frames
for year in terror_years:
    frame = {'data': [], 'name': str(year)}
    data_year = go.Data([
        go.Scattermapbox(
            lat=list(terror_ru[(terror_ru.fatalities > 0) & (terror_ru.year == year)]['latitude']),
            lon=list(terror_ru[(terror_ru.fatalities > 0) & (terror_ru.year == year)]['longitude']),
            mode='markers',
            marker=go.Marker(
                size = list(terror_ru[(terror_ru.fatalities > 0) & (terror_ru.year == year)]['fatalities'] ** 0.255 * 8),
                opacity=0.5,
                color='#FF0000'
            ),
            text=list(terror_ru[(terror_ru.fatalities > 0) & (terror_ru.year == year)]['text']),
            hoverinfo='text'
        ),
        go.Scattermapbox(
            lat=list(terror_ru[(terror_ru.fatalities == 0) & (terror_ru.year == year)]['latitude']),
            lon=list(terror_ru[(terror_ru.fatalities == 0) & (terror_ru.year == year)]['longitude']),
            mode='markers',
            marker=go.Marker(
                size = list(terror_ru[(terror_ru.fatalities == 0) & (terror_ru.year == year)]['injuries'] ** 0.255 * 8),
                opacity=0.5,
                color='#009999'
            ),
            text=list(terror_ru[(terror_ru.fatalities == 0) & (terror_ru.year == year)]['text']),
            hoverinfo='text'
        )
    ])
    frame['data'].extend(data_year)
    figure['frames'].append(frame)

    slider_step = {'args': [
        [str(year)],
        {'frame': {'duration': 900, 'redraw': False},
         'mode': 'immediate',
       'transition': {'duration': 900}}
     ],
     'label': str(year),
     'method': 'animate'}
    sliders_dict['steps'].append(slider_step)
    
figure['layout']['sliders'] = [sliders_dict]

#make plot:
iplot(figure)

# Terrorist attacks for years
Посмотрим на количество терактов каждый год

In [12]:
trace1 = go.Bar(
    x = terror_years,
    y = terror_peryear.cumsum(),
    name = 'Total number',
    marker=dict(
        color='#009999'
    )
)

trace2 = go.Scatter(
    x = terror_years,
    y = terror_peryear,
    name = 'Per year',
    mode = 'lines+markers',
    marker = dict(
        size = 5,
        symbol = 'diamond',
        color='#BF3030',
    ),
    line = dict(
        width = 2,
        color='#BF3030',
    ),
)


layout = go.Layout(
    title = 'Terrorist Attacks for years in Russia (1992-2015)',
    barmode='group',
    xaxis = dict(
        title = 'Year',
    ),
    yaxis = dict(
        title = 'Number of attacks',
    ),
    legend=dict(
        x=0,
        y=1
    )
)

data = [trace1, trace2]

fig = dict(data = data, layout = layout)
iplot(fig)

The Russian Federation was mostly suffered from terrorism in 2010. 

# Terrorist attacks per holidays

We shall try to identify which days were chosen for attacks: workdays, pre-holidays or holidays including week-ends.

Unfortunately, I could find the production calendars only since 1999. Thus, the analysis would be conducted since 1999 to 2015.

In [25]:
# import calendar
dates = pd.read_csv('data/data-20161107T1038-structure-20161107T1038.csv', engine='python', skiprows=1, skipfooter=4,
                    usecols=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], index_col=0, 
                    names=['year', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12'])
dates.head()

Unnamed: 0_level_0,1,2,3,4,5,6,7,8,9,10,11,12
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1999,"1,2,3,4,6*,7,9,10,16,17,23,24,30,31",67131420212728,678131420212728,"3,4,10,11,17,18,24,25,30*",12348910151622232930,"5,6,11*,12,13,14,19,20,26,27",3410111718242531,178141521222829,45111218192526,23910161723243031,678131420212728,"4,5,11,12,13,18,19,25,26,31*"
2000,"1,2,3,4,6*,7,8,9,15,16,22,23,29,30",56121319202627,"4,5,7*,8,11,12,18,19,25,26",1289151622232930,"1,2,6,7,8*,9,13,14,20,21,27,28",3410111217182425,1289151622232930,56121319202627,239101617232430,178141521222829,457111218192526,"2,3,9,10,11*,12,16,17,23,24,30,31"
2001,12678131420212728,34101117182425,"3,4,7*,8,10,11,17,18,24,25,31","1,7,8,14,15,21,22,28,29,30*","1,2,5,6,8*,9,12,13,19,20,26,27","2,3,9,10,11*,12,16,17,23,24,30",178141521222829,45111218192526,1289151622232930,67131420212728,"3,4,6*,7,10,11,17,18,24,25",128912151622232930
2002,12567121319202627,"2,3,9,10,16,17,22*,23,24,25","2,3,7*,8,9,10,16,17,23,24,30,31","6,7,13,14,20,21,28,30*","1,2,3,4,5,8*,9,10,11,12,19,25,26","1,2,8,9,11*,12,15,16,22,23,29,30",67131420212728,3410111718242531,178141521222829,56121319202627,"2,3,6*,7,8,9,16,17,23,24,30","1,7,8,11*,12,13,14,21,22,28,29,31*"
2003,"1,2,3,5*,6,7,11,12,18,19,25,26",12891516222324,"1,2,7*,8,9,10,15,16,22,23,29,30","5,6,12,13,19,20,26,27,30*","1,2,3,4,8*,9,10,11,17,18,24,25,31","1,7,8,11*,12,13,14,15,22,28,29",56121319202627,23910161723243031,67131420212728,45111218192526,"1,2,6*,7,8,9,15,16,22,23,29,30","6,7,11*,12,13,14,20,21,27,28,31*"


So, this dataframe presents data about working days with pre-holidays (these days are marked '*'). Let's identify pre-holidays and holidays including week-ends.

In [26]:
pre_holidays = []
holidays = []
for year in dates.index:
    for month in dates.columns:
        for day in dates.loc[year, month].split(','):
            if day.endswith('*'):
                pre_holidays.append(datetime(year, int(month), int(day[:len(day)-1]), 0, 0))
            else:
                holidays.append(datetime(year, int(month), int(day), 0, 0))
                
# transformate to pretty
calendar = pd.DataFrame(data={
    'date': pd.date_range('1/1/1999', '12/31/2015', freq='D'),
})                

In [27]:
def determ(date):
    if date in holidays:
        return 'holiday'
    if date in pre_holidays:
        return 'pre-holiday'
    else: return 'workday'

calendar['status'] = calendar['date'].apply(determ)
calendar['year'] = calendar['date'].apply(lambda x: x.year)
calendar = calendar.groupby(['year', 'status']).count().unstack()['date']

calendar.head(3)

status,holiday,pre-holiday,workday
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1999,114,4,247
2000,116,4,246
2001,114,5,246
2002,115,8,242
2003,115,8,242


In [18]:
terror_ru['holiday'] = terror_ru['date'].apply(lambda x: x.to_pydatetime() in holidays).astype(int)
terror_ru['pre_holiday'] = terror_ru['date'].apply(lambda x: x.to_pydatetime() in pre_holidays).astype(int)

t_years = np.arange(1999, 2016)
t_peryear = np.asarray(terror_ru[terror_ru['year']>1998].groupby('year').year.count())
t_peryear_hol = np.asarray(terror_ru[terror_ru['year']>1998].groupby('year').sum().holiday)
t_peryear_pre_hol = np.asarray(terror_ru[terror_ru['year']>1998].groupby('year').sum().pre_holiday)

In [19]:
trace1 = go.Bar(
    x = t_years,
    y = calendar['pre-holiday'],
    name = 'days',    
    marker=dict(
        color='#009999'
    )
)

trace2 = go.Bar(
    x = t_years,
    y = t_peryear_pre_hol,
    name = 'Num of terracts',
    marker=dict(
        color='#A60000'
    )    
)

trace3 = go.Bar(
    x = t_years,
    y = calendar['holiday'],
    name = 'days',
    marker=dict(
        color='#009999'
    )    
)

trace4 = go.Bar(
    x = t_years,
    y = t_peryear_hol,
    name = 'Num of terracts',
    marker=dict(
        color='#A60000'
    )    
)

trace5 = go.Bar(
    x = t_years,
    y = calendar['workday'],
    name = 'days',
    marker=dict(
        color='#009999'
    )    
)

trace6 = go.Bar(
    x = t_years,
    y = t_peryear - t_peryear_hol - t_peryear_pre_hol,
    name = 'Num of terracts',
    marker=dict(
        color='#A60000'
    )
)

fig = tools.make_subplots(rows=2, cols=2, specs=[[{}, {}], [{'colspan': 2}, None]],
                          subplot_titles=('Pre-holidays',
                                          'Holidays', 
                                          'Workdays'))


fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 1)
fig.append_trace(trace3, 1, 2)
fig.append_trace(trace4, 1, 2)
fig.append_trace(trace5, 2, 1)
fig.append_trace(trace6, 2, 1)

fig['layout']['xaxis1'].update(title='years')
fig['layout']['xaxis2'].update(title='years')
fig['layout']['xaxis3'].update(title='years', showgrid=False)
fig['layout'].update(showlegend=False, height=600, barmode='group',
                     title='Terrorist Attacks by type of the day in Russia (amount) (1999-2015)')

iplot(fig)

This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y2 ]
[ (2,1) x3,y3           -      ]



However, it could be interesting to have a look at type-days' relative coverage. Let's create the grouped bar chart.

In [20]:
trace1 = go.Bar(
    x = t_years,
    y = (t_peryear - t_peryear_hol - t_peryear_pre_hol) / np.asarray(calendar['workday']),
    name = 'Workdays',
    marker=dict(
        color='#FF0000'
    ) 
)

trace2 = go.Bar(
    x = t_years,
    y = t_peryear_hol / np.asarray(calendar['holiday']),
    name = 'Holidays',
    marker=dict(
        color='#FF7400'
    )    
)

trace3 = go.Bar(
    x = t_years,
    y = t_peryear_pre_hol / np.asarray(calendar['pre-holiday']),
    name = 'Pre-holidays',
    marker=dict(
        color='#CD0074'
    )    
)

data = [trace1, trace2, trace3]

layout = go.Layout(
    title='Terrorist Attacks by type of the day in proportion in Russia<br>(relative indicator) (1999-2015)',
    barmode='group',
    xaxis=dict(
        title='year',
    ),
    yaxis=dict(
        title='proportion of teracts to days',
    ),
    legend=dict(
        x=0,
        y=1
    )
)

figure = dict(data = data, layout = layout)
iplot(figure)

# Terrorist attacks per weekday

In [21]:
terror_ru['weekday'] = terror_ru['date'].apply(lambda x: datetime.isoweekday(x))

trace = go.Bar(
    x=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'],
    y=terror_ru.groupby('weekday').weekday.count().tolist(),
    marker=dict(
        color=['#009999'] * 5 + ['#A60000'] * 2
    ),
    
)

layout = go.Layout(
    title = 'Terrorist Attacks by Weekday in Russia (1992-2015)',
    xaxis = dict(
        title = 'Weekday',
    ),
    yaxis = dict(
        title = 'Number of attacks',
    )
)

data = [trace]

figure = dict(data = data, layout = layout)
iplot(figure)

В России реже всего терракты происходят по субботам.
Среди будних дней больше террактов приходится на четверги, меньше всего – на среду.

# Terrorist attacks by target

The aims of the terrorist attacks were combined into groups. 

In [9]:
target_codes = []

for attack in terror_ru['target'].values:
    if attack in ['Business', 'Journalists & Media', 'NGO']:
        target_codes.append(1)
    elif attack in ['Government (General)', 'Government (Diplomatic)']:
        target_codes.append(2)
    elif attack == 'Educational Institution':
        target_codes.append(5)
    elif attack == 'Police':
        target_codes.append(6)
    elif attack == 'Military':
        target_codes.append(7)
    elif attack == 'Religious Figures/Institutions':
        target_codes.append(8)
    elif attack in ['Airports & Aircraft', 'Maritime', 'Transportation']:
        target_codes.append(9)
    elif attack in ['Food or Water Supply', 'Telecommunication', 'Utilities']:
        target_codes.append(10)
    else:
        target_codes.append(3)

terror_ru['target'] = target_codes
target_categories = ['Business', 'Government', 'Individuals', 'Education',
                     'Police', 'Military', 'Religion', 'Transportation', 'Infrastructure']

# terrorist attacks by target
target_count = np.asarray(terror_ru.groupby('target').target.count())
target_percent = np.round(target_count / sum(target_count) * 100, 2)

# terrorist attack fatalities by target
target_fatality = np.asarray(terror_ru.groupby('target')['fatalities'].sum())

# terrorist attack injuries by target
target_injury = np.asarray(terror_ru.groupby('target')['injuries'].sum())

In [10]:
target_text = []
for i in range(0, 9):
    target_text.append(target_categories[i] + ' (' + target_percent[i].astype(str) 
                       + '%)<br>' + target_fatality[i].astype(str) + ' Killed, '
                       + target_injury[i].astype(str) + ' Injured')

data = [go.Scatter(
        x = target_injury,
        y = target_fatality,
        text = target_text,
        mode = 'markers',
        hoverinfo = 'text',
        marker = dict(
            size = target_count / 6.5,
            opacity = 0.6,
            color = '#A60000'
        )
        )
       ]

layout = go.Layout(
    title = 'Terrorist Attacks by Target in Russia (1992-2015)',
    hovermode='closest',
    xaxis = dict(
        title = 'Injuries',
        type = 'log',
        tickmode = 'auto',
        nticks = 2,
        showline = True,
        showgrid = False
    ),
    yaxis = dict(
        title = 'Fatalities',
        type = 'log',
        tickmode = 'auto',
        nticks = 2,
        showline = True,
        showgrid = False
    )
)

figure = dict(data = data, layout = layout)
iplot(figure)

The education attaks rate is the lowest (1.19%), but there was a huge amount of victims. Probably, it is conserned with dreadful tragedy in [Beslan](https://en.wikipedia.org/wiki/Beslan_school_siege).

## Terrorist attacks by weapon

In [11]:
# terrorist attack weapons grouped in categories
weapon_codes = []

for attack in terror_ru['weapon'].values:
    if attack in ['Explosives/Bombs/Dynamite', 'Sabotage Equipment']:
        weapon_codes.append(1)
    elif attack == 'Incendiary':
        weapon_codes.append(2)
    elif attack in ['Firearms', 'Fake Weapons']:
        weapon_codes.append(3)
    elif attack == 'Melee':
        weapon_codes.append(5)
    elif attack in ['Chemical', 'Radiological']:
        weapon_codes.append(7)
    else:
        weapon_codes.append(4)

terror_ru['weapon'] = weapon_codes
weapon_categories = ['Explosives', 'Flammables', 'Firearms', 'Miscellaneous',
                     'Knives', 'Chemicals']

# terrorist attacks by weapon
weapon_count = np.asarray(terror_ru.groupby('weapon').weapon.count())
weapon_percent = np.round(weapon_count / sum(weapon_count) * 100, 2)

# terrorist attack fatalities by weapon
weapon_fatality = np.asarray(terror_ru.groupby('weapon')['fatalities'].sum())

# terrorist attack injuries by weapon
weapon_injury = np.asarray(terror_ru.groupby('weapon')['injuries'].sum())

weapon_text = []
for i in range(0, 6):
    weapon_text.append(weapon_categories[i] + ' (' + weapon_percent[i].astype(str) 
                       + '%)<br>' + weapon_fatality[i].astype(str) + ' Killed, '
                       + weapon_injury[i].astype(str) + ' Injured')
    
data = [go.Scatter(
        x = weapon_injury,
        y = weapon_fatality,
        text = weapon_text,
        mode = 'markers',
        hoverinfo = 'text',
        marker = dict(
            size = (weapon_count + 50) / 10,
            opacity = 0.6,
            color = '#A60000')
        )]

layout = go.Layout(
         title = 'Terrorist Attacks by Weapon in Russia (1992-2015)',
         xaxis = dict(
             title = 'Injuries',
             type = 'log',
             tickmode = 'auto',
             nticks = 4,
             showline = True,
             showgrid = False
         ),
         yaxis = dict(
             title = 'Fatalities',
             type = 'log',
             tickmode = 'auto',
             nticks = 3,
             showline = True,
             showgrid = False)
         )


figure = dict(data = data, layout = layout)
iplot(figure)

More than half of the attacks in Russia were occured by the terrorists using explosive weapons.

# Terrorist attacks to cities

Let's make a bar chart, which presents the most vulnerable cities in Russia:

In [7]:
cities = terror_ru['city'].value_counts()[terror_ru['city'].value_counts()>=20]
# remove unknown city
cities = cities[cities.index!='Unknown']

trace = go.Bar(
    x=cities.index,
    y=cities,
    marker=dict(
        color='#009999'
    )    
)

data = [trace]
layout = go.Layout(
    title='The most frequently terrorist attacked cities in Russia (1992-2015)',
    xaxis=dict(
        title='City',
    ),
    yaxis=dict(
        title='Number of attacks',
    ),  
    annotations=[
        dict(x=xi,y=yi,
             text=str(yi),
             xanchor='center',
             yanchor='bottom',
             showarrow=False,
        ) for xi, yi in zip(cities.index, cities)
    ]
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

The above-mentioned cities could be seen as injures and fatalities coordinates.

In [8]:
fat = terror_ru[terror_ru['city'].isin(cities.index)].groupby('city').sum()['fatalities']
inj = terror_ru[terror_ru['city'].isin(cities.index)].groupby('city').sum()['injuries']

trace = go.Scatter(
    x=inj,
    y=fat,
    mode='markers',
    marker=go.Marker(
        size = (fat+inj) ** 0.65,
        opacity=0.7,
        color='#A60000',
    ),
    text=terror_ru[terror_ru['city'].isin(cities.index)].groupby('city').sum().index,
    hoverinfo='text',
)

data = [trace]

layout = go.Layout(
    title='The most frequently terrorist attacked cities in Russia (1992-2015)',
    xaxis=dict(
        title='fatalities',
    ),
    yaxis=dict(
        title='injuries',
    ),
    hovermode='closest',
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

C помощью данного графика можно заметить, что на Назрань, занимающая 3ье место по количеству терактов в городе, приходится значительно меньше жерт. В свою очередь, Владикавказ, на счету которого всего лишь 22 теракта занимает 3 место по количеству пострадавших.

Почти все города, присутствующие на данном графике, находятся на Кавказе. Исключением является Москва. Посмотрим на нее более подробно.

## Terrorist attacks per Moscow district's

Для полноты картины рассмотрим данные о Москве за все время, представленное в данных. Так как Москва в исходных представлена по-разному: 
* Различное написание - Moscow или Moscow-city,
* Различное заполнение - иногда Moscow пишется в поле State, а в поле City указывается район,

определим функцию, которая по географическим координатам будет определять район города Москвы:

In [22]:
# library for geo-json
from shapely.geometry import shape, Point, mapping

url = 'http://gis-lab.info/data/mos-adm/mo.geojson'
moscow_data = urllib.request.urlopen(url).read().decode('utf-8')
moscow = ast.literal_eval(moscow_data)

polygons = [shape(d['geometry']) for d in moscow['features']]

def determine_district(lon, lat):
    try:
        return list(map(lambda poly: Point(lon, lat).within(poly), polygons)).index(True)
    except:
        return -1
    
terror_data['Msc_district'] = terror_data\
    .apply(lambda row: determine_district(row['longitude'], row['latitude']), axis=1)

# only Moscow
terror_msc = terror_data[terror_data['Msc_district'] >= 0]    

HTTPError: HTTP Error 404: Not Found

In [None]:
# use sigmoid function to determine the color
def sigmoid(x):
    return 1 / (1 + math.exp(-0.3*x))

polygons = []
districts = []
num_of_acts = terror_msc['Msc_district'].value_counts().apply(sigmoid)
for i, district in enumerate(moscow['features']):
    if i in num_of_acts.index:
        color = 'rgba(163, 22, 19, {})'.format((num_of_acts[i]))
    else:
        color = 'rgba(240, 248, 255, 0.3)'
    d = {"type": "FeatureCollection"}
    d['features'] = [district]
    districts.append(dict(
            sourcetype = 'geojson',
            source = d,
            type = 'fill',
            color = color
        )
    )
    polygons.append(shape(d['features'][0]['geometry']))
    
data = go.Data([
    go.Scattermapbox(
        mode='markers',
    )
])
layout = go.Layout(
    title='The number of attacks per district of Moscow (1970-2015)',
    height=800,
    autosize=True,
    hovermode='closest',
    mapbox=dict(
        layers=districts,
        accesstoken=mapbox_access_token,
        bearing=0,
        center=dict(
            lat=55.75222,
            lon=37.61556
        ),
        pitch=0,
        zoom=10,
        style='light'
    ),
)

fig = dict(data=data, layout=layout)
iplot(fig)

К сожалению, данные очень грязные и сырые. Именно этим объясняется огромное количество терактов в Тверском районе – очевидно, для некоторых терактов указывался не точный адрес места происшествия, а координаты Москвы (Кремля).