# DRIVE SAFE IN GENEVA


## Table of Contents

<p><div class="lev1"> <a href='#1Project_planning'><span class="toc-item-num">1.&nbsp;&nbsp;</span>Project planning</a> </div> 
<div class="lev1"><a href='#2Dataset_description'><span class="toc-item-num">2.&nbsp;&nbsp;</span>Dataset description</a></div> 
<div class="lev1"><a href='#3Understand_data'><span class="toc-item-num">3.&nbsp;&nbsp;</span>Understand the data</a></div> 
<div class="lev1"><a href='#31Libraries'><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Libraries import and dataset load</a></div>
<div class="lev1"><a href='#32Feature'><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Feature subgroup exploration</a></div>
<div class="lev1"><a href='#321Time'><span class="toc-item-num">3.2.1&nbsp;&nbsp;</span>Time</a></div>
<div class="lev1"><a href='#322Localisation'><span class="toc-item-num">3.2.2&nbsp;&nbsp;</span>Localisation</a></div>
<div class="lev1"><a href='#323Conditions'><span class="toc-item-num">3.2.3&nbsp;&nbsp;</span>Conditions</a></div>
<div class="lev1"><a href='#324Accident'><span class="toc-item-num">3.2.4&nbsp;&nbsp;</span>Accident type</a></div>
</p>

## 3.1. Libraries import and  dataset load
<a id='31Libraries'></a>

In [44]:
# Import libraries
import pandas as pd
import numpy as np
import branca.colormap as cm # for color steps
%matplotlib inline 

import folium

from pyproj import Proj, transform

from datetime import datetime
from datetime import date, time
from dateutil.parser import parse

In [45]:
# Read the dataset
acc_data = '../_data/OTC_ACCIDENTS.csv'
compt_trafic_data = '../_data/OTC_COMPTAGE_TRAFIC.csv'
acc_df = pd.read_csv(acc_data, sep=';', encoding='latin-1')
compt_trafic_df = pd.read_csv(compt_trafic_data, sep=';', encoding='latin-1')

In [46]:
# Show the df to have a better idea
acc_df.head(3)

Unnamed: 0,ID_ACCIDENT,DATE_,GROUPE_ACCIDENT,CAUSE,COMMUNE,CONDITIONS_LUMINEUSES,CONDITIONS_METEO,CONSEQUENCES,COOR_X,COOR_Y,...,NB_MOTOS_50,NB_MOTOS_125,NB_MOTOS_11KW,NB_VOITURES_TOURISME,NB_VOITURES_LIVRAISON,NB_CAMIONS,NB_BUS,NB_TRAM,E,N
0,876245.0,2010-11-30 00:00:00,Dérapage ou perte de maîtrise,Inattention et distraction - Manque d'attentio...,Genève,Nuit,Chute de neige,Avec blessés légers,2500774.0,1117364.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,2500774.0,1117364.0
1,879408.0,2010-12-08 00:00:00,Autres,Utilisation inadéquate du véhicule - Stationne...,Genève,Jour,Beau,Autres,2498974.0,1118100.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,2498974.0,1118100.0
2,877254.0,2010-12-02 00:00:00,Dérapage ou perte de maîtrise,Inobservation de signaux ou de la signalisatio...,Vandoeuvres,Jour,Couvert,Avec blessés légers,2504618.0,1119635.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,2504618.0,1119635.0


In [47]:
# Shape of the dataset
acc_df.shape

(19231, 35)

In [48]:
# Information of the dataset
acc_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19231 entries, 0 to 19230
Data columns (total 35 columns):
ID_ACCIDENT              19231 non-null float64
DATE_                    19231 non-null object
GROUPE_ACCIDENT          19231 non-null object
CAUSE                    19231 non-null object
COMMUNE                  19231 non-null object
CONDITIONS_LUMINEUSES    19231 non-null object
CONDITIONS_METEO         19231 non-null object
CONSEQUENCES             19231 non-null object
COOR_X                   19231 non-null float64
COOR_Y                   19231 non-null float64
ETAT_ROUTE               19231 non-null object
GENRE_ROUTE              19231 non-null object
HEURE                    19230 non-null object
JOUR                     19231 non-null object
LOCALITE                 19231 non-null object
NB_ENFANTS_IMPLIQUES     19231 non-null float64
NB_ENFANTS_ECOLE         19231 non-null float64
NB_BLESSES_LEGERS        19231 non-null float64
NB_BLESSES_GRAVES        19231 non-null

The Geneva accidents dataset has 19'231 datapoints with 35 different features. They can be grouped in the following feature subgroups:

1) **Time**: DATE_, HEURE, JOUR

2) **Localisation**: COMMUNE, COOR_X, COOR_Y, LOCALITE, E, N 

3) **Conditions**: CONDITIONS_LUMINEUSES, CONDITIONS_METEO, ETAT_ROUTE, GENRE_ROUTE

4) **Accident type**: GROUPE_ACCIDENT, CAUSE 

5) **Number and type of people**: NB_ENFANTS_IMPLIQUES, NB_ENFANTS_ECOLE, NB_BLESSES_LEGERS, NB_BLESSES_GRAVES, NB_TUES, NB_PIETONS  

6) **Number of vehicles involved**: NB_BICYCLETTES, NB_VAE_25, NB_VAE_45, NB_CYCLOMOTEURS, NB_MOTOS_50, NB_MOTOS_125, NB_MOTOS_11KW, NB_VOITURES_TOURISME, NB_VOITURES_LIVRAISON, NB_CAMIONS, NB_BUS, NB_TRAM


## 3.2 Feature subgroup exploration
<a id='32Feature'></a>

### 3.2.1 Time
<a id='321Time'></a>

An important feature of the accidents datasets is the time: It answers the question when the accidents happen. This feature will be very useful for our further analysis. As time features, there are:
- Date
- Hour
- Day of the week

Let's check the format of this three features:

In [49]:
print(acc_df.DATE_.head(2))
print(acc_df.HEURE.head(2))
print(acc_df.JOUR.head(2))

0    2010-11-30 00:00:00
1    2010-12-08 00:00:00
Name: DATE_, dtype: object
0    1899-12-30 21:00:00
1    1899-12-30 14:00:00
Name: HEURE, dtype: object
0       Mardi
1    Mercredi
Name: JOUR, dtype: object


In order to use these features, we need to change the format. In addition, to enrich our analysis, we will create the `Year`, `Month` and `Day` of the month to find more correlations with the accidents:

In [50]:
# Datetime format
acc_df['DATE_'] = acc_df['DATE_'].apply(lambda d: pd.to_datetime(d))
acc_df['HEURE'] = acc_df['HEURE'].apply(lambda d: pd.to_datetime(d))

# Create new time features
acc_df['YEAR'] = [date.year for date in acc_df['DATE_']]
acc_df['MONTH'] = [date.month for date in acc_df['DATE_']]
acc_df['DAY'] = [date.day for date in acc_df['DATE_']]
acc_df['HEURE'] = acc_df['HEURE'].fillna(acc_df['HEURE'].iloc[0])  # Fillna with first value of the df (Error neglible)
#acc_df.info()

### 3.2.2 Localisation
<a id='322Localisation'></a>

An other important question to ask is WHERE?. The localisation features will help to answer this question and find the relation with the other features. 
As localisation features, we can find:
- **COOR_X**: X coordenate in 'epsg_2056' reference system
- **COOR_Y**: X coordenate in 'epsg_2056' reference system

E and N gives the same information as COOR_X and COOR_Y. This is why they will be dropped. The COOR_X and COOR_Y coordenates will be projected in the GPS coordenates, also called 'epsg:4326' reference system. For this, the Proj and transform libraries will be used:

In [51]:
# projection definition
p1 = Proj(init='epsg:2056')
p2 = Proj(init='epsg:4326')

# Helper functions
def coord_proj(acc_df,i, p1, p2):
    x1 = acc_df['COOR_X'].loc[i]
    y1 = acc_df['COOR_Y'].loc[i]
    x2, y2 = transform(p1,p2,x1,y1)
    acc_df['COOR_X'].set_value(i, x2)
    acc_df['COOR_Y'].set_value(i, y2)
    return acc_df

# Project data
for i in range(0, len(acc_df['COOR_X'])-1):
    acc_df = coord_proj(acc_df,i, p1, p2)
# Delete unuseful columns
#del acc_df['N']
#del acc_df['E']
acc_df.head(3)

Unnamed: 0,ID_ACCIDENT,DATE_,GROUPE_ACCIDENT,CAUSE,COMMUNE,CONDITIONS_LUMINEUSES,CONDITIONS_METEO,CONSEQUENCES,COOR_X,COOR_Y,...,NB_VOITURES_TOURISME,NB_VOITURES_LIVRAISON,NB_CAMIONS,NB_BUS,NB_TRAM,E,N,YEAR,MONTH,DAY
0,876245.0,2010-11-30,Dérapage ou perte de maîtrise,Inattention et distraction - Manque d'attentio...,Genève,Nuit,Chute de neige,Avec blessés légers,6.153116,46.200401,...,1.0,0.0,0.0,0.0,0.0,2500774.0,1117364.0,2010,11,30
1,879408.0,2010-12-08,Autres,Utilisation inadéquate du véhicule - Stationne...,Genève,Jour,Beau,Autres,6.129641,46.206753,...,1.0,0.0,0.0,0.0,0.0,2498974.0,1118100.0,2010,12,8
2,877254.0,2010-12-02,Dérapage ou perte de maîtrise,Inobservation de signaux ou de la signalisatio...,Vandoeuvres,Jour,Couvert,Avec blessés légers,6.202445,46.221384,...,1.0,0.0,0.0,0.0,0.0,2504618.0,1119635.0,2010,12,2


# LET'S GO FOR THE VISUALIZATION

In [52]:
m = folium.Map(location=[46.254423, 6.142972], tiles='cartodbpositron', zoom_start=11)
folium.GeoJson(open("map-3.geojson",encoding = "utf-8-sig").read()).add_to(m)
m

# associate each accident to the correct polygon

In [53]:
import json
from shapely.geometry import shape, Point

# load GeoJSON file containing sectors
with open('map-3.geojson') as f:
    js = json.load(f)

# adding an ID to every region in the json file:
for i in range(len(js['features'])):
    js['features'][i]['id']=i
js['features'][9]['id']=0
js['features'][8]['id']=0

### ASSIGNING EACH ACCIDENT ITS CORRECT REGION
acc_df['Region ID'] = 0
polygons = js['features']
for i in acc_df.index:
    point = Point(acc_df.loc[i][['COOR_X','COOR_Y']])
    for feature in polygons:
        polygon = shape(feature['geometry'])
        if polygon.contains(point):
            acc_df.set_value(i,'Region ID',feature['id'])

### let's visualize the total number of accidents in the years 2010-2016

In [54]:
from IPython.html.widgets import interact

In [55]:


def _maps(year):
    m = folium.Map(location=[46.254423, 6.142972], tiles='cartodbpositron', zoom_start=11)
    accidents_2015_by_region = acc_df.loc[acc_df['YEAR']==year].groupby('Region ID').count()['ID_ACCIDENT']
    m.choropleth(geo_data = js,
                 data = accidents_2015_by_region,
                 key_on = 'id',
                 fill_color = 'YlOrRd', fill_opacity=1,
                 line_color = 'black', line_weight = 1, line_opacity=0.9,
                 smooth_factor = 1,
                 reset = True, # False by default, put true if you wanna remove previous layers
                 highlight = True, # hovering
                 legend_name='Accidents in 2015')
    return m
year_list = np.arange(2010,2017)

m = interact(_maps,year=year_list)

Widget Javascript not detected.  It may not be installed or enabled properly.


In [58]:
# do one per dataset
def get_feature_to_plot(year,feature, value, normalize):
    serie = acc_df.loc[(acc_df['YEAR']==year) & (acc_df[feature]==value)].groupby('Region ID').count()['ID_ACCIDENT']
    if normalize == True:   
        total_number_of_accidents_in_year_per_region = acc_df.loc[(acc_df['YEAR']==year)].groupby('Region ID').count()['ID_ACCIDENT']
        serie = serie / total_number_of_accidents_in_year_per_region
    serie.fillna(0,inplace=True)
    print(serie)
    return serie


def serie_color(feature,serie_to_plot,colormap):
    """return color for countries, black if no data ':'
    depends on colormap settings"""

    if feature['id'] not in list(serie_to_plot.index):
        return '#000000'
    else:
        return colormap(serie_to_plot[feature['id']])
    
def create_colormap(index = 1, min_=0, max_=10,steps=10 ):
    if index == 1:
        step = cm.StepColormap(
            ['green', 'yellow', 'red'],
            vmin=3, vmax=10,
            index=[3, 4, 8, 10],
            caption='step'
            )
    if index == 2 :
        step = cm.linear.Set1.scale(min_, max_).to_step(steps)
    return step


def create_map (year, feature, value, normalize):
    m_serie = get_feature_to_plot(year, feature, value, normalize) # get the serie
    colormap = create_colormap(2,m_serie.min(),m_serie.max(),steps = 5) # create a colorfeature
    m = folium.Map(location=[46.254423, 6.142972], tiles='cartodbpositron', zoom_start=11)
    
    folium.GeoJson(js,
                  style_function=lambda feature: {
                'fillColor': serie_color(feature,m_serie,colormap),
                'color' : 'black',
                'weight' : 2,
                'dashArray' : '5, 5'
                }).add_to(m)
    
    colormap.caption = 'Accidents'
    m.add_child(colormap)
   
    return m

year_list = np.arange(2010,2017)


def create_interactive_map(feature, normalize=True):
    value_list = list(acc_df[feature].value_counts().index)
    m = interact(create_map, year = year_list, feature = feature, value = value_list, normalize= normalize)
    return m

In [59]:
create_interactive_map('GROUPE_ACCIDENT')

Widget Javascript not detected.  It may not be installed or enabled properly.


<function __main__.create_map>

In [60]:
create_interactive_map('CAUSE')

Widget Javascript not detected.  It may not be installed or enabled properly.


<function __main__.create_map>

In [61]:
create_interactive_map('COMMUNE')

Widget Javascript not detected.  It may not be installed or enabled properly.


<function __main__.create_map>

In [62]:
create_interactive_map('CONDITIONS_LUMINEUSES')

Widget Javascript not detected.  It may not be installed or enabled properly.


<function __main__.create_map>

In [63]:
create_interactive_map('CONDITIONS_METEO')

Widget Javascript not detected.  It may not be installed or enabled properly.


<function __main__.create_map>

In [64]:
create_interactive_map('CONSEQUENCES')

Widget Javascript not detected.  It may not be installed or enabled properly.


<function __main__.create_map>

In [65]:
def get_number_to_plot(year,feature):
    serie = acc_df.loc[(acc_df['YEAR']==year)].groupby('Region ID').sum()[feature]
    serie.fillna(0,inplace=True)
    return serie

def create_NB_map (year, feature):
    m_serie = get_number_to_plot(year, feature) # get the serie
    colormap = create_colormap(2,m_serie.min(),m_serie.max(),steps = 5) # create a colorfeature
    m = folium.Map(location=[46.254423, 6.142972], tiles='cartodbpositron', zoom_start=11)
    
    folium.GeoJson(js,
                  style_function=lambda feature: {
                'fillColor': serie_color(feature,m_serie,colormap),
                'color' : 'black',
                'weight' : 2,
                'dashArray' : '5, 5'
                }).add_to(m)
    
    colormap.caption = 'Accidents'
    m.add_child(colormap)
   
    return m



def create_interactive_ABSOLUTE_NB_map():
    year_list = np.arange(2010,2017)
    feature_list = ['NB_ENFANTS_IMPLIQUES', 'NB_ENFANTS_ECOLE']
    m = interact(create_NB_map, year = year_list, feature = feature_list)
    return m

In [66]:
create_interactive_ABSOLUTE_NB_map()

Widget Javascript not detected.  It may not be installed or enabled properly.


<function __main__.create_NB_map>

In [67]:
blesses_et_tues = ['NB_BLESSES_LEGERS',
       'NB_BLESSES_GRAVES', 'NB_TUES']

vehicles = ['NB_BICYCLETTES',
       'NB_VAE_25', 'NB_VAE_45', 'NB_CYCLOMOTEURS', 'NB_MOTOS_50',
       'NB_MOTOS_125', 'NB_MOTOS_11KW', 'NB_VOITURES_TOURISME',
       'NB_VOITURES_LIVRAISON', 'NB_CAMIONS', 'NB_BUS', 'NB_TRAM']

In [68]:
def get_number_to_plot_blesses_et_tues(year,feature,normalize=False):
    cum_df = acc_df.loc[(acc_df['YEAR']==year)].groupby('Region ID').sum()[blesses_et_tues]
    serie = cum_df[feature]
    print(serie)
    if normalize==True:
        tot_nb = cum_df.sum(axis=1)
        serie  = serie/tot_nb
    serie.fillna(0,inplace=True)
    return serie

def create_normalized_map_blesse_et_tues (year,feature,normalize=False):
    m_serie = get_number_to_plot_blesses_et_tues(year, feature,normalize) # get the serie
    colormap = create_colormap(2,m_serie.min(),m_serie.max(),steps = 5) # create a colorfeature
    m = folium.Map(location=[46.254423, 6.142972], tiles='cartodbpositron', zoom_start=11)
    
    folium.GeoJson(js,
                  style_function=lambda feature: {
                'fillColor': serie_color(feature,m_serie,colormap),
                'color' : 'black',
                'weight' : 2,
                'dashArray' : '5, 5'
                }).add_to(m)
    
    colormap.caption = 'Accidents'
    m.add_child(colormap)
   
    return m

def create_interactive_map_blesse_et_tues():
    year_list = np.arange(2010,2017)
    m = interact(create_normalized_map_blesse_et_tues, year = year_list, feature = blesses_et_tues,normalize=False)
    return m



def get_number_to_plot_vehicles(year,feature,normalize=False):
    cum_df = acc_df.loc[(acc_df['YEAR']==year)].groupby('Region ID').sum()[vehicles]
    serie = cum_df[feature]
    print(serie)
    if normalize==True:
        tot_nb = cum_df.sum(axis=1)
        serie  = serie/tot_nb
    serie.fillna(0,inplace=True)
    return serie

def create_normalized_map_vehicles (year,feature,normalize=False):
    m_serie = get_number_to_plot_vehicles(year, feature,normalize) # get the serie
    colormap = create_colormap(2,m_serie.min(),m_serie.max(),steps = 5) # create a colorfeature
    m = folium.Map(location=[46.254423, 6.142972], tiles='cartodbpositron', zoom_start=11)
    
    folium.GeoJson(js,
                  style_function=lambda feature: {
                'fillColor': serie_color(feature,m_serie,colormap),
                'color' : 'black',
                'weight' : 2,
                'dashArray' : '5, 5'
                }).add_to(m)
    
    colormap.caption = 'Accidents'
    m.add_child(colormap)
   
    return m

def create_interactive_map_vehicles():
    year_list = np.arange(2010,2017)
    m = interact(create_normalized_map_vehicles, year = year_list, feature = vehicles,normalize=False)
    return m

In [69]:
create_interactive_map_blesse_et_tues()

Widget Javascript not detected.  It may not be installed or enabled properly.


<function __main__.create_normalized_map_blesse_et_tues>

In [70]:
create_interactive_map_vehicles()

Widget Javascript not detected.  It may not be installed or enabled properly.


<function __main__.create_normalized_map_vehicles>