# Road Traffic Accidents in Switzerland

Our project goal is to scrap all traffic accidents from the accidents map from http://map.donneesaccidents.ch/

## Data scraping strategy

Accessing http://map.donneesaccidents.ch/, wich redirects to : <br>
https://map.geo.admin.ch/?topic=vu&lang=fr&bgLayer=ch.swisstopo.pixelkarte-grau&layers=ch.astra.unfaelle-personenschaeden_alle&layers_timestamp=&catalogNodes=1318


Postman parses the following parameters : 
<code>
topic:vu
lang:en
bgLayer:ch.swisstopo.pixelkarte-grau
layers:ch.astra.unfaelle-personenschaeden_alle
layers_timestamp:
catalogNodes:1318
</code>

The most important one is layers:ch.astra.unfaelle-personenschaeden_alle.<br>
It is the layer that contains all the geo-information dots on "Accidents with personal injury" which is the selected data layer.
<img src="Resources/images/layer_selector.png">

Selection all kinds of accidents returns the following :<br>
<img src="Resources/images/layer_selector_all.png">
with layer parameters :<br>
layers:<br>
    &nbsp;ch.astra.unfaelle-personenschaeden_alle,<br>
    &nbsp;ch.astra.unfaelle-personenschaeden_getoetete,<br>
    &nbsp;ch.astra.unfaelle-personenschaeden_fussgaenger,<br>
    &nbsp;ch.astra.unfaelle-personenschaeden_fahrraeder,<br>
    &nbsp;ch.astra.unfaelle-personenschaeden_motorraeder<br>
layers_timestamp:,,,,<br>

Now we want every data for each layer. By selecting a dot on the map, it queries the related data to the server.
What we want to do is selecting all the entries in the map to retrieves all data. This is done by ctrl clicking the whole area.

This makes a query for each "layers" parameter :
<code>
geometry:443999.04209536605,39001.6733318335,870499.0420953662,303001.67333183356
geometryFormat:geojson
geometryType:esriGeometryEnvelope
imageDisplay:1536,759,96
lang:en
layers:all:<i>LAYER_PARAM</i>
mapExtent:269999.04209536605,9501.673331833561,1037999.042095366,389001.67333183356
returnGeometry:true
tolerance:5
</code><br>
But doesn't select all dots on map, so let's try the "load more results" button on a 'accidetns with fatalities' layer, we get :
<code>
geometry:443999.04209536605,39001.6733318335,870499.0420953662,303001.67333183356
geometryFormat:geojson
geometryType:esriGeometryEnvelope
imageDisplay:1536,759,96
lang:en
layers:all:ch.astra.unfaelle-personenschaeden_getoetete
mapExtent:136199.04209536605,-28148.32666816644,1134599.042095366,465201.67333183356
<b>offset:200</b>
returnGeometry:true
tolerance:5
</code>
Pressing load more until no more possible give offset=1200 (for a total of 1337 objects) i.e it loads data entries 200 by 200

## JSON Data scraping

In [1]:
import requests
import json

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

from Scripts.helpers import *
from Scripts.plots import *


import pprint

%load_ext autoreload
%autoreload 2
#from bs4 import BeautifulSoup

In [2]:
pp = pprint.PrettyPrinter(indent=4)

In [3]:
#import raw data
data = import_data(all_data = True)

Processing layer : ch.astra.unfaelle-personenschaeden_alle
Layer processed : 90600 records

Processing layer : ch.astra.unfaelle-personenschaeden_getoetete
Layer processed : 1343 records

Processing layer : ch.astra.unfaelle-personenschaeden_fussgaenger
Layer processed : 11738 records

Processing layer : ch.astra.unfaelle-personenschaeden_fahrraeder
Layer processed : 18104 records

Processing layer : ch.astra.unfaelle-personenschaeden_motorraeder
Layer processed : 19676 records

Whole dataset processed : 141461 records



In [4]:
json_data_preprocessed = preprocess_data(data)

In [5]:
print("Data entry example after clean and reformat:\n")
json_data_preprocessed[0]

Data entry example after clean and reformat:



{'accidenttype_fr': 'dérapage ou perte de maîtrise',
 'accidenttypecode': 0,
 'accidentyear': 2014,
 'canton': 'AG',
 'coordinates': [630182.0, 232290.0],
 'day': 'samedi',
 'fsocommunecode': '4279',
 'id': 'F8D9FD92E6AC0196E0430A8394274F00',
 'label': 'Schleuder- oder Selbstunfall',
 'layerName': 'Accidents avec dommages corporels',
 'month': 'mai',
 'roadtype_fr': 'route principale',
 'roadtypecode': 432,
 'severitycategory_fr': 'accident avec blessés légers',
 'severitycategorycode': 'ULV',
 'time': '01h-02h'}

In [6]:
df = pd.DataFrame.from_dict(json_data_preprocessed)
df.set_index('id', inplace=True)
df.columns = ['accident_type','accident_type_code','accident_year','canton','coordinates','day','fsocommunecode','label','layerName','month','road_type','road_type_code','severity_category','severity_category_code','time']
df.sample(3)

Unnamed: 0_level_0,accident_type,accident_type_code,accident_year,canton,coordinates,day,fsocommunecode,label,layerName,month,road_type,road_type_code,severity_category,severity_category_code,time
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
1DCCE59B4E9A00F8E0530A8394274F10,accident lors d'un dépassement ou lors d'un ch...,1,2015,SO,"[597045.0, 215853.0]",mardi,2461,Überholunfall oder Fahrstreifenwechsel,Accidents avec dommages corporels,juillet,route principale,432,accident avec blessés légers,ULV,13h-14h
C7D89DCEFB510080E0430A8394279F87,dérapage ou perte de maîtrise,0,2012,GR,"[791058.0, 181902.0]",vendredi,3851,Schleuder- oder Selbstunfall,Accidents avec dommages corporels,août,route principale,432,accident avec blessés légers,ULV,15h-16h
DF1A87D80E0501FCE0430A8394279877,accident en s'engageant sur une route,4,2013,LU,"[665672.0, 214562.0]",mardi,1024,Einbiegeunfall,Accidents avec dommages corporels,mai,route principale,432,accident avec blessés légers,ULV,10h-11h


In [7]:
plot_all_features(df)

Plotting all features
plotting feature accident_type
plotting feature accident_type_code
plotting feature accident_year
plotting feature canton
plotting feature day
plotting feature label
plotting feature layerName
plotting feature month
plotting feature road_type
plotting feature road_type_code
plotting feature severity_category
plotting feature severity_category_code
plotting feature time
-> Done plotting


# Data analysis

1) Accidents par rapport au temps<br>
2) Corrélation nombre/type d'accident avec les endroits (Valais ivresse)<br>
3) Tracker des anomalies (fin/début d'une série d'accident) et essayer d'en trouver la cause<br>

In [8]:
plot_feature_combination(df, ['accident_year', 'month'])

In [9]:
plot_feature_combination(df, ['canton','day'])

In [10]:
plot_feature_combination(df, ['road_type','day'])

In [11]:
plot_feature_combination(df, ['time', 'severity_category'])

In [12]:
len(df.fsocommunecode.unique())


2191

In [13]:
plot_feature_combination(df, ['accident_type_code', 'severity_category_code'])

In [14]:
plot_all_feature_combinations(df, ['coordinates'], 2)

***Generating feature combinations***
-> Done
***Plotting feature combinations***
-> Done
