# Road Traffic Accidents in Switzerland

Our project goal is to scrap all traffic accidents from the accidents map from http://map.donneesaccidents.ch/

## Data scraping strategy

Accessing http://map.donneesaccidents.ch/, wich redirects to : <br>
https://map.geo.admin.ch/?topic=vu&lang=fr&bgLayer=ch.swisstopo.pixelkarte-grau&layers=ch.astra.unfaelle-personenschaeden_alle&layers_timestamp=&catalogNodes=1318


Postman parses the following parameters : 
<code>
topic:vu
lang:en
bgLayer:ch.swisstopo.pixelkarte-grau
layers:ch.astra.unfaelle-personenschaeden_alle
layers_timestamp:
catalogNodes:1318
</code>

The most important one is layers:ch.astra.unfaelle-personenschaeden_alle.<br>
It is the layer that contains all the geo-information dots on "Accidents with personal injury" which is the selected data layer.
<img src="Resources/images/layer_selector.png">

Selection all kinds of accidents returns the following :<br>
<img src="Resources/images/layer_selector_all.png">
with layer parameters :<br>
layers:<br>
    &nbsp;ch.astra.unfaelle-personenschaeden_alle,<br>
    &nbsp;ch.astra.unfaelle-personenschaeden_getoetete,<br>
    &nbsp;ch.astra.unfaelle-personenschaeden_fussgaenger,<br>
    &nbsp;ch.astra.unfaelle-personenschaeden_fahrraeder,<br>
    &nbsp;ch.astra.unfaelle-personenschaeden_motorraeder<br>
layers_timestamp:,,,,<br>

Now we want every data for each layer. By selecting a dot on the map, it queries the related data to the server.
What we want to do is selecting all the entries in the map to retrieves all data. This is done by ctrl clicking the whole area.

This makes a query for each "layers" parameter :
<code>
geometry:443999.04209536605,39001.6733318335,870499.0420953662,303001.67333183356
geometryFormat:geojson
geometryType:esriGeometryEnvelope
imageDisplay:1536,759,96
lang:en
layers:all:<i>LAYER_PARAM</i>
mapExtent:269999.04209536605,9501.673331833561,1037999.042095366,389001.67333183356
returnGeometry:true
tolerance:5
</code><br>
But doesn't select all dots on map, so let's try the "load more results" button on a 'accidetns with fatalities' layer, we get :
<code>
geometry:443999.04209536605,39001.6733318335,870499.0420953662,303001.67333183356
geometryFormat:geojson
geometryType:esriGeometryEnvelope
imageDisplay:1536,759,96
lang:en
layers:all:ch.astra.unfaelle-personenschaeden_getoetete
mapExtent:136199.04209536605,-28148.32666816644,1134599.042095366,465201.67333183356
<b>offset:200</b>
returnGeometry:true
tolerance:5
</code>
Pressing load more until no more possible give offset=1200 (for a total of 1337 objects) i.e it loads data entries 200 by 200

## JSON Data scraping

In [1]:
import requests
import json

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

from Scripts.helpers import *
from Scripts.plots import *


import pprint
#from bs4 import BeautifulSoup

In [2]:
pp = pprint.PrettyPrinter(indent=4)

In [10]:
#import raw data
data = import_data(all_data = False)
data[1]

Processing layer : ch.astra.unfaelle-personenschaeden_alle
Break in loop while
Layer processed : 603 records

Processing layer : ch.astra.unfaelle-personenschaeden_getoetete
Break in loop while
Layer processed : 603 records

Processing layer : ch.astra.unfaelle-personenschaeden_fussgaenger
Break in loop while
Layer processed : 603 records

Processing layer : ch.astra.unfaelle-personenschaeden_fahrraeder
Break in loop while
Layer processed : 603 records

Processing layer : ch.astra.unfaelle-personenschaeden_motorraeder
Break in loop while
Layer processed : 603 records

Whole dataset processed : 3015 records



{'bbox': [538884.0, 152061.0, 538884.0, 152061.0],
 'featureId': 'ED164DF7603600FEE0430A839427FAFB',
 'geometry': {'coordinates': [[538884.0, 152061.0]], 'type': 'MultiPoint'},
 'geometryType': 'Feature',
 'id': 'ED164DF7603600FEE0430A839427FAFB',
 'layerBodId': 'ch.astra.unfaelle-personenschaeden_alle',
 'layerName': 'Accidents avec dommages corporels',
 'properties': {'accidentday_de': 'Freitag / 17h-18h / November 2013',
  'accidentday_fr': 'vendredi / 17h-18h / novembre 2013',
  'accidentday_it': 'Venerdì / 17h-18h / Novembre 2013',
  'accidenttype_de': 'Fussgängerunfall',
  'accidenttype_fr': 'accident impliquant des piétons',
  'accidenttype_it': 'Incidente con pedoni',
  'accidenttypecode': 8,
  'accidentyear': 2013,
  'canton': 'VD',
  'fsocommunecode': '5586',
  'label': 'Fussgängerunfall',
  'roadtype_de': 'Nebenstrasse',
  'roadtype_fr': 'route secondaire',
  'roadtype_it': 'Strada secondaria',
  'roadtypecode': 433,
  'severitycategory_de': 'Unfall mit Schwerverletzten',
  

In [4]:
#translate data from german
json_data_preprocessed = preprocess_data(data)

In [5]:
print("Data entry example after clean and reformat:\n")
json_data_preprocessed[0]

Data entry example after clean and reformat:



{'accidenttype_fr': 'dérapage ou perte de maîtrise',
 'accidenttypecode': 0,
 'accidentyear': 2011,
 'canton': 'GE',
 'coordinates': [501537.0, 124408.0],
 'day': 'vendredi',
 'fsocommunecode': '6622',
 'id': 'A774A9B811D400CAE0430A83942700CA',
 'label': 'Schleuder- oder Selbstunfall',
 'layerName': 'Accidents avec dommages corporels',
 'month': 'juillet',
 'roadtype_fr': 'route principale',
 'roadtypecode': 432,
 'severitycategory_fr': 'accident avec blessés graves',
 'severitycategorycode': 'USV',
 'time': '19h-20h'}

In [6]:
df = pd.DataFrame.from_dict(json_data_preprocessed)
df.set_index('id', inplace=True)
df.sample(5)

Unnamed: 0_level_0,accidenttype_fr,accidenttypecode,accidentyear,canton,coordinates,day,fsocommunecode,label,layerName,month,roadtype_fr,roadtypecode,severitycategory_fr,severitycategorycode,time
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
28D0634E478C00DAE0530A8394278B29,dérapage ou perte de maîtrise,0,2015,FR,"[554723.0, 149961.0]",samedi,2321,Schleuder- oder Selbstunfall,Accidents avec la part. de motos,décembre,route principale,432,accident avec blessés légers,ULV,21h-22h
EF646051FB4102FEE0430A8394277C00,accident en traversant une route,5,2013,BE,"[613910.0, 191136.0]",jeudi,612,Überqueren der Fahrbahn,Accidents avec dommages corporels,novembre,route principale,432,accident avec blessés légers,ULV,12h-13h
25C36F0ABA43015EE0530A839427DB43,accident en quittant une route,3,2015,TG,"[713950.0, 254074.0]",samedi,4726,Abbiegeunfall,Accidents avec la part. de vélos,novembre,route secondaire,433,accident avec blessés graves,USV,14h-15h
F0F128CC05E70058E0430A839427247B,dérapage ou perte de maîtrise,0,2014,ZH,"[676678.0, 258361.0]",mardi,86,Schleuder- oder Selbstunfall,Accidents mortels,janvier,route principale,432,accident avec tués,UGT,22h-23h
2715BA20BD370068E0530A8394271104,dérapage ou perte de maîtrise,0,2015,BE,"[596767.0, 186404.0]",samedi,855,Schleuder- oder Selbstunfall,Accidents avec la part. de vélos,novembre,route secondaire,433,accident avec blessés légers,ULV,14h-15h


In [7]:
plot_all_features(df)

Plotting all features
plotting feature accidenttype_fr
plotting feature accidenttypecode
->    Key Error : 'accidenttypecode'
plotting feature accidentyear
plotting feature canton
plotting feature coordinates
->    Type Error : unhashable type: 'list'
plotting feature day
plotting feature label
plotting feature layerName
plotting feature month
plotting feature roadtype_fr
plotting feature roadtypecode
plotting feature severitycategory_fr
plotting feature severitycategorycode
plotting feature time
Done plotting


# Data analysis

1) Accidents par rapport au temps<br>
2) Corrélation nombre/type d'accident avec les endroits (Valais ivresse)<br>
3) Tracker des anomalies (fin/début d'une série d'accident) et essayer d'en trouver la cause<br>

In [9]:
plot_feature_combination(df, ['month', 'accidentyear', 'canton'])

Plotting features : ['month', 'accidentyear', 'canton']
Done plotting ['month', 'accidentyear', 'canton']
