# Road Traffic Accidents in Switzerland

Our project goal is to scrap all traffic accidents from the accidents map from http://map.donneesaccidents.ch/

## Data scraping strategy

Accessing http://map.donneesaccidents.ch/, wich redirects to : <br>
https://map.geo.admin.ch/?topic=vu&lang=fr&bgLayer=ch.swisstopo.pixelkarte-grau&layers=ch.astra.unfaelle-personenschaeden_alle&layers_timestamp=&catalogNodes=1318


Postman parses the following parameters : 
<code>
topic:vu
lang:en
bgLayer:ch.swisstopo.pixelkarte-grau
layers:ch.astra.unfaelle-personenschaeden_alle
layers_timestamp:
catalogNodes:1318
</code>

The most important one is layers:ch.astra.unfaelle-personenschaeden_alle.<br>
It is the layer that contains all the geo-information dots on "Accidents with personal injury" which is the selected data layer.
<img src="layer_selector.png">

Selection all kinds of accidents returns the following :<br>
<img src="layer_selector_all.png">
with layer parameters :<br>
layers:<br>
    &nbsp;ch.astra.unfaelle-personenschaeden_alle,<br>
    &nbsp;ch.astra.unfaelle-personenschaeden_getoetete,<br>
    &nbsp;ch.astra.unfaelle-personenschaeden_fussgaenger,<br>
    &nbsp;ch.astra.unfaelle-personenschaeden_fahrraeder,<br>
    &nbsp;ch.astra.unfaelle-personenschaeden_motorraeder<br>
layers_timestamp:,,,,<br>

Now we want every data for each layer. By selecting a dot on the map, it queries the related data to the server.
What we want to do is selecting all the entries in the map to retrieves all data. This is done by ctrl clicking the whole area.

This makes a query for each "layers" parameter :
<code>
geometry:443999.04209536605,39001.6733318335,870499.0420953662,303001.67333183356
geometryFormat:geojson
geometryType:esriGeometryEnvelope
imageDisplay:1536,759,96
lang:en
layers:all:<i>LAYER_PARAM</i>
mapExtent:269999.04209536605,9501.673331833561,1037999.042095366,389001.67333183356
returnGeometry:true
tolerance:5
</code><br>
But doesn't select all dots on map, so let's try the "load more results" button on a 'accidetns with fatalities' layer, we get :
<code>
geometry:443999.04209536605,39001.6733318335,870499.0420953662,303001.67333183356
geometryFormat:geojson
geometryType:esriGeometryEnvelope
imageDisplay:1536,759,96
lang:en
layers:all:ch.astra.unfaelle-personenschaeden_getoetete
mapExtent:136199.04209536605,-28148.32666816644,1134599.042095366,465201.67333183356
<b>offset:200</b>
returnGeometry:true
tolerance:5
</code>
Pressing load more until no more possible give offset=1200 (for a total of 1337 objects) i.e it loads data entries 200 by 200

## JSON Data scraping

In [214]:
import requests
import json

import numpy as np
import pandas as pd

from Scripts.helpers import *


import pprint
#from bs4 import BeautifulSoup

In [215]:
pp = pprint.PrettyPrinter(indent=4)

In [216]:
#def import_data():
#    url_0="https://api3.geo.admin.ch/rest/services/all/MapServer/identify?geometry=446000.0000000001,37750,860500.0000000002,317750.00000000006&geometryFormat=geojson&geometryType=esriGeometryEnvelope&imageDisplay=1536,759,96&lang=fr&layers=all:ch.astra.unfaelle-personenschaeden_getoetete&mapExtent=276000,250,1044000,379750&returnGeometry=true&tolerance=5"
#    url_1="https://api3.geo.admin.ch/rest/services/all/MapServer/identify?geometry=446000.0000000001,37750,860500.0000000002,317750.00000000006&geometryFormat=geojson&geometryType=esriGeometryEnvelope&imageDisplay=1536,759,96&lang=fr&layers=all:ch.astra.unfaelle-personenschaeden_getoetete&mapExtent=276000,250,1044000,379750&offset=200&returnGeometry=true&tolerance=5"
#    r0=requests.get(url_0)
#    r1=requests.get(url_1)
#    json_data0=(json.loads(str(r0.text))).get('results')
#    json_data1=(json.loads(str(r1.text))).get('results')
#    data = json_data0+json_data1 #merge two lists into one
#    return data

In [302]:
def import_data():
    layers =["ch.astra.unfaelle-personenschaeden_alle",
             "ch.astra.unfaelle-personenschaeden_getoetete",
             "ch.astra.unfaelle-personenschaeden_fussgaenger",
             "ch.astra.unfaelle-personenschaeden_fahrraeder",
             "ch.astra.unfaelle-personenschaeden_motorraeder"]
    data = []
    
    for layer in layers :

        offset = 0
        continue_=True
        while(continue_):
            print(layer, offset)
            query='https://api3.geo.admin.ch/rest/services/all/MapServer/identify?geometry=446000.0000000001,37750,860500.0000000002,317750.00000000006&geometryFormat=geojson&geometryType=esriGeometryEnvelope&imageDisplay=1536,759,96&lang=fr&layers=all:{}&mapExtent=276000,250,1044000,379750&offset={}&returnGeometry=true&tolerance=5'.format(layer, offset)
            #query='https://api3.geo.admin.ch/rest/services/all/MapServer/identify?geometry=446000.0000000001,37750,860500.0000000002,317750.00000000006&geometryFormat=geojson&geometryType=esriGeometryEnvelope&imageDisplay=1536,759,96&lang=fr&layers=all:ch.astra.unfaelle-personenschaeden_getoetete&mapExtent=276000,250,1044000,379750&offset={0}&returnGeometry=true&tolerance=5'.format(offset)

            r=requests.get(query)
    
            json_data=(json.loads(str(r.text))).get('results')
    
            if(len(json_data) == 0):
                #no more data to scrape
                continue_=False
            else :
                #set offset to the beginning of the next file to import
                offset+=200
                data+=json_data #merge two lists into one


    return data

In [303]:
#Remove all items we are not interested in
def clean_data(data_list):
    
    
    data_list_copy = data_list.copy()
    for item in data_list_copy : 
        properties = item.get('properties')
        del properties['accidentday_de']
        del properties['accidentday_it']
        del properties['accidenttype_de']
        del properties['accidenttype_it']
        del properties['severitycategory_it']
        del properties['severitycategory_de']
        
        
    return data_list_copy

In [304]:
#reformat data so we have each property as a feature of the event
def reformat_data(data_list):
    data_list_reformat = []
    
    for item in data_list:
        properties = item.get('properties')
        del item['properties']
        item = dict(item, **properties)
        data_list_reformat.append(item)
    
    return data_list_reformat
    
    

In [305]:
#import and clean the data
#data = import_data()
json_data_clean = reformat_data(clean_data(import_data()))

0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
9800
10000
10200
10400
10600
10800
11000
11200
11400
11600
11800
12000
12200
12400
12600
12800
13000
13200
13400
13600
13800
14000
14200
14400
14600
14800
15000
15200
15400
15600
15800
16000
16200
16400
16600
16800
17000
17200
17400
17600
17800
18000
18200
18400
18600
18800
19000
19200
19400
19600
19800
20000
20200
20400
20600
20800
21000
21200
21400
21600
21800
22000
22200
22400
22600
22800
23000
23200
23400
23600
23800
24000
24200
24400
24600
24800
25000
25200
25400
25600
25800
26000
26200
26400
26600
26800
27000
27200
27400
27600
27800
28000
28200
28400
28600
28800
29000
29200
29400
29600
29800
30000
30200
30400
30600
30800
31000
31200
31400
31600
31800
32000
32200
32400
32600
32800
33000
33200
33400
33600
33800
34000
34200
34400
34600
34800
35000
3

In [306]:
print("each request gets 201 entry points")
print(len(json_data0))
print(len(json_data1))

print("Data entry example :\n")

pp.pprint(import_data()[0])

print("Data entry example after clean and reformat:\n")
pp.pprint(json_data_clean[0])

each request gets 201 entry points
201
201
Data entry example :

0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
3600
3800
4000
4200
4400
4600
4800
5000
5200
5400
5600
5800
6000
6200
6400
6600
6800
7000
7200
7400
7600
7800
8000
8200
8400
8600
8800
9000
9200
9400
9600
9800
10000
10200
10400
10600
10800
11000
11200
11400
11600
11800
12000
12200
12400
12600
12800
13000
13200
13400
13600
13800
14000
14200
14400
14600
14800
15000
15200
15400
15600
15800
16000
16200
16400
16600
16800
17000
17200
17400
17600
17800
18000
18200
18400
18600
18800
19000
19200
19400
19600
19800
20000
20200
20400
20600
20800
21000
21200
21400
21600
21800
22000
22200
22400
22600
22800
23000
23200
23400
23600
23800
24000
24200
24400
24600
24800
25000
25200
25400
25600
25800
26000
26200
26400
26600
26800
27000
27200
27400
27600
27800
28000
28200
28400
28600
28800
29000
29200
29400
29600
29800
30000
30200
30400
30600
30800
31000
31200
31400
31600
31800
32000
32200
32400
32600
32800
33

In [307]:
df = pd.DataFrame.from_dict(json_data_clean)
df.set_index('id', inplace=True)
df.head()

Unnamed: 0_level_0,accidentday_fr,accidenttype_fr,accidenttypecode,accidentyear,bbox,canton,featureId,fsocommunecode,geometry,geometryType,label,layerBodId,layerName,roadtype_de,roadtype_fr,roadtype_it,roadtypecode,severitycategory_fr,severitycategorycode,type
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
9F99051DA8F610C8E0430A865E3310C8,jeudi / 10h-11h / février 2011,accident impliquant des piétons,8,2011,"[621566.0, 237728.0, 621566.0, 237728.0]",SO,9F99051DA8F610C8E0430A865E3310C8,2407,"{'coordinates': [[621566.0, 237728.0]], 'type'...",Feature,Fussgängerunfall,ch.astra.unfaelle-personenschaeden_alle,Accidents avec dommages corporels,Hauptstrasse,route principale,Strada principale,432,accident avec blessés légers,ULV,Feature
9F9D824A940A508CE0430A865E33508C,jeudi / 17h-18h / février 2011,accident par tamponnement,2,2011,"[670272.0, 213320.0, 670272.0, 213320.0]",LU,9F9D824A940A508CE0430A865E33508C,1051,"{'coordinates': [[670272.0, 213320.0]], 'type'...",Feature,Auffahrunfall,ch.astra.unfaelle-personenschaeden_alle,Accidents avec dommages corporels,Hauptstrasse,route principale,Strada principale,432,accident avec blessés légers,ULV,Feature
9F9FC69C25AF70AAE0430A865E3370AA,jeudi / 10h-11h / février 2011,accident par tamponnement,2,2011,"[539283.0, 153891.0, 539283.0, 153891.0]",VD,9F9FC69C25AF70AAE0430A865E3370AA,5586,"{'coordinates': [[539283.0, 153891.0]], 'type'...",Feature,Auffahrunfall,ch.astra.unfaelle-personenschaeden_alle,Accidents avec dommages corporels,Hauptstrasse,route principale,Strada principale,432,accident avec blessés légers,ULV,Feature
9FC221266992E0F0E0430A865E33E0F0,jeudi / 09h-10h / février 2011,collision frontale,6,2011,"[615262.0, 252360.0, 615262.0, 252360.0]",SO,9FC221266992E0F0E0430A865E33E0F0,2480,"{'coordinates': [[615262.0, 252360.0]], 'type'...",Feature,Frontalkollision,ch.astra.unfaelle-personenschaeden_alle,Accidents avec dommages corporels,Nebenstrasse,route secondaire,Strada secondaria,433,accident avec blessés légers,ULV,Feature
A050E131D26460AAE0430A865E3360AA,jeudi / 06h-07h / février 2011,accident lors d'un dépassement ou lors d'un ch...,1,2011,"[533718.0, 167987.0, 533718.0, 167987.0]",VD,A050E131D26460AAE0430A865E3360AA,5529,"{'coordinates': [[533718.0, 167987.0]], 'type'...",Feature,Überholunfall oder Fahrstreifenwechsel,ch.astra.unfaelle-personenschaeden_alle,Accidents avec dommages corporels,Autobahn,autoroute,Autostrada,430,accident avec blessés légers,ULV,Feature


In [309]:
df2 = df.groupby('accidenttype_fr').count()
df2

Unnamed: 0_level_0,accidentday_fr,accidenttypecode,accidentyear,bbox,canton,featureId,fsocommunecode,geometry,geometryType,label,layerBodId,layerName,roadtype_de,roadtype_fr,roadtype_it,roadtypecode,severitycategory_fr,severitycategorycode,type
accidenttype_fr,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
accident en parquant,1930,1930,1930,1930,1930,1930,1930,1930,1930,1930,1930,1930,1930,1930,1930,1930,1930,1930,1930
accident en quittant une route,12941,12941,12941,12941,12941,12941,12941,12941,12941,12941,12941,12941,12941,12941,12941,12941,12941,12941,12941
accident en s'engageant sur une route,18188,18188,18188,18188,18188,18188,18188,18188,18188,18188,18188,18188,18188,18188,18188,18188,18188,18188,18188
accident en traversant une route,7572,7572,7572,7572,7572,7572,7572,7572,7572,7572,7572,7572,7572,7572,7572,7572,7572,7572,7572
accident impliquant des animaux,783,783,783,783,783,783,783,783,783,783,783,783,783,783,783,783,783,783,783
accident impliquant des piétons,22975,22975,22975,22975,22975,22975,22975,22975,22975,22975,22975,22975,22975,22975,22975,22975,22975,22975,22975
accident lors d'un dépassement ou lors d'un changement de voie de circulation,6991,6991,6991,6991,6991,6991,6991,6991,6991,6991,6991,6991,6991,6991,6991,6991,6991,6991,6991
accident par tamponnement,23301,23301,23301,23301,23301,23301,23301,23301,23301,23301,23301,23301,23301,23301,23301,23301,23301,23301,23301
autres,991,991,991,991,991,991,991,991,991,991,991,991,991,991,991,991,991,991,991
collision frontale,5031,5031,5031,5031,5031,5031,5031,5031,5031,5031,5031,5031,5031,5031,5031,5031,5031,5031,5031


# Data analysis

1) Accidents par rapport au temps<br>
2) Corrélation nombre/type d'accident avec les endroits (Valais ivresse)<br>
3) Tracker des anomalies (fin/début d'une série d'accident) et essayer d'en trouver la cause<br>