# Carrefour dataframe

Here we just import the dataset of the crossroads of Geneve to check briefly how it is made, to see if and how it could be useful for our analysis. 

In [10]:
# Import libraries
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline 
from pyproj import Proj, transform
import folium

# Read the dataset
carr_data = '../_data/GMO_CARREFOUR.csv'
carr_df = pd.read_csv(carr_data, sep=';', encoding='latin-1')

# Read the dataset
carr_data = '../_data/GMO_CARREFOUR.csv'
carr_df = pd.read_csv(carr_data, sep=';', encoding='latin-1')

carr_df.dtypes

ID_GM_CARREFOUR    float64
TYPE                object
COMMENTAIRE         object
DATE_MAJ            object
RAYON              float64
E                  float64
N                  float64
dtype: object

In [11]:
carr_df.TYPE.value_counts()

Autres carrefours                      5852
Impasse                                1272
Signalisation lumineuse                 379
Changement voie                         216
Giratoire                               198
Frontière                               119
Stop toutes directions                   32
Giratoire à signalisation lumineuse       7
Name: TYPE, dtype: int64

We see that few basic information are provided:
* TYPE : the type of carrefour
* COMMENTAIRE: basically no details are provided in this cell.
* DATE_MAJ: 'Date mise-a-jour' indicate the date of the last update of the info in the dataset
* RAYON: the radius of roundabouts
* E-N : usual GPS coordinates

## converting GPS coordinates to standard epsg:4326 reference system
 
As localisation features, we can find:
- **E**: X coordenate in 'epsg_2056' reference system
- **N**: Y coordenate in 'epsg_2056' reference system

The E-N coordenates will be projected in the GPS coordenates, also called 'epsg:4326' reference system. For this, the Proj and transform libraries will be used:

In [4]:
# projection definition
p1 = Proj(init='epsg:2056')
p2 = Proj(init='epsg:4326')

# Helper functions
def coord_proj(carr_df,i, p1, p2):
    x1 = carr_df['E'].loc[i]
    y1 = carr_df['N'].loc[i]
    x2, y2 = transform(p1,p2,x1,y1)
    carr_df['E'].set_value(i, x2)
    carr_df['N'].set_value(i, y2)
    return carr_df

# Project data
for i in range(0, len(carr_df['E'])-1):
    carr_df = coord_proj(carr_df,i, p1, p2)
# Delete unuseful columns
#del acc_df['N']
#del acc_df['E']

Let's visualize the dataframe, and try to plot some of them on the map to manually check the precision o fthe coordinates

In [5]:
carr_df.head(3)

Unnamed: 0,ID_GM_CARREFOUR,TYPE,COMMENTAIRE,DATE_MAJ,RAYON,E,N
0,227920.0,Autres carrefours,,2008-04-18 16:20:43,,5.997844,46.158031
1,226972.0,Autres carrefours,,2008-04-18 16:20:43,,6.117844,46.192797
2,226973.0,Autres carrefours,,2008-04-18 16:20:43,,6.118345,46.193008


In [7]:
# Create the map visualization
Geneve_coord = [46.2004013, 6.1531163]
m = folium.Map(location=Geneve_coord,
               tiles='OpenStreetMap',
               zoom_start=10)
for i in giratoire_df.index[0:20]:
    folium.Marker([giratoire_df.N.loc[i], giratoire_df.E.loc[i]] , popup="Giratoire, %d"%i).add_to(m)
m

The coordinates are really precise!

### Possible use of this dataframe for milestone 3:
Our idea was to use the GPS coordinates of accidents and carrefours to associate the accidents to each carrefour. By studying the kind of accident and the kind of carrefour where it happened, we want to see possible correlations between these data. A study that we had originally in mind was the following:

**Is it possible with this data to predict the variation in accident risk of substituting a specific carrefour with a roundabout (or viceversa)?**

To answer this question, a possible approach could be the following:

* Link the accidents to the carrefour where they happened
* analyze the accidents linked to different kind of carrefours, to identify a possible cause-effect relation between type of carrefour and kind of accident
* Find partucularly dangerous carrefours (with a lot of accidents linked to that kind of carrefour)
* Suggest to replace them

On the internet we also found a document attesting the creation of a roundabout in 2014 (the middle of our time series of data). It could be possible to analyse the distribution of accidents before and after the installation of this particular roundabout to validate our conclusions!

Here's the data of this roundabout:

Giratoire build (finished) in 2014: possible case study for confronting carrefour VS giratoire
https://www.ge.ch/construction/pdf/chantiers/Communique_de_presse_Route_de_Thonon-giratoire_FINAL_28_06_2013.pdf

In [9]:
giratoire_df.loc[7741]

ID_GM_CARREFOUR                 226855
TYPE                         Giratoire
COMMENTAIRE                        NaN
DATE_MAJ           2016-02-19 14:07:06
RAYON                               15
E                              6.20622
N                              46.2453
Name: 7741, dtype: object