### Parse the timetable file to load it in Neo4j :
The A* algorithm of Neo4j use the haversine distance as an heuristic function so we need to merge the duration of the trips contained in timetable.csv and the latitude and the longitude of the stations contained in stops.txt.


In [1]:

import pandas as pd
import matplotlib.pyplot as plt

timetables=pd.read_csv('data_sncf/timetables.csv', delimiter='\t')

stops = pd.read_csv('data_sncf/stops.txt')


In [2]:
print(timetables['trajet'].size)


1575


In [3]:
timetables['start']=timetables['trajet'].map(lambda x : ' - '.join(x.split(' - ')[:-1]))
timetables['dest']=timetables['trajet'].map(lambda x : x.split(' - ')[-1])


# Remove values not in stops

In [4]:
timetables=timetables.drop(timetables[timetables['dest']=='nan'].index)
timetables[~timetables['dest'].isin(stops['stop_name'])]

Unnamed: 0,trip_id,trajet,duree,start,dest


In [5]:
timetables[~timetables['start'].isin(stops['stop_name'])]

Unnamed: 0,trip_id,trajet,duree,start,dest


# Add coordinates to timetables

In [6]:
timetables['lat_start']=timetables['start'].map(lambda x : stops.loc[stops['stop_name']==x,'stop_lat'].iloc[0])
timetables['lon_start']=timetables['start'].map(lambda x : stops.loc[stops['stop_name']==x,'stop_lon'].iloc[0])
timetables['lat_dest']=timetables['dest'].map(lambda x : stops.loc[stops['stop_name']==x,'stop_lat'].iloc[0])
timetables['lon_dest']=timetables['dest'].map(lambda x : stops.loc[stops['stop_name']==x,'stop_lon'].iloc[0])

In [7]:
timetables

Unnamed: 0,trip_id,trajet,duree,start,dest,lat_start,lon_start,lat_dest,lon_dest
0,OCESN003100F140147152,Gare de Le Havre - Gare de Paris-St-Lazare,138,Gare de Le Havre,Gare de Paris-St-Lazare,49.492653,0.124835,48.877865,2.324433
1,OCESN003190F040047309,Gare de Dieppe - Gare de Paris-St-Lazare,145,Gare de Dieppe,Gare de Paris-St-Lazare,49.921243,1.081128,48.877865,2.324433
2,OCESN003198F030037315,Gare de Paris-St-Lazare - Gare de Rouen-Rive-D...,97,Gare de Paris-St-Lazare,Gare de Rouen-Rive-Droite,48.877865,2.324433,49.449030,1.094154
3,OCESN003300F030037323,Gare de Cherbourg - Gare de Paris-St-Lazare,194,Gare de Cherbourg,Gare de Paris-St-Lazare,49.633498,-1.621473,48.877865,2.324433
4,OCESN003313F380387526,Gare de Caen - Gare de Paris-St-Lazare,149,Gare de Caen,Gare de Paris-St-Lazare,49.176544,-0.348270,48.877865,2.324433
...,...,...,...,...,...,...,...,...,...
1570,OCESN895822F0500552575,Gare de Belfort-Ville - Gare de Lyon-Perrache,244,Gare de Belfort-Ville,Gare de Lyon-Perrache,47.632447,6.853924,45.748785,4.825941
1571,OCESN895830F0200252600,Gare de Lons-le-Saunier - Gare de Lyon-Perrache,103,Gare de Lons-le-Saunier,Gare de Lyon-Perrache,46.668398,5.550877,45.748785,4.825941
1572,OCESN895880F0500552634,Gare de Belfort-Ville - Gare de Lons-le-Saunier,144,Gare de Belfort-Ville,Gare de Lons-le-Saunier,47.632447,6.853924,46.668398,5.550877
1573,OCESN895940F0200252654,Gare de Besançon-Viotte - Gare de Lons-le-Saunier,89,Gare de Besançon-Viotte,Gare de Lons-le-Saunier,47.247038,6.021912,46.668398,5.550877


In [66]:
timetables.to_csv('travel.csv')

# How to import data in neo4j in order to use the API

### Set up Neo4j
Download Neo4j desktop : https://neo4j.com/download/
Create a new project and add a new dbms
Click on the dbms and go on plugin tab to install Graph Data Science Library.


### Load data in neo4j

Put travel.csv in the import directory of your dbms. (on ubuntu "~/.config/'Neo4j Desktop'/Application/relate-data/dbmss/dbms-<dbmss id>/import")

Open neo4j browser then type

LOAD CSV WITH HEADERS FROM "file:///travel.csv" AS row
MERGE (s1:Station {name: row.start, longitude: toFloat(row.lon_start), latitude: toFloat(row.lat_start)})
MERGE (s2:Station {name: row.dest, longitude: toFloat(row.lon_dest), latitude: toFloat(row.lat_dest)})
MERGE (s1)-[:TRAVEL_TO {tripID:row.trip_id, duration: toInteger(row.duree)}]-(s2)
MERGE (s2)-[:TRAVEL_TO {tripID:row.trip_id, duration: toInteger(row.duree)}]-(s1)

You can type "match (n) return n" to see if the data were imported properly.

### Configure graph

CALL gds.graph.project(
    'travel',
    'Station',
    'TRAVEL_TO',
    {
        nodeProperties: ['latitude', 'longitude'],
        relationshipProperties: 'duration'
    }
)

### A* Test request :

MATCH (source:Station {name: 'Gare de Cherbourg'}), (target:Station {name: 'Gare de Dreux'})
CALL gds.shortestPath.astar.stream('travel', {
    sourceNode: source,
    targetNode: target,
    latitudeProperty: 'latitude',
    longitudeProperty: 'longitude',
    relationshipWeightProperty: 'duration'
})
YIELD index, sourceNode, targetNode, totalCost, nodeIds, costs, path
RETURN
    index,
    gds.util.asNode(sourceNode).name AS sourceNodeName,
    gds.util.asNode(targetNode).name AS targetNodeName,
    totalCost,
    [nodeId IN nodeIds | gds.util.asNode(nodeId).name] AS nodeNames,
    costs,
    nodes(path) as path
ORDER BY index