# Ma rando

Ce notebook permet de préparer les données pour l'exploration des randonnées. 

## Imports et constantes

In [None]:
import os
import unicodedata
from pathlib import Path
from typing import Any

import geopandas as gpd
import geopy.distance
import pandas as pd
import requests
from anyascii import anyascii
from bs4 import BeautifulSoup
from shapely.geometry import Point
from tqdm import tqdm

In [108]:
%reload_ext dotenv
%dotenv

In [109]:
VISORANDO_URL = "http://www.visorando.com"

DATA_DIR = Path("../data")

PRIM_API_BASE_URL = "https://prim.iledefrance-mobilites.fr/marketplace/v2/navitia/"
PRIM_API_JOURNEY_ENDPOINT = "journeys"
PRIM_API_KEY = os.getenv("PRIM_API_KEY", "")
MY_LOCATION = os.getenv("MY_LOCATION", "")

In [82]:
!wget -O ../data/emplacement-gares-idf.csv "https://data.iledefrance-mobilites.fr/api/explore/v2.1/catalog/datasets/emplacement-des-gares-idf/exports/csv?lang=fr&timezone=Europe%2FBerlin&use_labels=true&delimiter=%3B"
!wget -O ../data/departements.geojson "https://raw.githubusercontent.com/gregoiredavid/france-geojson/5d34ee6d0140c29f785fdb047d9329f1aab58833/departements.geojson"

--2025-06-15 14:35:48--  https://data.iledefrance-mobilites.fr/api/explore/v2.1/catalog/datasets/emplacement-des-gares-idf/exports/csv?lang=fr&timezone=Europe%2FBerlin&use_labels=true&delimiter=%3B
Resolving data.iledefrance-mobilites.fr (data.iledefrance-mobilites.fr)... 18.200.140.238, 52.211.64.165
Connecting to data.iledefrance-mobilites.fr (data.iledefrance-mobilites.fr)|18.200.140.238|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
Saving to: ‘../data/emplacement-gares-idf.csv’

../data/emplacement     [  <=>               ] 487,50K  1,59MB/s    in 0,3s    

2025-06-15 14:35:48 (1,59 MB/s) - ‘../data/emplacement-gares-idf.csv’ saved [499204]

--2025-06-15 14:35:50--  https://raw.githubusercontent.com/gregoiredavid/france-geojson/5d34ee6d0140c29f785fdb047d9329f1aab58833/departements.geojson
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8003::154, 2606:50c0:8002::154, ...
Connecting

## Récupération des données

Les fonctions suivantes permettent de récupérer les données de randonnées à partir de l'API visorando.

### Trouver les randonnées dans un rayon autour d'une gare

Pour trouver les randonnées proches d'une gare, on calcule les coordonnées d'une bbox carrée de 2km de côté centré autour de la gare. On envoie ensuite ces coordonnées à l'API visorando pour récupérer les randonnées correspondantes.

In [83]:
def compute_bbox(coords: list[float], distance: float = 1) -> list[float]:
    """Compute a square bbox around given coordinates."""
    
    top_right = geopy.distance.distance(kilometers=distance).destination(
        coords, bearing=45
    )
    bottom_left = geopy.distance.distance(kilometers=distance).destination(
        coords, bearing=255
    )

    bbox = [
        bottom_left.longitude,
        top_right.longitude,
        bottom_left.latitude,
        top_right.latitude,
    ]

    return ",".join([str(coord) for coord in bbox])

In [84]:
def search_hikes(bbox: str) -> list:
    """Search hikes starting within a given bbox. Coordinates should be comma-separated,
    in lon1lon2lat1lat2 format."""
    
    params = {
        "component": "rando",
        "task": "searchCircuitV2",
        "bbox": bbox,
    }

    headers = {"X-Requested-With": "XMLHttpRequest"}

    resp = requests.get(url=VISORANDO_URL, params=params, headers=headers)
    if resp.status_code == 200:
        return resp.json()
    else:
        raise Exception(f"Received status code {resp.status_code}")

### Récupérer les informations importantes pour chaque randonnée

À partir de l'ID de la randonnée, on peut retrouver sa page dédiée, qui contient des informations telles que la distance, le dénivelé, la difficulté, etc. On récupère les informations dans la page à l'aide de BeautifulSoup.

In [85]:
def get_hike_info(hike_id: str):
    """Gather info about hike: distance, climb, departure and arrival, etc."""

    params = {"component": "rando", "task": "searchCircuitV2", "loc": hike_id}
    rando_resp = requests.get(url=f"{VISORANDO_URL}/index.php", params=params)
    soup = BeautifulSoup(rando_resp.content, "html.parser")
    hike_info = {}

    link = soup.find("link", rel="canonical")["href"]
    hike_info["Lien"] = link
    hike_info["Identifiant"] = hike_id

    hike_data = soup.find_all(
        lambda x: x.has_attr("class") and "vr-walk-datasheet--dataset" in x.get("class")
    )
    hike_info.update(_enrich_hike_info_dict(hike_data))

    return hike_info

def _enrich_hike_info_dict(hike_data: list) -> dict[str, Any]:
    """Parse hike data to build dictionary of hike info"""

    data_to_keep = [
        "Distance",
        "Difficulté",
        "Dénivelé positif",
        "Dénivelé négatif",
        "Départ",
        "Arrivée",
        "Retour point de depart",
    ]

    def _parse_coords(coords: str):
        return [float(coord.strip()[2:-1]) for coord in coords.split("/")]

    hike_info = {}
    for data_fact in hike_data:
        contents = data_fact.contents
        for data_tag in data_to_keep:
            if contents[1].text.startswith(data_tag):
                hike_info[data_tag] = unicodedata.normalize(
                    "NFKC", contents[-1].strip()
                )

    if ["Retour point de depart"] == "Oui":
        hike_info["Arrivée"] = hike_info["Départ"]
    hike_info["Départ"] = (
        _parse_coords(hike_info["Départ"]) if hike_info.get("Départ") else None
    )
    hike_info["Arrivée"] = (
        _parse_coords(hike_info["Arrivée"]) if hike_info.get("Arrivée") else None
    )

    return hike_info

In [86]:
def search_and_download_hike_info(
    station_name: str, station_coords: list[float], distance=1
) -> list[dict]:
    """Search for hikes starting within a bbox and download corresponding gpx files.
    Station coords must be in latitude, longitude format."""
    
    bbox = compute_bbox(station_coords, distance=distance)
    hikes = search_hikes(bbox)

    hike_ids = [hike["R_id"] for hike in hikes]
    hike_infos = []
    for hike_id in hike_ids:
        hike_info = get_hike_info(hike_id)
        hike_info["Gare départ"] = station_name
        if hike_info["Retour point de depart"] == "Oui":
            hike_info["Gare arrivée"] = station_name
            hike_info["Arrivée"] = hike_info["Départ"]
        else:
            hike_info["Gare arrivée"] = None
        hike_infos.append(hike_info)
    return hike_infos

### Construire le tableau

C'est parti. On commence par récupérer la liste des gares d'IdF et le nettoyer pour ne garder que les noms, coordonnées GPS, etc et renommer les colonnes.

Ensuite on utilise ce fichier comme point de départ pour construire notre tableau : pour chaque gare, on va chercher toutes les randonnées autour de cette gare, puis on cherche les infos pertinentes.

In [95]:
def clean_station_file(
        in_file: Path,
        out_file: Path | None = None,
        dept_file: Path = DATA_DIR / "departements.csv", 
        save: bool = False
    ) -> pd.DataFrame:
    """Clean the île-de-France stations list file to keep relevant information."""
    
    print("Nettoyage du fichier des gares...")
    gares = pd.read_csv(
        in_file,
        sep=";",
        usecols=["Geo Point", "nom_long", "res_com", "mode", "id_ref_ZdC"],
    ).rename(
        columns={
            "Geo Point": "geo_point",
            "nom_long": "nom",
            "res_com": "ligne",
            "id_ref_ZdC": "id",
        }
    )
    gares = gares[gares["mode"].isin(["TRAIN", "RER"])].drop(columns=["mode"])
    gares["Lignes"] = gares["nom"].map(
        lambda cell: gares[gares["nom"] == cell]["ligne"].to_list()
    )
    gares = gares.drop(columns=["ligne"]).drop_duplicates(subset=["nom"])
    gares = _fill_dept_info(gares, dept_file)
    if save:
        if not out_file:
            raise ValueError("Must specify out_file if save set to True.")
        gares.to_csv(out_file, index=False)

    return gares


def _fill_dept_info(stations_df: pd.DataFrame, dept_file: Path = DATA_DIR / "departements.geojson") -> pd.DataFrame:
    """For each station present in stations_df, find the corresponding department and add its code and name to the dataframe."""
    
    departements = gpd.GeoDataFrame.from_file(DATA_DIR / dept_file)
    departements["Département"] = departements["code"] + " - " + departements["nom"]
    departements = departements.drop(columns=["code", "nom"])
    stations_gdf = gpd.GeoDataFrame(
        stations_df,
        geometry=stations_df["geo_point"].map(
            lambda cell: Point(float(cell.split(",")[1]), float(cell.split(",")[0]))
        )
    ).set_crs(epsg=4326)
    return gpd.tools.sjoin(stations_gdf, departements, predicate="within", how="left")


In [96]:
def fill_arrival_station(hikes_df: pd.DataFrame, stations_df: pd.DataFrame) -> None:
    """Use stations GPS coordinates to fill the name of Arrival station."""
    
    stations_df["geo_point"] = stations_df["geo_point"].map(
        lambda cell: [float(coord.strip()) for coord in cell.split(",")]
    )
    hikes_df.loc[hikes_df[hikes_df["Gare arrivée"].isna()].index, "Gare arrivée"] = (
        hikes_df.loc[hikes_df[hikes_df["Gare arrivée"].isna()].index, "Arrivée"].map(
            lambda cell: _find_min_distance(cell, stations_df)
        )
    )

def _find_min_distance(point: list[float], stations_df: pd.DataFrame) -> str | None:
    stations_df["distances"] = stations_df["geo_point"].map(
        lambda cell: geopy.distance.distance(point, cell).kilometers
    )
    min_dist = stations_df["distances"].idxmin()
    return stations_df["nom"][min_dist]

In [97]:
def build_hike_table(
    stations_df: pd.DataFrame, out_file: str | Path = "", save: bool = False
) -> pd.DataFrame:
    """Build the full hike table with hike info and departure/arrival stations."""
    
    hike_infos = []
    print(f"Recherche de randonnées à proximité de {len(stations_df)} gares.")
    for i in tqdm(range(len(stations_df))):
        try:
            nom_gare = stations_df.loc[i, "nom"]
            coords_gare = stations_df.loc[i, "geo_point"]
            hike_infos += search_and_download_hike_info(nom_gare, coords_gare)
        except Exception:
            pass
    hikes_df = pd.DataFrame.from_dict(hike_infos, orient="columns")
    print(f"{len(hikes_df)} randonnées trouvées. Recherche des gares d'arrivée...")
    fill_arrival_station(hikes_df, stations_df)
    hikes_df = hikes_df.dropna()
    hikes_df = hikes_df[
        [
            "Identifiant",
            "Distance",
            "Dénivelé positif",
            "Dénivelé négatif",
            "Difficulté",
            "Gare départ",
            "Gare arrivée",
            "Retour point de depart",
            "Lien",
        ]
    ]
    print(f"{len(hikes_df)} randonnées accessibles trouvées !")
    if save:
        hikes_df.to_csv(out_file, index=False)

    return hikes_df

### Optionnel : calculer le temps de trajet entre le domicile et la gare de départ

Pour cela, on utilise l'API PRIM d'IdF mobilités. Il faut renseigner une clé API que l'on peut obtenir en créant un compte sur https://prim.iledefrance-mobilites.fr/.

In [98]:
def get_journeys(
    start_point: str,
    end_point: str,
    departure_time: str | None = None,
    arrival_time: str | None = None,
) -> dict:
    """Get journey options between a starting and ending point"""
    
    url = "/".join((PRIM_API_BASE_URL, PRIM_API_JOURNEY_ENDPOINT))
    headers = {"apiKey": PRIM_API_KEY}
    params = {
        "from": start_point,
        "to": end_point,
    }
    if departure_time or arrival_time:
        params["datetime_represents"] = "departure" if departure_time else "arrival"
        params["datetime"] = departure_time or arrival_time

    resp = requests.get(url=url, headers=headers, params=params)
    if resp.status_code == 200:
        return resp.json().get("journeys")
    else:
        raise Exception(f"Received response {resp.content}")

In [99]:
def get_minimum_journey_time(
    start_point: str, end_point: str, departure_time: str | None = None
) -> int:
    """Find the quickest journey between two points and return duration in minutes."""
    journeys = get_journeys(start_point, end_point, departure_time)
    if journeys:
        return min([journey["duration"] // 60 + 1 for journey in journeys])
    else:
        return -1

In [100]:
def fill_station_times(
    stations_df: pd.DataFrame,
    start_point: str = MY_LOCATION,
    departure_time: str = "20250301T080000",
) -> None:
    """Compute journey times from home location to each station."""
    
    tqdm.pandas(desc="Calcul des temps de trajet pour chaque gare...")
    temps_trajet = stations_df["id"].progress_apply(
        lambda cell: get_minimum_journey_time(
            start_point=start_point,
            end_point=f"stop_area:IDFM:{cell}",
            departure_time=departure_time,
        )
    )
    return temps_trajet

## Fonction principale

In [None]:
def main():
    stations_df = clean_station_file(
        in_file=DATA_DIR / "emplacement-gares-idf.csv", 
        out_file=DATA_DIR /"gares.csv",
        dept_file=DATA_DIR / "departements.geojson",
        save=False
    )
    if MY_LOCATION and PRIM_API_KEY:
        stations_df["Temps de trajet"] = fill_station_times(stations_df)
        stations_df.to_csv(DATA_DIR / "gares.csv", index=False)
    hikes = build_hike_table(stations_df=stations_df, out_file=DATA_DIR / "hikes.csv", save=True)

    
    hikes = (
        pd.merge(hikes, stations_df[["nom", "Lignes", "Département"]], left_on="Gare départ", right_on="nom").drop(columns=["nom"])
    )
    hikes.to_csv(DATA_DIR / "hikes.csv", index=False)

In [113]:
main()

Nettoyage du fichier des gares...


Recherche de randonnées à proximité de 472 gares.


100%|██████████| 472/472 [01:56<00:00,  4.05it/s] 


229 randonnées trouvées. Recherche des gares d'arrivée...
229 randonnées accessibles trouvées !


Unnamed: 0,Identifiant,Distance,Dénivelé positif,Dénivelé négatif,Difficulté,Gare départ,Gare arrivée,Retour point de depart,Lien,Lignes,Département
0,920935,"8,02 km",+ 10 m,- 11 m,Facile,Asnières-sur-Seine,Gennevilliers,Non,https://www.visorando.com/randonnee-le-vieil-a...,"[TRAIN J, TRAIN L]",92 - Hauts-de-Seine
1,358913,"16,75 km",+ 87 m,- 87 m,Moyenne,La Ferté-Milon,Mareuil-sur-Ourcq,Non,https://www.visorando.com/randonnee-la-ferte-m...,[TRAIN P],02 - Aisne
2,195652,"21,26 km",+ 268 m,- 263 m,Difficile,La Ferté-Milon,La Ferté-Milon,Oui,https://www.visorando.com/randonnee-de-l-ourcq...,[TRAIN P],02 - Aisne
3,919844,"9,66 km",+ 51 m,- 50 m,Facile,Fontaine-le-Port,Fontaine-le-Port,Oui,https://www.visorando.com/randonnee-le-long-de...,[TRAIN R],77 - Seine-et-Marne
4,511231,"17,81 km",+ 164 m,- 151 m,Moyenne,Fontaine-le-Port,Champagne-sur-Seine,Non,https://www.visorando.com/randonnee-de-fontain...,[TRAIN R],77 - Seine-et-Marne
...,...,...,...,...,...,...,...,...,...,...,...
224,571131,"14,29 km",+ 134 m,- 136 m,Moyenne,Us,Us,Oui,https://www.visorando.com/randonnee-la-vallee-...,[TRAIN J],95 - Val-d'Oise
225,18012796,"23,85 km",+ 235 m,- 238 m,Difficile,Us,Us,Oui,https://www.visorando.com/randonnee-de-us-a-br...,[TRAIN J],95 - Val-d'Oise
226,14303599,"22,65 km",+ 233 m,- 233 m,Difficile,Us,Us,Oui,https://www.visorando.com/randonnee-de-us-a-sa...,[TRAIN J],95 - Val-d'Oise
227,4535009,"6,14 km",+ 32 m,- 32 m,Facile,Villiers-le-Bel-Gonesse-Arnouville,Villiers-le-Bel-Gonesse-Arnouville,Oui,https://www.visorando.com/randonnee-sur-les-tr...,[RER D],95 - Val-d'Oise
