# Renfe Scraping Software

This notebook is intended to show the process of scraping Renfe data using the `RenfeScraper` class. The class is designed to scrape data from the Renfe website and save it in a structured format. Two output files are obtained:
- `trips.csv`: Contains information about the trips, including the service ID, trip ID, and other relevant details.
- `stops.csv`: Contains information about the stops, including the stop ID, arrival and departure times, and other relevant details.

## 0. Import Libraries

In [1]:
from tensorflow.python.ops.gen_dataset_ops import anonymous_multi_device_iterator_v3
%load_ext autoreload
%autoreload 2

import datetime
import sys

from src.robin.scraping.entities import DataLoader, SupplySaver
from src.robin.scraping.renfe.entities import RenfeScraper

sys.path.append('..')  # Set the path to the parent directory

## 1. Scrape available stations from main menu

In [2]:
scraper = RenfeScraper(stations_csv_path='../data/renfe/adif_renfe_stations.csv')

for station_id, station_name in scraper.available_stations.items():
    print(f'{station_id}: {station_name}')

31412: A Coruña
94707: Abrantes
60911: Alicante / Alacant
60600: Albacete
06008: Alcantarilla-Los Romanos
60400: Alcázar de San Juan
55020: Algeciras
56312: Almería
99003: Altet Bus
99115: Aguadulce Bus
87912: Aix En Provence
99114: Andorra-Bus
ANTEQ: Antequera (TODAS)
87814: Avignon
10400: Avila
37606: Badajoz
BARCE: Barcelona (TODAS)
87078: Beziers
65318: Benicassim
BILBA: Bilbao (TODAS)
54400: Bobadilla
11014: Burgos Rosa Manzano
35400: Cáceres
51405: Cádiz
70600: Calatayud
50417: Campus Rabanales
61307: Cartagena
65300: Castellón /Castelló
37200: Ciudad Real
50500: Córdoba
CUENC: Cuenca (TODAS)
92201: Denia-Bus
60905: Elda-Petrer
03410: Elche AV/Elx AV
94428: Entroncamento
92157: Estepona Bus
21010: Ferrol
79309: Figueres
79333: Figueres Bus
04307: Figueres Vilafant
69110: Gandía
GIJON: Gijón
79300: Girona
05000: Granada
GUADA: Guadalajara (TODAS)
43019: Huelva
74200: Huesca
IRUN-: Irun-Hendaya (TODAS)
80100: Pamplona/Iruña
99103: Jaca-Bus
03100: Jaén
64100: Xàtiva/Játiva
97639: Ja

In [3]:
scraper.stations_df

Unnamed: 0,ADIF_ID,RENFE_ID,STATION_NAME,LATITUD,LONGITUD,DIRECION,CP,POBLACION,PROVINCIA,PAIS,CERCANIAS,FEVE,COMUN
0,01001,01001,EL SORBITO (APD-CGD),37.208475,-5.706642,,,ALCALÁ DE GUADAÍRA,SEVILLA,ESPAÑA,NO,NO,
1,01002,01002,LA TRINIDAD (APT-CGD),,,,,ALCALÁ DE GUADAÍRA,SEVILLA,ESPAÑA,NO,NO,
2,01003,01003,ARAHAL,37.268081,-5.548514,"Calle Virgen de los Dolores, S/N",41600,ARAHAL,SEVILLA,ESPAÑA,NO,NO,
3,01004,01004,PARADAS (APD-CGD),,,,,PARADAS,SEVILLA,ESPAÑA,NO,NO,
4,01005,01005,MARCHENA,37.334282,-5.425519,"Avenida Maestro Santos Ruano, 8",41620,MARCHENA,SEVILLA,ESPAÑA,NO,NO,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2772,99501,99501,ANDORRA,,,,,ANDORRA,PONTEVEDRA,ESPAÑA,NO,NO,
2773,99800,99800,CERCEDILLA TURÍSTICO,,,,,CERCEDILLA,MADRID,ESPAÑA,SI,NO,
2774,99801,99801,PUERTO NAVACERRADA,,,,,NAVACERRADA,MADRID,ESPAÑA,NO,NO,
2775,99802,99802,COTOS,,,,,RASCAFRÍA,MADRID,ESPAÑA,NO,NO,


## 2. Scrape Renfe services

The following cell scrapes the Renfe services for a specific date and range of days. The `origin` and `destination` variables specify the departure and arrival stations, respectively. The `day`, `month`, and `year` variables specify the date for which the services are to be scraped. The `range_days` variable specifies the number of days to scrape.

In [5]:
origin = '60000'
destination = '71801'

day=2
month=5
year=2025
range_days=1

date = datetime.date(day=day, month=month, year=year)
scraper.scrape(origin=origin,
               destination=destination,
               init_date=date,
               range_days=1,
               save_path='../data/renfe/2025/')

Date:  2025-05-02
Search url:  https://horarios.renfe.com/HIRRenfeWeb/buscar.do?O=MADRI&D=BARCE&AF=2025&MF=05&DF=02&SF=5&ID=s
{'60000': (0, 387), '70600': (446, 447), '04040': (472, 473), '71801': (565, 0)}
Unexpected exception formatting exception. Falling back to standard exception


Traceback (most recent call last):
  File "/Users/Shared/anaconda3/envs/robin/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3526, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
    
  File "/var/folders/_n/lz98rzln403f9jgxhtvhf_wr0000gn/T/ipykernel_25366/1437514964.py", line 10, in <module>
    scraper.scrape(origin=origin,
  File "/Users/david/PycharmProjects/robin/src/robin/scraping/renfe/entities.py", line 739, in scrape
    df_trips, _ = self.scrape_trips(
  File "/Users/david/PycharmProjects/robin/src/robin/scraping/renfe/entities.py", line 839, in scrape_trips
    new_df_trips = self.driver.scrape_trips(origin_id=origin_id, destination_id=destination_id, date=date)
    
  File "/Users/david/PycharmProjects/robin/src/robin/scraping/renfe/entities.py", line 453, in scrape_trips
    df_trips = self._get_df_trips(trips, date)
    
  File "/Users/david/PycharmProjects/robin/src/robin/scraping/renfe/entities.py", line 470, in _get_df_trips
    

## 3. Load scraped data

The following cell loads the scraped data from the `trips.csv` and `stops.csv` files. The `DataLoader` class is used to load the data and create a supply entity.

In [28]:
data_loader = DataLoader(stops_path='../data/renfe/2025/stop_times/stopTimes_MADRI_60911_2025-05-02_2025-05-03.csv',
                         renfe_stations_path='../data/renfe/renfe_stations.csv')

data_loader.build_supply_entities()

## 4. Save supply entities

The following cell saves the supply entities to a YAML file. The `SupplySaver` class is used to save the data in a structured format. The `to_yaml` method is used to save the data to a file.

In [29]:
supply_saver = SupplySaver(data_loader.services)
supply_saver.to_yaml(output_path='../data/renfe/2025/dummy_supply_2_April_MADRI_60911_2025.yaml')