# Renfe Scraping Software

This notebook is intended to show the process of scraping Renfe data using the `RenfeScraper` class. The class is designed to scrape data from the Renfe website and save it in a structured format. Two output files are obtained:
- `trips.csv`: Contains information about the trips, including the service ID, trip ID, and other relevant details.
- `stops.csv`: Contains information about the stops, including the stop ID, arrival and departure times, and other relevant details.

## 0. Import Libraries

In [62]:
%load_ext autoreload
%autoreload 2

import datetime
import sys

from src.robin.scraping.entities import DataLoader, SupplySaver
from src.robin.scraping.renfe.entities import RenfeScraper

sys.path.append('..')  # Set the path to the parent directory

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## 1. Scrape available stations from main menu

In [63]:
scraper = RenfeScraper(stations_csv_path='../data/renfe/renfe_stations.csv')

for station_id, station_name in scraper.available_stations.items():
    print(f'{station_id}: {station_name}')

31412: A Coruña
94707: Abrantes
60911: Alicante / Alacant
60600: Albacete
06008: Alcantarilla-Los Romanos
60400: Alcázar de San Juan
55020: Algeciras
56312: Almería
99003: Altet Bus
99115: Aguadulce Bus
87912: Aix En Provence
99114: Andorra-Bus
ANTEQ: Antequera (TODAS)
87814: Avignon
10400: Avila
37606: Badajoz
BARCE: Barcelona (TODAS)
87078: Beziers
65318: Benicassim
BILBA: Bilbao (TODAS)
54400: Bobadilla
11014: Burgos Rosa Manzano
35400: Cáceres
51405: Cádiz
70600: Calatayud
50417: Campus Rabanales
61307: Cartagena
65300: Castellón /Castelló
37200: Ciudad Real
50500: Córdoba
CUENC: Cuenca (TODAS)
92201: Denia-Bus
60905: Elda-Petrer
03410: Elche AV/Elx AV
94428: Entroncamento
92157: Estepona Bus
21010: Ferrol
79309: Figueres
79333: Figueres Bus
04307: Figueres Vilafant
69110: Gandía
GIJON: Gijón
79300: Girona
05000: Granada
GUADA: Guadalajara (TODAS)
43019: Huelva
74200: Huesca
IRUN-: Irun-Hendaya (TODAS)
80100: Pamplona/Iruña
99103: Jaca-Bus
03100: Jaén
64100: Xàtiva/Játiva
97639: Ja

In [64]:
scraper.stations_df

Unnamed: 0,stop_id,stop_name,renfe_id,stop_lat,stop_lon
0,00000,Unknown,00000,0.000000,0.000000
1,31412,A Corunya,31412,43.352761,-8.409755
2,60911,AlicanteAlacant,60911,38.344450,-0.495053
3,60600,Albacete-Los Llanos,60600,38.999384,-1.848450
4,60400,Alcazar de San Juan,60400,39.395628,-3.205744
...,...,...,...,...,...
92,13200,Bilbao-Abando Indalecio Prieto,BILBA,43.259609,-2.929150
93,66100,Cuenca,CUENC,40.067340,-2.136471
94,15410,GijonXixon,GIJON,43.535175,-5.698318
95,70200,Guadalajara,GUADA,40.644103,-3.182230


## 2. Scrape Renfe services

The following cell scrapes the Renfe services for a specific date and range of days. The `origin` and `destination` variables specify the departure and arrival stations, respectively. The `day`, `month`, and `year` variables specify the date for which the services are to be scraped. The `range_days` variable specifies the number of days to scrape.

In [75]:
origin = '60000'
destination = '71801'

day=2
month=5
year=2025
range_days=1

date = datetime.date(day=day, month=month, year=year)
scraper.scrape(origin=origin,
               destination=destination,
               init_date=date,
               range_days=1,
               save_path='../data/renfe/2025/')

Date:  2025-05-02
Search url:  https://horarios.renfe.com/HIRRenfeWeb/buscar.do?O=MADRI&D=BARCE&AF=2025&MF=05&DF=02&SF=5&ID=s
MADRID PTA. ATOCHA - ALMUDENA GRANDES   06.27
CALATAYUD 07.26 07.27
ZARAGOZA-DELICIAS 07.52 07.53
BARCELONA-SANTS 09.25  
{'MADRID PTA. ATOCHA - ALMUDENA GRANDES': (' ', '06.27'), 'CALATAYUD': ('07.26', '07.27'), 'ZARAGOZA-DELICIAS': ('07.52', '07.53'), 'BARCELONA-SANTS': ('09.25', ' ')}


StaleElementReferenceException: Message: stale element reference: stale element not found
  (Session info: chrome=134.0.6998.166); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#stale-element-reference-exception
Stacktrace:
0   chromedriver                        0x0000000102f436c8 cxxbridge1$str$ptr + 2791212
1   chromedriver                        0x0000000102f3bc9c cxxbridge1$str$ptr + 2759936
2   chromedriver                        0x0000000102a8de30 cxxbridge1$string$len + 92928
3   chromedriver                        0x0000000102a9e75c cxxbridge1$string$len + 160812
4   chromedriver                        0x0000000102a9d810 cxxbridge1$string$len + 156896
5   chromedriver                        0x0000000102a943e0 cxxbridge1$string$len + 118960
6   chromedriver                        0x0000000102a92aec cxxbridge1$string$len + 112572
7   chromedriver                        0x0000000102a95cfc cxxbridge1$string$len + 125388
8   chromedriver                        0x0000000102a95da4 cxxbridge1$string$len + 125556
9   chromedriver                        0x0000000102ad4d3c cxxbridge1$string$len + 383500
10  chromedriver                        0x0000000102aca818 cxxbridge1$string$len + 341224
11  chromedriver                        0x0000000102b165f8 cxxbridge1$string$len + 651976
12  chromedriver                        0x0000000102ac92fc cxxbridge1$string$len + 335820
13  chromedriver                        0x0000000102f086c4 cxxbridge1$str$ptr + 2549544
14  chromedriver                        0x0000000102f0b988 cxxbridge1$str$ptr + 2562540
15  chromedriver                        0x0000000102ee871c cxxbridge1$str$ptr + 2418560
16  chromedriver                        0x0000000102f0c1e8 cxxbridge1$str$ptr + 2564684
17  chromedriver                        0x0000000102ed9750 cxxbridge1$str$ptr + 2357172
18  chromedriver                        0x0000000102f2bf58 cxxbridge1$str$ptr + 2695100
19  chromedriver                        0x0000000102f2c0e0 cxxbridge1$str$ptr + 2695492
20  chromedriver                        0x0000000102f3b910 cxxbridge1$str$ptr + 2759028
21  libsystem_pthread.dylib             0x000000019abcc2e4 _pthread_start + 136
22  libsystem_pthread.dylib             0x000000019abc70fc thread_start + 8


## 3. Load scraped data

The following cell loads the scraped data from the `trips.csv` and `stops.csv` files. The `DataLoader` class is used to load the data and create a supply entity.

In [28]:
data_loader = DataLoader(stops_path='../data/renfe/2025/stop_times/stopTimes_MADRI_60911_2025-05-02_2025-05-03.csv',
                         renfe_stations_path='../data/renfe/renfe_stations.csv')

data_loader.build_supply_entities()

## 4. Save supply entities

The following cell saves the supply entities to a YAML file. The `SupplySaver` class is used to save the data in a structured format. The `to_yaml` method is used to save the data to a file.

In [29]:
supply_saver = SupplySaver(data_loader.services)
supply_saver.to_yaml(output_path='../data/renfe/2025/dummy_supply_2_April_MADRI_60911_2025.yaml')