# Renfe Scraping

This notebook is intended to show the process of scraping Renfe data using the `RenfeScraper` class. The class is designed to scrape data from the Renfe website and save it in a structured format. Two output files are obtained:
- `trips.csv`: Contains information about the trips, including the service ID, stops, and other relevant details.
- `stops.csv`: Contains information about the stops, including the stop ID, arrival and departure times, and other relevant details.

## 0. Import Libraries

In [1]:
%load_ext autoreload
%autoreload 2

import datetime

from robin.scraping.entities import DataLoader
from robin.scraping.renfe.entities import RenfeScraper
from robin.supply.entities import Supply
from robin.supply.saver.entities import SupplySaver

import ipywidgets as widgets
from IPython.display import display

## 1. Scrape available stations from main menu

The next cell initializes an instance of the `RenfeScraper` class and iterates through the available stations provided by the scraper. For each station, it prints the station ID and station name. This step is useful for identifying the station codes required for scraping specific routes in later steps.

In [2]:
scraper = RenfeScraper()

for station_id, station_name in scraper.available_stations.items():
    print(f'{station_id}: {station_name}')

31412: A Coruña
94707: Abrantes
60911: Alicante / Alacant
60600: Albacete
06008: Alcantarilla-Los Romanos
60400: Alcázar de San Juan
55020: Algeciras
56312: Almería
99003: Altet Bus
99115: Aguadulce Bus
87912: Aix En Provence
99114: Andorra-Bus
02003: Antequera (TODAS)
87814: Avignon
10400: Avila
37606: Badajoz
71801: Barcelona (TODAS)
87078: Beziers
65318: Benicassim
13200: Bilbao (TODAS)
54400: Bobadilla
11014: Burgos Rosa Manzano
35400: Cáceres
51405: Cádiz
70600: Calatayud
50417: Campus Rabanales
61307: Cartagena
65300: Castellón /Castelló
37200: Ciudad Real
50500: Córdoba
66100: Cuenca (TODAS)
92201: Denia-Bus
60905: Elda-Petrer
03410: Elche AV/Elx AV
94428: Entroncamento
92157: Estepona Bus
21010: Ferrol
79309: Figueres
79333: Figueres Bus
04307: Figueres Vilafant
69110: Gandía
15410: Gijón
79300: Girona
05000: Granada
70200: Guadalajara (TODAS)
43019: Huelva
74200: Huesca
11600: Irun-Hendaya (TODAS)
80100: Pamplona/Iruña
99103: Jaca-Bus
03100: Jaén
64100: Xàtiva/Játiva
97639: Ja

## 2. Scrape Renfe services



To help in the selection of the selection of the origin and destination, the next cell creates two dropdown widgets using the `ipywidgets` library to allow the user to select an origin and a destination station. The `available_stations` dictionary is used to populate the dropdown options, where the keys are station names and the values are station IDs needed to scrape them.

- The `origin_dropdown` widget is initialized with a default value of `'60000'` (`Madrid (TODAS)`).
- The `destination_dropdown` widget is initialized with a default value of `'71801'` (`Barcelona (TODAS)`).

In [3]:
available_stations = {station_name: station_id for station_id, station_name in scraper.available_stations.items()}

origin = widgets.Dropdown(
    options=available_stations,
    description='Origin:',
    style={'description_width': 'initial'},
    value='60000' # Madrid (TODAS)
)
destination = widgets.Dropdown(
    options=available_stations,
    description='Destination:',
    style={'description_width': 'initial'},
    value='71801' # Barcelona (TODAS)
)

display(origin, destination)

Dropdown(description='Origin:', index=65, options={'A Coruña': '31412', 'Abrantes': '94707', 'Alicante / Alaca…

Dropdown(description='Destination:', index=16, options={'A Coruña': '31412', 'Abrantes': '94707', 'Alicante / …

In [4]:
print('Origin:', origin.value, '->', scraper.available_stations[origin.value])
print('Destination:', destination.value, '->', scraper.available_stations[destination.value])

Origin: 60000 -> Madrid (TODAS)
Destination: 71801 -> Barcelona (TODAS)


The following cell scrapes the Renfe services for a specific date and range of days. The `origin` and `destination` variables specify the departure and arrival stations, respectively. The `day`, `month`, and `year` variables specify the date for which the services are to be scraped. The `range_days` variable specifies the number of days to scrape.

In [5]:
day=1
month=6
year=2025
range_days=1

date = datetime.date(day=day, month=month, year=year)
scraper.scrape(
    origin=origin.value,
    destination=destination.value,
    init_date=date,
    range_days=1,
    all_pairs=False,
    save_path='../data/scraping/renfe/'
)

[32m2025-04-23 14:30:29.239[0m | [1mINFO    [0m | [36mrobin.scraping.renfe.entities[0m:[36mscrape_trips[0m:[36m793[0m - [1mScraping trips for MADRI - BARCE on 2025-06-01[0m
[32m2025-04-23 14:30:29.239[0m | [1mINFO    [0m | [36mrobin.scraping.renfe.entities[0m:[36m_get_renfe_schedules_url[0m:[36m320[0m - [1mhttps://horarios.renfe.com/HIRRenfeWeb/buscar.do?O=MADRI&D=BARCE&AF=2025&MF=06&DF=01&SF=7&ID=s[0m
[32m2025-04-23 14:30:39.309[0m | [32m[1mSUCCESS [0m | [36mrobin.scraping.renfe.entities[0m:[36mscrape[0m:[36m692[0m - [32m[1mScraped 22 trips between MADRI and BARCE from 2025-06-01 to 2025-06-02[0m
[32m2025-04-23 14:30:39.310[0m | [1mINFO    [0m | [36mrobin.scraping.renfe.entities[0m:[36mscrape[0m:[36m693[0m - [1mFirst five trips:
               service_id stop_id  arrival  departure
0  06301_01-06-2025-06.27   60000        0          0
1  06301_01-06-2025-06.27   70600       59         60
2  06301_01-06-2025-06.27   04040       85       

## 3. Load scraped data

To load the scraped data, the `DataLoader` class is used. It reads the stop times and prices data from the specified CSV files and initializes the `data_loader` object. The `seat_components` and `seat_quantity` dictionaries are used to map the seat types in the prices CSV to their availability.

In [6]:
# The different seats with their hard and soft types
seat_components = {
    'Básica': (1, 1),
    'Básico': (1, 1),
    'Elige': (1, 2),
    'Elige Confort': (1, 3),
    'Prémium': (2, 4)
}

# The number of seats available for each hard type
seat_quantity = {
    1: 250,
    2: 50
}

end_date = date + datetime.timedelta(days=range_days)
data_loader = DataLoader(
    stops_path=f'../data/scraping/renfe/stopTimes/stopTimes_{origin.value}_{destination.value}_{date}_{end_date}.csv',
    prices_path=f'../data/scraping/renfe/prices/prices_{origin.value}_{destination.value}_{date}_{end_date}.csv',
    seat_components=seat_components,
    seat_quantity=seat_quantity
)

In [7]:
for service in data_loader.services:
    print(service)

Service id: 06301_01-06-2025-06.27 
	Date of service: 2025-06-01 
	Stops: ['60000', '70600', '04040', '71801'] 
	Line times (relative): [(0, 0), (59, 60), (85, 86), (178, 178)] 
	Line times (absolute): [('06:27', '06:27'), ('07:26', '07:27'), ('07:52', '07:53'), ('09:25', '09:25')] 
	Train Service Provider: Renfe 
	Time Slot: 38710 
	Rolling Stock: S-114 
	Prices: 
		('60000', '71801'): {Básica: 69.0} 
	Tickets sold (seats): 
		Básica: 0 
	Tickets sold (hard type): 
		1: 0
		2: 0 
	Tickets sold per each pair (seats): 
		('60000', '70600'): {Básica: 0}
		('60000', '04040'): {Básica: 0}
		('60000', '71801'): {Básica: 0}
		('70600', '04040'): {Básica: 0}
		('70600', '71801'): {Básica: 0}
		('04040', '71801'): {Básica: 0} 
	Capacity constraints: None 

Service id: 03073_01-06-2025-07.27 
	Date of service: 2025-06-01 
	Stops: ['60000', '04007', '04040', '78400', '04104', '71801', '79206', '04307'] 
	Line times (relative): [(0, 0), (25, 26), (84, 85), (128, 130), (157, 159), (199, 207), (245

## 4. Save supply entities

The following cell saves the supply into a YAML file using the `SupplySaver` class, which can be used as input of Robin. The file is named based on the origin, destination, and date range.

In [8]:
supply_saver = SupplySaver(services=data_loader.services)
supply_saver.to_yaml(output_path=f'../data/scraping/renfe/supply_{origin.value}_{destination.value}_{date}_{end_date}.yaml')

Finally, it is loaded the supply data from the previously saved YAML file.

In [9]:
supply = Supply.from_yaml(path=f'../data/scraping/renfe/supply_{origin.value}_{destination.value}_{date}_{end_date}.yaml')