# Renfe Scraping Software

This notebook is intended to show the process of scraping Renfe data using the `RenfeScraper` class. The class is designed to scrape data from the Renfe website and save it in a structured format. Two output files are obtained:
- `trips.csv`: Contains information about the trips, including the service ID, trip ID, and other relevant details.
- `stops.csv`: Contains information about the stops, including the stop ID, arrival and departure times, and other relevant details.

## 0. Import Libraries

In [1]:
%load_ext autoreload
%autoreload 2

import datetime
import sys

from src.robin.scraping.entities import DataLoader, SupplySaver
from src.robin.scraping.renfe.entities import RenfeScraper

sys.path.append('..')  # Set the path to the parent directory

## 1. Scrape available stations from main menu

In [2]:
scraper = RenfeScraper(stations_csv_path='../data/renfe/adif_renfe_stations.csv')

for station_id, station_name in scraper.available_stations.items():
    print(f'{station_id}: {station_name}')

31412: A Coruña
94707: Abrantes
60911: Alicante / Alacant
60600: Albacete
06008: Alcantarilla-Los Romanos
60400: Alcázar de San Juan
55020: Algeciras
56312: Almería
99003: Altet Bus
99115: Aguadulce Bus
87912: Aix En Provence
99114: Andorra-Bus
ANTEQ: Antequera (TODAS)
87814: Avignon
10400: Avila
37606: Badajoz
BARCE: Barcelona (TODAS)
87078: Beziers
65318: Benicassim
BILBA: Bilbao (TODAS)
54400: Bobadilla
11014: Burgos Rosa Manzano
35400: Cáceres
51405: Cádiz
70600: Calatayud
50417: Campus Rabanales
61307: Cartagena
65300: Castellón /Castelló
37200: Ciudad Real
50500: Córdoba
CUENC: Cuenca (TODAS)
92201: Denia-Bus
60905: Elda-Petrer
03410: Elche AV/Elx AV
94428: Entroncamento
92157: Estepona Bus
21010: Ferrol
79309: Figueres
79333: Figueres Bus
04307: Figueres Vilafant
69110: Gandía
GIJON: Gijón
79300: Girona
05000: Granada
GUADA: Guadalajara (TODAS)
43019: Huelva
74200: Huesca
IRUN-: Irun-Hendaya (TODAS)
80100: Pamplona/Iruña
99103: Jaca-Bus
03100: Jaén
64100: Xàtiva/Játiva
97639: Ja

In [3]:
scraper.stations_df

Unnamed: 0,ADIF_ID,RENFE_ID,STATION_NAME,LATITUD,LONGITUD,DIRECION,CP,POBLACION,PROVINCIA,PAIS,CERCANIAS,FEVE,COMUN
0,01001,01001,EL SORBITO (APD-CGD),37.208475,-5.706642,,,ALCALÁ DE GUADAÍRA,SEVILLA,ESPAÑA,NO,NO,
1,01002,01002,LA TRINIDAD (APT-CGD),,,,,ALCALÁ DE GUADAÍRA,SEVILLA,ESPAÑA,NO,NO,
2,01003,01003,ARAHAL,37.268081,-5.548514,"Calle Virgen de los Dolores, S/N",41600,ARAHAL,SEVILLA,ESPAÑA,NO,NO,
3,01004,01004,PARADAS (APD-CGD),,,,,PARADAS,SEVILLA,ESPAÑA,NO,NO,
4,01005,01005,MARCHENA,37.334282,-5.425519,"Avenida Maestro Santos Ruano, 8",41620,MARCHENA,SEVILLA,ESPAÑA,NO,NO,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2772,99501,99501,ANDORRA,,,,,ANDORRA,PONTEVEDRA,ESPAÑA,NO,NO,
2773,99800,99800,CERCEDILLA TURÍSTICO,,,,,CERCEDILLA,MADRID,ESPAÑA,SI,NO,
2774,99801,99801,PUERTO NAVACERRADA,,,,,NAVACERRADA,MADRID,ESPAÑA,NO,NO,
2775,99802,99802,COTOS,,,,,RASCAFRÍA,MADRID,ESPAÑA,NO,NO,


## 2. Scrape Renfe services

The following cell scrapes the Renfe services for a specific date and range of days. The `origin` and `destination` variables specify the departure and arrival stations, respectively. The `day`, `month`, and `year` variables specify the date for which the services are to be scraped. The `range_days` variable specifies the number of days to scrape.

In [4]:
origin = '60000'
destination = '71801'

day=5
month=6
year=2025
range_days=1

date = datetime.date(day=day, month=month, year=year)
scraper.scrape(origin=origin,
               destination=destination,
               init_date=date,
               range_days=1,
               all_pairs=False,
               save_path='../data/renfe/2025/')

[32m2025-04-05 00:41:11.420[0m | [1mINFO    [0m | [36msrc.robin.scraping.renfe.entities[0m:[36mscrape_trips[0m:[36m768[0m - [1mScraping trips for MADRI - BARCE on 2025-06-05[0m
[32m2025-04-05 00:41:11.420[0m | [1mINFO    [0m | [36msrc.robin.scraping.renfe.entities[0m:[36m_get_renfe_schedules_url[0m:[36m323[0m - [1mhttps://horarios.renfe.com/HIRRenfeWeb/buscar.do?O=MADRI&D=BARCE&AF=2025&MF=06&DF=05&SF=4&ID=s[0m
[32m2025-04-05 00:41:25.477[0m | [32m[1mSUCCESS [0m | [36msrc.robin.scraping.renfe.entities[0m:[36mscrape[0m:[36m673[0m - [32m[1mScraped 109 trips between MADRI and BARCE from 2025-06-05 to 2025-06-06[0m
[32m2025-04-05 00:41:25.478[0m | [1mINFO    [0m | [36msrc.robin.scraping.renfe.entities[0m:[36mscrape[0m:[36m674[0m - [1mFirst 5 rows of trips:
               service_id stop_id  arrival  departure
0  06301_05-06-2025-06.12   60000        0          0
1  06301_05-06-2025-06.12   71801      157        157
2  03063_05-06-2025-06.27  

## 3. Load scraped data

The following cell loads the scraped data from the `trips.csv` and `stops.csv` files. The `DataLoader` class is used to load the data and create a supply entity.

In [5]:
data_loader = DataLoader(stops_path='../data/renfe/2025/stop_times/stopTimes_MADRI_BARCE_2025-06-05_2025-06-06.csv',
                         renfe_stations_path='../data/renfe/adif_renfe_stations.csv')

data_loader.build_supply_entities()

In [7]:
for service in data_loader.services:
    print(service)

Service id: 06301_05-06-2025-06.12 
	Date of service: 2025-06-05 
	Stops: ['60000', '71801'] 
	Line times (relative): [(0, 0), (157, 157)] 
	Line times (absolute): [('06:12', '06:12'), ('08:49', '08:49')] 
	Train Service Provider: [1, Renfe, ['1']] 
	Time Slot: [37210, 6:12:00, 6:22:00, 6:17:00, 0:10:00] 
	Rolling Stock: [1, S-114, {1: 250, 2: 50}] 
	Prices: 
		('60000', '71801'): {Básica: 25.0} 
	Tickets sold (seats): {Básica: 0} 
	Tickets sold (hard type): {1: 0, 2: 0} 
	Tickets sold per each pair (seats): {('60000', '71801'): {Básica: 0}} 
	Tickets sold per each pair (hard type): {('60000', '71801'): {1: 0}} 
	Capacity constraints: None 

Service id: 03063_05-06-2025-06.27 
	Date of service: 2025-06-05 
	Stops: ['60000', '70600', '04040', '71801'] 
	Line times (relative): [(0, 0), (59, 60), (85, 86), (178, 178)] 
	Line times (absolute): [('06:27', '06:27'), ('07:26', '07:27'), ('07:52', '07:53'), ('09:25', '09:25')] 
	Train Service Provider: [1, Renfe, ['1']] 
	Time Slot: [38710, 6:

## 4. Save supply entities

The following cell saves the supply entities to a YAML file. The `SupplySaver` class is used to save the data in a structured format. The `to_yaml` method is used to save the data to a file.

In [6]:
supply_saver = SupplySaver(data_loader.services)
supply_saver.to_yaml(output_path='../data/renfe/2025/supply_2025_06_06_MADRI_BARCE.yaml')