Salvador Rocher Espinosa
Data Analytics, August 2019, Barcelona
As someone who lives and works in a Spanish city 400km away from home, I have found that the most convenient way to travel back and forth is to resort to the train. As a frequent user I have grown baffled of the pricing pattern upon buying the tickets, moving sometimes along the same levels, while others out of the most common levels.
Hence, this stirred me to know more about the Spanish long distance railway transportation pricing system.
“Do train ticket prices really change over the days”?
And if so,
“Is there an optimal moment to buy them?”
The initial hypothesis is that prices really change over days, in particular, they move up as departure day approaches.
Bonus question: "Are there relevant intraday price ticket differences?"
In this project, only Renfe’s long distances routes were considered.
The dataset is sourced from a Renfe scrapping process carried over by thegurus.tech (link below), where prices for the sampled routes departing trains where checked several times on loop each day. In particular, the trains whose priced were checked range near 3 months, from April 12th, 2019 to July 7th, 2019.
This is the workflow I envisaged for this project:
- Question formulation
- Data fetch
- Getting to know the raw data
- Data wrangling
- Data analysis and visualizations
- Conclusions
- Presentation
And this is the correspondence with the ipynb files that can be found in "The-code" folder.
- Getting to know data 1 --> 3)
- Getting to know data 2 --> 3)
- Data wrangling 1 --> 4)
- Data wrangling 2 --> 4)
- Paper. Analysis + figures --> 5), 6)
Note. 1) and 2) do not have correspondence with ipynb files, they have been explained here in this README file. 7) Presentation can be found in the repo along this README file.