Skip to content
This repository has been archived by the owner on Dec 9, 2020. It is now read-only.

chrisroos/southeastern-daily-performance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intro

Southeastern publish Daily Performance reports. They're not very usable as they are. This project contains a library, and some tools, to convert that data to splendid CSV.

If you're interested in analysing the data (rather than using this library to convert it) then checkout the southeastern daily performance website.

Installation

$ gem install southeastern-daily-performance

Usage

# To convert a single html report to csv
$ sedpr-to-csv <location-of-html>

# To convert a directory containing multiple html reports to multiple csv reports
$ convert-all-html-data <location-of-html-reports> <location-of-csv-output>

Examples

Explicitly download html and convert local file

$ curl "http://www.southeasternrailway.co.uk/index.php/cms/pages/view/132" > sedpr.html
$ sedpr-to-csv sedpr.html

Implicitly download html and convert

$ sedpr-to-csv http://www.southeasternrailway.co.uk/index.php/cms/pages/view/132

Notes

Combining all csv files into one big file

$ echo "Date,Problem,Scheduled departure time,Scheduled departure station,Scheduled arrival station,Affect on service" > combined.csv
$ cat /path/to/csv/files/*.csv >> combined.csv

Combining all csv overview files into one file

$ echo "Date,Services scheduled,Services run,Services within 5 minutes of schedule" > combined.overview.csv
$ cat /path/to/csv/files/*.overview.csv >> combined.overview.csv

TODO

  • 2010-04-21 breaks the parser...

  • I should now be in a position to use origin and destination stations rather than using the affect on service to parse the routes. This should remove these warnings: "Warning. Unknown, or missing, affect on service: '10:24 Dover Priory - Charing Cross'"

  • Don't generate empty csv files when the data doesn't exist (e.g 1st-5th dec 2010)

  • Don't generate csv overview files containing 0s when the data doesn't exist (e.g. 1st-5th dec 2010) - i.e. emit a warning if the necessary data can't be found.

About

Southeastern produce a daily performance report in horrendous html (http://www.southeasternrailway.co.uk/index.php/cms/pages/view/132). This converts it to csv.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages