Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Southeastern produce a daily performance report in horrendous html (http://www.southeasternrailway.co.uk/index.php/cms/pages/view/132). This converts it to csv.
Ruby
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
bin
lib
test
.gitignore
Gemfile
Gemfile.lock
README.md
Rakefile
southeastern-daily-performance.gemspec

README.md

Intro

Southeastern publish Daily Performance reports. They're not very usable as they are. This project contains a library, and some tools, to convert that data to splendid CSV.

If you're interested in analysing the data (rather than using this library to convert it) then checkout the southeastern daily performance website.

Installation

$ gem install southeastern-daily-performance

Usage

# To convert a single html report to csv
$ sedpr-to-csv <location-of-html>

# To convert a directory containing multiple html reports to multiple csv reports
$ convert-all-html-data <location-of-html-reports> <location-of-csv-output>

Examples

Explicitly download html and convert local file

$ curl "http://www.southeasternrailway.co.uk/index.php/cms/pages/view/132" > sedpr.html
$ sedpr-to-csv sedpr.html

Implicitly download html and convert

$ sedpr-to-csv http://www.southeasternrailway.co.uk/index.php/cms/pages/view/132

Notes

Combining all csv files into one big file

$ echo "Date,Problem,Scheduled departure time,Scheduled departure station,Scheduled arrival station,Affect on service" > combined.csv
$ cat /path/to/csv/files/*.csv >> combined.csv

Combining all csv overview files into one file

$ echo "Date,Services scheduled,Services run,Services within 5 minutes of schedule" > combined.overview.csv
$ cat /path/to/csv/files/*.overview.csv >> combined.overview.csv

TODO

  • 2010-04-21 breaks the parser...

  • I should now be in a position to use origin and destination stations rather than using the affect on service to parse the routes. This should remove these warnings: "Warning. Unknown, or missing, affect on service: '10:24 Dover Priory - Charing Cross'"

  • Don't generate empty csv files when the data doesn't exist (e.g 1st-5th dec 2010)

  • Don't generate csv overview files containing 0s when the data doesn't exist (e.g. 1st-5th dec 2010) - i.e. emit a warning if the necessary data can't be found.

Something went wrong with that request. Please try again.