Southeastern publish Daily Performance reports. They're not very usable as they are. This project contains a library, and some tools, to convert that data to splendid CSV.
If you're interested in analysing the data (rather than using this library to convert it) then checkout the southeastern daily performance website.
$ gem install southeastern-daily-performance
# To convert a single html report to csv
$ sedpr-to-csv <location-of-html>
# To convert a directory containing multiple html reports to multiple csv reports
$ convert-all-html-data <location-of-html-reports> <location-of-csv-output>
$ curl "http://www.southeasternrailway.co.uk/index.php/cms/pages/view/132" > sedpr.html
$ sedpr-to-csv sedpr.html
$ sedpr-to-csv http://www.southeasternrailway.co.uk/index.php/cms/pages/view/132
$ echo "Date,Problem,Scheduled departure time,Scheduled departure station,Scheduled arrival station,Affect on service" > combined.csv
$ cat /path/to/csv/files/*.csv >> combined.csv
$ echo "Date,Services scheduled,Services run,Services within 5 minutes of schedule" > combined.overview.csv
$ cat /path/to/csv/files/*.overview.csv >> combined.overview.csv
-
2010-04-21 breaks the parser...
-
I should now be in a position to use origin and destination stations rather than using the affect on service to parse the routes. This should remove these warnings: "Warning. Unknown, or missing, affect on service: '10:24 Dover Priory - Charing Cross'"
-
Don't generate empty csv files when the data doesn't exist (e.g 1st-5th dec 2010)
-
Don't generate csv overview files containing 0s when the data doesn't exist (e.g. 1st-5th dec 2010) - i.e. emit a warning if the necessary data can't be found.