Citybike usage statistics #1

Open
ilarischeinin opened this Issue Oct 20, 2016 · 0 comments

Projects

None yet

1 participant

@ilarischeinin
ilarischeinin commented Oct 20, 2016 edited

The HSL Developer Community page links to Citybike usage statistics (warning: page over 8 MB) and mentions they are "experimental". I downloaded them and just thought I'd share here some things I discovered.

The data is available as small json files with timestamps included in the filenames. Some of them are packaged into zip files (at the moment for May, June, and July) and the rest listed directly on that page. The first available file is stats_20160510T095601Z:

{"total_rentals":18235,"total_distance":49107964,"total_duration":24587450}

The timestamp in the file name represents 2016-05-10 at 09:56:01, and if I recall correctly, the system opened on 2 May, so some data is missing from the very beginning. As shown above, all files include three statistics: total_rentals, total_distance, and total_duration. The variable names sound like they are cumulative totals for the moment in time depicted by the timestamps in the file names. However, this is what they look like plotted over time:

1

Some apparent issues:

  1. There are negative values for total_duration in May.
  2. For sure there wasn't over 1 billion total_rentals in May.
  3. There are clear discontinuities (and the values are sometimes going down), which one wouldn't expect from cumulative totals.

Leaving out the month of May, the data looks like this:

2

And since total_rentals isn't really visible on that scale, here's that one separately:

3

The data is clearly not increasing monotonically as one would expect for cumulative totals.

To see in more detail how those decreasing values happen, here is a look at the few very first days, from 10 May until 14 May:

4

The transition between 11 and 12 May looks like one would expect: most rentals happening at daytime and a slowdown during the night. All the others however look weird: the numbers drop, and pretty much erase the day's increase in the total. These drops seem to happen at exactly 21:00:00, which in UTC time corresponds to midnight in Finland during the daylight savings time.

I'm not sure what to take from this, but just thought I'd share it. The kind folks at HSLdevcom created this repository so that I could post the issue here.

If anyone is interested, the data I downloaded (from 2016-05-10 09:56:01 until 2016-10-18 17:38:01) is available here: citybike-stats.csv.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment