# Edinburgh bike counter

Arjan Geers

Analysis of [Edinburgh bike counter data](https://data.edinburghopendata.info/dataset/bike-counter-data-set-cluster).

## Preamble

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import calendar
import math
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from bikecounter.data import get_edinburgh_bike_counter_data

In [None]:
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

%matplotlib inline
plt.style.use('seaborn')

## Get data

Download and read the data of all bike counters. Each counter has one or more 'channels', which correspond to directions such as northbound and southbound. For now, we'll just consider the total count across all channels.

In [None]:
df = get_edinburgh_bike_counter_data()
df.tail()

In [None]:
# Define variables for later convenience
daily = df.resample('D').sum()
counters = list(df.columns)
n_counters = len(counters)
n_cols = 5  # for grid plots
n_rows = math.ceil(n_counters / n_cols)  # for grid plots

## Bike counter operating days

When were the bike counters turned on and counting bikes?

In [None]:
def get_datetime_ranges(datetimes, bin_size='D'):
    """Given a list of sorted datetimes, return a list
    of datetime ranges [(start, end), (...)] that
    correspond to periods without gaps at the bin_size
    resolution.
    
    """
    datetime_ranges = []
    start = datetimes[0]
    for i in range(1, len(datetimes)):
        if (datetimes[i].to_period(bin_size) -
            datetimes[i - 1].to_period(bin_size)).n > 1:
            end = datetimes[i - 1]
            datetime_ranges.append((start, end))
            start = datetimes[i]
    end = datetimes[-1]
    datetime_ranges.append((start, end))
    return datetime_ranges

In [None]:
fig, ax = plt.subplots(figsize=(8, 12))
for y, counter in enumerate(counters):
    operating_days = daily[counter].loc[daily[counter] > 0].index
    operating_day_ranges = get_datetime_ranges(operating_days, bin_size='D')
    for operating_day_range in operating_day_ranges:
        ax.hlines(y, operating_day_range[0], operating_day_range[1])

ax.set_title('Bike counter operating days')
plt.yticks(range(n_counters), counters);

We see two distinct sets of bike counters. In early 2015, the first set stopped operating and the second started. Only two bike counters span the entire time period. We also see there are many large and small data gaps.

Are there any days of the week on which the bike counters were operating less (or more)?

In [None]:
fig, ax = plt.subplots(n_rows, n_cols, figsize=(16, 16))
for i, counter in enumerate(counters):
    current_ax = ax[i // n_cols][i % n_cols]
    operating_days = daily[counter].loc[daily[counter] > 0].index
    operating_dayofweek_counts = operating_days.dayofweek.value_counts().sort_index()
    operating_dayofweek_counts.plot(ax=current_ax,
                                    kind='bar',
                                    title=counter,
                                    ylim=(0, 350))
    current_ax.set_xticklabels(list(calendar.day_abbr), rotation=0);
    
fig.suptitle('Distribution of operating days over the days of the week', y=1.02)
plt.tight_layout()

All days appear to be equally represented in the dataset, so we can just average over all available data to look at hourly and daily trends.

## Hourly and daily trends

Let's look at the hourly trend first.

In [None]:
hourly_trend = df.pivot_table(index=df.index.hour, aggfunc='mean')

fig, ax = plt.subplots(n_rows, n_cols, figsize=(16, 16))
for i, counter in enumerate(counters):
    current_ax = ax[i // n_cols][i % n_cols]
    hourly_trend[counter].plot(ax=current_ax,
                               title=counter,
                               ylim=(0, 125))
    
fig.suptitle('Hourly trend of bike activity', y=1.02)
plt.tight_layout()

By keeping the limits of the y-axis the same, we can easily see that some roads are much busier than others. Nearly all roads show a bimodal traffic pattern with a peak in the morning and a peak in the afternoon, corresponding to rush hour.

How about the daily trend?

In [None]:
daily_trend = df.pivot_table(index=df.index.dayofweek, aggfunc='mean')

fig, ax = plt.subplots(n_rows, n_cols, figsize=(16, 16))
for i, counter in enumerate(counters):
    current_ax = ax[i // n_cols][i % n_cols]
    daily_trend[counter].plot(ax=current_ax,
                              kind='bar',
                              title=counter)
    current_ax.set_xticklabels(list(calendar.day_abbr), rotation=0);
    
fig.suptitle('Daily trend of bike activity', y=1.02)
plt.tight_layout()

Note that we use flexible y-axis limits to highlight the relative variation throughout the week. We see that for most counters the bike activity is much higher on weekdays than in the weekend. Silverknowes, Cramond, and Dalmeny - to the northwest of Edinburgh - are more popular with cyclists in the weekend. 