This notebook supposes you have a folder called `challenge_data`, obtained from `http://challenge.openbikes.co/`, in the same directory as the notebook.

In [1]:
import pandas as pd

## Utility functions

In [2]:
def get_station_updates(city, station):
    df = pd.read_csv('challenge_data/{}/stations/{}.csv'.format(city, station))
    df['moment'] = pd.to_datetime(df['moment'])
    return df

## Submission

In [3]:
# Compute the mean number of bikes at a station at each hour of the day
def mean_bikes_per_hour(city, station):
    df = get_station_updates(city, station)
    df['hour'] = df['moment'].apply(lambda x: x.hour)
    means = {
        hour: rows['bikes'].mean()
        for hour, rows in df.groupby('hour')
    }
    return means

In [12]:
print(mean_bikes_per_hour('toulouse', '00003-pomme'))

{0: 3.301532033426184, 1: 2.312913907284768, 2: 2.1444533120510774, 3: 1.563621533442088, 4: 0.9473276529821844, 5: 0.9241748438893844, 6: 1.2878048780487805, 7: 1.2994555353901995, 8: 2.949879518072289, 9: 7.8939597315436245, 10: 11.791435768261964, 11: 12.976853836262324, 12: 13.962654218072691, 13: 11.69225721784777, 14: 11.180851063829786, 15: 11.939585528626624, 16: 12.267051238547676, 17: 12.132993095166617, 18: 11.399108138238573, 19: 6.370210314030539, 20: 5.279234290470246, 21: 6.254257193188491, 22: 6.568677792041078, 23: 5.235756385068762}


Let's run this for every station to predict.

In [13]:
to_predict_df = pd.read_csv('challenge_data/test-blank.csv', index_col=0)
to_predict_df['moment'] = pd.to_datetime(to_predict_df.index)
to_predict_df['hour'] = to_predict_df['moment'].apply(lambda x: x.hour)

Build a dictionary with the name of the cities and the stations to predict.

In [11]:
means = {
    city_name: {
        station_name: mean_bikes_per_hour(city_name, station_name)
        for station_name in rows['station'].unique()
    }
    for city_name, rows in to_predict_df.groupby('city')
}

In [16]:
to_predict_df['bikes'] = to_predict_df.apply(lambda r: means[r['city']][r['station']][r['hour']], axis=1).tolist()

In [17]:
del to_predict_df['moment']
del to_predict_df['hour']

In [18]:
to_predict_df

Unnamed: 0,city,station,bikes
2016-10-05 10:10:00,toulouse,00229-iut-rangueil,9.800427
2016-10-05 10:30:00,toulouse,00229-iut-rangueil,9.800427
2016-10-05 11:00:00,toulouse,00229-iut-rangueil,9.773834
2016-10-05 12:00:00,toulouse,00229-iut-rangueil,9.363546
2016-10-05 14:00:00,toulouse,00229-iut-rangueil,9.591378
2016-10-05 18:00:00,toulouse,00229-iut-rangueil,7.392437
2016-10-06 10:00:00,toulouse,00229-iut-rangueil,9.800427
2016-10-07 10:00:00,toulouse,00229-iut-rangueil,9.800427
2016-10-08 10:00:00,toulouse,00229-iut-rangueil,9.800427
2016-10-09 10:00:00,toulouse,00229-iut-rangueil,9.800427
