I have always been a big fan of city bike sharing services. I use the 'Bicing' service in Barcelona and I think it's an immensely practical way of getting around a modern city. While the data set released by the 'Bicing' service is less than impressive, I see that the DC bike share program has put out a rather clean data set which could be useful to visualize a few interesting trends. This data set is fairly old now, but I doubt human behavior in urban settings has changed significantly in the last couple of years. Any conclusions that we might draw from this must essentially hold, at least for the DC area, and to a lesser extent, other cities (think different weather patterns, mobility cultures, etc.).  

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

# Any results you write to the current directory are saved as output.

A couple more imports to help us plot data

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Now, lets import the data, change the index to datetime and tell python to process the index as a date time object. 

In [None]:
data = pd.read_csv("../input/train.csv", index_col = 'datetime', parse_dates = True)
data.head()

In [None]:
data.describe()

The describe method gives us a nice little summary of our data. We see for instance that the mean temperature in DC area is 20° C (the max is 41!), the biking service is used a 192 times every hour on an average and so on...

Now, lets see how the bike usage changes across seasons. Clearly, the warmer months should have a higher usage. We will plot the average temperatures from each season to see it alongside the corresponding bike usage. We can also split the user base into casual and registered users to see if there are any differences in  behavior patterns. Perhaps, registered users remain committed even in the colder seasons as they are likely to be using the bikes for their daily commute?  

In [None]:

by_season = data.groupby('season').mean()
by_season.index = ['spring', 'summer', 'fall', 'winter']
fig = plt.figure()
ax1 = fig.add_subplot(2, 2, 2)
by_season['count'].plot(kind = 'bar')
ax1.set_ylabel('mean_count')
ax1.set_title('Total users')
ax2 = fig.add_subplot(2, 2, 1)
by_season['temp'].plot(kind = 'bar')
ax2.set_ylabel('mean_temp')
ax2.set_title('Temperature')
ax3 = fig.add_subplot(2, 2, 3)
by_season['registered'].plot(kind = 'bar')
ax3.set_ylabel('mean_registered')
ax3.set_title('Registered users')
ax4 = fig.add_subplot(2, 2, 4)
by_season['casual'].plot(kind = 'bar')
ax4.set_ylabel('mean_casual')
ax4.set_title('Casual users')
plt.tight_layout()
plt.show()

The first thing we notice of course is that there are more users in fall and summer as compared to spring and winter. At first I was surprised that winter months have more users than in spring, but plotting the temperatures, we see that the way seasons are defined in this data set, the average temperature in spring is actually lower than in winter. We also see that the registered users do remain more committed to their biking habits in the colder months, just as we expected. Now since our hypothesis is that the registered users are using the service for their commutes, we should be able to visualize this data by plotting the hourly use.  

In [None]:

by_time = data.groupby(data.index.time).mean()
fig2 = plt.figure()
ax = fig2.add_subplot(1,1,1)
by_time['count'].plot(label = 'total_count')
by_time['registered'].plot(label = 'registered')
by_time['casual'].plot(label = 'casual')
ax.set_ylabel('hourly mean')
ax.legend(loc = 'upper left')
plt.tight_layout()
plt.show()

With registered users, we clearly see the double peak structure corresponding to the commutes to and back from work. There is also slightly higher usage on the way back, rather than on the way in. I can completely relate to being in a hurry to make a meeting in the morning and ditching the bike for the metro. It's good to know there are others out there! 

Plotting the data over weekdays should give us a drop off in the registered users as they are mostly using the service for their commute to work...  

In [None]:

by_weekday = data.groupby(data.index.dayofweek).mean()
by_weekday.index = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
fig3 = plt.figure()
ax = fig3.add_subplot(1,1,1)
by_weekday['count'].plot(label = 'total_count')
by_weekday['registered'].plot(label = 'registered')
by_weekday['casual'].plot(label = 'casual')
ax.set_ylabel('hourly mean')
ax.legend(loc = 'best')
plt.tight_layout()
plt.show()

Voila! 

One other classification that the data set provides is what I will call 'type of day'. This is a little weird really, as heavy rain for one hour during the day presumably results in a very different behavior from a day when it pours most of the day, and yet I assume that the days are classified the same. 

In [None]:
by_daytype = data.groupby('weather').mean()
by_daytype.index = ['Clear', 'Cloudy', 'Light snow / rain', 'Heavy snow / rain']
fig4 = plt.figure()
ax1 = fig4.add_subplot(1,3,1)
by_daytype['count'].plot(kind = 'bar')
ax1.set_title('Total users')
ax2 = fig4.add_subplot(1,3,2)
by_daytype['registered'].plot(kind = 'bar')
ax2.set_title('Registered users')
ax3 = fig4.add_subplot(1,3,3)
by_daytype['casual'].plot(kind = 'bar')
ax3.set_title('Casual users')
plt.tight_layout()
plt.show()

While the casual users seem to respond to weather conditions of the day appropriately, registered users seem rather resilient to heavy rain days. One explanation could be that the light snow/rain days are more likely to occur in winter and spring months when the overall usage is lower as we saw above. And perhaps the heavy rain days occur mostly in fall/spring months (and only for a few hours a day), when the users are more committed to their biking habits.  