In [1]:
%matplotlib inline

import pandas as pd
import matplotlib.pyplot as plt

plt.rcParams['figure.figsize'] = (15, 5)




                the kernel may be left running.  Please let us know
                about your system (bitness, Python, etc.) at
                ipython-dev@scipy.org
  ipython-dev@scipy.org""")


Lets us go back to the bike dataset

# 4.1 Adding a 'weekday' column to our dataframe

First, we need to load up the data again with `read_csv` and the good parameters. 

In [4]:
df = pd.read_csv('../data/bikes.csv', encoding='latin1', sep=';', parse_dates=True, index_col='Date', dayfirst=True)
df

Unnamed: 0_level_0,Berri 1,Brébeuf (données non disponibles),Côte-Sainte-Catherine,Maisonneuve 1,Maisonneuve 2,du Parc,Pierre-Dupuy,Rachel1,St-Urbain (données non disponibles)
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2012-01-01,35,,0,38,51,26,10,16,
2012-01-02,83,,1,68,153,53,6,43,
2012-01-03,135,,2,104,248,89,3,58,
2012-01-04,144,,1,116,318,111,8,61,
2012-01-05,197,,2,124,330,97,13,95,
2012-01-06,146,,0,98,244,86,4,75,
2012-01-07,98,,2,80,108,53,6,54,
2012-01-08,95,,1,62,98,64,11,63,
2012-01-09,244,,2,165,432,198,12,173,
2012-01-10,397,,3,238,563,275,18,241,


Next up, we're just going to look at the Berri bike path. Berri is a street in Montreal, with a pretty important bike path. I use it mostly on my way to the library now, but I used to take it to work sometimes when I worked in Old Montreal. 

So we're going to create a dataframe with just the Berri bikepath in it. In order not to modify the original dataframe, make a copy of the selected column

In [5]:
dfberri = df['Berri 1'].copy()
dfberri[0:3]

Date
2012-01-01     35
2012-01-02     83
2012-01-03    135
Name: Berri 1, dtype: int64

Next, we need to add a 'weekday' column. Firstly, we can get the weekday from the index. We haven't talked about indexes yet, but the index is what's on the left on the above dataframe, under 'Date'. It's basically all the days of the year.

You can see that actually some of the days are missing -- only 310 days of the year are actually there. Who knows why.

Pandas has a bunch of really great time series functionality, so if we wanted to get the day of the month for each row, we could do it like this:

In [6]:
#from datetime import datetime
#now = datetime.today()
#now.weekday()
dfberri[0:6].index


DatetimeIndex(['2012-01-01', '2012-01-02', '2012-01-03', '2012-01-04',
               '2012-01-05', '2012-01-06'],
              dtype='datetime64[ns]', name='Date', freq=None)

In [7]:
dfberri[0:6].index.weekday

Int64Index([6, 0, 1, 2, 3, 4], dtype='int64', name='Date')

We actually want the weekday, though:

In [8]:
j_semaine = dfberri.index.weekday_name
j_semaine

#df1[jour] = dfberri.index.weekday_name

Index(['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday',
       'Saturday', 'Sunday', 'Monday', 'Tuesday',
       ...
       'Saturday', 'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday',
       'Friday', 'Saturday', 'Sunday', 'Monday'],
      dtype='object', name='Date', length=310)

These are the days of the week, where 0 is Monday. I found out that 0 was Monday by checking on a calendar.

Now that we know how to *get* the weekday, we can add it as a column in our dataframe like this:

In [9]:
df1 = pd.DataFrame(dfberri, columns=['jour semaine', 'Berri 1'])
df1['jour semaine'] = j_semaine
df1

Unnamed: 0_level_0,jour semaine,Berri 1
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2012-01-01,Sunday,35
2012-01-02,Monday,83
2012-01-03,Tuesday,135
2012-01-04,Wednesday,144
2012-01-05,Thursday,197
2012-01-06,Friday,146
2012-01-07,Saturday,98
2012-01-08,Sunday,95
2012-01-09,Monday,244
2012-01-10,Tuesday,397


# 4.2 Adding up the cyclists by weekday

This turns out to be really easy!

Dataframes have a `.groupby()` method that is similar to SQL groupby, if you're familiar with that. I'm not going to explain more about it right now -- if you want to to know more, [the documentation](http://pandas.pydata.org/pandas-docs/stable/groupby.html) is really good.

In this case, `berri_bikes.groupby('weekday').aggregate(sum)` means "Group the rows by weekday and then add up all the values with the same weekday".

In [28]:
p = df1.groupby(['jour semaine'], sort=False).sum()
p

Unnamed: 0_level_0,Berri 1
jour semaine,Unnamed: 1_level_1
Sunday,99310
Monday,134298
Tuesday,135305
Wednesday,152972
Thursday,160131
Friday,141771
Saturday,101578


It's hard to remember what 0, 1, 2, 3, 4, 5, 6 mean, so we can fix it up and graph it:

So it looks like Montrealers are commuter cyclists -- they bike much more during the week. Neat!

# 4.3 Putting it together

Let's put all that together, to prove how easy it is. 6 lines of magical pandas!

If you want to play around, try changing `sum` to `max`, `numpy.median`, or any other function you like.

In [35]:
m = df1.groupby(['jour semaine'], sort=False).mean()
m

Unnamed: 0_level_0,Berri 1
jour semaine,Unnamed: 1_level_1
Sunday,2206.888889
Monday,2984.4
Tuesday,3075.113636
Wednesday,3476.636364
Thursday,3639.340909
Friday,3222.068182
Saturday,2308.590909


In [36]:
mn = df1.groupby(['jour semaine'], sort=False).min()
mn

Unnamed: 0_level_0,Berri 1
jour semaine,Unnamed: 1_level_1
Sunday,35
Monday,83
Tuesday,135
Wednesday,138
Thursday,92
Friday,75
Saturday,32
