You need to group individual rows by time periods
Use resample to group rows by chunks of time

In [35]:
# Load libraries
import pandas as pd
import numpy as np

Use resample to group rows by chunks of time

In [36]:
# Create date range
time_index=pd.date_range('06/06/2017', periods=100000, freq='30S')

In [37]:
# Create DataFrame
dataframe = pd.DataFrame(index=time_index)

In [38]:
dataframe.head()

2017-06-06 00:00:00
2017-06-06 00:00:30
2017-06-06 00:01:00
2017-06-06 00:01:30
2017-06-06 00:02:00


In [39]:
# Create column of random values
dataframe['Sale_Amount'] = np.random.randint(1, 10, 100000)

In [40]:
dataframe.head(2)

Unnamed: 0,Sale_Amount
2017-06-06 00:00:00,5
2017-06-06 00:00:30,5


In [41]:
# Group rows by week, calculate sum per week
dataframe.resample('W').sum()

Unnamed: 0,Sale_Amount
2017-06-11,86908
2017-06-18,101083
2017-06-25,100093
2017-07-02,100440
2017-07-09,101119
2017-07-16,10371


Our standard Titanic dataset does not contain a datetime column, so for this
recipe we have generated a simple DataFrame where each row is an individual
sale. For each sale we know its date and time and its dollar amount (this data
isn’t realistic because every sale takes place precisely 30 seconds apart and is an
exact dollar amount, but for the sake of simplicity let us pretend).

Notice that the date and time of each sale is the index of the DataFrame; this is
because resample requires the index to be datetime-like values.

Using resample we can group the rows by a wide array of time periods (offsets)
and then we can calculate some statistic on each time group

In [42]:
# Group by two weeks, calculate mean
dataframe.resample('2W').mean()


Unnamed: 0,Sale_Amount
2017-06-11,5.029398
2017-06-25,4.989484
2017-07-09,4.998983
2017-07-23,4.986058


In [43]:
# Group by month, count rows
dataframe.resample('M').count()

Unnamed: 0,Sale_Amount
2017-06-30,72000
2017-07-31,28000


You might notice that in the two outputs the datetime index is a date despite the
fact that we are grouping by weeks and months, respectively. The reason is
because by default resample returns the label of the right “edge” (the last label)
of the time group. We can control this behavior using the label parameter

In [45]:
# Group by month, count rows
dataframe.resample('W', label='left').count()

Unnamed: 0,Sale_Amount
2017-06-04,17280
2017-06-11,20160
2017-06-18,20160
2017-06-25,20160
2017-07-02,20160
2017-07-09,2080
