In [1]:
import pandas as pd
import numpy as np

In [2]:
rng = pd.date_range('1/1/2011', periods = 72, freq = 'H')
ts = pd.Series(list(range(len(rng))), index = rng)

In [3]:
ts.head()

2011-01-01 00:00:00    0
2011-01-01 01:00:00    1
2011-01-01 02:00:00    2
2011-01-01 03:00:00    3
2011-01-01 04:00:00    4
Freq: H, dtype: int64

In [4]:
converted = ts.asfreq('45Min', method = 'ffill')

### What does the above code do to the size and content of your data frame?

In [5]:
ts.shape

(72,)

In [6]:
converted.shape

(95,)

In [7]:
converted[1:10]

2011-01-01 00:45:00    0
2011-01-01 01:30:00    1
2011-01-01 02:15:00    2
2011-01-01 03:00:00    3
2011-01-01 03:45:00    3
2011-01-01 04:30:00    4
2011-01-01 05:15:00    5
2011-01-01 06:00:00    6
2011-01-01 06:45:00    6
Freq: 45T, dtype: int64

In [8]:
ts.asfreq('45Min')[1:10]

2011-01-01 00:45:00    NaN
2011-01-01 01:30:00    NaN
2011-01-01 02:15:00    NaN
2011-01-01 03:00:00    3.0
2011-01-01 03:45:00    NaN
2011-01-01 04:30:00    NaN
2011-01-01 05:15:00    NaN
2011-01-01 06:00:00    6.0
2011-01-01 06:45:00    NaN
Freq: 45T, dtype: float64

In [9]:
ts[1:10]

2011-01-01 01:00:00    1
2011-01-01 02:00:00    2
2011-01-01 03:00:00    3
2011-01-01 04:00:00    4
2011-01-01 05:00:00    5
2011-01-01 06:00:00    6
2011-01-01 07:00:00    7
2011-01-01 08:00:00    8
2011-01-01 09:00:00    9
Freq: H, dtype: int64

### Take a look at the specs for .asfreq(). What are your options for filling in missing data?
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.asfreq.html

In [10]:
bfill = ts.asfreq('45Min', method = 'bfill')

In [11]:
bfill[1:10]

2011-01-01 00:45:00    1
2011-01-01 01:30:00    2
2011-01-01 02:15:00    3
2011-01-01 03:00:00    3
2011-01-01 03:45:00    4
2011-01-01 04:30:00    5
2011-01-01 05:15:00    6
2011-01-01 06:00:00    6
2011-01-01 06:45:00    7
Freq: 45T, dtype: int64

### How can you go to less frequent rather than more frequent?

In [12]:
converted = ts.asfreq('3H')

In [13]:
converted[1:10]

2011-01-01 03:00:00     3
2011-01-01 06:00:00     6
2011-01-01 09:00:00     9
2011-01-01 12:00:00    12
2011-01-01 15:00:00    15
2011-01-01 18:00:00    18
2011-01-01 21:00:00    21
2011-01-02 00:00:00    24
2011-01-02 03:00:00    27
Freq: 3H, dtype: int64

In [14]:
ts[1:10]

2011-01-01 01:00:00    1
2011-01-01 02:00:00    2
2011-01-01 03:00:00    3
2011-01-01 04:00:00    4
2011-01-01 05:00:00    5
2011-01-01 06:00:00    6
2011-01-01 07:00:00    7
2011-01-01 08:00:00    8
2011-01-01 09:00:00    9
Freq: H, dtype: int64

### Resamping
.resample() is a time-based groupby, followed by a reduction method on each of its groups. The resample function is very flexible and allows you to specify many different parameters to control the frequency conversion and resampling operation. The how parameter can be a function name or numpy array function that takes an array and produces aggregated values:

In [15]:
# Let's try the more flexible .resample()
ts.resample('2H').mean()[1:10]

2011-01-01 02:00:00     2.5
2011-01-01 04:00:00     4.5
2011-01-01 06:00:00     6.5
2011-01-01 08:00:00     8.5
2011-01-01 10:00:00    10.5
2011-01-01 12:00:00    12.5
2011-01-01 14:00:00    14.5
2011-01-01 16:00:00    16.5
2011-01-01 18:00:00    18.5
Freq: 2H, dtype: float64

In [16]:
ts.resample('2H').sum()[1:10]

2011-01-01 02:00:00     5
2011-01-01 04:00:00     9
2011-01-01 06:00:00    13
2011-01-01 08:00:00    17
2011-01-01 10:00:00    21
2011-01-01 12:00:00    25
2011-01-01 14:00:00    29
2011-01-01 16:00:00    33
2011-01-01 18:00:00    37
Freq: 2H, dtype: int64

In [17]:
# What's particularly useful is that we can use reample to even out irregular time series
irreg_ts = ts[list(np.random.choice(a = list(range(len(ts))), size = 10, replace = False))]

In [18]:
irreg_ts

2011-01-03 12:00:00    60
2011-01-03 14:00:00    62
2011-01-02 03:00:00    27
2011-01-02 23:00:00    47
2011-01-01 19:00:00    19
2011-01-03 19:00:00    67
2011-01-01 17:00:00    17
2011-01-03 02:00:00    50
2011-01-01 20:00:00    20
2011-01-01 09:00:00     9
dtype: int64

In [19]:
irreg_ts.asfreq('D')

Series([], Freq: D, dtype: int64)

### Why didn't that work?
The time series is not ordered! 

In [20]:
irreg_ts = irreg_ts.sort_index()
irreg_ts

2011-01-01 09:00:00     9
2011-01-01 17:00:00    17
2011-01-01 19:00:00    19
2011-01-01 20:00:00    20
2011-01-02 03:00:00    27
2011-01-02 23:00:00    47
2011-01-03 02:00:00    50
2011-01-03 12:00:00    60
2011-01-03 14:00:00    62
2011-01-03 19:00:00    67
dtype: int64

In [21]:
irreg_ts.asfreq('D')

2011-01-01 09:00:00    9.0
2011-01-02 09:00:00    NaN
2011-01-03 09:00:00    NaN
Freq: D, dtype: float64

In [22]:
irreg_ts.resample('D').count()

2011-01-01    4
2011-01-02    2
2011-01-03    4
Freq: D, dtype: int64

In [23]:
irreg_ts.resample('D').mean()

2011-01-01    16.25
2011-01-02    37.00
2011-01-03    59.75
Freq: D, dtype: float64

In [24]:
irreg_ts.resample('D').var()

2011-01-01     24.916667
2011-01-02    200.000000
2011-01-03     50.916667
Freq: D, dtype: float64

# Try

(1) What if you want to go to a higher frequency, but you don't want to back fill or forward fill? Why might you want to do that?

**Answer: We can set the method to be None or leave it unspecified. Then timestamps that do not have a value will be NaN.**

(2) What is the difference between .ressample() and .asfreq()?

**Answer: asfreq() is limited, while resample() is an aggregation function that allows you to apply different operators on the values.**

(3) How can I forward-fill only a few days? (hint: .fillna())

**Answer: See below**

(4) What are some helpful functions to use with a Resampler object?

**Answer: mean(), var(), count(), sum(), pad(), ffill()**

In [25]:
irreg_ts

2011-01-01 09:00:00     9
2011-01-01 17:00:00    17
2011-01-01 19:00:00    19
2011-01-01 20:00:00    20
2011-01-02 03:00:00    27
2011-01-02 23:00:00    47
2011-01-03 02:00:00    50
2011-01-03 12:00:00    60
2011-01-03 14:00:00    62
2011-01-03 19:00:00    67
dtype: int64

In [26]:
irreg_ts.resample('6H').fillna(method='ffill', limit=1)

2011-01-01 06:00:00     NaN
2011-01-01 12:00:00     9.0
2011-01-01 18:00:00    17.0
2011-01-02 00:00:00    20.0
2011-01-02 06:00:00    27.0
2011-01-02 12:00:00     NaN
2011-01-02 18:00:00     NaN
2011-01-03 00:00:00    47.0
2011-01-03 06:00:00    50.0
2011-01-03 12:00:00    60.0
2011-01-03 18:00:00    62.0
Freq: 6H, dtype: float64

In [27]:
irreg_ts.resample('6H').fillna(method='ffill', limit=2)

2011-01-01 06:00:00     NaN
2011-01-01 12:00:00     9.0
2011-01-01 18:00:00    17.0
2011-01-02 00:00:00    20.0
2011-01-02 06:00:00    27.0
2011-01-02 12:00:00    27.0
2011-01-02 18:00:00     NaN
2011-01-03 00:00:00    47.0
2011-01-03 06:00:00    50.0
2011-01-03 12:00:00    60.0
2011-01-03 18:00:00    62.0
Freq: 6H, dtype: float64