#### Pandas Tutorial - Part 31

This notebook covers advanced time series functionality in pandas, focusing on:
- DatetimeIndex partial string indexing
- Slicing with date strings in DataFrames
- Slicing with date strings in MultiIndex DataFrames
- Slicing with UTC offset
- Time resampling with various parameters

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

##### DatetimeIndex Partial String Indexing

Pandas allows you to slice time series data using partial string indexing, which is a powerful and convenient way to select date ranges.

In [2]:
# Create a DataFrame with DatetimeIndex
rng = pd.date_range('2013-01-01', periods=84960, freq='1min')
dft = pd.DataFrame({'A': np.random.randn(len(rng))}, index=rng)
dft.head()

Unnamed: 0,A
2013-01-01 00:00:00,0.077939
2013-01-01 00:01:00,-0.849062
2013-01-01 00:02:00,0.473848
2013-01-01 00:03:00,0.36029
2013-01-01 00:04:00,1.164333


### Slicing by Date Range

You can slice using partial string notation to select ranges of dates.

In [3]:
# This specifies a stop time that includes all of the times on the last day
dft['2013-1':'2013-2-28']

Unnamed: 0,A
2013-01-01 00:00:00,0.077939
2013-01-01 00:01:00,-0.849062
2013-01-01 00:02:00,0.473848
2013-01-01 00:03:00,0.360290
2013-01-01 00:04:00,1.164333
...,...
2013-02-28 23:55:00,0.368791
2013-02-28 23:56:00,-0.704480
2013-02-28 23:57:00,-0.187321
2013-02-28 23:58:00,-1.404094


In [4]:
# This specifies an exact stop time (and is not the same as the above)
dft['2013-1':'2013-2-28 00:00:00']

Unnamed: 0,A
2013-01-01 00:00:00,0.077939
2013-01-01 00:01:00,-0.849062
2013-01-01 00:02:00,0.473848
2013-01-01 00:03:00,0.360290
2013-01-01 00:04:00,1.164333
...,...
2013-02-27 23:56:00,2.149572
2013-02-27 23:57:00,-0.451749
2013-02-27 23:58:00,-1.497939
2013-02-27 23:59:00,1.037084


In [5]:
# We are stopping on the included end-point as it is part of the index
dft['2013-1-15':'2013-1-15 12:30:00']

Unnamed: 0,A
2013-01-15 00:00:00,-0.768540
2013-01-15 00:01:00,0.286499
2013-01-15 00:02:00,-1.342423
2013-01-15 00:03:00,-0.217816
2013-01-15 00:04:00,0.545183
...,...
2013-01-15 12:26:00,-0.859744
2013-01-15 12:27:00,0.180850
2013-01-15 12:28:00,0.269518
2013-01-15 12:29:00,-0.445406


##### DatetimeIndex with MultiIndex

DatetimeIndex partial string indexing also works on a DataFrame with a MultiIndex.

In [6]:
# Create a DataFrame with a MultiIndex
dft2 = pd.DataFrame(np.random.randn(20, 1),
                    columns=['A'],
                    index=pd.MultiIndex.from_product(
                        [pd.date_range('20130101', periods=10, freq='12H'),
                         ['a', 'b']]))
dft2

  [pd.date_range('20130101', periods=10, freq='12H'),


Unnamed: 0,Unnamed: 1,A
2013-01-01 00:00:00,a,-0.072053
2013-01-01 00:00:00,b,1.708024
2013-01-01 12:00:00,a,-0.026038
2013-01-01 12:00:00,b,0.005478
2013-01-02 00:00:00,a,0.068893
2013-01-02 00:00:00,b,1.125223
2013-01-02 12:00:00,a,1.169113
2013-01-02 12:00:00,b,-0.594967
2013-01-03 00:00:00,a,-0.273745
2013-01-03 00:00:00,b,3.100614


In [7]:
# Selecting a specific date
dft2.loc['2013-01-05']

Unnamed: 0,Unnamed: 1,A
2013-01-05 00:00:00,a,1.752398
2013-01-05 00:00:00,b,-0.631273
2013-01-05 12:00:00,a,-0.18806
2013-01-05 12:00:00,b,0.285072


In [8]:
# Using IndexSlice for more complex selections
idx = pd.IndexSlice
dft2 = dft2.swaplevel(0, 1).sort_index()
dft2.loc[idx[:, '2013-01-05'], :]

Unnamed: 0,Unnamed: 1,A
a,2013-01-05 00:00:00,1.752398
a,2013-01-05 12:00:00,-0.18806
b,2013-01-05 00:00:00,-0.631273
b,2013-01-05 12:00:00,0.285072


##### Slicing with UTC Offset

Slicing with string indexing also honors UTC offset (new in pandas 0.25.0).

In [9]:
# Create a DataFrame with timezone information
df = pd.DataFrame([0], index=pd.DatetimeIndex(['2019-01-01'], tz='US/Pacific'))
df

Unnamed: 0,0
2019-01-01 00:00:00-08:00,0


In [10]:
# Slicing with timezone-aware timestamps
df['2019-01-01 12:00:00+04:00':'2019-01-01 13:00:00+04:00']

Unnamed: 0,0
2019-01-01 00:00:00-08:00,0


##### Time Resampling

Resampling is a convenient way to convert time series data from one frequency to another. It can be used to:
- Downsample: reduce the frequency (e.g., from seconds to minutes)
- Upsample: increase the frequency (e.g., from days to hours)

In [11]:
# Create a time series
rng = pd.date_range('1/1/2012', periods=100, freq='S')
ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
ts.head()

  rng = pd.date_range('1/1/2012', periods=100, freq='S')


2012-01-01 00:00:00    244
2012-01-01 00:00:01    229
2012-01-01 00:00:02    435
2012-01-01 00:00:03     86
2012-01-01 00:00:04    349
Freq: s, dtype: int64

### Basic Resampling Operations

In [12]:
# Resample to 5-minute frequency and compute the mean
ts.resample('5Min').mean()

2012-01-01    256.35
Freq: 5min, dtype: float64

In [13]:
# Resample to 5-minute frequency and compute open, high, low, close (OHLC)
ts.resample('5Min').ohlc()

Unnamed: 0,open,high,low,close
2012-01-01,244,498,14,83


In [14]:
# Resample to 5-minute frequency and compute the max
ts.resample('5Min').max()

2012-01-01    498
Freq: 5min, dtype: int64

### Downsampling with 'closed' Parameter

For downsampling, the 'closed' parameter can be set to 'left' or 'right' to specify which end of the interval is closed.

In [15]:
# Resample with closed='right'
ts.resample('5Min', closed='right').mean()

2011-12-31 23:55:00    244.000000
2012-01-01 00:00:00    256.474747
Freq: 5min, dtype: float64

In [16]:
# Resample with closed='left' (default)
ts.resample('5Min', closed='left').mean()

2012-01-01    256.35
Freq: 5min, dtype: float64

### Using 'label' and 'loffset' Parameters

Parameters like 'label' and 'loffset' are used to manipulate the resulting labels:
- 'label' specifies whether the result is labeled with the beginning or the end of the interval
- 'loffset' performs a time adjustment on the output labels

In [17]:
# Default: label='left'
ts.resample('5Min').mean()

2012-01-01    256.35
Freq: 5min, dtype: float64

In [18]:
# Explicitly setting label='left'
ts.resample('5Min', label='left').mean()

2012-01-01    256.35
Freq: 5min, dtype: float64

In [20]:
# Using loffset to adjust the output labels
ts.resample('5Min', label='left', offset='1s').mean()

2011-12-31 23:55:01    244.000000
2012-01-01 00:00:01    256.474747
Freq: 5min, dtype: float64

### Warning about Default Values

The default values for 'label' and 'closed' is 'left' for all frequency offsets except for 'M', 'A', 'Q', 'BM', 'BA', 'BQ', and 'W' which all have a default of 'right'. This might unintendedly lead to looking ahead.

In [21]:
# Example showing how values can be pulled back
s = pd.date_range('2000-01-01', '2000-01-05').to_series()
s.iloc[2] = pd.NaT
s.dt.day_name()

2000-01-01     Saturday
2000-01-02       Sunday
2000-01-03          NaN
2000-01-04      Tuesday
2000-01-05    Wednesday
Freq: D, dtype: object

In [22]:
# Default: label='left', closed='left'
s.resample('B').last().dt.day_name()

1999-12-31       Sunday
2000-01-03          NaN
2000-01-04      Tuesday
2000-01-05    Wednesday
Freq: B, dtype: object

In [23]:
# Using label='right', closed='right' to push Sunday to Monday
s.resample('B', label='right', closed='right').last().dt.day_name()

2000-01-03       Sunday
2000-01-04      Tuesday
2000-01-05    Wednesday
2000-01-06          NaN
Freq: B, dtype: object

##### Upsampling

For upsampling, you can specify a way to upsample and the limit parameter to interpolate over the gaps that are created.

In [24]:
# Upsampling from secondly to every 250 milliseconds with asfreq (no interpolation)
ts[:2].resample('250L').asfreq()

  ts[:2].resample('250L').asfreq()


2012-01-01 00:00:00.000    244.0
2012-01-01 00:00:00.250      NaN
2012-01-01 00:00:00.500      NaN
2012-01-01 00:00:00.750      NaN
2012-01-01 00:00:01.000    229.0
Freq: 250ms, dtype: float64

In [25]:
# Upsampling with forward fill (ffill) to fill the gaps
ts[:2].resample('250L').ffill()

  ts[:2].resample('250L').ffill()


2012-01-01 00:00:00.000    244
2012-01-01 00:00:00.250    244
2012-01-01 00:00:00.500    244
2012-01-01 00:00:00.750    244
2012-01-01 00:00:01.000    229
Freq: 250ms, dtype: int64

##### Conclusion

In this notebook, we've explored advanced time series functionality in pandas, including:

1. DatetimeIndex partial string indexing for convenient date range selection
2. Working with DatetimeIndex in MultiIndex DataFrames
3. Slicing with timezone-aware timestamps
4. Time resampling with various parameters and options
5. Downsampling with the 'closed' parameter
6. Using 'label' and 'loffset' parameters
7. Upsampling with different interpolation methods

These features make pandas a powerful tool for time series analysis and manipulation.