#### Pandas Tutorial - Part 31

This notebook covers advanced time series functionality in pandas, focusing on:
- DatetimeIndex partial string indexing
- Slicing with date strings in DataFrames
- Slicing with date strings in MultiIndex DataFrames
- Slicing with UTC offset
- Time resampling with various parameters

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

##### DatetimeIndex Partial String Indexing

Pandas allows you to slice time series data using partial string indexing, which is a powerful and convenient way to select date ranges.

In [None]:
# Create a DataFrame with DatetimeIndex
rng = pd.date_range('2013-01-01', periods=84960, freq='1min')
dft = pd.DataFrame({'A': np.random.randn(len(rng))}, index=rng)
dft.head()

### Slicing by Date Range

You can slice using partial string notation to select ranges of dates.

In [None]:
# This specifies a stop time that includes all of the times on the last day
dft['2013-1':'2013-2-28']

In [None]:
# This specifies an exact stop time (and is not the same as the above)
dft['2013-1':'2013-2-28 00:00:00']

In [None]:
# We are stopping on the included end-point as it is part of the index
dft['2013-1-15':'2013-1-15 12:30:00']

##### DatetimeIndex with MultiIndex

DatetimeIndex partial string indexing also works on a DataFrame with a MultiIndex.

In [None]:
# Create a DataFrame with a MultiIndex
dft2 = pd.DataFrame(np.random.randn(20, 1),
                    columns=['A'],
                    index=pd.MultiIndex.from_product(
                        [pd.date_range('20130101', periods=10, freq='12H'),
                         ['a', 'b']]))
dft2

In [None]:
# Selecting a specific date
dft2.loc['2013-01-05']

In [None]:
# Using IndexSlice for more complex selections
idx = pd.IndexSlice
dft2 = dft2.swaplevel(0, 1).sort_index()
dft2.loc[idx[:, '2013-01-05'], :]

##### Slicing with UTC Offset

Slicing with string indexing also honors UTC offset (new in pandas 0.25.0).

In [None]:
# Create a DataFrame with timezone information
df = pd.DataFrame([0], index=pd.DatetimeIndex(['2019-01-01'], tz='US/Pacific'))
df

In [None]:
# Slicing with timezone-aware timestamps
df['2019-01-01 12:00:00+04:00':'2019-01-01 13:00:00+04:00']

##### Time Resampling

Resampling is a convenient way to convert time series data from one frequency to another. It can be used to:
- Downsample: reduce the frequency (e.g., from seconds to minutes)
- Upsample: increase the frequency (e.g., from days to hours)

In [None]:
# Create a time series
rng = pd.date_range('1/1/2012', periods=100, freq='S')
ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
ts.head()

### Basic Resampling Operations

In [None]:
# Resample to 5-minute frequency and compute the mean
ts.resample('5Min').mean()

In [None]:
# Resample to 5-minute frequency and compute open, high, low, close (OHLC)
ts.resample('5Min').ohlc()

In [None]:
# Resample to 5-minute frequency and compute the max
ts.resample('5Min').max()

### Downsampling with 'closed' Parameter

For downsampling, the 'closed' parameter can be set to 'left' or 'right' to specify which end of the interval is closed.

In [None]:
# Resample with closed='right'
ts.resample('5Min', closed='right').mean()

In [None]:
# Resample with closed='left' (default)
ts.resample('5Min', closed='left').mean()

### Using 'label' and 'loffset' Parameters

Parameters like 'label' and 'loffset' are used to manipulate the resulting labels:
- 'label' specifies whether the result is labeled with the beginning or the end of the interval
- 'loffset' performs a time adjustment on the output labels

In [None]:
# Default: label='left'
ts.resample('5Min').mean()

In [None]:
# Explicitly setting label='left'
ts.resample('5Min', label='left').mean()

In [None]:
# Using loffset to adjust the output labels
ts.resample('5Min', label='left', loffset='1s').mean()

### Warning about Default Values

The default values for 'label' and 'closed' is 'left' for all frequency offsets except for 'M', 'A', 'Q', 'BM', 'BA', 'BQ', and 'W' which all have a default of 'right'. This might unintendedly lead to looking ahead.

In [None]:
# Example showing how values can be pulled back
s = pd.date_range('2000-01-01', '2000-01-05').to_series()
s.iloc[2] = pd.NaT
s.dt.day_name()

In [None]:
# Default: label='left', closed='left'
s.resample('B').last().dt.day_name()

In [None]:
# Using label='right', closed='right' to push Sunday to Monday
s.resample('B', label='right', closed='right').last().dt.day_name()

##### Upsampling

For upsampling, you can specify a way to upsample and the limit parameter to interpolate over the gaps that are created.

In [None]:
# Upsampling from secondly to every 250 milliseconds with asfreq (no interpolation)
ts[:2].resample('250L').asfreq()

In [None]:
# Upsampling with forward fill (ffill) to fill the gaps
ts[:2].resample('250L').ffill()

##### Conclusion

In this notebook, we've explored advanced time series functionality in pandas, including:

1. DatetimeIndex partial string indexing for convenient date range selection
2. Working with DatetimeIndex in MultiIndex DataFrames
3. Slicing with timezone-aware timestamps
4. Time resampling with various parameters and options
5. Downsampling with the 'closed' parameter
6. Using 'label' and 'loffset' parameters
7. Upsampling with different interpolation methods

These features make pandas a powerful tool for time series analysis and manipulation.