#### Pandas Tutorial - Part 32

This notebook covers time-related functionality in pandas, focusing on:
- Sparse resampling
- Aggregation with resampling
- Timedeltas and their operations

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
from functools import partial
from pandas.tseries.frequencies import to_offset

%matplotlib inline

##### Upsampling (Continued from Part 31)

For upsampling, you can specify a way to upsample and the limit parameter to interpolate over the gaps that are created.

In [None]:
# Create a time series
rng = pd.date_range('1/1/2012', periods=100, freq='S')
ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
ts.head()

In [None]:
# Upsampling from secondly to every 250 milliseconds with forward fill
ts[:2].resample('250L').ffill()

In [None]:
# Upsampling with forward fill and a limit
ts[:2].resample('250L').ffill(limit=2)

##### Sparse Resampling

Sparse timeseries are the ones where you have a lot fewer points relative to the amount of time you are looking to resample. Naively upsampling a sparse series can potentially generate lots of intermediate values. 

Since resample is a time-based groupby, we can efficiently resample only the groups that are not all NaN.

In [None]:
# Create a sparse time series
rng = pd.date_range('2014-1-1', periods=100, freq='D') + pd.Timedelta('1s')
ts = pd.Series(range(100), index=rng)
ts.head()

In [None]:
# Resampling to the full range of the series (generates many intermediate values)
ts.resample('3T').sum().head()

In [None]:
# Define a function to round timestamps to a specific frequency
def round(t, freq):
    freq = to_offset(freq)
    return pd.Timestamp((t.value // freq.delta.value) * freq.delta.value)

# Only resample those groups where we have points
ts.groupby(partial(round, freq='3T')).sum()

##### Aggregation with Resampling

Similar to the aggregating API, groupby API, and the window functions API, a Resampler can be selectively resampled.

In [None]:
# Create a DataFrame with multiple columns
df = pd.DataFrame(np.random.randn(1000, 3),
                  index=pd.date_range('1/1/2012', freq='S', periods=1000),
                  columns=['A', 'B', 'C'])
df.head()

In [None]:
# Resample and apply mean to all columns
r = df.resample('3T')
r.mean()

In [None]:
# Select a specific column for resampling
r['A'].mean()

In [None]:
# Select multiple columns for resampling
r[['A', 'B']].mean()

##### Timedeltas

Timedeltas are differences in times, expressed in different units like days, hours, minutes, seconds. They can be represented using various formats.

### Creating Timedeltas

There are multiple ways to create Timedeltas:

In [None]:
# Using strings
pd.Timedelta('1 days')

In [None]:
# Using keyword arguments
pd.Timedelta(days=1, seconds=1)

In [None]:
# From a datetime.timedelta object
pd.Timedelta(datetime.timedelta(days=1, seconds=1))

In [None]:
# Constructing a negative Timedelta
pd.Timedelta('-1 days')

In [None]:
# Timedeltas can be added
pd.Timedelta('2 days') + pd.Timedelta('00:00:02') + pd.Timedelta('00:00:00.000123')

### to_timedelta Function

Using the top-level `pd.to_timedelta`, you can convert a scalar, array, list, or Series from a recognized timedelta format/value into a Timedelta type.

In [None]:
# Parse a single string to a Timedelta
pd.to_timedelta('1 days 06:05:01.00003')

In [None]:
# Parse microseconds
pd.to_timedelta('15.5us')

In [None]:
# Parse a list/array of strings
pd.to_timedelta(['1 days 06:05:01.00003', '15.5us', 'nan'])

In [None]:
# Using the unit keyword argument to specify the unit of the Timedelta
pd.to_timedelta(np.arange(5), unit='s')  # seconds

In [None]:
pd.to_timedelta(np.arange(5), unit='d')  # days

### Timedelta Limitations

Pandas represents Timedeltas in nanosecond resolution using 64 bit integers. As such, the 64 bit integer limits determine the Timedelta limits.

In [None]:
# Minimum Timedelta value
pd.Timedelta.min

In [None]:
# Maximum Timedelta value
pd.Timedelta.max

##### Operations with Timedeltas

You can operate on Series/DataFrames and construct timedelta64[ns] Series through subtraction operations on datetime64[ns] Series, or Timestamps.

In [None]:
# Create a Series of dates
s = pd.Series(pd.date_range('2012-1-1', periods=3, freq='D'))

# Create a Series of timedeltas
td = pd.Series([pd.Timedelta(days=i) for i in range(3)])

# Create a DataFrame with both
df = pd.DataFrame({'A': s, 'B': td})
df

In [None]:
# Add a datetime and a timedelta
df['C'] = df['A'] + df['B']
df

In [None]:
# Check the data types
df.dtypes

In [None]:
# Subtract dates to get timedeltas
s - s.max()

In [None]:
# Subtract a datetime from a Series of dates
s - datetime.datetime(2011, 1, 1, 3, 5)

In [None]:
# Add a timedelta to a Series of dates
s + datetime.timedelta(minutes=5)

In [None]:
# Add a pandas offset to a Series of dates
s + pd.offsets.Minute(5)

In [None]:
# Add multiple offsets
s + pd.offsets.Minute(5) + pd.offsets.Milli(5)

In [None]:
# Operations with scalars from a timedelta64[ns] series
y = s - s[0]
y

In [None]:
# Series of timedeltas with NaT values
y = s - s.shift()
y

##### Conclusion

In this notebook, we've explored:

1. Upsampling with forward fill and limits
2. Sparse resampling techniques
3. Aggregation with resampling for DataFrames
4. Creating and working with Timedeltas
5. Operations with Timedeltas and datetime objects

These features provide powerful tools for time series analysis and manipulation in pandas.