#### Pandas Tutorial - Part 32

This notebook covers time-related functionality in pandas, focusing on:
- Sparse resampling
- Aggregation with resampling
- Timedeltas and their operations

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
from functools import partial
from pandas.tseries.frequencies import to_offset

%matplotlib inline

##### Upsampling (Continued from Part 31)

For upsampling, you can specify a way to upsample and the limit parameter to interpolate over the gaps that are created.

In [2]:
# Create a time series
rng = pd.date_range('1/1/2012', periods=100, freq='S')
ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
ts.head()

  rng = pd.date_range('1/1/2012', periods=100, freq='S')


2012-01-01 00:00:00    449
2012-01-01 00:00:01     27
2012-01-01 00:00:02    284
2012-01-01 00:00:03    135
2012-01-01 00:00:04    274
Freq: s, dtype: int64

In [3]:
# Upsampling from secondly to every 250 milliseconds with forward fill
ts[:2].resample('250L').ffill()

  ts[:2].resample('250L').ffill()


2012-01-01 00:00:00.000    449
2012-01-01 00:00:00.250    449
2012-01-01 00:00:00.500    449
2012-01-01 00:00:00.750    449
2012-01-01 00:00:01.000     27
Freq: 250ms, dtype: int64

In [4]:
# Upsampling with forward fill and a limit
ts[:2].resample('250L').ffill(limit=2)

  ts[:2].resample('250L').ffill(limit=2)


2012-01-01 00:00:00.000    449.0
2012-01-01 00:00:00.250    449.0
2012-01-01 00:00:00.500    449.0
2012-01-01 00:00:00.750      NaN
2012-01-01 00:00:01.000     27.0
Freq: 250ms, dtype: float64

##### Sparse Resampling

Sparse timeseries are the ones where you have a lot fewer points relative to the amount of time you are looking to resample. Naively upsampling a sparse series can potentially generate lots of intermediate values. 

Since resample is a time-based groupby, we can efficiently resample only the groups that are not all NaN.

In [5]:
# Create a sparse time series
rng = pd.date_range('2014-1-1', periods=100, freq='D') + pd.Timedelta('1s')
ts = pd.Series(range(100), index=rng)
ts.head()

2014-01-01 00:00:01    0
2014-01-02 00:00:01    1
2014-01-03 00:00:01    2
2014-01-04 00:00:01    3
2014-01-05 00:00:01    4
Freq: D, dtype: int64

In [6]:
# Resampling to the full range of the series (generates many intermediate values)
ts.resample('3T').sum().head()

  ts.resample('3T').sum().head()


2014-01-01 00:00:00    0
2014-01-01 00:03:00    0
2014-01-01 00:06:00    0
2014-01-01 00:09:00    0
2014-01-01 00:12:00    0
Freq: 3min, dtype: int64

In [7]:
# Define a function to round timestamps to a specific frequency
def round(t, freq):
    freq = to_offset(freq)
    return pd.Timestamp((t.value // freq.delta.value) * freq.delta.value)

# Only resample those groups where we have points
ts.groupby(partial(round, freq='3T')).sum()

  freq = to_offset(freq)
  return pd.Timestamp((t.value // freq.delta.value) * freq.delta.value)


2014-01-01     0
2014-01-02     1
2014-01-03     2
2014-01-04     3
2014-01-05     4
              ..
2014-04-06    95
2014-04-07    96
2014-04-08    97
2014-04-09    98
2014-04-10    99
Length: 100, dtype: int64

##### Aggregation with Resampling

Similar to the aggregating API, groupby API, and the window functions API, a Resampler can be selectively resampled.

In [8]:
# Create a DataFrame with multiple columns
df = pd.DataFrame(np.random.randn(1000, 3),
                  index=pd.date_range('1/1/2012', freq='S', periods=1000),
                  columns=['A', 'B', 'C'])
df.head()

  index=pd.date_range('1/1/2012', freq='S', periods=1000),


Unnamed: 0,A,B,C
2012-01-01 00:00:00,-0.281079,-0.34462,0.070935
2012-01-01 00:00:01,-1.256503,-0.619734,0.473987
2012-01-01 00:00:02,0.184864,-1.291176,-0.22956
2012-01-01 00:00:03,-0.531484,0.682944,-1.229333
2012-01-01 00:00:04,0.653548,-1.145591,-1.71269


In [9]:
# Resample and apply mean to all columns
r = df.resample('3T')
r.mean()

  r = df.resample('3T')


Unnamed: 0,A,B,C
2012-01-01 00:00:00,0.031888,0.028999,-0.152893
2012-01-01 00:03:00,-0.08815,0.001123,-0.011401
2012-01-01 00:06:00,0.150925,0.005198,0.000454
2012-01-01 00:09:00,-0.026687,0.082603,0.006907
2012-01-01 00:12:00,-0.023432,0.129062,0.002456
2012-01-01 00:15:00,0.08931,0.102311,-0.154284


In [10]:
# Select a specific column for resampling
r['A'].mean()

2012-01-01 00:00:00    0.031888
2012-01-01 00:03:00   -0.088150
2012-01-01 00:06:00    0.150925
2012-01-01 00:09:00   -0.026687
2012-01-01 00:12:00   -0.023432
2012-01-01 00:15:00    0.089310
Freq: 3min, Name: A, dtype: float64

In [11]:
# Select multiple columns for resampling
r[['A', 'B']].mean()

Unnamed: 0,A,B
2012-01-01 00:00:00,0.031888,0.028999
2012-01-01 00:03:00,-0.08815,0.001123
2012-01-01 00:06:00,0.150925,0.005198
2012-01-01 00:09:00,-0.026687,0.082603
2012-01-01 00:12:00,-0.023432,0.129062
2012-01-01 00:15:00,0.08931,0.102311


##### Timedeltas

Timedeltas are differences in times, expressed in different units like days, hours, minutes, seconds. They can be represented using various formats.

### Creating Timedeltas

There are multiple ways to create Timedeltas:

In [12]:
# Using strings
pd.Timedelta('1 days')

Timedelta('1 days 00:00:00')

In [13]:
# Using keyword arguments
pd.Timedelta(days=1, seconds=1)

Timedelta('1 days 00:00:01')

In [14]:
# From a datetime.timedelta object
pd.Timedelta(datetime.timedelta(days=1, seconds=1))

Timedelta('1 days 00:00:01')

In [15]:
# Constructing a negative Timedelta
pd.Timedelta('-1 days')

Timedelta('-1 days +00:00:00')

In [16]:
# Timedeltas can be added
pd.Timedelta('2 days') + pd.Timedelta('00:00:02') + pd.Timedelta('00:00:00.000123')

Timedelta('2 days 00:00:02.000123')

### to_timedelta Function

Using the top-level `pd.to_timedelta`, you can convert a scalar, array, list, or Series from a recognized timedelta format/value into a Timedelta type.

In [17]:
# Parse a single string to a Timedelta
pd.to_timedelta('1 days 06:05:01.00003')

Timedelta('1 days 06:05:01.000030')

In [18]:
# Parse microseconds
pd.to_timedelta('15.5us')

Timedelta('0 days 00:00:00.000015500')

In [19]:
# Parse a list/array of strings
pd.to_timedelta(['1 days 06:05:01.00003', '15.5us', 'nan'])

TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015500', NaT], dtype='timedelta64[ns]', freq=None)

In [20]:
# Using the unit keyword argument to specify the unit of the Timedelta
pd.to_timedelta(np.arange(5), unit='s')  # seconds

TimedeltaIndex(['0 days 00:00:00', '0 days 00:00:01', '0 days 00:00:02',
                '0 days 00:00:03', '0 days 00:00:04'],
               dtype='timedelta64[ns]', freq=None)

In [21]:
pd.to_timedelta(np.arange(5), unit='d')  # days

TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)

### Timedelta Limitations

Pandas represents Timedeltas in nanosecond resolution using 64 bit integers. As such, the 64 bit integer limits determine the Timedelta limits.

In [22]:
# Minimum Timedelta value
pd.Timedelta.min

Timedelta('-106752 days +00:12:43.145224193')

In [23]:
# Maximum Timedelta value
pd.Timedelta.max

Timedelta('106751 days 23:47:16.854775807')

##### Operations with Timedeltas

You can operate on Series/DataFrames and construct timedelta64[ns] Series through subtraction operations on datetime64[ns] Series, or Timestamps.

In [24]:
# Create a Series of dates
s = pd.Series(pd.date_range('2012-1-1', periods=3, freq='D'))

# Create a Series of timedeltas
td = pd.Series([pd.Timedelta(days=i) for i in range(3)])

# Create a DataFrame with both
df = pd.DataFrame({'A': s, 'B': td})
df

Unnamed: 0,A,B
0,2012-01-01,0 days
1,2012-01-02,1 days
2,2012-01-03,2 days


In [25]:
# Add a datetime and a timedelta
df['C'] = df['A'] + df['B']
df

Unnamed: 0,A,B,C
0,2012-01-01,0 days,2012-01-01
1,2012-01-02,1 days,2012-01-03
2,2012-01-03,2 days,2012-01-05


In [26]:
# Check the data types
df.dtypes

A     datetime64[ns]
B    timedelta64[ns]
C     datetime64[ns]
dtype: object

In [27]:
# Subtract dates to get timedeltas
s - s.max()

0   -2 days
1   -1 days
2    0 days
dtype: timedelta64[ns]

In [28]:
# Subtract a datetime from a Series of dates
s - datetime.datetime(2011, 1, 1, 3, 5)

0   364 days 20:55:00
1   365 days 20:55:00
2   366 days 20:55:00
dtype: timedelta64[ns]

In [29]:
# Add a timedelta to a Series of dates
s + datetime.timedelta(minutes=5)

0   2012-01-01 00:05:00
1   2012-01-02 00:05:00
2   2012-01-03 00:05:00
dtype: datetime64[ns]

In [30]:
# Add a pandas offset to a Series of dates
s + pd.offsets.Minute(5)

0   2012-01-01 00:05:00
1   2012-01-02 00:05:00
2   2012-01-03 00:05:00
dtype: datetime64[ns]

In [31]:
# Add multiple offsets
s + pd.offsets.Minute(5) + pd.offsets.Milli(5)

0   2012-01-01 00:05:00.005
1   2012-01-02 00:05:00.005
2   2012-01-03 00:05:00.005
dtype: datetime64[ns]

In [32]:
# Operations with scalars from a timedelta64[ns] series
y = s - s[0]
y

0   0 days
1   1 days
2   2 days
dtype: timedelta64[ns]

In [33]:
# Series of timedeltas with NaT values
y = s - s.shift()
y

0      NaT
1   1 days
2   1 days
dtype: timedelta64[ns]

##### Conclusion

In this notebook, we've explored:

1. Upsampling with forward fill and limits
2. Sparse resampling techniques
3. Aggregation with resampling for DataFrames
4. Creating and working with Timedeltas
5. Operations with Timedeltas and datetime objects

These features provide powerful tools for time series analysis and manipulation in pandas.