# Timestamp, DatetimeIndex, Timedelta, TimedeltaIndex, Period, PeriodIndex

Timestamped data is the most basic type of time series data

```python
pd.Timestamp.today()

ts = pd.Timestamp('Sep 20, 2019 20:42:51')

ts.year, ts.month, ts.month_name(), ts.day, ts.day_name(), ts.hour, ts.minute, ts.second, ts.dayofweek, ts.dayofyear, ts.daysinmonth
(2019, 9, 'September', 20, 'Friday', 20, 42, 51, 4, 263, 30)
```

A collection of Timestamp objects is a DatetimeIndex.

```python
pd.date_range('2014-08', periods=5, freq='2M')
DatetimeIndex(['2014-08-31', '2014-10-31', '2014-12-31', '2015-02-28', '2015-04-30'],
              dtype='datetime64[ns]', freq='2M')
```

A difference between Timestamp objects is a Timedelta, and a collection of Timedelta objects is a TimedeltaIndex.

```python
ts = pd.date_range('2018-11-29 16:15:05', periods=4, freq='D')

ts - pd.Timestamp("2018-01-01")
TimedeltaIndex(['332 days 16:15:05', '333 days 16:15:05', '334 days 16:15:05', '335 days 16:15:05'],
               dtype='timedelta64[ns]', freq=None)

(ts - pd.Timestamp("2018-01-01")) / pd.Timedelta('5H')
Float64Index([1596.8502777777778, 1601.6502777777778, 1606.4502777777777, 1611.2502777777777],
             dtype='float64')
```


A Period represents a span of time. Arithmetic is not allowed between Period with different freq.

```python
p = pd.Period('2014-07-01 12:00', freq='2H')

p - 1                              # Period('2014-07-01 10:00', '2H')
p + pd.offsets.Hour(1)             # Period('2014-07-01 13:00', '2H')

p.asfreq('M')                      # Period('2014-07', 'M')
```

# Date offsets

* pd.DateOffset()
* pd.offsets.DateOffset()
* pd.offsets.BDay(), ...

```python
ts = pd.Timestamp('2020-5-15 15:30:00')
ts.day_name()                 # Friday

ts + pd.DateOffset(days=2)    # Timestamp('2020-05-17 15:30:00')
ts + pd.offsets.BDay(2)       # Timestamp('2020-05-19 15:30:00')
ts + 2*pd.offsets.BDay()      # Timestamp('2020-05-19 15:30:00')

ts + pd.offsets.BusinessHour(2, start='07:00')    # Timestamp('2020-05-18 07:30:00')
```

## rollforward(), rollback()

```python
offset = pd.offsets.BusinessHour(start='07:00')              # n=1, start='09:00', end='17:00' by default

offset.rollforward(pd.Timestamp('2020-5-15 22:30:00'))       # Timestamp('2020-05-18 07:00:00')
offset.rollback(pd.Timestamp('2020-5-15 00:30:00'))          # Timestamp('2020-05-14 17:00:00')
```

# Generating Date/Time objects

## pd.date_range(), pd.bdate_range(), pd.period_range(), pd.timedelta_range()

* The default frequency for date_range is a calendar day.
* The default frequency for data_brange is a business day.

```python
pd.date_range('2014-08', periods=5, freq='2M')
DatetimeIndex(['2014-08-31', '2014-10-31', '2014-12-31', '2015-02-28', '2015-04-30'],
              dtype='datetime64[ns]', freq='2M')

pd.period_range('2014-08', periods=5, freq='2M')
PeriodIndex(['2014-08', '2014-10', '2014-12', '2015-02', '2015-04'], dtype='period[2M]', freq='2M')
```

## pd.datetime.strptime(), pd.datetime.strftime()

* pd.datetime.strptime(): str -> datetime
* pd.datetime.strftime(dt, str_format) or dt.strftime(format): date/time object -> str

```python
pd.datetime.strptime('1901-03', '%Y-%m')             # datetime.datetime(1901, 3, 1, 0, 0)

# Consider the file test.csv:
# "Date", "Values"
# "1-01", 12.0
# "1-02", 8.2
# "1-03", 9.4
# ...

pd.read_csv('test.csv', index_col=0, parse_dates=True, 
            date_parser=lambda x: pd.datetime.strptime("200"+x, "%Y-%m"), squeeze=True)
Date
2001-01-01  12.0
2001-02-01   8.2
2001-03-01   9.4
...


pd.Timestamp('2020-03-15 18:30:00').strftime('%B %d, %Y, %r')   # 'March 15, 2020, 06:30:00 PM'
```

# Changing types, formats, values

## pd.to_datetime(), pd.to_timedelta()

```python
pd.to_datetime('2015-02-04')
Timestamp('2015-02-04 00:00:00')

pd.to_datetime(['2015-02-15','2016-04-25'])
DatetimeIndex(['2015-02-15', '2016-04-25'], dtype='datetime64[ns]', freq=None)

pd.to_datetime(pd.DataFrame({'year':[2015,2016], 'month':[2,4], 'day':[15,25]}))
0   2015-02-15
1   2016-04-25
dtype: datetime64[ns]
```

## to_period(), to_timestamp()

```python
rng = pd.date_range('1/1/2020', periods=3, freq='M')
s = pd.Series(np.random.randn(len(rng)).round(2), index=rng)
s
2020-01-31    0.04
2020-02-29    0.99
2020-03-31   -1.81
Freq: M, dtype: float64
        
s = s.to_period()
s
2020-01    0.04
2020-02    0.99
2020-03   -1.81
Freq: M, dtype: float64
        
s.to_timestamp()
2020-01-01    0.04
2020-02-01    0.99
2020-03-01   -1.81
Freq: MS, dtype: float64       
        
s.to_timestamp(freq='M')
Out[78]: 
2020-01-31    0.04
2020-02-29    0.99
2020-03-31   -1.81
Freq: M, dtype: float64        
        
# 'Q-NOV': quarterly frequency, year ends in November        
prng = pd.period_range('1990Q1', '1992Q4', freq='Q-NOV')        
prng.to_timestamp()
DatetimeIndex(['1989-12-01', '1990-03-01', '1990-06-01', '1990-09-01',
               '1990-12-01', '1991-03-01', '1991-06-01', '1991-09-01',
               '1991-12-01', '1992-03-01', '1992-06-01', '1992-09-01'],
              dtype='datetime64[ns]', freq='QS-DEC')
```

## replace()

```python
ts = pd.Timestamp('2020-5-15 15:30:00')

ts.replace(hour=5, minute=45)
Timestamp('2020-05-15 05:45:00')
```

## normalize()

Normalize Timestamp to midnight, preserving tz information.

```python
pd.date_range('2014-08-12', periods=4, freq='10H30min').normalize()
DatetimeIndex(['2014-08-12', '2014-08-12', '2014-08-12', '2014-08-13'], dtype='datetime64[ns]', freq=None)
```

## tz_localize(), tz_convert()

```python
s = pd.Series([1], index=pd.DatetimeIndex(['2020-08-04 02:20:00']))
s
2020-08-04 02:20:00    1
dtype: int64
    
s = s.tz_localize('CET')
s
2020-08-04 02:20:00+02:00    1
dtype: int64
    
s.tz_convert('US/Eastern')
2020-08-03 20:20:00-04:00    1
dtype: int64
```


## shift(), tshift()

shift(periods=1, freq=None, axis=0, fill_value=None): a general method not limited to dates/times


```python
ts = pd.Series(range(3), index=pd.date_range('2013-05-08', '2013-05-10'))
ts
2013-05-08    0
2013-05-09    1
2013-05-10    2
Freq: D, dtype: int64
        
ts.shift(1)                            # shift values
2013-05-08    NaN
2013-05-09    0.0
2013-05-10    1.0
Freq: D, dtype: float64
        
ts.shift(7, freq=pd.offsets.BDay())    # shift index; tshift() can also be used
2013-05-17    0
2013-05-20    1
2013-05-21    2
Freq: B, dtype: int64
        
ts.tshift(1)
2013-05-09    0
2013-05-10    1
2013-05-11    2
Freq: D, dtype: int64
```


## asfreq()

```python
ts = pd.Series(range(3), index=pd.date_range('2020-4-10', periods=3, freq=pd.offsets.BDay(3)))
ts
2020-04-10    0
2020-04-15    1
2020-04-20    2
Freq: 3B, dtype: int64
        
s.asfreq(pd.offsets.BDay(), method='pad')
2020-04-10    0
2020-04-13    0
2020-04-14    0
2020-04-15    1
2020-04-16    1
2020-04-17    1
2020-04-20    2
Freq: B, dtype: int64        
```

For a PeriodIndex, we can specify how='S'|'E' ('Start' or 'End').

```python
prng = pd.period_range('1990Q1', '1991Q4', freq='Q-NOV')
PeriodIndex(['1990Q1', '1990Q2', '1990Q3', '1990Q4', '1991Q1', '1991Q2',
             '1991Q3', '1991Q4'],
            dtype='period[Q-NOV]', freq='Q-NOV')

prng.asfreq('M','E')
PeriodIndex(['1990-02', '1990-05', '1990-08', '1990-11', '1991-02', '1991-05',
             '1991-08', '1991-11'],
            dtype='period[M]', freq='M')

prng.asfreq('M','S')
PeriodIndex(['1989-12', '1990-03', '1990-06', '1990-09', '1990-12', '1991-03',
             '1991-06', '1991-09'],
            dtype='period[M]', freq='M')
```

# Indexing, Selecting

```python
s = pd.Series(range(5), index=pd.date_range('2014-08', periods=5, freq='2M'))
s
2014-08-31    0
2014-10-31    1
2014-12-31    2
2015-02-28    3
2015-04-30    4
Freq: 2M, dtype: int64
        
s['2015']
2015-02-28    3
2015-04-30    4
Freq: 2M, dtype: int64
        
s['2014-10':'2015-01']
2014-10-31    1
2014-12-31    2
Freq: 2M, dtype: int64

s[s.index.dayofweek == 5]
2015-02-28    3
Freq: 2M, dtype: int64       
```


## between_time(), at_time()

Note: between_time() is different from between().


```python
ts = pd.Series(range(4), index=pd.date_range('2020-05-15', periods=4, freq='1D2H30min'))
ts
2020-05-15 00:00:00    0
2020-05-16 02:30:00    1
2020-05-17 05:00:00    2
2020-05-18 07:30:00    3
Freq: 1590T, dtype: int64
        
ts.between_time('2:00', '6:00')
2020-05-16 02:30:00    1
2020-05-17 05:00:00    2
Freq: 1590T, dtype: int64
        
ts.between_time('6:00', '2:00') # those that are not between two times by setting start_time later than end_time
2020-05-15 00:00:00    0
2020-05-18 07:30:00    3
Freq: 4770T, dtype: int64
        
ts.at_time('2:30')
2020-05-16 02:30:00    1
Freq: 1590T, dtype: int64
```

## first(), truncate()

* first(offset): Convenience method for subsetting initial periods of time series data based on a date offset.

* truncate(before=None, after=None, ...); this is a general method not limited only to date/time objects.

```python
ts = pd.DataFrame({'A': range(4)}, index=pd.date_range('2015-03-27', periods=4, freq='2D'))

ts.first('3D')
            A
2015-03-27  0
2015-03-29  1

ts.first(pd.offsets.MonthBegin(1))
            A
2015-03-27  0
2015-03-29  1
2015-03-31  2

ts.truncate('2015-03-29','2015-03-31')
            A
2015-03-29  1
2015-03-31  2
```

# Resampling


## resample()

Similar to groupby(). We can use short string expressions for freq in resample() such as 'D' and 'Y'. We can also use pd.offsets module. For example, resample('Y') is equal to resample(pd.offsets.YearEnd()).


Upsampling:

* Increase the frequency of the samples, such as from months to days
* Using interpolation, we can determine fine-grained observations

```python
# ts is a Series 
ts.index.freq        # <MonthBegin>
ts.resample('D').interpolate(method='linear')            # Upsampling + Interpolation
ts.resample('D').interpolate(method='spline', order=2)
```


Downsampling

* Decrease the frequency of the samples, such as from days to months
* Using summary statistics, compute the aggregate values

```python
ts = pd.Series(range(60), index=pd.date_range('5/8/2015', periods=60, freq='S'))
ts
2015-05-08 00:00:00     0
2015-05-08 00:00:01     1
2015-05-08 00:00:02     2
        ...
2015-05-08 00:00:58    58
2015-05-08 00:00:59    59
Freq: S, dtype: int64
        
ts.resample('20S').mean()
2015-05-08 00:00:00     9.5
2015-05-08 00:00:20    29.5
2015-05-08 00:00:40    49.5
Freq: 20S, dtype: float64
        
ts.resample('20S', label='right').mean()
2015-05-08 00:00:20     9.5
2015-05-08 00:00:40    29.5
2015-05-08 00:01:00    49.5
Freq: 20S, dtype: float64
        
ts.resample('500L').asfreq()
2015-05-08 00:00:00.000     0.0
2015-05-08 00:00:00.500     NaN
2015-05-08 00:00:01.000     1.0
2015-05-08 00:00:01.500     NaN
        ...
2015-05-08 00:00:58.500     NaN
2015-05-08 00:00:59.000    59.0
Freq: 500L, Length: 119, dtype: float64
            
            
# Assume df is a DataFrame whose index is of type DatetimeIndex with freq='S'
df.resample('5T').mean()  # 5 min

df.resample('5T')['col1'].mean()
df.resample('5T')[['col1','col2']].mean()
df.resample('5T')['col1'].agg([np.mean, np.std])
df.resample('5T').agg({'col1': ['sum', 'mean'], 'col2': ['mean', 'std']})

df.resample('Q').mean()   # QuarterEnd
```




## rolling(), expanding()

rolling() is a time-based window operation, while resample() is a frequency-based window operation.

```python
ts = pd.Series([1,5,2,8], index=pd.date_range('5/8/2015', periods=4, freq=pd.offsets.BDay()))
ts
2015-05-08    1
2015-05-11    5
2015-05-12    2
2015-05-13    8
Freq: B, dtype: int64
        
ts.rolling(2).mean()     # moving average
2015-05-08    NaN
2015-05-11    3.0        # (ts.iloc[0] + ts.iloc[1])/2
2015-05-12    3.5        # (ts.iloc[1] + ts.iloc[2])/2
2015-05-13    5.0        # (ts.iloc[2] + ts.iloc[3])/2
Freq: B, dtype: float64

ts.expanding(2).mean()
2015-05-08         NaN
2015-05-11    3.000000   # (ts.iloc[0] + ts.iloc[1])/2
2015-05-12    2.666667   # (ts.iloc[0] + ts.iloc[1] + ts.iloc[2])/3
2015-05-13    4.000000   # (ts.iloc[0] + ts.iloc[1] + ts.iloc[2] + ts.iloc[3])/4
Freq: B, dtype: float64
        
# Here df is a DataFrame whose index is of type DatetimeIndex.
df.rolling(60).mean().plot()
```

# Grouping


## Using groupby() with pd.Grouper()

```python
s.head(3)
Date
1981-01-01     2.4
1981-01-02     5.0
1981-01-03    14.4
dtype: float64
        
s.groupby(pd.Grouper(freq='A')).mean().head(3)
Date
1981-12-31     9.978082
1982-12-31     9.932603
1983-12-31    10.267945
Freq: A-DEC, dtype: float64

s.groupby(pd.Grouper(freq='M')).mean().head(3)
Date
1981-01-31    10.729032
1981-02-28     9.221429
1981-03-31    10.154839
Freq: M, dtype: float64

        
df = pd.DataFrame()
for name, gp in s.groupby(pd.Grouper(freq='A')):
    df[name.year] = gp.values

df.head(3)
   1981  1982  1983  ...
0   2.4   9.0  18.4  ...
1   5.0  16.8   2.4  ...
2  14.4   0.2  15.6  ...
```