# Working with Time Series

[Book](https://jakevdp.github.io/PythonDataScienceHandbook/03.11-working-with-time-series.html)

## Dates and times in Python

`datetime` and `dateutil`

In [1]:
from datetime import datetime
datetime(year=2015, month=7, day=4)

datetime.datetime(2015, 7, 4, 0, 0)

In [2]:
datetime(2015, 7, 4)

datetime.datetime(2015, 7, 4, 0, 0)

In [3]:
from dateutil import parser
date = parser.parse("4th of July, 2015")
date

datetime.datetime(2015, 7, 4, 0, 0)

[`strftime` formats](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)

In [4]:
date.strftime('%A')

'Saturday'

## NumPy

`datetime64`

In [5]:
import numpy as np
date = np.array('2015-07-04', dtype=np.datetime64)
date

array('2015-07-04', dtype='datetime64[D]')

In [6]:
date + np.arange(12)

array(['2015-07-04', '2015-07-05', '2015-07-06', '2015-07-07',
       '2015-07-08', '2015-07-09', '2015-07-10', '2015-07-11',
       '2015-07-12', '2015-07-13', '2015-07-14', '2015-07-15'],
      dtype='datetime64[D]')

In [7]:
np.arange(12)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

## Pandas

`Timestamp`, `DatetimeIndex`

In [8]:
import pandas as pd
date = pd.to_datetime("4th of July, 2015")
date

Timestamp('2015-07-04 00:00:00')

In [9]:
date.strftime('%A')

'Saturday'

In [10]:
date + pd.to_timedelta(np.arange(12), 'D')

DatetimeIndex(['2015-07-04', '2015-07-05', '2015-07-06', '2015-07-07',
               '2015-07-08', '2015-07-09', '2015-07-10', '2015-07-11',
               '2015-07-12', '2015-07-13', '2015-07-14', '2015-07-15'],
              dtype='datetime64[ns]', freq=None)

## Indexing by Time

### Construct series with time index

In [11]:
index = pd.DatetimeIndex(['2014-07-04', '2014-08-04',
                          '2015-07-04', '2015-08-04'])
data = pd.Series([0, 1, 2, 3], index=index)
data

2014-07-04    0
2014-08-04    1
2015-07-04    2
2015-08-04    3
dtype: int64

In [12]:
data['2014-07-04':'2015-07-04']

2014-07-04    0
2014-08-04    1
2015-07-04    2
dtype: int64

In [13]:
data['2015']

2015-07-04    2
2015-08-04    3
dtype: int64

## Pandas Time Series Data Structures

- `Timestamp`, `DatetimeIndex`
- `Period`, `PeriodIndex`
- `Timedelta`, `TimedeltaIndex`

`pd.to_datetime()`: with 1 date, produces a `Timestamp`, with a series, produces a `DatetimeIndex`

In [14]:
dates = pd.to_datetime([datetime(2015, 7, 3), '4th of July, 2015',
                       '2015-Jul-6', '07-07-2015', '20150708'])
dates

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-06', '2015-07-07',
               '2015-07-08'],
              dtype='datetime64[ns]', freq=None)

In [15]:
# Convert DatetimeIndex to PeriodIndex
dates.to_period('D')

PeriodIndex(['2015-07-03', '2015-07-04', '2015-07-06', '2015-07-07',
             '2015-07-08'],
            dtype='period[D]')

In [16]:
# subtracting dates creates a TimeDeltaIndex
dates - dates[0]

TimedeltaIndex(['0 days', '1 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq=None)

## Regular sequences

- `pd.date_range()` for timestamps
- `pd.period_range()` for periods
- `pd.timedelta_range()` for time deltas. 

In [17]:
pd.date_range('2015-07-03', '2015-07-10')

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10'],
              dtype='datetime64[ns]', freq='D')

In [18]:
pd.date_range('2015-07-03', periods=8)

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10'],
              dtype='datetime64[ns]', freq='D')

In [19]:
pd.date_range('2015-07-03', periods=8, freq='H')

DatetimeIndex(['2015-07-03 00:00:00', '2015-07-03 01:00:00',
               '2015-07-03 02:00:00', '2015-07-03 03:00:00',
               '2015-07-03 04:00:00', '2015-07-03 05:00:00',
               '2015-07-03 06:00:00', '2015-07-03 07:00:00'],
              dtype='datetime64[ns]', freq='H')

In [20]:
pd.period_range('2015-07', periods=8, freq='M')

PeriodIndex(['2015-07', '2015-08', '2015-09', '2015-10', '2015-11', '2015-12',
             '2016-01', '2016-02'],
            dtype='period[M]')

In [21]:
pd.timedelta_range(0, periods=10, freq='H')

TimedeltaIndex(['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
                '0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00',
                '0 days 06:00:00', '0 days 07:00:00', '0 days 08:00:00',
                '0 days 09:00:00'],
               dtype='timedelta64[ns]', freq='H')


## Frequencies and Offsets

Fundamental to these Pandas time series tools is the concept of a frequency or date offset. Just as we saw the D (day) and H (hour) codes above, we can use such codes to specify any desired frequency spacing. The following table summarizes the main codes available:
|Code| 	Description| 	Code |	Description|
|---|---|---|---|
|D 	|Calendar day 	|B 	|Business day|
|W 	|Weekly 	|	||
|M 	|Month end 	|BM |	Business month end|
|Q 	|Quarter end |	BQ |	Business quarter end|
|A 	|Year end 	|BA |	Business year end|
|H 	|Hours 	|BH |	Business hours|
|T 	|Minutes 	|	||
|S 	|Seconds 	|	||
|L 	|Milliseonds |	||	
|U 	|Microseconds |	||	
|N 	|nanoseconds |	|	

The monthly, quarterly, and annual frequencies are all marked at the end of the specified period. By adding an S suffix to any of these, they instead will be marked at the beginning:
|Code 	|Description 	|	Code |	Description|
|---|---|---|---|
MS |	Month start 	|	BMS |	Business month start|
QS |	Quarter start 	|	BQS |	Business quarter start|
|AS 	|Year start 		|BAS |	Business year start|

Additionally, you can change the month used to mark any quarterly or annual code by adding a three-letter month code as a suffix:

- Q-JAN, BQ-FEB, QS-MAR, BQS-APR, etc.
- A-JAN, BA-FEB, AS-MAR, BAS-APR, etc.

In the same way, the split-point of the weekly frequency can be modified by adding a three-letter weekday code:

- W-SUN, W-MON, W-TUE, W-WED, etc.

On top of this, codes can be combined with numbers to specify other frequencies. For example, for a frequency of 2 hours 30 minutes, we can combine the hour (H) and minute (T) codes as follows:


In [22]:
pd.timedelta_range(0, periods=9, freq="2H30T")

TimedeltaIndex(['0 days 00:00:00', '0 days 02:30:00', '0 days 05:00:00',
                '0 days 07:30:00', '0 days 10:00:00', '0 days 12:30:00',
                '0 days 15:00:00', '0 days 17:30:00', '0 days 20:00:00'],
               dtype='timedelta64[ns]', freq='150T')

In [23]:
from pandas.tseries.offsets import BDay
pd.date_range('2015-07-01', periods=15, freq=BDay())

DatetimeIndex(['2015-07-01', '2015-07-02', '2015-07-03', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10',
               '2015-07-13', '2015-07-14', '2015-07-15', '2015-07-16',
               '2015-07-17', '2015-07-20', '2015-07-21'],
              dtype='datetime64[ns]', freq='B')

## Resampling, shifting and windowing

In [24]:
from frozendict import frozendict
from pandas_datareader import data as pdr

import yfinance as yfin


yfin.pdr_override()

spy = pdr.get_data_yahoo('SPY', start='2022-10-24', end='2022-12-23')

print(spy)

[*********************100%***********************]  1 of 1 completed
                  Open        High         Low       Close   Adj Close  \
Date                                                                     
2022-10-24  375.890015  380.059998  373.109985  378.869995  370.341309   
2022-10-25  378.790009  385.250000  378.670013  384.920013  376.255157   
2022-10-26  381.619995  387.579987  381.350006  382.019989  373.420380   
2022-10-27  383.070007  385.000000  379.329987  379.980011  371.426361   
2022-10-28  379.869995  389.519989  379.679993  389.019989  380.262817   
2022-10-31  386.440002  388.399994  385.260010  386.209991  377.516113   
2022-11-01  390.140015  390.390015  383.290009  384.519989  375.864166   
2022-11-02  383.899994  388.630005  374.760010  374.869995  366.431366   
2022-11-03  371.470001  374.200012  368.790009  371.010010  362.658295   
2022-11-04  377.000000  378.869995  370.000000  376.350006  367.878052   
2022-11-07  377.709991  380.570007  375.529