# Working with Time Series

Different types of date and time data: 
- Time stamps: particular moments in time 
- Time intervals and periods: length of time between two points.
- Time deltas or durations: an exact length of time 


## Dates and Times in Python

### Native Python dates and times: datetime and dateutil

In [1]:
from datetime import datetime 
datetime(year=1997, month=7, day=4)

datetime.datetime(1997, 7, 4, 0, 0)

In [2]:
from dateutil import parser
date = parser.parse("4th of July, 2015")
date

datetime.datetime(2015, 7, 4, 0, 0)

In [3]:
date.strftime('%A')

'Saturday'

https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior <br>
http://labix.org/python-dateutil

### Typed arrays of times: NumPy's datetime64

datetime64 represents as 64-bit integers, so they can be represented compactly@

In [5]:
import numpy as np
date = np.array('2015-07-04', dtype=np.datetime64)
date

array('2015-07-04', dtype='datetime64[D]')

In [8]:
date + np.arange(12)

array(['2015-07-04', '2015-07-05', '2015-07-06', '2015-07-07',
       '2015-07-08', '2015-07-09', '2015-07-10', '2015-07-11',
       '2015-07-12', '2015-07-13', '2015-07-14', '2015-07-15'],
      dtype='datetime64[D]')

One detail of timedelta64 and datetime64 is they are built on a *fundamental time unit.* Since a datetime64 is limited to 64-bit precision, the range of encodable times is 2^(64) times this unit. Thus there is a tradeoff between *time resolution* and *maximum time span.* Suppose we want nanosecond precision, then we can only encode 2^64 nanoseconds, which is 600 years. Numpy can infer the unit from the input e.g.

In [9]:
np.datetime64('2015-07-04')

numpy.datetime64('2015-07-04')

In [10]:
np.datetime64('2015-07-04 12:00')

numpy.datetime64('2015-07-04T12:00')

Timezone is automatically set as local time of computer. Can force the fundamental unit:

In [11]:
np.datetime64('2015-07-04 12:59:59.50', 'ns')

numpy.datetime64('2015-07-04T12:59:59.500000000')

**Code Meaning Time span (relative) Time span (absolute)** <br> 
Y Year ± 9.2e18 years [9.2e18 BC, 9.2e18 AD]<br>
M Month ± 7.6e17 years [7.6e17 BC, 7.6e17 AD]<br>
W Week ± 1.7e17 years [1.7e17 BC, 1.7e17 AD]<br>
D Day ± 2.5e16 years [2.5e16 BC, 2.5e16 AD]<br>
h Hour ± 1.0e15 years [1.0e15 BC, 1.0e15 AD]<br>
m Minute ± 1.7e13 years [1.7e13 BC, 1.7e13 AD]<br>
s Second ± 2.9e12 years [ 2.9e9 BC, 2.9e9 AD]<br>
ms Millisecond ± 2.9e9 years [ 2.9e6 BC, 2.9e6 AD]<br>
us Microsecond ± 2.9e6 years [290301 BC, 294241 AD]<br>
ns Nanosecond ± 292 years [ 1678 AD, 2262 AD]<br>
ps Picosecond ± 106 days [ 1969 AD, 1970 AD]<br>
fs Femtosecond ± 2.6 hours [ 1969 AD, 1970 AD]<br>
as Attosecond ± 9.2 seconds [ 1969 AD, 1970 AD]<br>

### Dates and times in pandas

pandas provides a Timestamp object, which combines the above two, and can be used to create a DatetimeIndex

In [12]:
import pandas as pd
date = pd.to_datetime("4th of July, 2015")
date

Timestamp('2015-07-04 00:00:00')

In [13]:
date.strftime("%A")

'Saturday'

In [15]:
date + pd.to_timedelta(np.arange(12), 'D')

DatetimeIndex(['2015-07-04', '2015-07-05', '2015-07-06', '2015-07-07',
               '2015-07-08', '2015-07-09', '2015-07-10', '2015-07-11',
               '2015-07-12', '2015-07-13', '2015-07-14', '2015-07-15'],
              dtype='datetime64[ns]', freq=None)

## Indexing by Time

In [16]:
index = pd.DatetimeIndex(['2014-07-04', '2014-08-04',
                          '2015-07-04', '2015-08-04'])
data = pd.Series([0, 1, 2, 3], index=index)
data

2014-07-04    0
2014-08-04    1
2015-07-04    2
2015-08-04    3
dtype: int64

We can make use of any of the indexing patterns from normal Series:

In [17]:
data['2014-07-04':'2015-07-04']

2014-07-04    0
2014-08-04    1
2015-07-04    2
dtype: int64

In [19]:
# date only indexing option
data['2015']

2015-07-04    2
2015-08-04    3
dtype: int64

## Pandas Time Series Data Structures

- For time stamps, Pandas has the Timestamp object. The associated index is a DatetimeIndex
- For time period, Pandas has the Period type. The index is PeriodIndex.
- Fir time deltas or durations, we have the Timedelta. This index is TimedeltaIndex

Timestamp and DatetimeIndex are most important. pd.to_datetime() can parse lots of formats/

In [21]:
dates = pd.to_datetime([datetime(2015, 7, 3), '4th of July, 2015',
                       '2015-Jul-6', '07-07-2015', '20150708'])
dates

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-06', '2015-07-07',
               '2015-07-08'],
              dtype='datetime64[ns]', freq=None)

to_period can covert a DatetimeIndex to a PeriodIndex:

In [22]:
dates.to_period('D')

PeriodIndex(['2015-07-03', '2015-07-04', '2015-07-06', '2015-07-07',
             '2015-07-08'],
            dtype='period[D]', freq='D')

TimedeltaIndex, for example, is made when a date is subtracted from another:

In [23]:
dates - dates[0]

TimedeltaIndex(['0 days', '1 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq=None)

## Regular sequences: pd.date_range()

Similar to arange, give a start date, end date and frequency. The frequency defaults to one day:

In [24]:
pd.date_range('2015-07-03', '2015-07-10')

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10'],
              dtype='datetime64[ns]', freq='D')

In [25]:
pd.date_range('2015-07-03', periods=8)

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10'],
              dtype='datetime64[ns]', freq='D')

In [27]:
pd.date_range('2015-07-03', periods=8, freq='H')

DatetimeIndex(['2015-07-03 00:00:00', '2015-07-03 01:00:00',
               '2015-07-03 02:00:00', '2015-07-03 03:00:00',
               '2015-07-03 04:00:00', '2015-07-03 05:00:00',
               '2015-07-03 06:00:00', '2015-07-03 07:00:00'],
              dtype='datetime64[ns]', freq='H')

Can also use similar for PeriodIndex and TimedeltaIndex objects:

In [28]:
pd.period_range('2015-07', periods=8, freq='M')

PeriodIndex(['2015-07', '2015-08', '2015-09', '2015-10', '2015-11', '2015-12',
             '2016-01', '2016-02'],
            dtype='period[M]', freq='M')

In [29]:
pd.timedelta_range(0, periods=10, freq='H')

TimedeltaIndex(['00:00:00', '01:00:00', '02:00:00', '03:00:00', '04:00:00',
                '05:00:00', '06:00:00', '07:00:00', '08:00:00', '09:00:00'],
               dtype='timedelta64[ns]', freq='H')

## Frequencies and Offsets

**Code Description** <br>
D Calendar day<br>
B Business day<br>
W Weekly<br>
M Month end<br>
BM Business month end<br>
Q Quarter end<br>
BQ Business quarter end<br>
A Year end<br>
BA Business year end<br>
H Hours<br>
BH Business hours<br>
T Minutes<br>
S Seconds<br>
L Milliseonds<br>
U Microseconds<br>
N nanoseconds<br>



**Code Description**<br>
MS Month start<br>
BMS Business month start<br>
QS Quarter start<br>
BQS Business quarter start<br>
AS Year start<br>
BAS Business year start <br>

Can also change the month used to mark any quarterly or annual code:
- Q-JAN, BQ-FEB, QS-MAR etc
- A-JAN, BA-FEB, AS-MAR etc

Similarly for weeks:
- W-SUN, W-MON etc

Can also combine codes e.g. 2H30T is 2hr 30 min:

In [35]:
pd.timedelta_range(0, periods=9, freq="2H30T")

TimedeltaIndex(['00:00:00', '02:30:00', '05:00:00', '07:30:00', '10:00:00',
                '12:30:00', '15:00:00', '17:30:00', '20:00:00'],
               dtype='timedelta64[ns]', freq='150T')

These all exist in the pd.tseries.offsets module e.g. for business days:


In [49]:
from pandas.tseries.offsets import BDay
pd.date_range('2015-07-01', periods=5, freq=BDay())

DatetimeIndex(['2015-07-01', '2015-07-02', '2015-07-03', '2015-07-06',
               '2015-07-07'],
              dtype='datetime64[ns]', freq='B')

## Resampling, Shifting and Windowing

Can't do this at work