# Working with Time Series

Pandas was developed in the context of financial modeling, so as you might expect, it contains a fairly extensive set of tools for working with dates, times, and time-indexed data. 

## Datas and Times in Python
### Native Python dates and times: `datetime` and `dateutil`

In [1]:
from datetime import datetime
datetime(year=2018, month=6, day=26)

datetime.datetime(2018, 6, 26, 0, 0)

In [2]:
from dateutil import parser
date = parser.parse('26th of July, 2018')
date

datetime.datetime(2018, 7, 26, 0, 0)

In [4]:
# Print day of week
date.strftime('%A')

'Thursday'

## Typed arrays of times: NumPy's 
`datetime64`

In [5]:
import numpy as np
date = np.array('2018-07-04', dtype=np.datetime64)
date

array('2018-07-04', dtype='datetime64[D]')

In [7]:
date + np.arange(12)

array(['2018-07-04', '2018-07-05', '2018-07-06', '2018-07-07',
       '2018-07-08', '2018-07-09', '2018-07-10', '2018-07-11',
       '2018-07-12', '2018-07-13', '2018-07-14', '2018-07-15'],
      dtype='datetime64[D]')

In [8]:
# Day based datetime
np.datetime64('2018-06-18')

numpy.datetime64('2018-06-18')

In [10]:
# Minute based datetime
np.datetime64('2018-06-06 12:00')

numpy.datetime64('2018-06-06T12:00')

In [11]:
# Nanosecond based time
np.datetime64('2018-06-18 12:55:56.50', 'ns')

numpy.datetime64('2018-06-18T12:55:56.500000000')

## Dates and times in pandas


In [12]:
import pandas as pd
date = pd.to_datetime("4th of July, 2018")
date

Timestamp('2018-07-04 00:00:00')

In [13]:
date.strftime('%A')

'Wednesday'

In [14]:
date + pd.to_timedelta(np.arange(12), 'D')

DatetimeIndex(['2018-07-04', '2018-07-05', '2018-07-06', '2018-07-07',
               '2018-07-08', '2018-07-09', '2018-07-10', '2018-07-11',
               '2018-07-12', '2018-07-13', '2018-07-14', '2018-07-15'],
              dtype='datetime64[ns]', freq=None)

## Pandas Time Series: Indexing by Time

Where the Pandas time series tools really become useful is when you begin to _index data by timestamps_. For example, we can construct a `Series` object that has time indexed data:

In [15]:
index = pd.DatetimeIndex(['2014-07-04', '2014-08-04',
                          '2015-07-04', '2015-08-04'])
data = pd.Series([0, 1, 2, 3], index=index)
data

2014-07-04    0
2014-08-04    1
2015-07-04    2
2015-08-04    3
dtype: int64

In [17]:
data['2014-07-04':'2015-07-05']

2014-07-04    0
2014-08-04    1
2015-07-04    2
dtype: int64

In [18]:
data['2015']

2015-07-04    2
2015-08-04    3
dtype: int64

## Pandas Time Series Data Structures

In [19]:
dates = pd.to_datetime([datetime(2015, 7, 3), '4th of July, 2015',
                       '2015-Jul-6', '07-07-2015', '20150708'])
dates

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-06', '2015-07-07',
               '2015-07-08'],
              dtype='datetime64[ns]', freq=None)

In [20]:
dates.to_period('D')

PeriodIndex(['2015-07-03', '2015-07-04', '2015-07-06', '2015-07-07',
             '2015-07-08'],
            dtype='period[D]', freq='D')

In [21]:
dates - dates[0]

TimedeltaIndex(['0 days', '1 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq=None)

## Regular sequence: 
   `pd.date_range()`

In [22]:
pd.date_range('2015-07-03', '2015-07-10')

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10'],
              dtype='datetime64[ns]', freq='D')

In [23]:
pd.date_range('2015-07-03', periods=8)

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10'],
              dtype='datetime64[ns]', freq='D')

In [28]:
# Hours frequency
pd.date_range('2015-07-03', periods=8, freq='H')

DatetimeIndex(['2015-07-03 00:00:00', '2015-07-03 01:00:00',
               '2015-07-03 02:00:00', '2015-07-03 03:00:00',
               '2015-07-03 04:00:00', '2015-07-03 05:00:00',
               '2015-07-03 06:00:00', '2015-07-03 07:00:00'],
              dtype='datetime64[ns]', freq='H')

In [25]:
# Months frequency
pd.period_range('2015-07', periods=8, freq='M')

PeriodIndex(['2015-07', '2015-08', '2015-09', '2015-10', '2015-11', '2015-12',
             '2016-01', '2016-02'],
            dtype='period[M]', freq='M')

In [26]:
# Only hours
pd.timedelta_range(0, periods=10, freq='H')

TimedeltaIndex(['00:00:00', '01:00:00', '02:00:00', '03:00:00', '04:00:00',
                '05:00:00', '06:00:00', '07:00:00', '08:00:00', '09:00:00'],
               dtype='timedelta64[ns]', freq='H')

In [29]:
pd.timedelta_range(0, periods=9, freq="2H30T")

TimedeltaIndex(['00:00:00', '02:30:00', '05:00:00', '07:30:00', '10:00:00',
                '12:30:00', '15:00:00', '17:30:00', '20:00:00'],
               dtype='timedelta64[ns]', freq='150T')

In [31]:
from pandas.tseries.offsets import BDay
pd.date_range('2018-07-01', periods=5, freq=BDay())

DatetimeIndex(['2018-07-02', '2018-07-03', '2018-07-04', '2018-07-05',
               '2018-07-06'],
              dtype='datetime64[ns]', freq='B')