# Working with Time Series
## Dates and Times in Python
### Nativr Python dates and times: datetime and dateutil

In [1]:
from datetime import datetime
datetime(year=2015, month=7, day=4)

datetime.datetime(2015, 7, 4, 0, 0)

In [4]:
from dateutil import parser
date = parser.parse("4th of July, 2015")

In [5]:
date.strftime('%A')

'Saturday'

### Typed arrays of times: NumPy's datetime64

In [6]:
import numpy as np
date = np.array('2015-07-04', dtype=np.datetime64)
date

array('2015-07-04', dtype='datetime64[D]')

In [7]:
date + np.arange(12)

array(['2015-07-04', '2015-07-05', '2015-07-06', '2015-07-07',
       '2015-07-08', '2015-07-09', '2015-07-10', '2015-07-11',
       '2015-07-12', '2015-07-13', '2015-07-14', '2015-07-15'],
      dtype='datetime64[D]')

In [8]:
np.datetime64('2015-07-04')

numpy.datetime64('2015-07-04')

In [9]:
np.datetime64('2015-07-04 12:00')

numpy.datetime64('2015-07-04T12:00')

Description of date and teim codes

| Code | Meaning | Time span (relative) | Time span (absolute) |
| --- | --- | --- | --- |
| Y | Year | +-9.2e18 years | [9.2e18BC, 9.2e18AD] |
| M | Month | +-7.6e17 years | [7.6e17BC, 7.6e17AD] |
| W | Week | +-1.7e17 years | [1.7e17BC, 1.7e17AD] |
| D | Day | +-2.5e16 years | [2.5e16BC, 2.5e16AD] |
| h | Hour | +-1.0e15 years | [1.0e15BC, 1.0e15AD] |
| m | Minute | +-1.7e13 years | [1.7e13BC, 1.7e13AD] |
| s | Second | +-2.9e12 years | [2.9e9BC, 2.9e9AD] |
| ms | Milliscond | +-2.9e9 years | [2.9e6BC, 2.9e6AD] |
| us | Microsecond | +-2.9e6 years | [29301 BC, 294241 AD] |
| ns | Nanosecond | +-292 years | [1678 AD, 2262 AD] |
| ps | Picosecond | +-106 days | [1969 AD, 1970 AD] |
| fs | Femtosecond | +-2.6 hours | [1969 AD, 1970 AD] |
| as | Attosecond | +-9.2 seconds | [1069 AD, 1970 AD] |

For the types of data we see in the real world, a useful default is datetime64[ns], as it can encode a useful range of modern dates with a suitably fine precision.

### Dates and times in Pandas: Best of both worlds

In [10]:
import pandas as pd
date = pd.to_datetime("4th of July, 2015")
date

Timestamp('2015-07-04 00:00:00')

In [11]:
date.strftime('%A')

'Saturday'

In [12]:
date + pd.to_timedelta(np.arange(12), 'D')

DatetimeIndex(['2015-07-04', '2015-07-05', '2015-07-06', '2015-07-07',
               '2015-07-08', '2015-07-09', '2015-07-10', '2015-07-11',
               '2015-07-12', '2015-07-13', '2015-07-14', '2015-07-15'],
              dtype='datetime64[ns]', freq=None)

## Pandas Time Series: Indexing by Time

In [14]:
index = pd.DatetimeIndex(['2014-07-04', '2014-08-04','2015-07-04', '2015-08-04'])
data = pd.Series([0, 1, 2, 3], index=index)
data

2014-07-04    0
2014-08-04    1
2015-07-04    2
2015-08-04    3
dtype: int64

In [15]:
data['2014-07-04':'2015-07-04']

2014-07-04    0
2014-08-04    1
2015-07-04    2
dtype: int64

In [16]:
date['2015']

2015-07-04    2
2015-08-04    3
dtype: int64

## Pandas Time Series Data Structures

- For *tine stamps*, Pandas provides the timestamp type. As mentioned before, it is essentially a replacement for Python's native datetime, but is based on the more efficient numpy.datetim64 data type. the associated index structure is DateimtIndex.
- For *time periods*, Pandas provides the Period type. This encodes a fixed frequency interval based on numpy.datetime64. The associated index structure is PeriodIndex.
- For *tine deltas* or *durations*, Pandas provides the Timedelta type. Timedelta is a more efficient replacement for Python's native datetime.timedelta type, and is base don numpy.timedelta64. The associated index structure is TimedeltaIndex.

pd.to_datetime() function: Passing a single date to pd.to_datetime() yields a Timestamp; passing a series of dates by default yields a DatetimeIndex:

In [17]:
dates = pd.to_datetime([datetime(2015, 7, 3), '4th of July, 2015',
                       '2015-Jul-6', '07-07-2015', '20150708'])
dates

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-06', '2015-07-07',
               '2015-07-08'],
              dtype='datetime64[ns]', freq=None)

In [18]:
# Any DatetimeIndex can be converted to a PeriodIndex with the to_period() function with 
# the addition of a frequency code
dates.to_period('D')

PeriodIndex(['2015-07-03', '2015-07-04', '2015-07-06', '2015-07-07',
             '2015-07-08'],
            dtype='period[D]', freq='D')

In [19]:
dates - dates[0]

TimedeltaIndex(['0 days', '1 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq=None)

### Regular sequences: pd.date_range()
pd.date_range()

pd.period_range()

pd.timedelta_range()

In [20]:
pd.date_range('2015-07-03', '2015-07-10')

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10'],
              dtype='datetime64[ns]', freq='D')

In [21]:
pd.date_range('2015-07-03', periods=8)

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10'],
              dtype='datetime64[ns]', freq='D')

In [22]:
pd.date_range('2015-07-03', periods=8, freq='H')

DatetimeIndex(['2015-07-03 00:00:00', '2015-07-03 01:00:00',
               '2015-07-03 02:00:00', '2015-07-03 03:00:00',
               '2015-07-03 04:00:00', '2015-07-03 05:00:00',
               '2015-07-03 06:00:00', '2015-07-03 07:00:00'],
              dtype='datetime64[ns]', freq='H')

In [23]:
pd.period_range('2015-07', periods=8, freq='M')

PeriodIndex(['2015-07', '2015-08', '2015-09', '2015-10', '2015-11', '2015-12',
             '2016-01', '2016-02'],
            dtype='period[M]', freq='M')

In [24]:
pd.timedelta_range(0, periods=10, freq='H')

TimedeltaIndex(['00:00:00', '01:00:00', '02:00:00', '03:00:00', '04:00:00',
                '05:00:00', '06:00:00', '07:00:00', '08:00:00', '09:00:00'],
               dtype='timedelta64[ns]', freq='H')

## Frequencies and Offsets

Listing of Pandas frequency codes

| Code | Description | Code | Description |
| --- | --- | --- | --- |
| D | Calendar day | B | Business day |
| W | Weekly |
| M | Month end | BM | Business month end |
| Q | Quarter end | BQ | Business quarter end |
| A | Year end | BA | Business year end |
| H | Hours | BH | Business hours |
| T | Minutes |
| S | Seconds |
| L | Milliseconds |
| U | Microseconds |
| N | Nanoseconds |

The monthly, quarterly, and annual frequencies are all marked at the end of the specified period. Adding an S suffix to any of these marks it instead at the beginning.

| Code | Description |
| --- | --- |
| MS | Month start |
| BMS | Business month start |
| QS | Quarter start |
| BQS | Business quarter start |
| AS | Year start |
| BAS | Business year start |

Additionally, you can change the month used to mark any quarterly or annual code by adding a three-letter month code as a suffix:

- Q-JAN, BQ-FEB, QS-MAR, BQS-APR, etc.
- A-JAN, BA-FEB, AS-MAR, BAS-APR, etc.

In the same way, you can modify the split-point of the weekly frequency by adding a three-letter weekday code:

- W-SUN, W-MON, W-TUE, W-WED, etc.

In [25]:
pd.timedelta_range(0, periods=9, freq="2H30T")

TimedeltaIndex(['00:00:00', '02:30:00', '05:00:00', '07:30:00', '10:00:00',
                '12:30:00', '15:00:00', '17:30:00', '20:00:00'],
               dtype='timedelta64[ns]', freq='150T')

In [26]:
from pandas.tseries.offsets import BDay
pd.date_range('2015-07-01', periods=5, freq=BDay())

DatetimeIndex(['2015-07-01', '2015-07-02', '2015-07-03', '2015-07-06',
               '2015-07-07'],
              dtype='datetime64[ns]', freq='B')