# Time Series / Date functionality



## Background

pandas has proven very successful as a tool for working with [time series data](http://pandas.pydata.org/pandas-docs/stable/timeseries.html), especially in the financial data analysis space. Using the NumPy `datetime64` and `timedelta64` dtypes, we have consolidated a large number of features from other Python libraries like `scikits.timeseries` as well as created a tremendous amount of new functionality for manipulating time series data.

In working with time series data, we will frequently seek to:
* generate sequences of fixed-frequency dates and time spans
* conform or convert time series to a particular frequency
* compute “relative” dates based on various non-standard time increments (e.g. 5 business days before the last business day of the year), or “roll” dates forward or backward

pandas provides a relatively compact and self-contained set of tools for performing the above tasks.



# Learning Outcomes
* Introduction
* Overview
* Time Stamps vs. Time Spans
* Converting to Timestamps
* Generating Ranges of Timestamps
* Timestamp limitations
* DatetimeIndex
* DateOffset objects
* Time series-related instance methods
* Resampling
* Time Span Representation
* Converting between Representations
* Representing out-of-bounds spans
* Time Zone Handling

# Introduction

In [1]:
import pandas as pd
import numpy as np
print("Pandas version : {}".format(pd.__version__))
print("Numpy version : {}".format(np.__version__))

Pandas version : 0.22.0
Numpy version : 1.14.3


Create a range of dates:

In [2]:
rng = pd.date_range('1/1/2011', periods=72, freq='H')
rng[:5]

DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',
               '2011-01-01 02:00:00', '2011-01-01 03:00:00',
               '2011-01-01 04:00:00'],
              dtype='datetime64[ns]', freq='H')

Index pandas objects with dates:

In [3]:
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts[:5]

2011-01-01 00:00:00   -0.330390
2011-01-01 01:00:00   -0.377608
2011-01-01 02:00:00   -1.266870
2011-01-01 03:00:00    1.598714
2011-01-01 04:00:00   -2.795055
Freq: H, dtype: float64

Change frequency and fill gaps:

In [4]:
# to 45 minute frequency and forward fill
converted = ts.asfreq('45Min', method='pad')
converted.head()

2011-01-01 00:00:00   -0.330390
2011-01-01 00:45:00   -0.330390
2011-01-01 01:30:00   -0.377608
2011-01-01 02:15:00   -1.266870
2011-01-01 03:00:00    1.598714
Freq: 45T, dtype: float64

Resample:

In [5]:
# Daily means
ts.resample('D').mean()

2011-01-01    0.116912
2011-01-02   -0.571520
2011-01-03    0.115185
Freq: D, dtype: float64

# Overview

| Class	| Remarks	| How to create| 
| - | - | - |
| Timestamp| 	Represents a single time stamp	| to_datetime, Timestamp
| DatetimeIndex	| Index of Timestamp	| to_datetime, date_range, DatetimeIndex
| Period| 	Represents a single time span	| Period
| PeriodIndex| 	Index of Period	| period_range, PeriodIndex

# Time Stamps vs. Time Spans

Time-stamped data is the most basic type of timeseries data that associates values with points in time. For pandas objects it means using the points in time.

In [6]:
import datetime

In [7]:
pd.Timestamp(datetime.datetime(2012, 5, 1))

Timestamp('2012-05-01 00:00:00')

In [8]:
pd.Timestamp('2012-05-01')

Timestamp('2012-05-01 00:00:00')

While point-in-time is useful, there are other ways to we associate time with. Day-to-day, month-to-month for example. We can use `pandas` built-in function `Period` for this. We can state the frequency explicitly or let `Period` infer from the datetime string format.

#### by inference

In [9]:
pd.Period('2011-01')

Period('2011-01', 'M')

#### state explicitly

In [10]:
pd.Period('2012-05', freq='D')

Period('2012-05-01', 'D')

`Timestamp` and `Period` can be the index of a `Series` or `DataFrame`. Lists of `Timestamp` and `Period` are **automatically coerce** to `DatetimeIndex` and `PeriodIndex` respectively.

#### `Timestamp` being coerced into `DatetimeIndex`

In [11]:
dates = [pd.Timestamp('2012-05-01'), pd.Timestamp('2012-05-02'), pd.Timestamp('2012-05-03')]

In [12]:
dates

[Timestamp('2012-05-01 00:00:00'),
 Timestamp('2012-05-02 00:00:00'),
 Timestamp('2012-05-03 00:00:00')]

In [13]:
ts = pd.Series(np.random.randn(3), dates)

In [14]:
type(ts.index)

pandas.core.indexes.datetimes.DatetimeIndex

Note that the Timestamp has been coerced into `DatetimeIndex`.

In [15]:
ts.index

DatetimeIndex(['2012-05-01', '2012-05-02', '2012-05-03'], dtype='datetime64[ns]', freq=None)

In [16]:
ts

2012-05-01   -0.155346
2012-05-02    0.590148
2012-05-03    0.869051
dtype: float64

#### `Period` being coerced into `PeriodIndex`

In [17]:
periods = [pd.Period('2012-01'), pd.Period('2012-02'), pd.Period('2012-03')]

In [18]:
ts = pd.Series(np.random.randn(3), periods)

In [19]:
type(ts.index)

pandas.core.indexes.period.PeriodIndex

In [20]:
ts.index

PeriodIndex(['2012-01', '2012-02', '2012-03'], dtype='period[M]', freq='M')

In [21]:
ts

2012-01    0.101763
2012-02   -0.030161
2012-03    0.650238
Freq: M, dtype: float64

pandas allows you to capture both representations and convert between them. Under the hood, pandas represents timestamps using instances of `Timestamp` and sequences of timestamps using instances of `DatetimeIndex`. For regular time spans, pandas uses `Period` objects for scalar values and `PeriodIndex` for sequences of spans.

# Converting to Timestamps

To convert a Series or list-like object of date-like objects e.g. strings, epochs, or a mixture, you can use the `to_datetime` function. When passed a Series, this returns a Series (with the same index), while a list-like is converted to a DatetimeIndex:

#### Mixture of strings to DatetimeIndex

In [22]:
pd.to_datetime(pd.Series(['Jul 31, 2009', '2010-01-10', None]))

0   2009-07-31
1   2010-01-10
2          NaT
dtype: datetime64[ns]

In [23]:
pd.to_datetime(['2005/11/23', '2010.12.31'])

DatetimeIndex(['2005-11-23', '2010-12-31'], dtype='datetime64[ns]', freq=None)

#### If you use dates which start with the day first, you can pass the dayfirst flag:

In [24]:
pd.to_datetime(['04-01-2012 10:00'], dayfirst=True)

DatetimeIndex(['2012-01-04 10:00:00'], dtype='datetime64[ns]', freq=None)

In [25]:
pd.to_datetime(['14-01-2012', '01-14-2012'], dayfirst=True)

DatetimeIndex(['2012-01-14', '2012-01-14'], dtype='datetime64[ns]', freq=None)

You see in the above example that `dayfirst` isn’t strict, so if a date can’t be parsed with the day being first it will be parsed as if `dayfirst` were **False**.

Whenever possible, explicitly specifying a format string of **`'%Y%m%d'`**

You can also pass a `DataFrame` of integer or string columns to assemble into a `Series` of `Timestamps`.

Convert to 2012, Jan 10th.

In [29]:
pd.to_datetime('10-01-2012', format="%d-%m-%Y")

Timestamp('2012-01-10 00:00:00')

Convert to 2012, Oct 1st

In [30]:
pd.to_datetime('10-01-2012', format="%m-%d-%Y")

Timestamp('2012-10-01 00:00:00')

In [26]:
df = pd.DataFrame({'year': [2015, 2016],
                   'month': [2, 3],
                   'day': [4, 5],
                   'hour': [2, 3]})

In [27]:
pd.to_datetime(df)

0   2015-02-04 02:00:00
1   2016-03-05 03:00:00
dtype: datetime64[ns]

`pd.to_datetime` looks for standard designations of the datetime component in the column names, including:
* required: `year, month, day`
* optional: `hour, minute, second, millisecond, microsecond, nanosecond`

## Epoch Timestamps

To read more about [UNIX Epoch Time](https://en.wikipedia.org/wiki/Unix_time)


It’s also possible to convert integer or float epoch times. The default unit for these is nanoseconds (since these are how `Timestamp` s are stored). However, often epochs are stored in another `unit` which can be specified:

In [31]:
pd.to_datetime([1349720105, 1349806505, 1349892905,
                1349979305, 1350065705], unit='s')

DatetimeIndex(['2012-10-08 18:15:05', '2012-10-09 18:15:05',
               '2012-10-10 18:15:05', '2012-10-11 18:15:05',
               '2012-10-12 18:15:05'],
              dtype='datetime64[ns]', freq=None)

In [32]:
pd.to_datetime([1349720105100, 1349720105200, 1349720105300,
                1349720105400, 1349720105500 ], unit='ms')

DatetimeIndex(['2012-10-08 18:15:05.100000', '2012-10-08 18:15:05.200000',
               '2012-10-08 18:15:05.300000', '2012-10-08 18:15:05.400000',
               '2012-10-08 18:15:05.500000'],
              dtype='datetime64[ns]', freq=None)

# Generating Ranges of Timestamps

If we need timestamps on a regular frequency, we can use the pandas functions `date_range` and `bdate_range` to create timestamp indexes.

#### day generation

In [33]:
index = pd.date_range('2000-1-1', periods=1000, freq='M')

In [34]:
index

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-30',
               '2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',
               '2000-09-30', '2000-10-31',
               ...
               '2082-07-31', '2082-08-31', '2082-09-30', '2082-10-31',
               '2082-11-30', '2082-12-31', '2083-01-31', '2083-02-28',
               '2083-03-31', '2083-04-30'],
              dtype='datetime64[ns]', length=1000, freq='M')

#### business day generation

In [35]:
index = pd.bdate_range('2012-1-1', periods=250)

In [36]:
index

DatetimeIndex(['2012-01-02', '2012-01-03', '2012-01-04', '2012-01-05',
               '2012-01-06', '2012-01-09', '2012-01-10', '2012-01-11',
               '2012-01-12', '2012-01-13',
               ...
               '2012-12-03', '2012-12-04', '2012-12-05', '2012-12-06',
               '2012-12-07', '2012-12-10', '2012-12-11', '2012-12-12',
               '2012-12-13', '2012-12-14'],
              dtype='datetime64[ns]', length=250, freq='B')

*****

# DatetimeIndex

One of the main uses for `DatetimeIndex` is as an index for pandas objects. The `DatetimeIndex` class contains many timeseries related optimizations:
* A large range of dates for various offsets are pre-computed and cached under the hood in order to make generating subsequent date ranges very fast (just have to grab a slice)
* Fast shifting using the `shift` and `tshift` method on pandas objects
* Unioning of overlapping DatetimeIndex objects with the same frequency is very fast (important for fast data alignment)
* Quick access to date fields via properties such as `year`, `month`, etc.
* Regularization functions like `snap` and very fast `asof` logic

DatetimeIndex objects has all the basic functionality of regular Index objects and a smorgasbord of advanced timeseries-specific methods for easy frequency processing.

`DatetimeIndex` can be used like a regular index and offers all of its intelligent functionality like selection, slicing, etc.

In [37]:
start = datetime.datetime(2011, 1, 1)
end = datetime.datetime(2012, 1, 1)

In [38]:
rng = pd.date_range(start, end, freq='BM')

In [39]:
ts = pd.Series(np.random.randn(len(rng)), index=rng)

In [40]:
ts.index

DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-29',
               '2011-05-31', '2011-06-30', '2011-07-29', '2011-08-31',
               '2011-09-30', '2011-10-31', '2011-11-30', '2011-12-30'],
              dtype='datetime64[ns]', freq='BM')

In [41]:
ts[:5].index

DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-29',
               '2011-05-31'],
              dtype='datetime64[ns]', freq='BM')

In [42]:
ts[::2].index

DatetimeIndex(['2011-01-31', '2011-03-31', '2011-05-31', '2011-07-29',
               '2011-09-30', '2011-11-30'],
              dtype='datetime64[ns]', freq='2BM')

## DatetimeIndex Partial String Indexing

In [43]:
ts

2011-01-31    0.049414
2011-02-28   -1.092563
2011-03-31    0.092017
2011-04-29    0.806781
2011-05-31   -0.340045
2011-06-30   -0.435513
2011-07-29   -0.672560
2011-08-31   -0.710764
2011-09-30    0.206956
2011-10-31   -0.399248
2011-11-30    0.635448
2011-12-30    1.672199
Freq: BM, dtype: float64

In [44]:
ts['1/31/2011']

0.04941356118978495

In [45]:
ts['10/31/2011':'12/31/2011']

2011-10-31   -0.399248
2011-11-30    0.635448
2011-12-30    1.672199
Freq: BM, dtype: float64

In [46]:
ts['2011']

2011-01-31    0.049414
2011-02-28   -1.092563
2011-03-31    0.092017
2011-04-29    0.806781
2011-05-31   -0.340045
2011-06-30   -0.435513
2011-07-29   -0.672560
2011-08-31   -0.710764
2011-09-30    0.206956
2011-10-31   -0.399248
2011-11-30    0.635448
2011-12-30    1.672199
Freq: BM, dtype: float64

In [47]:
ts['2011-6']

2011-06-30   -0.435513
Freq: BM, dtype: float64

This type of slicing will work on a DataFrame with a `DateTimeIndex` as well. Since the partial string selection is a form of label slicing, the endpoints **will be** included. This would include matching times on an included date. Here’s an example:

In [48]:
dft = pd.DataFrame(np.random.randn(100,1),
                   columns=['A'],
                   index=pd.date_range('20000101',periods=100,freq='M'))

In [49]:
dft.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 100 entries, 2000-01-31 to 2008-04-30
Freq: M
Data columns (total 1 columns):
A    100 non-null float64
dtypes: float64(1)
memory usage: 1.6 KB


In [50]:
dft['2005']

Unnamed: 0,A
2005-01-31,-1.165165
2005-02-28,0.664051
2005-03-31,-0.752272
2005-04-30,-0.18419
2005-05-31,-0.32183
2005-06-30,-0.611597
2005-07-31,0.36899
2005-08-31,1.066317
2005-09-30,-1.391748
2005-10-31,-0.028552


In [51]:
dft['2003-1':'2003-2']

Unnamed: 0,A
2003-01-31,-0.043946
2003-02-28,0.334384


## Partial String Indexing versus Datetime Indexing

Alternative to partial string index selection is indexing with datetime objects.
* String Index Selection: **`dft['2013-1']`**
* Datetime objection Selection: **`dft[datetime(2013, 1, 1)]`**

## Time/Date Components

There are several time/date properties that one can access from Timestamp or a collection of `timestamps` like a `DateTimeIndex`.

|Property |	Description |
 |- |- |
 |year	 |The year of the datetime |
 |month	 |The month of the datetime |
 |day	 |The days of the datetime |
 |hour	 |The hour of the datetime |
 |minute	 |The minutes of the datetime |
 |second	 |The seconds of the datetime |
 |microsecond	 |The microseconds of the datetime |
 |nanosecond |	The nanoseconds of the datetime |
 |date |	Returns datetime.date |
 |time |	Returns datetime.time |
 |dayofyear |	The ordinal day of year |
 |weekofyear |	The week ordinal of the year |
 |week |	The week ordinal of the year |
 |dayofweek |	The number of the day of the week with Monday=0, Sunday=6 |
 |weekday |	The number of the day of the week with Monday=0, Sunday=6 |
 |weekday_name	 |The name of the day in a week (ex: Friday) |
 |quarter |	Quarter of the date: Jan=Mar = 1, Apr-Jun = 2, etc. |
 |days_in_month	 |The number of days in the month of the datetime |
 |is_month_start |	Logical indicating if first day of month (defined by frequency) |
 |is_month_end |	Logical indicating if last day of month (defined by frequency) |
 |is_quarter_start |	Logical indicating if first day of quarter (defined by frequency) |
 |is_quarter_end |	Logical indicating if last day of quarter (defined by frequency) |
 |is_year_start |	Logical indicating if first day of year (defined by frequency) |
 |is_year_end |	Logical indicating if last day of year (defined by frequency) |

In [52]:
ts = pd.Timestamp('2010-01-31 22:30:23')

In [53]:
ts.year

2010

In [54]:
ts.minute

30

In [55]:
ts.second

23

In [56]:
ts.weekofyear

4

In [57]:
ts.dayofweek

6

# DateOffset objects

 | Class name | 	Description | 
  | - | - | 
 | DateOffset | 	Generic offset class, defaults to 1 calendar day | 
 | BDay	 | business day (weekday) | 
 | CDay | 	custom business day (experimental) | 
 | Week	 | one week, optionally anchored on a day of the week | 
 | WeekOfMonth	 | the x-th day of the y-th week of each month | 
 | LastWeekOfMonth	 | the x-th day of the last week of each month | 
 | MonthEnd	 | calendar month end | 
 | MonthBegin | 	calendar month begin | 
 | BMonthEnd | 	business month end | 
 | BMonthBegin | 	business month begin | 
 | CBMonthEnd | 	custom business month end | 
 | CBMonthBegin | 	custom business month begin | 
 | QuarterEnd	 | calendar quarter end | 
 | QuarterBegin | 	calendar quarter begin | 
 | BQuarterEnd | 	business quarter end | 
 | BQuarterBegin	 | business quarter begin | 
 | FY5253Quarter | 	retail (aka 52-53 week) quarter | 
 | YearEnd | 	calendar year end | 
 | YearBegin | 	calendar year begin | 
 | BYearEnd | 	business year end | 
 | BYearBegin | 	business year begin | 
 | FY5253	 | retail (aka 52-53 week) year | 
 | BusinessHour	 | business hour | 
 | CustomBusinessHour | 	custom business hour | 
 | Hour | 	one hour | 
 | Minute | 	one minute | 
 | Second | 	one second | 
 | Milli | 	one millisecond | 
 | Micro	 | one microsecond | 
 | Nano	 | one nanosecond | 

The basic `DateOffset` takes the same arguments as `dateutil.relativedelta`, which works like:

In [58]:
from dateutil.relativedelta import *

In [59]:
d = datetime.datetime(2008, 8, 18, 9, 0)
d

datetime.datetime(2008, 8, 18, 9, 0)

In [60]:
d + relativedelta(months=4, days=5)

datetime.datetime(2008, 12, 23, 9, 0)

In [61]:
from pandas.tseries.offsets import *

In [62]:
d + DateOffset(months=4, days=5)

Timestamp('2008-12-23 09:00:00')

The key features of a `DateOffset` object are:
* it can be added / subtracted to/from a datetime object to obtain a shifted date
* it can be multiplied by an integer (positive or negative) so that the increment will be applied multiple times
* it has `rollforward` and `rollback` methods for moving a date forward or backward to the next or previous “offset date”

Subclasses of `DateOffset` define the `apply` function which dictates custom date increment logic, such as adding business days:

In [63]:
d

datetime.datetime(2008, 8, 18, 9, 0)

In [64]:
d - 5 * BDay()

Timestamp('2008-08-11 09:00:00')

In [65]:
d + BMonthEnd()

Timestamp('2008-08-29 09:00:00')

In [66]:
d

datetime.datetime(2008, 8, 18, 9, 0)

In [67]:
offset = BMonthEnd()

In [68]:
offset.rollforward(d)

Timestamp('2008-08-29 09:00:00')

In [69]:
offset.rollback(d)

Timestamp('2008-07-31 09:00:00')

### Time Reset

To reset time, use `normalize=True` keyword when creating the offset instance. If `normalize=True`, result is normalized after the function is applied.

In [80]:
day = Day()

In [81]:
day.apply(pd.Timestamp('2014-01-01 09:30'))

Timestamp('2014-01-02 09:30:00')

In [82]:
day = Day(normalize=True)

In [83]:
day.apply(pd.Timestamp('2014-01-01 09:00'))

Timestamp('2014-01-02 00:00:00')

In [84]:
hour = Hour()

In [85]:
hour.apply(pd.Timestamp('2014-01-01 22:00'))

Timestamp('2014-01-01 23:00:00')

In [86]:
hour = Hour(normalize=True)

In [87]:
hour.apply(pd.Timestamp('2014-01-01 22:00'))

Timestamp('2014-01-01 00:00:00')

In [88]:
hour.apply(pd.Timestamp('2014-01-01 23:00'))

Timestamp('2014-01-02 00:00:00')

# Summary

[Reference Doc](http://pandas.pydata.org/pandas-docs/stable/timeseries.html)

***