# Chapter 09 -- Panda Time Series and Date Handling--DRAFT

## Topics Covered:

<a href="http://nbviewer.jupyter.org/github/RandyBetancourt/PythonForSASUsers/blob/master/Chapter%2009%20--%20Panda%20Time%20Series%20and%20Date%20Handling#Definitions.ipynb"> Definitions </a>

<a href="http://nbviewer.jupyter.org/github/RandyBetancourt/PythonForSASUsers/blob/master/Chapter%2009%20--%20Panda%20Time%20Series%20and%20Date%20Handling"> Creating and manipulating a fixed-frequency of datetime spans </a>

Convert time series from one frequency to another

Increment 'non-standard' datetimes intervals (e.g. business week)

Chapter 8, <a href="http://nbviewer.jupyter.org/github/RandyBetancourt/PythonForSASUsers/blob/master/Chapter%2008%20--%20Python%20Date%2C%20Time%2C%20and%20%20Timedelta%20Objects.ipynb"> Understanding Date Time and TimeDelta objects </a> provided a short introduction to Python's built-in datetime capabilities.  In this chapter we illustrate pandas time series and date handling.  



In [1]:
from datetime import date, time, datetime, timedelta
import numpy as np
import pandas as pd
from pandas import Series, DataFrame, Index

## Definitions

To begin, we need to distinguish between object types used to represent datetimes.  While a bit pandantic,, it helps to clarify the behaviors of these objects.  panda Time Series utilize NumPy datetime64 and timedelta64 dtypes.

Recall, you can always return an object's type with the type method:

    type()
    
Examples work better than prose.  Consider the assignments in the cell below.

In [2]:
a_date = date(2016, 10, 24)
a_datetime = datetime(2016, 10, 24)

In [3]:
print(a_date)
print(a_datetime)

2016-10-24
2016-10-24 00:00:00


In [4]:
a_date == a_datetime

False

Surprised?  I was at first, but it does make logical sense.  After all, the cell below illustrates they are are from two different classes from the datetime module.

In [5]:
print(type(a_date))
print(type(a_datetime))

<class 'datetime.date'>
<class 'datetime.datetime'>


And in case you were wondering about SAS:

````
    56       data _null_;
    57       
    58       a_date = '24Oct2016'd;
    59       a_datetime = '24Oct2016:00:00:00'dt;
    60       
    61       if a_date = a_datetime then
    62          put 'True';
    63       else
    64          put 'False';

    False
````

Python also distinquishes between datetime and datestamps.  Again, examples work better than prose.  The path.getatime() method returns the access time for a file.

In [6]:
file = "lines.html"
from os import path

a_time = path.getatime(file)

af_time = datetime.fromtimestamp(a_time)

In [7]:
print('value returned:', a_time)
print('value returned:', af_time)

value returned: 1477433492.8224854
value returned: 2016-10-25 16:11:32.822485


In [8]:
print('Type for a_time is', type(a_time), 'and Type for af_time is', type(af_time))

Type for a_time is <class 'float'> and Type for af_time is <class 'datetime.datetime'>


A timestamp is time value that represents a count of the number of seconds from the start of an epoch.  This is similiar to SAS datetime values that represent an off-set from an epoch beginning at midnight.   

In [37]:
pdt = pd.Timestamp('2016-10-24')

In [38]:
type(pdt)

pandas.tslib.Timestamp

## Creating and manipulating a fixed-frequency of date and time spans

The pd.date_range() method generates a DateTime Index which is applied to a panda Series or DataFrame to provide datetime interval indexing.  We will see examples of its construction methods.  And later we will utilize indexers taking advange of the Date TimeIndex.  

In [9]:
rng = pd.date_range('1/1/2016', periods=90, freq='D')

Print the first 10 dates in the DateTimeIndex

In [10]:
rng[:10]

DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
               '2016-01-05', '2016-01-06', '2016-01-07', '2016-01-08',
               '2016-01-09', '2016-01-10'],
              dtype='datetime64[ns]', freq='D')

In [11]:
ts = pd.Series(np.random.randn(len(rng)), index=rng)

In [12]:
type(ts)

pandas.core.series.Series

Time-stamped data for pandas represent a point in time.

Period being inferred from the datetime string.

In [14]:
pd.Period('2016-01-01')

Period('2016-01-01', 'D')

Get type

In [15]:
type(pd.Period('2016-01-01'))

pandas._period.Period

Period being set explicitly

In [16]:
pd.Period('2016-05', freq='D')

Period('2016-05-01', 'D')

Timestamp and Period can be an index.  Coerced into PeriodIndex and DateTimeIndex

In [17]:
dates = [pd.Timestamp('2012-05-01'), pd.Timestamp('2012-05-02'), pd.Timestamp('2012-05-03')]

In [18]:
dates

[Timestamp('2012-05-01 00:00:00'),
 Timestamp('2012-05-02 00:00:00'),
 Timestamp('2012-05-03 00:00:00')]

In [19]:
 ts = pd.Series(np.random.randn(3), dates)

In [20]:
ts

2012-05-01    1.095364
2012-05-02    1.274572
2012-05-03   -0.116212
dtype: float64

In [21]:
type(ts.index)

pandas.tseries.index.DatetimeIndex

In [22]:
ts.index

DatetimeIndex(['2012-05-01', '2012-05-02', '2012-05-03'], dtype='datetime64[ns]', freq=None)

Convert date string to datetime

In [23]:
pd.to_datetime('2016/11/30')

Timestamp('2016-11-30 00:00:00')

In [24]:
type(pd.to_datetime('2016/11/30'))

pandas.tslib.Timestamp

Convert date string to Timestamp

In [25]:
pd.Timestamp('2016/11/30')

Timestamp('2016-11-30 00:00:00')

In [26]:
type(pd.Timestamp('2016/11/30'))

pandas.tslib.Timestamp

In [None]:
You can assemble a DataFrame by using strings and integers for columns.

In [45]:
df = pd.DataFrame({'year': [2014, 2015, 2016],
                   'month': [1, 2, 3],
                   'day': [1,2,3,]})
df1 = pd.to_datetime(df)

In [48]:
print(type(df))
print(type(df1))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>


In [28]:
from datetime import datetime, date, time
start = datetime(2016, 1, 1)
end = datetime(2016, 12, 31)
rng = pd.date_range(start,end)

In [29]:
rng

DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
               '2016-01-05', '2016-01-06', '2016-01-07', '2016-01-08',
               '2016-01-09', '2016-01-10',
               ...
               '2016-12-22', '2016-12-23', '2016-12-24', '2016-12-25',
               '2016-12-26', '2016-12-27', '2016-12-28', '2016-12-29',
               '2016-12-30', '2016-12-31'],
              dtype='datetime64[ns]', length=366, freq='D')

In [30]:
start = datetime(2016, 1, 1)
end = datetime(2016, 12, 31)
b_rng = pd.bdate_range(start,end)

In [31]:
b_rng

DatetimeIndex(['2016-01-01', '2016-01-04', '2016-01-05', '2016-01-06',
               '2016-01-07', '2016-01-08', '2016-01-11', '2016-01-12',
               '2016-01-13', '2016-01-14',
               ...
               '2016-12-19', '2016-12-20', '2016-12-21', '2016-12-22',
               '2016-12-23', '2016-12-26', '2016-12-27', '2016-12-28',
               '2016-12-29', '2016-12-30'],
              dtype='datetime64[ns]', length=261, freq='B')

In [32]:
rng = pd.date_range(start, end, freq='BM')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts.index

DatetimeIndex(['2016-01-29', '2016-02-29', '2016-03-31', '2016-04-29',
               '2016-05-31', '2016-06-30', '2016-07-29', '2016-08-31',
               '2016-09-30', '2016-10-31', '2016-11-30', '2016-12-30'],
              dtype='datetime64[ns]', freq='BM')

Returns the first 5

In [33]:
ts[:5].index

DatetimeIndex(['2016-01-29', '2016-02-29', '2016-03-31', '2016-04-29',
               '2016-05-31'],
              dtype='datetime64[ns]', freq='BM')

 Returns the nth, i.e. 2 = every other one

In [34]:
ts[::2]

2016-01-29   -0.037426
2016-03-31    0.445821
2016-05-31   -0.823287
2016-07-29    1.406608
2016-09-30   -1.649926
2016-11-30    0.897852
Freq: 2BM, dtype: float64

In [35]:
ts[::6]

2016-01-29   -0.037426
2016-07-29    1.406608
Freq: 6BM, dtype: float64

## Navigation

<a href="http://nbviewer.jupyter.org/github/RandyBetancourt/PythonForSASUsers/tree/master/"> Return to Chapter List </a>    