### Working with Time Series

Pandas contains a wide range of tools for working with date, time features and time indexed data. 

Time stamps references particular moments in time( let us say June 23rd, 2011, at 4:00p.m)

Time intervals and periods reference a length of time between a particular begining and end point. example the year 2011. While periods references to a specific case of time intervals in which each case of interval is of uniform length and does not overlap. example, 24 hour-long period constituting days.

Time deltas or durations references an exact length of time(e.g a duration of 22.56 seconds)

### Dates and Times in Python



In [2]:
# we can use the built in native python modules for datetime applications

from datetime import datetime

datetime(year=2011, month=6,day= 5)

datetime.datetime(2011, 6, 5, 0, 0)

In [3]:
# using dateutil, we can parse the variety dates from string formats

from dateutil import parser

date = parser.parse("4th of July, 2011")
date

datetime.datetime(2011, 7, 4, 0, 0)

In [4]:
# we can as well print the day of the week

date.strftime('%A')

'Monday'

Just as a lists of python numerical variable are suboptimal compared to Numpy-style typed numerical arrays, list of python datetime objects are suboptimal as well compared to typed arrays of encoded dates.


### Typed arrays of times: Numpy's datetime64

In [12]:
# the datetime64 dtype encodes dates as 64-bit integers and thus allows arrays
# of dates to be represented very compactly. The datetime64 requires a very 
# specific input format

import numpy as np
date = np.array('2012-03-04', dtype=np.datetime64)
date

array('2012-03-04', dtype='datetime64[D]')

In [13]:
# we can now easily do vectorized operations on this datetime

date + np.arange(12)

array(['2012-03-04', '2012-03-05', '2012-03-06', '2012-03-07',
       '2012-03-08', '2012-03-09', '2012-03-10', '2012-03-11',
       '2012-03-12', '2012-03-13', '2012-03-14', '2012-03-15'],
      dtype='datetime64[D]')

Because of the uniform type in Numpy datetime64 arrays, this type of operation can be done more quickly than when we are using pythons built in datetime objects. Also one other important feature of this numpydate time array is its features om handling complete time resolution and maximum time span.

In [16]:
np.datetime64('2015-07-04')

numpy.datetime64('2015-07-04')

In [18]:
# a minute based datetime

np.datetime64('2012-03-12 12:03') # in this case, the time zon will be set automatically with respect to the local time on the computer execution code

numpy.datetime64('2012-03-12T12:03')

In [24]:
# let us work with nanoseconds


np.datetime64('2012-03-12 12:03:32.34', 'ns')


numpy.datetime64('2012-03-12T12:03:32.340000000')

### Dates and Times in Pandas

Pandas builds upon all the aforementioned tools to provide a timestamp object, which combines with ease, the use of datetime and dateutil with effecient storage and vectorized interface of numpy.datetime64.

From this, pandas can construct a DatetimeIndex that can be used to index data in a series or a Dataframe.

In [42]:
# we can use pandas tools parse a flexile formatted string date, and use
# format codes to output the day of the week
import pandas as pd

date = pd.to_datetime("4th of July, 2012")
date

Timestamp('2012-07-04 00:00:00')

In [46]:
# formatting the day of the week

date.strftime('%A')


'Wednesday'

In [48]:
# we can additionally add numpy vectorized operations directly here

date + pd.to_timedelta(np.arange(12), 'D')

DatetimeIndex(['2012-07-04', '2012-07-05', '2012-07-06', '2012-07-07',
               '2012-07-08', '2012-07-09', '2012-07-10', '2012-07-11',
               '2012-07-12', '2012-07-13', '2012-07-14', '2012-07-15'],
              dtype='datetime64[ns]', freq=None)

### Pandas Time Series

In [53]:
# Indexing by time comes really handy when you start indexing the data
# by timestamps.

index = pd.DatetimeIndex(['2011-03-23', '2011-04-12', '2012-03-21',
                         '2012-04-13'])
data = pd.Series([0,1,2,3], index=index)
data

2011-03-23    0
2011-04-12    1
2012-03-21    2
2012-04-13    3
dtype: int64

In [62]:
# working with series indexing pattern

data.iloc[0:3]

2011-03-23    0
2011-04-12    1
2012-03-21    2
dtype: int64

In [63]:
data['2011-03-23': '2012-03-21']

2011-03-23    0
2011-04-12    1
2012-03-21    2
dtype: int64

In [64]:
# we can still pass a single year to it
data['2012']

2012-03-21    2
2012-04-13    3
dtype: int64

In [65]:
data.loc['2011']

2011-03-23    0
2011-04-12    1
dtype: int64

### Pandas Time Series Data Structures

Pandas provides the TImeStamp type which is essentially a replacement for pythons native datetime and it is more effecient because it is built on numpy.datetim64 datatype. The asscociated index structure is DatetimeIndex

Pandas provides time periods, which is a period type that specifies a fixed frequency interval based on numpy.datetime64. The asscociated index structure is PeriodIndex.

for time deltas or durations, pandas provides a Timedelta type. it is more efficeint repalcement for pythons native datetime.timedelta type and it is based on numpy.timedelta64. The associated index strcutre is TimedeltaIndex.

The most used of these date/time objects are the Timestamp and DatetimeIndex objects. And it is more common to use pd.to_datetime() function for it which can parse variety of formats. Passing a single date to pd.to_datetime() yeilds to a Timestamp, while passing a series of dates leds to a DatetimeIndex

In [73]:
# passing a single date to give out a timestamp datatype
single_date = pd.to_datetime(datetime(2011,3,12))
single_date


Timestamp('2011-03-12 00:00:00')

In [75]:
# passing variety of dates to give out DatetimeIndex

date_variety = pd.to_datetime(['5th of july 2013', '4th of may 2011'])
date_variety

DatetimeIndex(['2013-07-05', '2011-05-04'], dtype='datetime64[ns]', freq=None)

In [67]:
dates = pd.to_datetime([datetime(2011,3,23), '4th of june 2011',
                       '5th of july 2013', '4th of may 2011', '20130405'])
dates

DatetimeIndex(['2011-03-23', '2011-06-04', '2013-07-05', '2011-05-04',
               '2013-04-05'],
              dtype='datetime64[ns]', freq=None)

In [83]:
# converting DatetimeIndex to PeriodIndex with the addition of the freq code
dates.to_period('D')

PeriodIndex(['2011-03-23', '2011-06-04', '2013-07-05', '2011-05-04',
             '2013-04-05'],
            dtype='period[D]', freq='D')

In [84]:
# a TimedeltaIndex is created when one date is subtracted from another

dates - dates[0]

TimedeltaIndex(['0 days', '73 days', '835 days', '42 days', '744 days'], dtype='timedelta64[ns]', freq=None)

### Regular Sequences

What of if you wanted to create a range of days? pandas makes this more easy for us by using pd.date_range() function for timestamps, pd.period_range() function for periods and pd.timedelta_range() function for time deltas.

just like np.arange() and range(), which has a startpoint,, endpoint and as well sequence to follow, pd.date_range() accepts a start date, end date and an optional frequency code to create a sequence of dates.

Where as by default, the frequency is one day.



In [97]:
# the date_range() pandas function

pd.date_range(start='2011-04-19', end='2011-04-30', freq='2D') # frequency of 2 days

DatetimeIndex(['2011-04-19', '2011-04-21', '2011-04-23', '2011-04-25',
               '2011-04-27', '2011-04-29'],
              dtype='datetime64[ns]', freq='2D')

In [98]:
# teh date range can be specified with a start and period

pd.date_range('2011-9-21', periods=8)

DatetimeIndex(['2011-09-21', '2011-09-22', '2011-09-23', '2011-09-24',
               '2011-09-25', '2011-09-26', '2011-09-27', '2011-09-28'],
              dtype='datetime64[ns]', freq='D')

In [108]:
pd.date_range(start='2011-04-19', periods=8, freq='H') # frequency of 2 hours 

DatetimeIndex(['2011-04-19 00:00:00', '2011-04-19 01:00:00',
               '2011-04-19 02:00:00', '2011-04-19 03:00:00',
               '2011-04-19 04:00:00', '2011-04-19 05:00:00',
               '2011-04-19 06:00:00', '2011-04-19 07:00:00'],
              dtype='datetime64[ns]', freq='H')

In [106]:
# hence to create a sequence of period or time elta values, we can use 

pd.period_range('2011-07', periods=6, freq='M') # iincreased periods by months

PeriodIndex(['2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12'], dtype='period[M]', freq='M')

In [111]:
# increase by hour using timedelta

pd.timedelta_range(0, periods=10, freq='H')

TimedeltaIndex(['00:00:00', '01:00:00', '02:00:00', '03:00:00', '04:00:00',
                '05:00:00', '06:00:00', '07:00:00', '08:00:00', '09:00:00'],
               dtype='timedelta64[ns]', freq='H')

In [None]:
### Frequencies and Offsets