In today's lecture, where we'll be looking at the time series and date functionally in pandas. Manipulating
dates and time is quite flexible in Pandas and thus allows us to conduct more analysis such as time series
analysis, which we will talk about soon. Actually, pandas was originally created by Wed McKinney to handle date and time data when he worked as a consultant for hedge funds. So it's quite robust in this matter.

In [1]:
# Let's bring in pandas and numpy as usual
import pandas as pd
import numpy as np

### Timestamp

In [2]:
# Pandas has four main time related classes. Timestamp, DatetimeIndex, Period, and PeriodIndex. First, let's
# look at Timestamp. It represents a single timestamp and associates values with points in time.

# For example, let's create a timestamp using a string 9/1/2019 10:05AM, and here we have our timestamp.
# Timestamp is interchangeable with Python's datetime in most cases.
pd.Timestamp('9/1/2019 10:05AM')

Timestamp('2019-09-01 10:05:00')

In [3]:
# We can also create a timestamp by passing multiple parameters such as year, month, date, hour,
# minute, separately
pd.Timestamp(2019, 12, 20, 0, 0) # So pd.Timestamp and we just pass these in order.

Timestamp('2019-12-20 00:00:00')

In [4]:
# Timestamp also has some useful attributes, such as isoweekday(), which shows the weekday of the timestamp
# note that 1 represents Monday and 7 represents Sunday
pd.Timestamp(2019, 12, 20, 0, 0).isoweekday()
# And this is actually quite commonly done to just get a simple numeric value.

5

In [5]:
# You can find extract the specific year, month, day, hour, minute, second from a timestamp as well
pd.Timestamp(2019, 12, 20, 5, 2,23).second # see how many seconds there were, who are the second parameter space rather for that

23

### Period

In [6]:
# Suppose we weren't interested in a specific point in time and instead wanted a span of time. This is where
# the Period class comes into play. Period represents a single time span, such as a specific day or month.

# Here we are creating a period that is January 2016,
pd.Period('1/2016')

Period('2016-01', 'M')

In [7]:
# You'll notice when we print that out that the granularity of the period is M for month, since that was the
# finest grained piece we provided. Here's an example of a period that is March 5th, 2016.
pd.Period('3/5/2016')
# So we just do pd.Period, pass in our date, and we'll see that it's got the granularity of day which is 'D'.

Period('2016-03-05', 'D')

In [8]:
# Period objects represent the full timespan that you specify. Arithmetic on period is very easy and
# intuitive, for instance, if we want to find out 5 months after January 2016, we simply plus 5
pd.Period('1/2016') + 5

Period('2016-06', 'M')

In [9]:
# From the result, you can see we get June 2016. If we want to find out two days before March 5th 2016, we
# simply subtract 2
pd.Period('3/5/2016') - 2

Period('2016-03-03', 'D')

In [10]:
# The key here is that the period object encapsulates the granularity for arithmetic

### DatetimeIndex and PeriodIndex

In [11]:
# The index of a timestamp is DatetimeIndex. Let's look at a quick example because this is really quite handy when you start to 
# deal with time series data. First, let's create our example series t1, we'll use the Timestamp of September 1st, 2nd and 3rd 
# of 2016. When we look at the series, each Timestamp is the index and has a value associated with it, in this case, a, b and c.

t1 = pd.Series(list('abc'), [pd.Timestamp('2016-09-01'), pd.Timestamp('2016-09-02'), 
                             pd.Timestamp('2016-09-03')])
# So here we do pd.Series, we're going to give a list, abc, so these are our data values. And then our indexes and array next 
# to it. And we're going to do create three Timestamps. One is going to be the first day of September then the second day of 
# September then the third day. Let's take a look at what this creates as a series.
t1

2016-09-01    a
2016-09-02    b
2016-09-03    c
dtype: object

In [12]:
# Looking at the type of our series index, we see that it's DatetimeIndex.
type(t1.index)
# Remember the d-type above is referring to the data values in the series not the index. So here we can see that the index is a 
# DatetimeIndex.

pandas.core.indexes.datetimes.DatetimeIndex

In [13]:
# Similarly, we can create a period-based index as well. Which I think is really quite nice.
t2 = pd.Series(list('def'), [pd.Period('2016-09'), pd.Period('2016-10'), 
                             pd.Period('2016-11')])
# So we'll create some new series, def, and we'll just create our period objects in that list for the index.
t2

2016-09    d
2016-10    e
2016-11    f
Freq: M, dtype: object

In [14]:
# Looking at the type of the ts2.index, we can see that it's PeriodIndex. 
type(t2.index)

pandas.core.indexes.period.PeriodIndex

### Converting to Datetime

In [15]:
# Now, let's look into how to convert to Datetime. Suppose we have a list of dates as strings and we want to
# create a new dataframe

# I'm going to try a bunch of different date formats
d1 = ['2 June 2013', 'Aug 29, 2014', '2015-06-26', '7/12/16']
# So if you take a human entered data, this is what you're going to get. Or if you take data that comes from multiple different
# systems. Let's say you're joining dataframes and you're bringing in date fields from multiple places.

# And now let's just create some random data here
ts3 = pd.DataFrame(np.random.randint(10, 100, (4,2)), index=d1, 
                   columns=list('ab'))
# So we'll just create a new dataframe, just with some random integers. And then we're going to set our index to d1 and some 
# columns. And let's take a look at that.
ts3

Unnamed: 0,a,b
2 June 2013,85,16
"Aug 29, 2014",21,50
2015-06-26,29,86
7/12/16,47,50


In [16]:
# Using pandas to_datetime, pandas will try to convert these to Datetime and put them in a standard format, because this is ugly
# and horrible to deal with

ts3.index = pd.to_datetime(ts3.index)
# So pandas has these nice built-in to_datetime to try and figure out the format for us. So let's take a look with ts3.
ts3

Unnamed: 0,a,b
2013-06-02,85,16
2014-08-29,21,50
2015-06-26,29,86
2016-07-12,47,50


In [17]:
# to_datetime also() has options to change the date parse order. For example, we 
# can pass in the argument dayfirst = True to parse the date in European date, if we think that our data is going to be in that
# format.

pd.to_datetime('4.7.12', dayfirst=True)
# So here I'll take '4.7.12'. You see there's some ambiguity here, what's month and what's day. And well, even what's year, I 
# guess, it could be an ambiguity here. So we could just say dayfirst=True.

Timestamp('2012-07-04 00:00:00')

### Timedelta

In [18]:
# So Timedeltas are differences in times. This is not the same as a a period, but it may be it feels like it at first, but 
# it's conceptually very similar. For instance, if we want to take the difference between September 3rd and  September 1st, we 
# get a Timedelta of two days.
pd.Timestamp('9/3/2016')-pd.Timestamp('9/1/2016')
# So we can actually just take a Timestamp- another Timestamp. And the resulting object is a Timedelta and here you see it says
# that it's 2 days. It's printed out a nice human parseable thing. But the object itself is capturing that it's these 2 days.

Timedelta('2 days 00:00:00')

In [19]:
# So we can also do something like find what the date and time is for 12 days and three hours past September 2nd,
# at 8:10 AM. So we don't have to have these round number.
pd.Timestamp('9/2/2016 8:10AM') + pd.Timedelta('12D 3H')
# So here we've taken a Timestamp which is a very precise. It's 8:10AM. And here we just want to add two at a Timedelta of 12 
# days and 3 hours. And we can see what that time is.

Timestamp('2016-09-14 11:10:00')

### Offset

In [20]:
# Offset is similar to timedelta, but it follows specific calendar duration rules. Offset allows flexibility
# in terms of types of time intervals. Besides hour, day, week, month, etc it also has business day, end of
# month, semi month begin etc. So very non-traditional time series, but things that we would use in business all the time.

# Let's create a timestamp, and see what day is that
pd.Timestamp('9/4/2016').weekday()
# So we'll create some new Timestamp and we'll see which weekday it is and this is now the sixth, so this is from the ISO.

6

In [21]:
# Now we can now add the timestamp with a week ahead. So we can take that Timestamp and then we can say we want to add an 
# offset of one week.
pd.Timestamp('9/4/2016') + pd.offsets.Week()

Timestamp('2016-09-11 00:00:00')

In [22]:
# Now let's try to do this to month end, then we would have the last day of Septemer
pd.Timestamp('9/4/2016') + pd.offsets.MonthEnd()
# So here we can take a Timestamp and then we want to add some offset to get to the month end. And so now we actually have the 
# month end date.

Timestamp('2016-09-30 00:00:00')

### Working with Dates in a Dataframe

In [23]:
# Next, let's look at a few tricks for working with dates in a DataFrame. Suppose we want to look at nine
# measurements, taken bi-weekly (i.e. every two weeks), every Sunday, starting in October 2016. Using date_range, 
# we can create this DatetimeIndex. In data_range, we have to either specify the start or end date. If it is not explicitly
# specified, by default, the date is considered the start date. Then we have to specify number of periods, and
# a frequency. Here, we set it to "2W-SUN", which means biweekly on Sunday

# So like regex, there's sort of a mini language to describe these periods. And the pandas documentation, if you get really 
# dealing with time series data, we're not going to in this course. But if you end up dealing a lot with time series data, the 
# pandas documentation is phenomenal.

dates = pd.date_range('10-01-2016', periods=9, freq='2W-SUN')
# So here we'll do dates = pd.date_range. We set our start, we say we want 9 periods. And this is our frequency, and then look 
# at the DatetimeIndex that comes out of that. And we see that it's well set up for it.
dates

DatetimeIndex(['2016-10-02', '2016-10-16', '2016-10-30', '2016-11-13',
               '2016-11-27', '2016-12-11', '2016-12-25', '2017-01-08',
               '2017-01-22'],
              dtype='datetime64[ns]', freq='2W-SUN')

In [24]:
# There are many other frequencies that you can specify. For example, one common one is the business day.
pd.date_range('10-01-2016', periods=9, freq='B')

DatetimeIndex(['2016-10-03', '2016-10-04', '2016-10-05', '2016-10-06',
               '2016-10-07', '2016-10-10', '2016-10-11', '2016-10-12',
               '2016-10-13'],
              dtype='datetime64[ns]', freq='B')

In [25]:
# Another common one is quarterly but setting the start of the quarter in a specific month like June.
pd.date_range('04-01-2016', periods=12, freq='QS-JUN')

DatetimeIndex(['2016-06-01', '2016-09-01', '2016-12-01', '2017-03-01',
               '2017-06-01', '2017-09-01', '2017-12-01', '2018-03-01',
               '2018-06-01', '2018-09-01', '2018-12-01', '2019-03-01'],
              dtype='datetime64[ns]', freq='QS-JUN')

In [26]:
# Now, let's go back to our weekly on Sunday example and create a DataFrame using these dates, and some random
# data, and see what we can do with it.

dates = pd.date_range('10-01-2016', periods=9, freq='2W-SUN') # the frequency equal to two weeks starting on Sunday
df = pd.DataFrame({'Count 1': 100 + np.random.randint(-5, 10, 9).cumsum(),
                  'Count 2': 120 + np.random.randint(-5, 10, 9)}, index=dates)
# And now we're just going to fill in to that some data. So I'm going to create a column called Count 1. We'll make these some 
# random numbers and a cumulative sum of these. And then Count 2 as well, and set the index to dates. Okay, so there's our 
# dummy data.
df

Unnamed: 0,Count 1,Count 2
2016-10-02,98,116
2016-10-16,99,115
2016-10-30,108,120
2016-11-13,106,125
2016-11-27,111,119
2016-12-11,116,128
2016-12-25,121,124
2017-01-08,129,115
2017-01-22,127,124


In [27]:
# First, we can check what day of the week a specific date is. For example, here we can see that all the dates
# in our index are on a Sunday. Which matches the frequency that we set
df.index.weekday_name
# So if we say df.index.weekday_name, we see, they all come back on a Sunday. If we had more variable dates as our index, we 
# would see the variation here.

Index(['Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday',
       'Sunday', 'Sunday'],
      dtype='object')

In [28]:
# We can also use diff() to find the difference between each date's value. And this is just on the data frame itself, df.diff().
df.diff()

Unnamed: 0,Count 1,Count 2
2016-10-02,,
2016-10-16,1.0,-1.0
2016-10-30,9.0,5.0
2016-11-13,-2.0,5.0
2016-11-27,5.0,-6.0
2016-12-11,5.0,9.0
2016-12-25,5.0,-4.0
2017-01-08,8.0,-9.0
2017-01-22,-2.0,9.0


In [29]:
# Suppose we want to know what the mean count is for each month in our DataFrame. We can do this using
# resample. Converting from a higher frequency from a lower frequency is called downsampling (we'll talk about this in a moment
# in a little bit more detail in another lecture)
df.resample('M').mean()
# So df.resample to monthly, and let's look at the the mean values. And so here we've collapsed the dates on a monthly cadence.
# So we've got this kind of date perspective on how to do that. We haven't had to write our own group by and figure it all out 
# ourselves. We're able to use the resampling built-in.

Unnamed: 0,Count 1,Count 2
2016-10-31,101.666667,117.0
2016-11-30,108.5,122.0
2016-12-31,118.5,126.0
2017-01-31,128.0,119.5


In [30]:
# Now let's talk about datetime indexing and slicing, which is a wonderful feature of the pandas DataFrame.
# For instance, we can use partial string indexing, and the key here is that it's actually based on string indexing, to find 
# values from a particular year
df['2017']
# So if we're just interested in 2017, we can say df['2017'] and that returns to us all the values that were from that year.

Unnamed: 0,Count 1,Count 2
2017-01-08,129,115
2017-01-22,127,124


In [31]:
# Or we can do it from a particular month
df['2016-12']
# So remember, these are all string values that were matching on.

Unnamed: 0,Count 1,Count 2
2016-12-11,116,128
2016-12-25,121,124


In [32]:
# Or we can even slice on a range of dates For example, here we only want the values from December 2016
# onwards.
df['2016-12':] # So we can just say we want df['2016-12':] onwards. 

Unnamed: 0,Count 1,Count 2
2016-12-11,116,128
2016-12-25,121,124
2017-01-08,129,115
2017-01-22,127,124


In [33]:
df['2016']

Unnamed: 0,Count 1,Count 2
2016-10-02,98,116
2016-10-16,99,115
2016-10-30,108,120
2016-11-13,106,125
2016-11-27,111,119
2016-12-11,116,128
2016-12-25,121,124


Okay, so that's how the date functionality in pandas works. It's kind of a whirlwind tour we've done here. We're going to talk about resampling in another lecture. And that's maybe going to be a little bit more clear how we use this. Again, if you have to deal with a lot of datetime data, this lecture will become important to come back to to refresh and understand. And also the functionality of pandas with respect to datetime is really quite phenomenal and the documentation describes it in more detail.