# Lecture 9
# Section 5: Working with Time Series and Dates

In this section we will be looking at the time series and date functionally in Pandas.

Pandas has four main time related classes. 

1. `Timestamp`, 
2. `DatetimeIndex`, 
3. `Period`,
4. `PeriodIndex`. 

First, let's look at `Timestamp`. 

## Timestamp

`Timestamp` represents a single timestamp and associates values with points in time.

For example, let's create a timestamp using a string `2020-05-04` for 4th May 2020 and `T14:15` for 14:15, and here we have our timestamp. Timestamp is interchangeable with Python's datetime in most cases.

In [None]:
import pandas as pd
import numpy as np
pd.Timestamp('2020-05-04T14:15')

Or let's create a timestamp explicitly with a definition for `year`, `month`, `day` and the time in `hour` and `minute`

In [None]:
pd.Timestamp(year=2020, month=5, day=4, hour=14, minute=15)

## Period

Suppose we weren't interested in a specific point in time, and instead wanted a span of time. This is where Period comes into play.


Period represents a single time span, such as a specific day or month.

Here we are creating a period that is January 2020


In [None]:
pd.Period('1/2020')

And here's an example of a period that is May 4th, 2020. The index of a timestamp is DatetimeIndex. Let's look at a quick example.


In [None]:
pd.Period('5/4/2020') # Note: the notation month/day/year is common in the US

Please note: The date and time notation in the United States is different from that used in almost all other countries. Unfortunately, this notation is frequently used in software code, as seen above. To avoid confusion, try using the international **SI** date notation YYYY-MM-DD or use an explicit notation.

The index of a timestamp is `DatetimeIndex`. Let's look at a quick example.

## DatetimeIndex

First, let's create our example series `t1`, we will use the `Timestamp` of March 1st, 2nd and 3rd of 2020

In [None]:
t1 = pd.Series(list('abc'), [pd.Timestamp('2020-03-01'), pd.Timestamp('2020-03-02'), pd.Timestamp('2020-03-03')])
t1

When we look at the series, each `Timestamp` is the index and has a value associated with it, in this case, `a`, `b` and `c`.

Looking at the type of our series index, we see that it's DatetimeIndex.

In [None]:
type(t1.index)

Similarly, the index of period is `PeriodIndex`.

## PeriodIndex

Let's create another example series `t2`. This time, we'll use the values `d`, `e`, and `f` and match them with the period January, February and March 2020.

In [None]:
t2 = pd.Series(list('def'), [pd.Period('2020-01'), pd.Period('2020-02'), pd.Period('2020-03')])
t2

Looking at the type of the `ts2.index`, we can see that it's `PeriodIndex`.

In [None]:
type(t2.index)

Now, let's look into how to convert to Datetime.

## Converting to Datetime

Suppose we have a list of dates as strings.

We create a DataFrame with these data as index  and some randomly generated data.

In [None]:
d1 = ['2 January 2020', 'Aug 29, 2019', '2020-03-26', '7/12/2019']
ts3 = pd.DataFrame(np.random.randint(10, 100, (4,2)), index=d1, columns=list('ab'))
ts3

Looking at the index we can see that it’s pretty messy and the dates are all in different formats.

Using Pandas `to_datetime`, it will try to convert these to `Datetime` and put them in a standard format.

In [None]:
ts3.index = pd.to_datetime(ts3.index)
ts3

`to_datetime` also has options to change the date parse order. For example, we can pass in the argument `dayfirst = True` to parse the date in **European** date format.

In [None]:
pd.to_datetime('4.7.12', dayfirst=True)

## Timedeltas

Timedeltas are differences in times. We can see that when we take the difference between 3rd March and 1st March, we get a Timedelta of two days.

In [None]:
pd.Timestamp('2020-03-03')-pd.Timestamp('2020-03-01')

We can also do something like find what the date and time is for 12 days and three hours past April 2nd, at  AM.

In [None]:
pd.Timestamp('2020-04-02 18:10') + pd.Timedelta('12D 3H')

## Working with Dates in a Dataframe

Next we will look at some tricks for working with data in a DataFrame.

Suppose we want to look at nine measurements, taken bi-weekly, every Sunday, starting in January 2020. Using date_range, we can create this `DatetimeIndex`.

In [None]:
dates = pd.date_range('2020-01-01', periods=9, freq='2W-SUN')
dates

Now let's create a DataFrame with this data and some random data and see what we can do with it.

In [None]:
df = pd.DataFrame({'Count 1': 100 + np.random.randint(-5, 10, 9).cumsum(),
                  'Count 2': 120 + np.random.randint(-5, 10, 9)}, index=dates)
df

First, we can check what day of the week a specific date is. For example, here we can see that all the dates in our index are on a Sunday.

In [None]:
df.index.day_name()

We can use diff to determine the difference between the values of the individual data.

In [None]:
df.diff()

Suppose we wanted to know the mean count for each month in our DataFrame.

We can do this using `resample`.

Resampling generates a unique sampling distribution on the basis of the actual data. We can apply various frequency to resample our time series data. 
Most commonly used time series frequency are:
- **W** : weekly frequency
- **M** : month end frequency
- **SM** : semi-month end frequency (15th and end of month)
- **Q** : quarter end frequency

Here we use the monthly frequency via the argument `M` for the sample to calculate the mean count.

In [None]:
df.resample('M').mean()

We can use partial string indexing to find values from a particular year

In [None]:
df['2020']

We can also do this for  a particular month, or we can even slice on a range of dates.

For example, here we only want the values from March 2020

In [None]:
df['2020-03']

Here we  want the values from March 2020 onwards:

In [None]:
df['2020-03':]

Another cool thing we can do is change the frequency of our dates in our DataFrame using `asfreq`.

In [None]:
df.asfreq('W')

If we use this to change the frequency from bi-weekly to weekly, we'll end up with missing values every other week. So let's use the forward fill `ffill` method on those missing values.

In [None]:
df.asfreq('W', method='ffill')

One last thing (preparing for next week) is plotting time series.

Importing matplotlib.pyplot, and using the iPython magic %mapplotlib inline, will allow you to visualize the time series in the notebook.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

df.plot()