# Time Series

---

Time series data is an important form of structured data in many different fields, such as finance, economics, ecology, neuroscience, and physics. Anything that is observed or measured at many points in time forms a `Time Series`. Many time series are fixed frequency. However we may have time series with irregular frequency. **We will not deal with irregular frequency time series**.

In today's lecture, where we'll be looking at the time series and date functionally in Pandas. Manipulating dates and time is quite flexible in Pandas and thus allows us to conduct more analysis. Actually, Pandas was originally created by Wes McKinney to handle date and time data when he worked as a consultant for hedge funds.


### Lecture outline

---

* Date and Time data types


* Dealing with Datetime Objects


* Indexing, Selection, Sub-setting


* Periods and Period Arithmetic


* Date and Time Conversion


* Time Shifting


* Resampling


* Moving Window Functions

#### Reference


[Timeseries](https://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html#timeseries)


[Time series / date functionality](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html)


[Time deltas](https://pandas.pydata.org/pandas-docs/stable/user_guide/timedeltas.html)


[Windowing Operations](https://pandas.pydata.org/pandas-docs/stable/user_guide/window.html)


[datetime — Basic date and time types](https://docs.python.org/3/library/datetime.html)


[Python Datetime](https://www.w3schools.com/python/python_datetime.asp)

In [None]:
import pandas as pd

import numpy as np

import datetime

## Date and Time data types

---

Pandas has four main time related classes:

* `Timestamp`


* `DatetimeIndex`


* `Period`


* `PeriodIndex`


Before we investigate what are those time classes, we have to know what is `datetime` object at all.

### datetime

---

Python has builtin module `Datetime` to work with date and time objects. Data and time are objects containing date and time specific characteristics and when we manipulate them, we manipulate objects and not strings.


`datetime` module consists of the following type:


* `date` - Store calendar date (year, month, day) using the Gregorian calendar


* `time` - Store time of day as hours, minutes, seconds, and microseconds


* `datetime` - Stores both date and time


* `timedelta` - Represents the difference between two datetime values (as days, seconds, and microseconds)


* `tzinfo` - Base type for storing time zone information

In [None]:
now = datetime.datetime.now()

now

In [None]:
type(now) # datetime object

We can extract those different characteristics from datetime object by using appropriate methods or attributes.

In [None]:
now.date() # Extract data

now.time() # Extract time

now.year # Extract year

now.month # Extract month

now.day # Extract day

now.hour # Extract hour

now.minute # Extract minute

now.second # Extract second

now.microsecond # Extract microsecond

### Timestamp

---

`Timestamp` represents a single timestamp and associates values with points in time. In other words, it's a specific instants in time.


For example, let's create a timestamp using a string `1/5/2021 10:05AM`, and here we have our timestamp.
Timestamp is interchangeable with Python's `datetime` in most cases.

In [None]:
pd.Timestamp('1/5/2021 10:05AM')

We can also create a timestamp by passing multiple parameters such as year, month, date, hour, minute, separately.

In [None]:
pd.Timestamp(2021, 1, 5, 10, 15)

In [None]:
pd.Timestamp(2021, 1, 5, 10, 15).isoweekday() # Return the day of the week represented by the date. Monday == 1 … Sunday == 7

As it was in case of Python's builtin `datetime` module, we can extract different parts of timestamp object by using appropriate methods and/or attributes

In [None]:
single_timestamp = pd.Timestamp(2021, 1, 5, 10, 15, 23, 154, 4450)

single_timestamp

In [None]:
single_timestamp.date() # Extract date

single_timestamp.time() # Extract time

single_timestamp.year # Extract year

single_timestamp.month # Extract month as a number January == 1...December == 12

single_timestamp.month_name() # Return actual name of the month

single_timestamp.week # Return week number

single_timestamp.weekday() # Return weekday as a number Monday == 1 … Sunday == 7

single_timestamp.day_name() # Return actual name of the weekday

single_timestamp.hour # Extract hour

single_timestamp.minute # Extract minute

single_timestamp.second # Extract second

single_timestamp.microsecond # Extract microsecond

single_timestamp.nanosecond # Extract nanosecond

### Period

---

If we are interested in a span of a time, we have to use the `Period` object instead of datetime or anything else. `Period` represents fixed period of time. For example: January 2021, etc.

In [None]:
pd.Period(value="1/2021", freq="M") # A period object, that is January 2021,

In [None]:
pd.Period(value='1/5/2021', freq="D") # More granular period object - January 5th, 2021

<div class="alert alert-info">

**Note:** We can extract date and time characteristics from a `Period` object as we did in case of `Timestamp`
    

> [**pandas.Period**](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Period.html)

</div>

### DatetimeIndex and PeriodIndex

---

The `PeriodIndex` class stores a sequence of `Periods` and can serve as an axis index in any Pandas data structure. The `DatetimeIndex` class stores sequence of `Datetime` and it also can serve as an index for an axis.



[pandas.DatetimeIndex](https://pandas.pydata.org/docs/reference/api/pandas.DatetimeIndex.html)


[pandas.PeriodIndex](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.PeriodIndex.html)

The index of a Timestamp is `DatetimeIndex`. When we look at the series, each Timestamp is the index and has a value associated with it, in this case, `a`, `b`, `c`, `d`, and `e`.

In [None]:
t1 = pd.Series(list("abcde"), [pd.Timestamp('2021-01-05'),
                               pd.Timestamp('2021-01-06'),
                               pd.Timestamp('2021-01-07'),
                               pd.Timestamp('2021-01-08'),
                               pd.Timestamp('2021-01-09')])


t1

In [None]:
t1.index

In [None]:
type(t1.index) # Looking at the type of our series index, we see that it's DatetimeIndex

Similarly, we can create a `Period-based` index as well.

In [None]:
t2 = pd.Series(list("abcde"), [pd.Period('2021-01'),
                               pd.Period('2021-02'),
                               pd.Period('2021-03'),
                               pd.Period('2021-04'),
                               pd.Period('2021-05')])



t2

In [None]:
t2.index

In [None]:
type(t2.index) # Looking at the type of the ts2.index, we can see that it's PeriodIndex.

### Timedelta

---

Timedeltas are differences in times, expressed in difference units, e.g. days, hours, minutes, seconds. They can be both positive and negative. This is not the same as a a period, but conceptually similar. For instance, if we want to take the difference between January 11th and  January 10th, we get a Timedelta of one days.

In [None]:
pd.Timestamp('01/11/2021') - pd.Timestamp('01/10/2021')

We can also find what the date and time is for 12 days and 3 hours past January 2nd, at 8:10 AM.

In [None]:
pd.Timestamp('01/2/2021 8:10AM') + pd.Timedelta('12D 3H')

### Offset

---

Offset is similar to timedelta, but it follows specific calendar duration rules. Offset allows flexibility in terms of types of time intervals. Besides hour, day, week, month, etc it also has business day, end of month, semi month begin etc.

In [None]:
pd.Timestamp('1/5/2021').weekday() # Day of a week - 5th January

In [None]:
pd.Timestamp('1/5/2021').day_name() # Day of a week - 5th January

In [None]:
pd.Timestamp('1/4/2021') + pd.offsets.Week() # Add the timestamp with a week ahead

In [None]:
pd.Timestamp('1/4/2021') + pd.offsets.MonthEnd() # Add the month end. We have the last day of January

## Dealing with Datetime Objects

---

Next, let's look at a few tricks for working with dates in a DataFrame. Suppose we want to look at nine measurements, taken bi-weekly, every Sunday, starting in October 2020. Using `date_range()` method, we can create this DatetimeIndex. In `data_range()`, we have to either specify the `start` or `end` date. If it is not explicitly specified, by default, the date is considered the start date. Then we have to specify the number of periods, and a frequency. Here, we set it to `2W-SUN`, which means biweekly on Sunday.

<div class="alert alert-info">

**Note:** Contains Spoilers!!!

</div>

In [None]:
dates = pd.date_range(start="10-01-2020", periods=9, freq="2W-SUN")


dates

There are many other frequencies that you can specify. For example, you can do business day.

In [None]:
pd.date_range(start="10-01-2020", periods=9, freq="B")

We can do quarterly as well, with the quarter start in June

In [None]:
pd.date_range(start="04-01-2020", periods=12, freq="QS-JUN")

Now, let's go back to our weekly on Sunday example and create a DataFrame using these dates, and some random data, and see what we can do with it.

In [None]:
dates

In [None]:
np.random.seed(425)



dates = pd.date_range(start="10-11-2020", periods=20, freq="2W-SUN")


df = pd.DataFrame({"count_1": np.random.randint(1, 10, 20),
                   "count_2": np.random.randint(1, 10, 20)},
                  index=dates)


df

Here, we can see that all the dates in our index are on a Sunday, which matches the frequency that we set.

In [None]:
df.index.day_name()

We can also use `diff()` to find the first discrete difference between each date's value. We will talk about the `diff()` method later on.

In [None]:
df.diff()

Suppose we want to know what is the mean count for each month in our DataFrame. We can do this using
`resample()`. Converting from a higher frequency from a lower frequency is called `downsampling` (we'll talk about this in a moment)

In [None]:
df.resample("M").mean()

Now let's talk about datetime indexing and slicing, which is a wonderful feature of the pandas DataFrame.For instance, we can use partial string indexing to find values from a particular year.

In [None]:
df.loc["2020"] # Select only 2020 year

df.loc["2021"] # Select only 2021 year

df.loc["2020-12"] # Select particular year and month

df.loc["2020-12":] # Select range

## Indexing, Selection, Sub-setting

---

Time series behaves like any other Pandas Series when you are indexing and selecting data based on label.


> **While pandas does not force you to have a sorted date index, some of these methods may have unexpected or incorrect behavior if the dates are unsorted.**

In [None]:
df.index

In [None]:
df.shape

In [None]:
df[:10] # Select first 10 rows

df[10:15] # Select 5 rows

df[15:] # Select last 5 rows

In [None]:
df.loc["2020-10-11"] # Select one row

df.loc[datetime.datetime(2020, 10, 11)]


df.loc["2020-10"] # Select all row for a month

df.loc["2020"] # Select all rows for a year



df.loc[:"2020"] # Select year range

df.loc["2021":] # Select year range


df.loc["2021-01":"2021-05"] # Select month range

df.loc[datetime.datetime(2020, 10, 11): datetime.datetime(2021, 2, 28)]

In [None]:
df.truncate(before="2021-01-31") # Truncate all rows before this index value

df.truncate(after="2021-01-31") # Truncate all rows after this index value

df.truncate(before="2021-01-03", after="2021-04-11") # Truncate before and after these index values

## Periods and Period Arithmetic

---

Periods represent timespans, like days, months, quarters, or years. The Period class represents this data type, requiring a string or integer and a frequency from the below table.


![alt text](images/base_ts_frq.png "Title")

Period object represents the full timespan from January 1, 2020, to December 31, 2020, inclusive.

In [None]:
first_period = pd.Period(value=2020, freq="A-Dec")


first_period

The frequency for our period object is `Annual`, hence adding and subtracting integers from this object has the effect of shifting by their frequency not actual dates.

In [None]:
first_period - 5

In [None]:
first_period + 3

If two periods have the same frequency, their difference is the number of units between them

In [None]:
second_period = pd.Period(value="2015", freq="A-DEC")

In [None]:
second_period - first_period

In [None]:
first_period - second_period

It's not possible to add two period object

In [None]:
first_period + second_period

Also, it's not possible to do an arithmetic operations on period objects with different frequency.

In [None]:
third_period = pd.Period(value="01-04-2021", freq="D")

In [None]:
first_period - third_period

In [None]:
first_period + third_period

The key here is that the `Period` object encapsulates the granularity for arithmetic.

### Period Frequency Conversion


---

If it's not possible to do operations on period objects with different frequency. However we can perform frequency conversion and then do different operations.

In [None]:
first_period

Convert annual period into monthly period!

In [None]:
first_period.asfreq(freq="M", how="start") # Annual period to Monthly period

In [None]:
first_period_daily = first_period.asfreq(freq="D", how="start")


first_period_daily

In [None]:
third_period - first_period_daily

## Date and Time Conversion

---

Converting strings into proper date object or vice versa is a crucial operation in Time Series data. For some type of operation we need to have a string representation of date and in some cases we need proper date/time object.

### Converting Between String and Datetime

---

We can convert string representation of a date into proper date object and vice versa either by using Python builtin `datetime` module or Pandas.

![alt text](images/format_table.png "Title")


$$
$$


[**See this link for full reference**](https://www.w3schools.com/python/python_datetime.asp)

**Datetime to String**

In [None]:
single_stamp = datetime.datetime(year=2021, month=1, day=5, hour=10, minute=45)


single_stamp

In [None]:
str(single_stamp) # From datetime to string

In [None]:
single_stamp.strftime(format="%Y-%m-%d")

single_stamp.strftime(format="%y-%m-%d")

single_stamp.strftime(format="%y-%m-%d-%H-%M")

In [None]:
single_stamp.strftime("%b")

single_stamp.strftime("%B")

**String to Datetime**

In [None]:
value = "2021-1-05"

In [None]:
pd.to_datetime(value) # from string to datetime

In [None]:
datetime.datetime.strptime(value, "%Y-%m-%d") # We need to indicate EXACT format. Otherwise conversion does not work

In [None]:
date_strings = ["7/6/2011", "8/6/2011"]


[datetime.datetime.strptime(x, '%m/%d/%Y') for x in date_strings]

$$
$$

Pandas `to_datetime()` is not such a picky compared to `datetime.strptime()` in terms of date formatting.

$$
$$

In [None]:
new_dates = ["2 June 2013", "Aug 29, 2014", "2015-06-26", "7/12/16"]

[pd.to_datetime(i) for i in new_dates]

`to_datetime()` method has option to change the date parse order.

In [None]:
pd.to_datetime("4.7.12", dayfirst=True)

In [None]:
pd.to_datetime("2010/11/12", format="%Y/%m/%d") # We can even indicate format argument

$$
$$

Missing values in Time Series is represented as `Not a Time` or `NaT`

In [None]:
pd.to_datetime([None])

### Converting Timestamps to Periods

---

Series and DataFrame objects indexed by Timestamps can be converted to Periods with the `to_period()` method.

In [None]:
rng = pd.date_range("2000-01-01", periods=3, freq="M")


ts = pd.Series(np.random.randint(low=1, high=10, size=3), index=rng)


ts

In [None]:
type(ts.index)

In [None]:
new_ts = ts.to_period() # Convert Series from DatetimeIndex to PeriodIndex.


new_ts

In [None]:
type(new_ts.index)

### Converting Periods to Timestamps

---

Series and DataFrame objects indexed by Periods can be converted to Timestamps with the `to_timestamp()` method.

In [None]:
new_ts

In [None]:
new_ts.to_timestamp()

In [None]:
type(new_ts.to_timestamp().index)

## Time Shifting

---

Shifting (Leading and Lagging) data refers to moving data backward and forward through time. Both Series and DataFrame have a `shift()` method for doing naive shifts forward or backward, leaving the index unmodified.

When we shift like this, missing data is introduced either at the start or the end of the time series.

In [None]:
np.random.seed(425)

ts = pd.Series(data=np.random.randint(low=1, high=10, size=7),
               index=pd.date_range(start="2021-01-11", periods=7))


ts

**Forward Shift**

In [None]:
pd.DataFrame(ts).shift(periods=2) # Shift index by desired number of periods

If `freq` argument is specified then the index values are shifted but the data is not realigned.

In [None]:
pd.DataFrame(ts).shift(periods=2, freq="D")

In [None]:
pd.DataFrame(ts)

**Backward Shift**

In [None]:
pd.DataFrame(ts).shift(periods=-2) # Shift index by desired number of periods

In [None]:
pd.DataFrame(ts).shift(periods=-2, freq="D") # Shift index by desired number of periods

In [None]:
pd.DataFrame(ts)

## Rasampling

---

Resampling refers to the process of converting a time series from one frequency to another. Aggregating higher frequency data to lower frequency is called `downsampling`, while converting lower frequency to higher frequency is called `upsampling`.


`resample()` is a time-based `groupby()`, followed by a reduction method on each of its groups.

#### Reference


[Resampling](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-resampling)

In [None]:
date_range = pd.date_range("2021-01-01-12-00", periods=120, freq="D")

df = pd.Series(data=np.random.randint(low=1, high=20, size=120),
               index=date_range)


df

In [None]:
type(df.index)

In [None]:
df.index

### Downsampling

In [None]:
df

Aggregate data into month chunks by taking the sum of each group.

In [None]:
df.resample(rule="M").sum()

In [None]:
df.resample(rule="M").mean()

In [None]:
df.resample(rule="M", kind="period").sum() # Convert resulting index to "PeriodIndex"

#### Open-High-Low-Close (OHLC) resampling

---

In finance, a popular way to aggregate a time series is to compute four values for each bucket: the first (open), last (close), maximum (high), and minimal (low) values.





* (open, first)


* (high, max)


* (low, min)


* (close, last)

In [None]:
df.resample("M").ohlc()

In [None]:
df = df.resample(rule="W").mean() # Week average

df

### Upsampling


---

Compared to downsampling, when doing upsampling we don't need an aggregation function. We use the asfreq method to convert to the higher frequency without any aggregation.

In [None]:
df # Week average

In [None]:
df.resample(rule="D").asfreq()

In [None]:
df.resample(rule="D").ffill()

$$
$$


We can resample our Series or DataFrame and then apply different functions

In [None]:
res = df.resample(rule="M")

In [None]:
res.aggregate([np.sum, np.mean, np.std])

## Moving Window Functions

---

An important class of array transformations used for Time Series operations are statistics and other functions evaluated over a sliding window - the function that performs an aggregation over a sliding partition of values.

### Rolling Functions

Let create some simple Pandas Series to understand how the Window functions work.

In [None]:
np.random.seed(425)

ts = pd.Series(data=np.random.randint(low=1, high=10, size=7),
               index=pd.date_range(start="2021-01-11", periods=7))


ts

In [None]:
ts.rolling(window=2).sum() # Sum every two period

In [None]:
ts.rolling(window=2).mean() # Mean of every two period

In [None]:
ts.rolling(window=3).sum() # Sum every third period

### Expanding Functions

---

The expanding functions starts the time window from the beginning of the time series and increases the size of the window until it encompasses the whole series.

In [None]:
ts.expanding().sum()

In [None]:
ts.expanding().mean()

# Summary

---

Time series data requires different types of analysis and transformation. Knowing how to deal with time series data, either it will be a regular or irregular is a great tool under the belt.