# Time Series

---

Time series data is an important form of structured data in many different fields, such as finance, economics, ecology, neuroscience, and physics. Anything that is observed or measured at many points in time forms a `Time Series`. Many time series are fixed frequency. However we may have time series with irregular frequency. **We will not deal with irregular frequency time series**.

In today's lecture, where we'll be looking at the time series and date functionally in Pandas. Manipulating dates and time is quite flexible in Pandas and thus allows us to conduct more analysis. Actually, Pandas was originally created by Wes McKinney to handle date and time data when he worked as a consultant for hedge funds.


### Lecture outline

---

* Date and Time data types


* Dealing with Datetime Objects


* Indexing, Selection, Sub-setting


* Periods and Period Arithmetic


* Date and Time Conversion


* Time Shifting


* Resampling


* Moving Average

#### Reference


[Timeseries](https://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html#timeseries)


[Time series / date functionality](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html)


[Time deltas](https://pandas.pydata.org/pandas-docs/stable/user_guide/timedeltas.html)


[Windowing Operations](https://pandas.pydata.org/pandas-docs/stable/user_guide/window.html)


[datetime — Basic date and time types](https://docs.python.org/3/library/datetime.html)


[Python Datetime](https://www.w3schools.com/python/python_datetime.asp)

In [26]:
import pandas as pd

import numpy as np

import datetime

## Date and Time data types - DONE!!!

---

Pandas has four main time related classes:

* `Timestamp`


* `DatetimeIndex`


* `Period`


* `PeriodIndex`


Before we investigate what are those time classes, we have to know what is `datetime` object at all.

### datetime

---

Python has builtin module `Datetime` to work with date and time objects. Data and time are objects containing date and time specific characteristics and when we manipulate them, we manipulate objects and not strings.


`datetime` module consists of the following type:


* `date` - Store calendar date (year, month, day) using the Gregorian calendar


* `time` - Store time of day as hours, minutes, seconds, and microseconds


* `datetime` - Stores both date and time


* `timedelta` - Represents the difference between two datetime values (as days, seconds, and microseconds)


* `tzinfo` - Base type for storing time zone information

In [53]:
now = datetime.datetime.now()

now

datetime.datetime(2021, 1, 11, 14, 26, 51, 485815)

In [54]:
type(now) # datetime object

datetime.datetime

We can extract those different characteristics from datetime object by using appropriate methods or attributes.

In [58]:
now.date() # Extract data

now.time() # Extract time

now.year # Extract year

now.month # Extract month

now.day # Extract day

now.hour # Extract hour

now.minute # Extract minute

now.second # Extract second

now.microsecond # Extract microsecond

485815

### Timestamp

---

`Timestamp` represents a single timestamp and associates values with points in time. In other words, it's a specific instants in time.


For example, let's create a timestamp using a string `1/5/2021 10:05AM`, and here we have our timestamp.
Timestamp is interchangeable with Python's `datetime` in most cases.

In [97]:
pd.Timestamp('1/5/2021 10:05AM')

Timestamp('2021-01-05 10:05:00')

We can also create a timestamp by passing multiple parameters such as year, month, date, hour, minute, separately.

In [98]:
pd.Timestamp(2021, 1, 5, 10, 15)

Timestamp('2021-01-05 10:15:00')

In [100]:
pd.Timestamp(2021, 1, 5, 10, 15).isoweekday() # Return the day of the week represented by the date. Monday == 1 … Sunday == 7

2

As it was in case of Python's builtin `datetime` module, we can extract different parts of timestamp object by using appropriate methods and/or attributes

In [110]:
single_timestamp = pd.Timestamp(2021, 1, 5, 10, 15, 23, 154, 4450)

single_timestamp

Timestamp('2021-01-05 10:15:23.000158450')

In [132]:
single_timestamp.date() # Extract date

single_timestamp.time() # Extract time

single_timestamp.year # Extract year

single_timestamp.month # Extract month as a number January == 1...December == 12

single_timestamp.month_name() # Return actual name of the month

single_timestamp.week # Return week number

single_timestamp.weekday() # Return weekday as a number Monday == 1 … Sunday == 7

single_timestamp.day_name() # Return actual name of the weekday

single_timestamp.hour # Extract hour

single_timestamp.minute # Extract minute

single_timestamp.second # Extract second

single_timestamp.microsecond # Extract microsecond

single_timestamp.nanosecond # Extract nanosecond

4450

### Period

---

If we are interested in a span of a time, we have to use the `Period` object instead of datetime or anything else. `Period` represents fixed period of time. For example: January 2021, etc.

In [135]:
pd.Period(value="1/2021", freq="M") # A period object, that is January 2021,

Period('2021-01', 'M')

In [137]:
pd.Period(value='1/5/2021', freq="D") # More granular period object - January 5th, 2021

Period('2021-01-05', 'D')

<div class="alert alert-info">

**Note:** We can extract date and time characteristics from a `Period` object as we did in case of `Timestamp`
    

> [**pandas.Period**](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Period.html)

</div>

### DatetimeIndex and PeriodIndex

---

The `PeriodIndex` class stores a sequence of `Periods` and can serve as an axis index in any Pandas data structure. The `DatetimeIndex` class stores sequence of `Datetime` and it also can serve as an index for an axis.



[pandas.DatetimeIndex](https://pandas.pydata.org/docs/reference/api/pandas.DatetimeIndex.html)


[pandas.PeriodIndex](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.PeriodIndex.html)

The index of a Timestamp is `DatetimeIndex`. When we look at the series, each Timestamp is the index and has a value associated with it, in this case, `a`, `b`, `c`, `d`, and `e`.

In [143]:
t1 = pd.Series(list("abcde"), [pd.Timestamp('2021-01-05'),
                               pd.Timestamp('2021-01-06'),
                               pd.Timestamp('2021-01-07'),
                               pd.Timestamp('2021-01-08'),
                               pd.Timestamp('2021-01-09')])


t1

2021-01-05    a
2021-01-06    b
2021-01-07    c
2021-01-08    d
2021-01-09    e
dtype: object

In [144]:
t1.index

DatetimeIndex(['2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08',
               '2021-01-09'],
              dtype='datetime64[ns]', freq=None)

In [145]:
type(t1.index) # Looking at the type of our series index, we see that it's DatetimeIndex

pandas.core.indexes.datetimes.DatetimeIndex

Similarly, we can create a `Period-based` index as well.

In [146]:
t2 = pd.Series(list("abcde"), [pd.Period('2021-01'),
                               pd.Period('2021-02'),
                               pd.Period('2021-03'),
                               pd.Period('2021-04'),
                               pd.Period('2021-05')])



t2

2021-01    a
2021-02    b
2021-03    c
2021-04    d
2021-05    e
Freq: M, dtype: object

In [147]:
t2.index

PeriodIndex(['2021-01', '2021-02', '2021-03', '2021-04', '2021-05'], dtype='period[M]', freq='M')

In [148]:
type(t2.index) # Looking at the type of the ts2.index, we can see that it's PeriodIndex.

pandas.core.indexes.period.PeriodIndex

### Timedelta

---

Timedeltas are differences in times, expressed in difference units, e.g. days, hours, minutes, seconds. They can be both positive and negative. This is not the same as a a period, but conceptually similar. For instance, if we want to take the difference between January 11th and  January 10th, we get a Timedelta of one days.

In [61]:
pd.Timestamp('01/11/2021') - pd.Timestamp('01/10/2021')

Timedelta('1 days 00:00:00')

We can also find what the date and time is for 12 days and 3 hours past January 2nd, at 8:10 AM.

In [62]:
pd.Timestamp('01/2/2021 8:10AM') + pd.Timedelta('12D 3H')

Timestamp('2021-01-14 11:10:00')

### Offset

---

Offset is similar to timedelta, but it follows specific calendar duration rules. Offset allows flexibility in terms of types of time intervals. Besides hour, day, week, month, etc it also has business day, end of month, semi month begin etc.

In [76]:
pd.Timestamp('1/5/2021').weekday() # Day of a week - 5th January

1

In [82]:
pd.Timestamp('1/5/2021').day_name() # Day of a week - 5th January

'Tuesday'

In [83]:
pd.Timestamp('1/4/2021') + pd.offsets.Week() # Add the timestamp with a week ahead

Timestamp('2021-01-11 00:00:00')

In [85]:
pd.Timestamp('1/4/2021') + pd.offsets.MonthEnd() # Add the month end. We have the last day of January

Timestamp('2021-01-31 00:00:00')

## Dealing with Datetime Objects - DONE!!!

---

Next, let's look at a few tricks for working with dates in a DataFrame. Suppose we want to look at nine measurements, taken bi-weekly, every Sunday, starting in October 2020. Using `date_range()` method, we can create this DatetimeIndex. In `data_range()`, we have to either specify the `start` or `end` date. If it is not explicitly specified, by default, the date is considered the start date. Then we have to specify the number of periods, and a frequency. Here, we set it to `2W-SUN`, which means biweekly on Sunday.

<div class="alert alert-info">

**Note:** Contains Spoilers!!!

</div>

In [150]:
dates = pd.date_range(start="10-01-2020", periods=9, freq="2W-SUN")


dates

DatetimeIndex(['2020-10-04', '2020-10-18', '2020-11-01', '2020-11-15',
               '2020-11-29', '2020-12-13', '2020-12-27', '2021-01-10',
               '2021-01-24'],
              dtype='datetime64[ns]', freq='2W-SUN')

There are many other frequencies that you can specify. For example, you can do business day.

In [154]:
pd.date_range(start="10-01-2020", periods=9, freq="B")

DatetimeIndex(['2020-10-01', '2020-10-02', '2020-10-05', '2020-10-06',
               '2020-10-07', '2020-10-08', '2020-10-09', '2020-10-12',
               '2020-10-13'],
              dtype='datetime64[ns]', freq='B')

We can do quarterly as well, with the quarter start in June

In [153]:
pd.date_range(start="04-01-2020", periods=12, freq="QS-JUN")

DatetimeIndex(['2020-06-01', '2020-09-01', '2020-12-01', '2021-03-01',
               '2021-06-01', '2021-09-01', '2021-12-01', '2022-03-01',
               '2022-06-01', '2022-09-01', '2022-12-01', '2023-03-01'],
              dtype='datetime64[ns]', freq='QS-JUN')

Now, let's go back to our weekly on Sunday example and create a DataFrame using these dates, and some random data, and see what we can do with it.

In [161]:
dates

DatetimeIndex(['2020-10-04', '2020-10-18', '2020-11-01', '2020-11-15',
               '2020-11-29', '2020-12-13', '2020-12-27', '2021-01-10',
               '2021-01-24'],
              dtype='datetime64[ns]', freq='2W-SUN')

In [203]:
np.random.seed(425)



dates = pd.date_range(start="10-11-2020", periods=20, freq="2W-SUN")


df = pd.DataFrame({"count_1": np.random.randint(1, 10, 20),
                   "count_2": np.random.randint(1, 10, 20)},
                  index=dates)


df

Unnamed: 0,count_1,count_2
2020-10-11,3,1
2020-10-25,1,7
2020-11-08,3,5
2020-11-22,4,6
2020-12-06,8,1
2020-12-20,5,2
2021-01-03,8,9
2021-01-17,5,5
2021-01-31,5,5
2021-02-14,8,9


Here, we can see that all the dates in our index are on a Sunday, which matches the frequency that we set.

In [204]:
df.index.day_name()

Index(['Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday',
       'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday',
       'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday'],
      dtype='object')

We can also use `diff()` to find the first discrete difference between each date's value. We will talk about the `diff()` method later on.

In [205]:
df.diff()

Unnamed: 0,count_1,count_2
2020-10-11,,
2020-10-25,-2.0,6.0
2020-11-08,2.0,-2.0
2020-11-22,1.0,1.0
2020-12-06,4.0,-5.0
2020-12-20,-3.0,1.0
2021-01-03,3.0,7.0
2021-01-17,-3.0,-4.0
2021-01-31,0.0,0.0
2021-02-14,3.0,4.0


Suppose we want to know what is the mean count for each month in our DataFrame. We can do this using
`resample()`. Converting from a higher frequency from a lower frequency is called `downsampling` (we'll talk about this in a moment)

In [206]:
df.resample("M").mean()

Unnamed: 0,count_1,count_2
2020-10-31,2.0,4.0
2020-11-30,3.5,5.5
2020-12-31,6.5,1.5
2021-01-31,6.0,6.333333
2021-02-28,8.0,6.0
2021-03-31,3.0,5.0
2021-04-30,2.5,2.0
2021-05-31,3.5,5.0
2021-06-30,3.5,5.5
2021-07-31,7.0,7.0


Now let's talk about datetime indexing and slicing, which is a wonderful feature of the pandas DataFrame.For instance, we can use partial string indexing to find values from a particular year.

In [224]:
df.loc["2020"] # Select only 2020 year

df.loc["2021"] # Select only 2021 year

df.loc["2020-12"] # Select particular year and month

df.loc["2020-12":] # Select range

Unnamed: 0,count_1,count_2
2020-12-06,8,1
2020-12-20,5,2
2021-01-03,8,9
2021-01-17,5,5
2021-01-31,5,5
2021-02-14,8,9
2021-02-28,8,3
2021-03-14,1,3
2021-03-28,5,7
2021-04-11,3,1


## Indexing, Selection, Sub-setting

---

Time series behaves like any other Pandas Series when you are indexing and selecting data based on label.


> **While pandas does not force you to have a sorted date index, some of these methods may have unexpected or incorrect behavior if the dates are unsorted.**

In [260]:
df.index

DatetimeIndex(['2020-10-11', '2020-10-25', '2020-11-08', '2020-11-22',
               '2020-12-06', '2020-12-20', '2021-01-03', '2021-01-17',
               '2021-01-31', '2021-02-14', '2021-02-28', '2021-03-14',
               '2021-03-28', '2021-04-11', '2021-04-25', '2021-05-09',
               '2021-05-23', '2021-06-06', '2021-06-20', '2021-07-04'],
              dtype='datetime64[ns]', freq='2W-SUN')

In [265]:
df.shape

(20, 2)

In [267]:
df[:10] # Select first 10 rows

df[10:15] # Select 5 rows

df[15:] # Select last 5 rows

Unnamed: 0,count_1,count_2
2021-05-09,5,2
2021-05-23,2,8
2021-06-06,6,3
2021-06-20,1,8
2021-07-04,7,7


In [247]:
df.loc["2020-10-11"] # Select one row

df.loc[datetime.datetime(2020, 10, 11)]


df.loc["2020-10"] # Select all row for a month

df.loc["2020"] # Select all rows for a year



df.loc[:"2020"] # Select year range

df.loc["2021":] # Select year range


df.loc["2021-01":"2021-05"] # Select month range

df.loc[datetime.datetime(2020, 10, 11): datetime.datetime(2021, 2, 28)]

Unnamed: 0,count_1,count_2
2020-10-11,3,1
2020-10-25,1,7
2020-11-08,3,5
2020-11-22,4,6
2020-12-06,8,1
2020-12-20,5,2
2021-01-03,8,9
2021-01-17,5,5
2021-01-31,5,5
2021-02-14,8,9


In [259]:
df.truncate(before="2021-01-31") # Truncate all rows before this index value

df.truncate(after="2021-01-31") # Truncate all rows after this index value

df.truncate(before="2021-01-03", after="2021-04-11") # Truncate before and after these index values

Unnamed: 0,count_1,count_2
2021-01-03,8,9
2021-01-17,5,5
2021-01-31,5,5
2021-02-14,8,9
2021-02-28,8,3
2021-03-14,1,3
2021-03-28,5,7
2021-04-11,3,1


## Periods and Period Arithmetic - DONE!!!

---

Periods represent timespans, like days, months, quarters, or years. The Period class represents this data type, requiring a string or integer and a frequency from the below table.


![alt text](images/base_ts_frq.png "Title")

Period object represents the full timespan from January 1, 2020, to December 31, 2020, inclusive.

In [4]:
first_period = pd.Period(value=2020, freq="A-Dec")


first_period

Period('2020', 'A-DEC')

The frequency for our period object is `Annual`, hence adding and subtracting integers from this object has the effect of shifting by their frequency not actual dates.

In [5]:
first_period - 5

Period('2015', 'A-DEC')

In [6]:
first_period + 3

Period('2023', 'A-DEC')

If two periods have the same frequency, their difference is the number of units between them

In [7]:
second_period = pd.Period(value="2015", freq="A-DEC")

In [8]:
second_period - first_period

<-5 * YearEnds: month=12>

In [9]:
first_period - second_period

<5 * YearEnds: month=12>

It's not possible to add two period object

In [10]:
first_period + second_period

TypeError: unsupported operand type(s) for +: 'Period' and 'Period'

Also, it's not possible to do an arithmetic operations on period objects with different frequency.

In [11]:
third_period = pd.Period(value="01-04-2021", freq="D")

In [12]:
first_period - third_period

IncompatibleFrequency: Input has different freq=D from Period(freq=A-DEC)

In [13]:
first_period + third_period

TypeError: unsupported operand type(s) for +: 'Period' and 'Period'

The key here is that the `Period` object encapsulates the granularity for arithmetic.

### Period Frequency Conversion


---

If it's not possible to do operations on period objects with different frequency. However we can perform frequency conversion and then do different operations.

In [14]:
first_period

Period('2020', 'A-DEC')

Convert annual period into monthly period!

In [20]:
first_period.asfreq(freq="M", how="start") # Annual period to Monthly period

Period('2020-01', 'M')

In [24]:
first_period_daily = first_period.asfreq(freq="D", how="start")


first_period_daily

Period('2020-01-01', 'D')

In [25]:
third_period - first_period_daily

<369 * Days>

## Date and Time Conversion

---

fdkjnsdlfs

### Converting to Datetime

In [15]:
# Now, let's look into how to convert to Datetime. Suppose we have a list of dates as strings and we want to
# create a new dataframe

# I'm going to try a bunch of different date formats
d1 = ['2 June 2013', 'Aug 29, 2014', '2015-06-26', '7/12/16']

# And just some random data
ts3 = pd.DataFrame(np.random.randint(10, 100, (4,2)), index=d1, 
                   columns=list('ab'))
ts3

Unnamed: 0,a,b
2 June 2013,38,40
"Aug 29, 2014",94,99
2015-06-26,75,32
7/12/16,68,62


In [16]:
# Using pandas to_datetime, pandas will try to convert these to Datetime and put them in a standard format.

ts3.index = pd.to_datetime(ts3.index)
ts3

Unnamed: 0,a,b
2013-06-02,38,40
2014-08-29,94,99
2015-06-26,75,32
2016-07-12,68,62


In [17]:
# to_datetime also() has options to change the date parse order. For example, we 
# can pass in the argument dayfirst = True to parse the date in European date.

pd.to_datetime('4.7.12', dayfirst=True)

Timestamp('2012-07-04 00:00:00')

## Time Shifting

## Rasampling

---

`resample()` is a time-based `groupby()`, followed by a reduction method on each of its groups.

#### Reference


[Resampling](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-resampling)

## Moving Average

# Summary

---

lkdfngodifsog