<a href="https://colab.research.google.com/github/brendenwest/cis276/blob/main/7_time_series.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Time Series Analysis

### Reading
- Murach's, Chapter 9
- https://wesmckinney.com/book/time-series.html

### Learning Outcomes

- Date & time conversion
- Basic Time Series data
- Handling date ranges & time periods
- Rolling time windows




### What is Time Series data

Time series data is an important form of structured data, and can be any data recorded repeatedly at many points in time.

Time series can have `fixed frequency` - data points occurring at regular intervals (e.g. hourly, daily, etc).

Time series can also be `irregular` without a fixed unit of time or offset between units.

Data points typically have a `timestamp` specifying an instant in time. Timestamp can be a `date`, a `time`, or a combination of both.


### Date & Time types

Python has built-in data types & modules for date and time data, such as `datetime`, `time`, & `calendar`.

`datetime` stores both the date and time down to the microsecond. It has numerous methods to create a datetime object from inputs and to output parts of a datetime object.

In [11]:
from datetime import datetime
now = datetime.now()
print("now - ", now)  # show complete datetime object
print("date parts - ", now.year, now.month, now.day)
then = datetime(2023, 4, 12, 13, 9, 16, 484533)
delta = now - then
delta

now -  2024-02-12 00:16:28.778890
date parts -  2024 2 12


datetime.timedelta(days=305, seconds=40032, microseconds=294357)

In [6]:
delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)
delta

datetime.timedelta(days=926, seconds=56700)

You can format datetime objects as strings. See docs for a full listing of format codes.

Python `datetime` objects also have locale-specific formatting options for countries & languages other than US English.

In [13]:
timestamp = datetime(2021, 1, 3)
str(timestamp)  # convert datetime to string
timestamp.strftime("%Y-%m-%d")  # timestamp to formatted string

'2021-01-03'

You can use many of the same format codes to convert strings to dates.

In [16]:
datetime.strptime("2021-01-03", "%Y-%m-%d")
datestrs = ["7/6/2021", "8/6/2021"]
[datetime.strptime(x, "%m/%d/%Y") for x in datestrs]

[datetime.datetime(2021, 7, 6, 0, 0), datetime.datetime(2021, 8, 6, 0, 0)]

### Pandas Time Series Basics

The `pandas.to_datetime` method parses many different kinds of date representations, including missing data, into a `DatetimeIndex` Series indexed by datetime objects.

In [38]:
import pandas as pd
datestrs = ["2021/07/06 12:00:00", "2021-08-06 00:00:00", None]
pd.to_datetime(datestrs)

DatetimeIndex(['2021-07-06 12:00:00', '2021-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)

A basic pandas time series object is a Series indexed by timestamps. Pandas time series can be sliced like any other Series.

In [39]:
import numpy as np
ts = pd.Series(np.random.standard_normal(3), index=pd.to_datetime(datestrs))
ts

2021-07-06 12:00:00   -0.210091
2021-08-06 00:00:00    0.544252
NaT                   -0.608974
dtype: float64

Scalar values from a DatetimeIndex are pandas `Timestamp` objects and can generally be substituted for a `datetime` object. However, Timestamp objects can store data at nanosecond precision and also store frequency information.

In [40]:
ts.index[0]

Timestamp('2021-07-06 12:00:00')

Time series can also use a date string for indexing and also can slice with timestamps not contained in a time series to perform a range query.

In [41]:
ts['2021-07-06']
ts["2021-01-01":"2021-09-01"]

  ts["2021-01-01":"2021-09-01"]


2021-07-06 12:00:00   -0.210091
2021-08-06 00:00:00    0.544252
dtype: float64

### Date Ranges & Frequencies

### Time Periods

### Moving Windows