<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Working With Time Series Data

---

### Learning Objectives
 
**After this lesson, you will be able to:**
- Identify time series data.
- Explain the challenges of working with time series data.
- Use the `datetime` library to represent dates as objects.
- Preprocess time series data with Pandas.

---

### Lesson Guide

#### Time Series Data
- [What is a Time Series](#A)
- [The Datetime Library](#B)
- [Preprocessing Time Series Data with Pandas](#C)
- [Independent Practice](#D)
----

<h2><a id="A">What is a Time Series?</a></h2>

A **time series** is a series of data points that's indexed (or listed, or graphed) in time order. Most commonly, a time series is a sequence that's taken at successive equally spaced points in time. Time series are often represented as a set of observations that have a time-bound relation, which is represented as an index.

Time series are commonly found in sales, analysis, stock market trends, economic phenomena, and social science problems.

These data sets are often investigated to evaluate the long-term trends, forecast the future, or perform some other form of analysis.

> **Check for Understanding:** List some examples of real-world time series data.

### Let's take a look at some Apple stock data to get a feel for what time series data look like.

In [1]:
import pandas as pd
from datetime import timedelta
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

aapl = pd.read_csv("data/aapl.csv", parse_dates=['Date'])
# Or use aapl['Date'] = pd.to_datetime(aapl['Date'])

Take a high-level look at the data. What are we looking at?

    - Date column of trading dates when measurements taken, starting from 2016-01-19 
    - Open column showing stock price at opening of trading day (US dollars)
    - High column showing maximum stock price of trading day 
    - Low column showing minimum stock price of trading day
    - Close column showing stock price at close of trading day
    - Volume column shows total number of shares traded throughout the trading day

In [2]:
aapl.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,2017-01-13,119.11,119.62,118.81,119.04,26111948
1,2017-01-12,118.9,119.3,118.21,119.25,27086220
2,2017-01-11,118.74,119.93,118.6,119.75,27588593
3,2017-01-10,118.77,119.38,118.3,119.11,24462051
4,2017-01-09,117.95,119.43,117.94,118.99,33561948


In [48]:
aapl.shape

(251, 5)

In [3]:
aapl.dtypes

Date      datetime64[ns]
Open             float64
High             float64
Low              float64
Close            float64
Volume             int64
dtype: object

In [4]:
aapl.describe()

Unnamed: 0,Open,High,Low,Close,Volume
count,251.0,251.0,251.0,251.0,251.0
mean,105.1551,106.060518,104.39255,105.292191,36744950.0
std,7.905047,7.876708,7.995679,7.963102,16090590.0
min,90.0,91.67,89.47,90.34,11475920.0
25%,97.355,98.22,96.69,97.34,26651440.0
50%,106.27,107.27,105.5,106.1,32292340.0
75%,111.45,112.37,110.7,111.75,41373940.0
max,119.11,119.93,118.81,119.75,132224500.0


In [5]:
aapl['Date'].min()

Timestamp('2016-01-19 00:00:00')

In [21]:
# Time duration between first and last dates
aapl.Date.max() - aapl.Date.min()

Timedelta('360 days 00:00:00')

<h2><a id="B">The DateTime library</a></h2>

As time is important to time series data, we will need to interpret these data in the ways that humans interpret them (which is many ways). 

Python's `DateTime` library is great for dealing with time-related data, and Pandas has incorporated this library into its own `datetime` series and objects.

In this lesson, we'll review these data types and learn a little more about each of them:

* `datetime` objects.
* `datetime` series.
* Timestamps.
* `timedelta()`.

### `datetime` Objects

Below, we'll load in the `DateTime` library, which we can use to create a `datetime` object by entering in the different components of the date as arguments.

In [6]:
# The datetime library is something you should already have from Anaconda.
from datetime import datetime

In [7]:
# Let's just set a random datetime — not the end of the world or anything.
lesson_date = datetime(2012, 12, 21, 12, 21, 12, 844089)

The components of the date are accessible via the object's attributes.

In [8]:
print("Micro-Second", lesson_date.microsecond)
print("Second", lesson_date.second)
print("Minute", lesson_date.minute)
print("Hour", lesson_date.hour)
print("Day", lesson_date.day)
print("Month",lesson_date.month)
print("Year", lesson_date.year)

Micro-Second 844089
Second 12
Minute 21
Hour 12
Day 21
Month 12
Year 2012


### `timedelta()`

Suppose we want to add time to or subtract time from a date. Maybe we're using time as an index and want to get everything that happened a week before a specific observation (difference between two times). `timedelta` only works for days, hours, seconds and microseconds, i.e. the largest value will be in days.

We can use a `timedelta` object to shift a `datetime` object. Here's an example:

`datetime`'s `.now()` function will give you the `datetime` object of this very moment.

In [9]:
# Timedeltas represent time as an amount rather than as a fixed position.
offset = timedelta(days=1, seconds=20)

# The timedelta() has attributes that allow us to extract values from it.
print('offset days', offset.days)
print('offset seconds', offset.seconds)
print('offset microseconds', offset.microseconds)

offset days 1
offset seconds 20
offset microseconds 0


In [10]:
now = datetime.now()
print("Like Right Now: ", now)

Like Right Now:  2022-07-17 07:47:38.059979


The current time is particularly useful when using `timedelta()`.

In [11]:
print("Future: ", now + offset)
print("Past: ", now - offset)

Future:  2022-07-18 07:47:58.059979
Past:  2022-07-16 07:47:18.059979


*Note: The largest value a `timedelta()` can hold is days. For instance, you can't say you want your offset to be two years, 44 days, and 12 hours; you have to convert those years to days.*

You can read more about the `timedelta()` category [here](https://docs.python.org/2/library/datetime.html).

### Guided Practice: Apple Stock Data

We can practice using `datetime` functions and objects using Apple stock data.

In [12]:
aapl.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,2017-01-13,119.11,119.62,118.81,119.04,26111948
1,2017-01-12,118.9,119.3,118.21,119.25,27086220
2,2017-01-11,118.74,119.93,118.6,119.75,27588593
3,2017-01-10,118.77,119.38,118.3,119.11,24462051
4,2017-01-09,117.95,119.43,117.94,118.99,33561948


If the `Date` column starts off as an object, or there are dollar/pound signs in the money columns, these need to be converted or stripped:

aapl['Open'] = aapl['Open'].str.replace('$', '')

In [13]:
aapl.dtypes

Date      datetime64[ns]
Open             float64
High             float64
Low              float64
Close            float64
Volume             int64
dtype: object

<h2><a id="C">Preprocessing Time Series Data with Pandas</a><h2>

### Convert time data to a `datetime` object.

Overwrite the original `Date` column with one that's been converted to a `datetime` series.

In [41]:
aapl['Date'] = pd.to_datetime(aapl.Date)

We can see these changes reflected in the `Date` column structure.

In [42]:
aapl.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,2017-01-13,119.11,119.62,118.81,119.04,26111948
1,2017-01-12,118.9,119.3,118.21,119.25,27086220
2,2017-01-11,118.74,119.93,118.6,119.75,27588593
3,2017-01-10,118.77,119.38,118.3,119.11,24462051
4,2017-01-09,117.95,119.43,117.94,118.99,33561948


We can also see that the `Date` object has changed. 

In [43]:
aapl.dtypes

Date      datetime64[ns]
Open             float64
High             float64
Low              float64
Close            float64
Volume             int64
dtype: object

### The `.dt` Attribute (like the `.str` accessor)

Pandas' `datetime` columns have a `.dt` attribute that allows you to access attributes that are specific to dates. For example:

    aapl.Date.dt.day
    aapl.Date.dt.month
    aapl.Date.dt.year
    aapl.Date.dt.weekday_name

And, there are many more!

In [15]:
aapl.Date.dt.day.head()

0    13
1    12
2    11
3    10
4     9
Name: Date, dtype: int64

In [16]:
aapl.Date.dt.dayofyear.head()

0    13
1    12
2    11
3    10
4     9
Name: Date, dtype: int64

In [17]:
aapl.Date.dt.month.head()

0    1
1    1
2    1
3    1
4    1
Name: Date, dtype: int64

In [18]:
aapl.Date.dt.year.head()

0    2017
1    2017
2    2017
3    2017
4    2017
Name: Date, dtype: int64

Check out the Pandas `.dt` [documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.html) for more information.

### Timestamps

Timestamps are useful objects for comparisons. You can create a timestamp object using the `pd.to_datetime()` function and a string specifying the date. These objects are especially helpful when you need to perform logical filtering with dates.

WATCH OUT! pd.to_datetime usually likes american format (month, day, year). If you use parameter format='%d / %m / %Y'

In [19]:
ts = pd.to_datetime('1/1/2017')
ts

Timestamp('2017-01-01 00:00:00')

The main difference between a `datetime` object and a timestamp is that timestamps can be used as comparisons or filters.

Let's use the timestamp `ts` as a comparison with our Apple stock data.

In [20]:
aapl.loc[aapl.Date >= ts, :].head()

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,2017-01-13,119.11,119.62,118.81,119.04,26111948
1,2017-01-12,118.9,119.3,118.21,119.25,27086220
2,2017-01-11,118.74,119.93,118.6,119.75,27588593
3,2017-01-10,118.77,119.38,118.3,119.11,24462051
4,2017-01-09,117.95,119.43,117.94,118.99,33561948


We can even get the first and last dates from a time series.

In [18]:
aapl.Date.max() - aapl.Date.min()

Timedelta('360 days 00:00:00')

> **Check for Understanding:** Why do we convert the DataFrame column containing the time information into a `datetime` object?

### Set `datetime` to Index the DataFrame

After converting the column containing time data from object to `datetime`, it is also useful to make the index of the DataFrame a `datetime`. If you set the index and sort by the index, it can be much faster.

In [22]:
aapl.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,2017-01-13,119.11,119.62,118.81,119.04,26111948
1,2017-01-12,118.9,119.3,118.21,119.25,27086220
2,2017-01-11,118.74,119.93,118.6,119.75,27588593
3,2017-01-10,118.77,119.38,118.3,119.11,24462051
4,2017-01-09,117.95,119.43,117.94,118.99,33561948


Let's set the `Date` column as the index.

In [23]:
aapl.set_index('Date', inplace=True)

In [24]:
aapl.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2017-01-13,119.11,119.62,118.81,119.04,26111948
2017-01-12,118.9,119.3,118.21,119.25,27086220
2017-01-11,118.74,119.93,118.6,119.75,27588593
2017-01-10,118.77,119.38,118.3,119.11,24462051
2017-01-09,117.95,119.43,117.94,118.99,33561948


### Filtering by Date with Pandas

It is easy to filter by date using Pandas. Let's create a subset of data containing only the stock prices from 2017. We can specify the index as a string constant. 

In [41]:
aapl.loc['2017']

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2017-01-13,119.11,119.62,118.81,119.04,26111948
2017-01-12,118.9,119.3,118.21,119.25,27086220
2017-01-11,118.74,119.93,118.6,119.75,27588593
2017-01-10,118.77,119.38,118.3,119.11,24462051
2017-01-09,117.95,119.43,117.94,118.99,33561948
2017-01-06,116.78,118.16,116.47,117.91,31751900
2017-01-05,115.92,116.86,115.81,116.61,22193587
2017-01-04,115.85,116.51,115.75,116.02,21118116
2017-01-03,115.8,116.33,114.76,116.15,28781865


In [42]:
aapl.loc['2016-01']

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2016-01-29,94.79,97.34,94.35,97.34,64010141
2016-01-28,93.79,94.52,92.39,94.09,55557109
2016-01-27,96.04,96.63,93.34,93.42,132224500
2016-01-26,99.93,100.88,98.07,99.99,63538305
2016-01-25,101.52,101.53,99.21,99.44,51196375
2016-01-22,98.63,101.46,98.37,101.42,65562769
2016-01-21,97.06,97.88,94.94,96.3,52054521
2016-01-20,95.1,98.19,93.42,96.79,72008265
2016-01-19,98.41,98.65,95.5,96.66,52841349


In [55]:
aapl.loc['2016-02-01':'2016-03-01']

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2016-03-01,97.65,100.77,97.42,100.53,50153943
2016-02-29,96.86,98.23,96.65,96.69,34876558
2016-02-26,97.2,98.02,96.58,96.91,28913208
2016-02-25,96.05,96.76,95.25,96.76,27393905
2016-02-24,93.98,96.38,93.32,96.1,36155642
2016-02-23,96.4,96.5,94.55,94.69,31686699
2016-02-22,96.31,96.9,95.92,96.88,34048195
2016-02-19,96.0,96.76,95.8,96.04,34485576
2016-02-18,98.84,98.89,96.09,96.26,38494442
2016-02-17,96.67,98.21,96.15,98.12,44390173


In [57]:
aapl[aapl.index.month == 7]

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2016-07-29,104.19,104.55,103.68,104.21,27733688
2016-07-28,102.83,104.45,102.82,104.34,39869839
2016-07-27,104.26,104.35,102.75,102.95,92344820
2016-07-26,96.82,97.97,96.42,96.67,56239822
2016-07-25,98.25,98.84,96.92,97.34,40382921
2016-07-22,99.26,99.3,98.31,98.66,28313669
2016-07-21,99.83,101.0,99.13,99.43,32702028
2016-07-20,100.0,100.46,99.74,99.96,26275968
2016-07-19,99.56,100.0,99.34,99.87,23779924
2016-07-18,98.7,100.13,98.6,99.83,36493867


There are a few things to note about indexing with time series. Unlike numeric indexing, the end index will be included. If you want to index with a range, the time indices must be sorted first.  

> **Recap:** The steps for preprocessing time series data are to:
* Convert time data to a `datetime` object.
* Set `datetime` to index the DataFrame.

# Recap

* We use time series analysis to identify changes in values over time.
* The `datetime` library makes working with time data more convenient.
* To preprocess time series data with Pandas, you:
    1. Convert the time column to a `datetime` object.
    2. Set the time column as the index of the DataFrame.

<h2><a id="D">Independent Practice</a></h2>

**Instructor Note**: These are optional and can be assigned as student practice questions outside of class.

### 1) Create a `datetime` object representing today's date.

In [58]:
today_date = datetime(2022, 7, 17)

In [59]:
print(today_date.year)
print(today_date.month)
print(today_date.day)

2022
7
17


### 2) Load the UFO data set from the internet.

In [60]:
ufo = pd.read_csv('http://bit.ly/uforeports')

In [61]:
ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


In [62]:
# Note that Time column is string format
ufo.dtypes

City               object
Colors Reported    object
Shape Reported     object
State              object
Time               object
dtype: object

In [63]:
ufo.shape

(18241, 5)

In [64]:
ufo.isnull().sum()

City                  25
Colors Reported    15359
Shape Reported      2644
State                  0
Time                   0
dtype: int64

### 3) Convert the `Time` column to a `datetime` object.

In [65]:
ufo['Time'] = pd.to_datetime(ufo.Time)

In [66]:
ufo.dtypes

City                       object
Colors Reported            object
Shape Reported             object
State                      object
Time               datetime64[ns]
dtype: object

### 4) Set the `Time` column to the index of the dataframe.

In [67]:
# Without inplace=True, you won't be able to index 
ufo.set_index('Time', inplace=True)

### 5) Create a `timestamp` object for the date January 1, 1999.

In [68]:
timestamp = pd.Timestamp(1999, 1, 1)

### 6) Use the `timestamp` object to perform logical filtering on the DataFrame and create a subset of entries with a date above or equal to January 1, 1999.

In [71]:
ufo.loc[ufo.index >= timestamp]

Unnamed: 0_level_0,City,Colors Reported,Shape Reported,State
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1999-01-01 02:30:00,Loma Rica,,LIGHT,CA
1999-01-01 03:00:00,Bauxite,,,AR
1999-01-01 14:00:00,Florence,,CYLINDER,SC
1999-01-01 15:00:00,Lake Henshaw,,CIGAR,CA
1999-01-01 17:15:00,Wilmington Island,,LIGHT,GA
...,...,...,...,...
2000-12-31 23:00:00,Grant Park,,TRIANGLE,IL
2000-12-31 23:00:00,Spirit Lake,,DISK,IA
2000-12-31 23:45:00,Eagle River,,,WI
2000-12-31 23:45:00,Eagle River,RED,LIGHT,WI


In [72]:
#ufo[ufo.index < timestamp]

In [76]:
# Slicing on a time series object
ufo[timestamp :]

Unnamed: 0_level_0,City,Colors Reported,Shape Reported,State
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1999-01-01 02:30:00,Loma Rica,,LIGHT,CA
1999-01-01 03:00:00,Bauxite,,,AR
1999-01-01 14:00:00,Florence,,CYLINDER,SC
1999-01-01 15:00:00,Lake Henshaw,,CIGAR,CA
1999-01-01 17:15:00,Wilmington Island,,LIGHT,GA
...,...,...,...,...
2000-12-31 23:00:00,Grant Park,,TRIANGLE,IL
2000-12-31 23:00:00,Spirit Lake,,DISK,IA
2000-12-31 23:45:00,Eagle River,,,WI
2000-12-31 23:45:00,Eagle River,RED,LIGHT,WI
