### Learning Objectives
 
**After this lesson, you will be able to:**
- Identify time series data.
- Explain the challenges of working with time series data.
- Use the `datetime` library to represent dates as objects.
- Preprocess time series data with Pandas.

---

### Lesson Guide

#### Time Series Data
- [What is a Time Series](#A)
- [The Datetime Library](#B)
- [Preprocessing Time Series Data with Pandas](#C)
- [Independent Practice](#D)
----

<h2><a id="A">What is a Time Series?</a></h2>

A **time series** is a series of data points that's indexed (or listed, or graphed) in time order. Most commonly, a time series is a sequence that's taken at successive equally spaced points in time. Time series are often represented as a set of observations that have a time-bound relation, which is represented as an index.

Time series are commonly found in sales, analysis, stock market trends, economic phenomena, and social science problems.

These data sets are often investigated to evaluate the long-term trends, forecast the future, or perform some other form of analysis.

> **Check for Understanding:** List some examples of real-world time series data.

<h2><a id="B">The DateTime library</a></h2>

As time is important to time series data, we will need to interpret these data in the ways that humans interpret them (which is many ways). 

Python's `DateTime` library is great for dealing with time-related data, and Pandas has incorporated this library into its own `datetime` series and objects.

In this lesson, we'll review these data types and learn a little more about each of them:

* `datetime` objects.
* `datetime` series.
* Timestamps.
* `timedelta()`.

### `datetime` Objects

Below, we'll load in the `DateTime` library, which we can use to create a `datetime` object by entering in the different components of the date as arguments.

In [1]:
# The datetime library is something you should already have from Anaconda.
from datetime import datetime

In [2]:
# Let's just set a random datetime
lesson_date = datetime(2020, 8, 21, 12, 21, 12, 844089)

The components of the date are accessible via the object's attributes.

In [3]:
print("Micro-Second", lesson_date.microsecond)
print("Second", lesson_date.second)
print("Minute", lesson_date.minute)
print("Hour", lesson_date.hour)
print("Day", lesson_date.day)
print("Month",lesson_date.month)
print("Year", lesson_date.year)

### `timedelta()`

Suppose we want to add time to or subtract time from a date. Maybe we're using time as an index and want to get everything that happened a week before a specific observation.

We can use a `timedelta` object to shift a `datetime` object. Here's an example:

In [4]:
# Import timedelta() from the DateTime library.
from datetime import timedelta

# Timedeltas represent time as an amount rather than as a fixed position.
offset = timedelta(days=1, seconds=20)

# The timedelta() has attributes that allow us to extract values from it.
print('offset days', offset.days)
print('offset seconds', offset.seconds)
print('offset microseconds', offset.microseconds)

`datetime`'s `.now()` function will give you the `datetime` object of this very moment.

In [5]:
now = datetime.now()
print("Like Right Now: ", now)

The current time is particularly useful when using `timedelta()`.

In [6]:
print("Future: ", now + offset)
print("Past: ", now - offset)

*Note: The largest value a `timedelta()` can hold is days. For instance, you can't say you want your offset to be two years, 44 days, and 12 hours; you have to convert those years to days.*

You can read more about the `timedelta()` category [here](https://docs.python.org/2/library/datetime.html).

## Guided Practice: Airline Passenger Data

### Let's take a look at some airline passenger data to get a feel for what time series data look like.

In [7]:
import pandas as pd
from datetime import timedelta
%matplotlib inline

airlines_df = pd.read_csv("airline_passengers.csv")

In [8]:
airlines_df.head()

Unnamed: 0,Month,Thousands of Passengers
0,1949-01,112
1,1949-02,118
2,1949-03,132
3,1949-04,129
4,1949-05,121


In [9]:
airlines_df.describe()

Unnamed: 0,Thousands of Passengers
count,144.0
mean,280.298611
std,119.966317
min,104.0
25%,180.0
50%,265.5
75%,360.5
max,622.0


The `Month` column starts off as an object.

In [10]:
airlines_df.dtypes

Month                      object
Thousands of Passengers     int64
dtype: object

<h2><a id="C">Preprocessing Time Series Data with Pandas</a><h2>

### Convert time data to a `datetime` object.

Overwrite the original `Month` column with one that's been converted to a `datetime` series.

In [11]:
airlines_df['Month'] = pd.to_datetime(airlines_df['Month'])

We can see these changes reflected in the `Month` column structure.

In [12]:
airlines_df.head()

Unnamed: 0,Month,Thousands of Passengers
0,1949-01-01,112
1,1949-02-01,118
2,1949-03-01,132
3,1949-04-01,129
4,1949-05-01,121


We can also see that the `Month` object has changed. 

In [13]:
airlines_df.dtypes

Month                      datetime64[ns]
Thousands of Passengers             int64
dtype: object

### The `.dt` Attribute

Pandas' `datetime` columns have a `.dt` attribute that allows you to access attributes that are specific to dates. For example:

    airlines_df.Month.dt.day
    airlines_df.Month.dt.month
    airlines_df.Month.dt.year

And, there are many more!

In [14]:
airlines_df.Month.dt.year

0      1949
1      1949
2      1949
3      1949
4      1949
       ... 
139    1960
140    1960
141    1960
142    1960
143    1960
Name: Month, Length: 144, dtype: int64

In [15]:
airlines_df.Month.dt.dayofyear.head()

0      1
1     32
2     60
3     91
4    121
Name: Month, dtype: int64

Check out the Pandas `.dt` [documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.html) for more information.

### Timestamps

Timestamp is the pandas equivalent of python’s Datetime. Timestamps are useful objects for comparisons. 

You can create a timestamp object using the `pd.to_datetime()` function and a string specifying the date. These objects are especially helpful when you need to perform logical filtering with dates.

In [16]:
ts = pd.to_datetime('3/9/1949')
ts

Timestamp('1949-03-09 00:00:00')

In [17]:
airlines_df[airlines_df.Month > '3/9/1949']

Unnamed: 0,Month,Thousands of Passengers
3,1949-04-01,129
4,1949-05-01,121
5,1949-06-01,135
6,1949-07-01,148
7,1949-08-01,148
...,...,...
139,1960-08-01,606
140,1960-09-01,508
141,1960-10-01,461
142,1960-11-01,390


Let's use the timestamp `ts` as a comparison with our Apple stock data.

In [18]:
airlines_df.loc[airlines_df.Month >= ts, :].head()

Unnamed: 0,Month,Thousands of Passengers
3,1949-04-01,129
4,1949-05-01,121
5,1949-06-01,135
6,1949-07-01,148
7,1949-08-01,148


We can even get the first and last dates from a time series.

In [19]:
airlines_df.Month.max() - airlines_df.Month.min()

Timedelta('4352 days 00:00:00')

> **Check for Understanding:** Why do we convert the DataFrame column containing the time information into a `datetime` object?

### Set `datetime` to Index the DataFrame

After converting the column containing time data from object to `datetime`, it is also useful to make the index of the DataFrame a `datetime`.

In [20]:
airlines_df.head()

Unnamed: 0,Month,Thousands of Passengers
0,1949-01-01,112
1,1949-02-01,118
2,1949-03-01,132
3,1949-04-01,129
4,1949-05-01,121


Let's set the `Date` column as the index.

In [21]:
airlines_df.set_index('Month', inplace=True)

In [22]:
airlines_df.head()

Unnamed: 0_level_0,Thousands of Passengers
Month,Unnamed: 1_level_1
1949-01-01,112
1949-02-01,118
1949-03-01,132
1949-04-01,129
1949-05-01,121


When setting the TimeSeries index, we may need to specify the frequency (though this is usually calculated by default).

The syntax for specifying the index frequency is:
> <span style="color:blue">\# *Set the index frequency to MS*</span><br>
Series.index.freq = 'MS'

In this example, *Series* is the name of the TimeSeries variable and *MS* refers to the frequency code corresponding to the start of the month (as per the data). The following table lists some of the frequency codes and descriptions of each. This may be useful for reference and is also available [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases).

| Frequency Code | Description | Example Data |
| --- | --- | --- |
| A | Year End | `2020-12-31` `2019-12-31` |
| AS | Year Start | `2020-01-01` `2019-01-01` |
| Q | Quarter End | `2020-03-31` `2020-06-30` |
| M | Month End | `2020-06-30` `2020-07-31` |
| **MS** | **Month Start** | `2020-05-01` `2020-06-01` |
| W | Week | `2020-04-13` `2020-04-20` |
| D | Day | `2020-08-14` `2020-08-15` |
| B | Business Day | `2020-06-26` `2020-06-29` |
| H | Hour | `2020-12-31 12:00:00` |
| T | Minutes | `2020-12-31 12:01:00` |
| S | Seconds | `2020-12-31 12:01:59` |

In [23]:
# Set the index frequency to MS

airlines_df.index.freq = 'MS'

In [24]:
airlines_df.index.rename('', inplace=True)

### Filtering by Date with Pandas

It is easy to filter by date using Pandas. Let's create a subset of data containing only the stock prices from 2017. We can specify the index as a string constant. 

In [25]:
airlines_df['July 1958']

Unnamed: 0,Thousands of Passengers
,
1958-07-01,491.0


There are a few things to note about indexing with time series. Unlike numeric indexing, the end index will be included. If you want to index with a range, the time indices must be sorted first.  

> **Recap:** The steps for preprocessing time series data are to:
* Convert time data to a `datetime` object.
* Set `datetime` to index the DataFrame.

# Recap

* We use time series analysis to identify changes in values over time.
* The `datetime` library makes working with time data more convenient.
* To preprocess time series data with Pandas, you:
    1. Convert the time column to a `datetime` object.
    2. Set the time column as the index of the DataFrame.

<h2><a id="D">Independent Practice</a></h2>

**Instructor Note**: These are optional and can be assigned as student practice questions outside of class.

### 1) Create a `datetime` object representing today's date.

In [26]:
# A:

### 2) Load the UFO data set from the internet.

In [27]:
import pandas as pd
from datetime import timedelta
%matplotlib inline

ufo = pd.read_csv('http://bit.ly/uforeports')

ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


### 3) Convert the `Time` column to a `datetime` object.

In [28]:
# A:

### 4) Set the `Time` column to the index of the dataframe.

In [29]:
# A:

### 5) Create a `timestamp` object for the date January 1, 1999.

In [30]:
# A:

### 6) Use the `timestamp` object to perform logical filtering on the DataFrame and create a subset of entries with a date above or equal to January 1, 1999.

In [31]:
# A: