<img src=images/logo.png align='right' width=200>

# Time Based Features

## Goal

Our next step is to get introduced to a real time series dataset and learn some fundamental analysis techniques.

In this notebook we shall focus on creating time-based features.

## Program

- [Time Based Features](#tb)
- [Assignment]()
- [Summary](#sum)


In [1]:
import pandas as pd
import numpy as np
import seaborn as sns

<a id='read'></a>

## Reading in Time Series Data
Again, we will use the *household power consumption* dataset. It comes from [UCI ML repo](https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption) and contains detailed power consumption time series data of a single household in Paris between 2006 and 2010.
![](images/power.jpeg)

Let's load the data in again, taking care to parse the time information and set it as the index.

In [2]:
power = pd.read_csv('data/household_power_consumption.csv', 
                    parse_dates=['ts'], 
                    index_col='ts')
power.head()

Unnamed: 0_level_0,consumption
ts,Unnamed: 1_level_1
2006-12-16 17:24:00,52.266667
2006-12-16 17:25:00,72.333333
2006-12-16 17:26:00,70.566667
2006-12-16 17:27:00,71.8
2006-12-16 17:28:00,43.1


<a id='tb'></a>
## Time Based Features

Let's create a daily data frame (precision in minutes can be a little too much granularity)

In [3]:
power_daily = power.resample('D').sum()
power_daily

Unnamed: 0_level_0,consumption
ts,Unnamed: 1_level_1
2006-12-16,14680.933333
2006-12-17,36946.666667
2006-12-18,19028.433333
2006-12-19,13131.900000
2006-12-20,20384.800000
...,...
2010-11-22,16924.600000
2010-11-23,16352.266667
2010-11-24,13769.466667
2010-11-25,17278.733333


We can add time based features to our data.

In the example we choose to extract the day of the week and the quarter. Feel free to explore [other](https://pandas.pydata.org/pandas-docs/stable/reference/series.html#api-series-dt) properties and methods.

In [5]:
(
    power_daily
    .assign(weekday = power_daily.index.day_name())
    .assign(quarter = power_daily.index.quarter)
    .loc[lambda df: df['weekday']=='Monday']
).head()

Unnamed: 0_level_0,consumption,weekday,quarter
ts,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2007-07-01,10716.066667,Sunday,3
2007-07-02,6373.2,Monday,3
2007-07-03,9289.2,Tuesday,3
2007-07-04,6544.5,Wednesday,3
2007-07-05,9956.5,Thursday,3


The time based features can help with our analysis.

In the example below we add year and day of year features to our data. This helps us to track the daily consumption across the different years.

In [None]:
power_monthly = (
    power.resample('M').sum()
    .assign(year = lambda df: df.index.year)
    .assign(month = lambda df: df.index.month)
    .assign(month_name = lambda df: df.index.month_name())
)
power_monthly.head()

In [None]:
import seaborn as sns

sns.lineplot(data=power_monthly, x='month', y='consumption', hue=power_monthly['year'].astype(str))

In [None]:
import altair as alt

alt.Chart(power_monthly).mark_line().encode(
    x='month',
    y='consumption',
    color='year:N',
    tooltip = ['consumption', 'month_name']
).interactive()

<a id='as2'></a>
## Assignment: Weekday with most consumption

Which day of the week has the highest consumption on average? Does consumption drop in the weekend? 

Bonus: make a bar plot to better illustrate weekly consumption patterns

*you can load the answer below:*

In [None]:
# %load answers/power-weekday-most-consumption.py

<a id='sum'></a>
## Summary

We have covered: 
- How to properly read in time series data in Pandas, and why it is important to set the date as an index
- How to aggregate over time periods with Pandas
- How to create time based features to aid with analysis
