# Time Series

- https://www.dataquest.io/blog/tutorial-time-series-analysis-with-pandas/

## Content
- Time series data structures
- Time-based indexing
- Visualizing time series data
- Seasonality
- Frequencies
- Resampling
- Rolling windows
- Trends

In [4]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sys

%matplotlib inline

!python -V
print(sys.executable)

ModuleNotFoundError: No module named 'matplotlib'

---
# Data Set

The dataset: Open Power Systems Data [Link](https://github.com/jenfly/opsd/raw/master/opsd_germany_daily.csv)

Daily time series of Open Power System Data (OPSD) for Germany, which has been rapidly expanding its renewable energy production in recent years. 
The data set includes country-wide totals of electricity consumption, wind power production, and solar power production for 2006-2017. You can download the data here.

Electricity production and consumption are reported as daily totals in gigawatt-hours (GWh). The columns of the data file are:

- *Date* - Thedate(yyyy-mm-ddformat)
- *Consumption* - ElectricityconsumptioninGWh
- *Wind* - WindpowerproductioninGWh
- *Solar* - SolarpowerproductioninGWh
- *Wind+Solar* - SumofwindandsolarpowerproductioninGWh

---
# Creating a time series (data structure) DataFrame

**Todo:**
- check/adapt index
- check/adapt datetime
- add year/month/day columns for wrangling

In [None]:
# Open Power Systems Data
opsd_daily = pd.read_csv('opsd_germany_daily.csv')
opsd_daily.info  # not helpfull as it seems ?!

In [None]:
opsd_daily.shape

In [None]:
opsd_daily.head()

In [None]:
opsd_daily.dtypes

## Problem 1 - datetime
- Seems like Date is not recognized as DateTime Object!


In [None]:
opsd_daily['Date'] = pd.to_datetime(opsd_daily['Date'])
opsd_daily.dtypes

## Problem 2 - index
- Wrong Index

In [None]:
opsd_daily = opsd_daily.set_index('Date')
opsd_daily.head()

In [None]:
opsd_daily.index

## Solution while importing

In [None]:
df = pd.read_csv('opsd_germany_daily.csv', index_col=0, parse_dates=True)

In [None]:
df.head()


In [None]:
df.dtypes

In [None]:
df.index

## Add year/month/day

In [None]:
df['Year'] = df.index.year
df['Month'] = df.index.month
df['day'] = df.index.day
df.sample(5, random_state=0)

---
# Time-based indexing

**Remarks**:
-  `loc`
- also partial indexing


In [None]:
df.head()

# time-based indexing
df.loc['2015-01-01':'2015-01-03']

# partial string indexing
df.loc['2010-02']

---
# Visualizing time series data

**Remarks**
- blub

## Yearly seasonality

In [None]:
# Use seaborn style defaults and set the default figure size
sns.set(rc={'figure.figsize':(20, 7)})

In [None]:
cols_plot = ['Consumption', 'Solar', 'Wind']
axes = df[cols_plot].plot(marker='.', alpha=0.5, linestyle='None', subplots=True)

In [None]:
axes = df[cols_plot].loc['2012':'2018'].plot(marker='.', alpha=0.5, linestyle='None', subplots=True)
for ax in axes:
    ax.set_ylabel('Daily Totals [GWh]')

### Observations
- Electricity consumption is highest in winter, presumably due to electric heating and increased lighting usage, and lowest in summer.
- Electricity consumption appears to split into two clusters — one with oscillations centered roughly around 1400 GWh, and another with fewer and more scattered data points, centered roughly around 1150 GWh. We might guess that these clusters correspond with weekdays and weekends, and we will investigate this further shortly.
- Solar power production is highest in summer, when sunlight is most abundant, and lowest in winter.
- Wind power production is highest in winter, presumably due to stronger winds and more frequent storms, and lowest in summer.
- There appears to be a strong increasing trend in wind power production over the years.


## Weekly seasonality 

In [None]:
# df['Consumption'].loc['2015'].plot(linewidth=0.8)

# differnet version
ax = df.loc['2017', 'Consumption'].plot()
ax.set_ylabel('Daily consumption [GWh]')

**Observation**

Drastic decrease in electricity consumption in early January and late December, during the holidays.



In [None]:
ax = df.loc['2017-01':'2017-02', 'Consumption'].plot(marker='o', linestyle='--')
#ax.set(xlim=(0,5), ylim=(0,5), xticks=[0,2.5,5], yticks=[0,2.5,5])


Consumption is highest on weekdays and lowest on weekends.

## Customizing time series plots

Because date/time ticks are handled a bit differently in matplotlib.dates compared with the **DataFrame**’s `plot()` method, let’s create the *plot* directly in *matplotlib*. 

Then we use `mdates.WeekdayLocator()` and `mdates.MONDAY` to set the x-axis ticks to the first Monday of each week. 

We also use `mdates.DateFormatter()` to improve the formatting of the tick labels, using the format codes we saw earlier.

In [None]:
import matplotlib.dates as mdates

In [None]:
fig, ax = plt.subplots()
ax.plot(df.loc['2017-01':'2017-02', 'Consumption'], marker='o', linestyle='--')
ax.set_ylabel('Daily Consumption (GWh)')
ax.set_title('Jan-Feb 2017 Electricity Consumption')

# Set x-axis major ticks to weekly interval, on Mondays
ax.xaxis.set_major_locator(mdates.WeekdayLocator(byweekday=mdates.MONDAY))

In [None]:
ax.plot()