<a href="https://colab.research.google.com/github/aaubs/ds-master/blob/main/courses/ds4b-m1-6-sml/notebooks/s2-sml-timeseries.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import altair as alt
import matplotlib.pyplot as plt

# Introduction to timeseries

What is a time series analysis and what are the benefits? A time series analysis focuses on a series of data points ordered in time. This is one of the most widely used data science analyses and is applied in a variety of industries. 

This approach can play a huge role in helping companies understand and forecast data patterns and other phenomena, and the results can drive better business decisions. For example:

* If you’re a retailer, a time series analysis can help you forecast daily sales volumes to guide decisions around inventory and better timing for marketing efforts.
* If you’re in the financial industry, a time series analysis can allow you to forecast stock prices for more effective investment decisions
* If you’re an agricultural company, a time series analysis can be used for weather forecasting to guide planning decisions around planting and harvesting.

## Basics

- A **time-series** data is a series of data points or observations recorded at different or regular time intervals. In general, a time series is a sequence of data points taken at equally spaced time intervals.  The frequency of recorded data points may be hourly, daily, weekly, monthly, quarterly or annually.


- **Time-Series Forecasting** is the process of using a statistical model to predict future values of a time-series based on past results.


- A time series analysis encompasses statistical methods for analyzing time series data. These methods enable us to extract meaningful statistics, patterns and other characteristics of the data. Time series are visualized with the help of line charts. So, time series analysis involves understanding inherent aspects of the time series data so that we can create meaningful and accurate forecasts.


- Applications of time series are used in statistics, finance or business applications. A very common example of time series data is the daily closing value of the stock index like NASDAQ or Dow Jones. Other common applications of time series are sales and demand forecasting, weather forecasting, econometrics, signal processing, pattern recognition and earthquake prediction.


## Components of a Time-Series


- **Trend** - The trend shows a general direction of the time series data over a long period of time. A trend can be increasing(upward), decreasing(downward), or horizontal(stationary).


- **Seasonality** - The seasonality component exhibits a trend that repeats with respect to timing, direction, and magnitude. Some examples include an increase in water consumption in summer due to hot weather conditions.


- **Cyclical Component** - These are the trends with no set repetition over a particular period of time. A cycle refers to the period of ups and downs, booms and slums of a time series, mostly observed in business cycles. These cycles do not exhibit a seasonal variation but generally occur over a time period of 3 to 12 years depending on the nature of the time series.


- **Irregular Variation** - These are the fluctuations in the time series data which become evident when trend and cyclical variations are removed. These variations are unpredictable, erratic, and may or may not be random.

- **ETS Decomposition** - ETS Decomposition is used to separate different components of a time series. The term ETS stands for Error, Trend and Seasonality.

![timeseries seassions](https://github.com/aaubs/ds-master/blob/main/courses/ds4b-m1-6-sml/notebooks/aaubs.github.io/ds-master/data/Images/m1_sml_time_series_seasson.png?raw=1)

# Data example

In [None]:
from vega_datasets import data

# Air Passengers

In [None]:
data_passengers = pd.read_csv("https://raw.githubusercontent.com/aaubs/ds-master/main/data/air_passengers.csv")

In [None]:
data_passengers.head()

In [None]:
data_passengers.info()

In [None]:
df['Date'] = pd.to_datetime(df['Date'])
# Set the date as index 
df = df.set_index('Date')

## Weather
https://altair-viz.github.io/user_guide/times_and_dates.html

In [None]:
data_temp = data.seattle_temps()
data_temp.head()

In [None]:
data_temp.info()

In [None]:
data_temp["date"] = pd.to_datetime(data_temp["date"])

In [None]:
alt.Chart(ata_temp[data_temp.date < '2010-01-15']).mark_line().encode(
    x='date:T',
    y='temp:Q'
)

In [None]:
alt.Chart(ata_temp[data_temp.date < '2010-01-15']).mark_rect().encode(
    alt.X('hoursminutes(date):O', title='hour of day'),
    alt.Y('monthdate(date):O', title='date'),
    alt.Color('temp:Q', title='temperature (F)')
)

In [None]:
alt.Chart(data_temp.resample('M', on='date').mean().reset_index()).mark_line().encode(
    x='date:T',
    y='temp:Q'
)

## Energy Usage
https://infovis.fh-potsdam.de/tutorials/infovis4time.html 

In [None]:
# data downloaded from OPSD - see filter elements on this page: https://data.open-power-system-data.org/time_series/2019-06-05

data_energy = pd.read_csv("https://data.open-power-system-data.org/index.php?package=time_series&version=2019-06-05&action=customDownload&resource=3&filter%5B_contentfilter_cet_cest_timestamp%5D%5Bfrom%5D=2015-01-01&filter%5B_contentfilter_cet_cest_timestamp%5D%5Bto%5D=2018-12-31&filter%5BRegion%5D%5B%5D=DE&filter%5BVariable%5D%5B%5D=load_actual_entsoe_transparency&filter%5BVariable%5D%5B%5D=solar_generation_actual&filter%5BVariable%5D%5B%5D=wind_generation_actual&downloadCSV=Download+CSV",
                   parse_dates=['utc_timestamp']) # parse timestamp column

In [None]:
data_energy.head()

In [None]:
data_energy = data_energy.drop(columns=["cet_cest_timestamp"])
data_energy.columns=["datetime", "load", "solar", "wind"]
data_energy["datetime"] = data_energy["datetime"].dt.tz_convert("Europe/Berlin")
data_energy = data_energy.set_index("datetime")

In [None]:
data_energy.head()

In [None]:
data_energy_day = data_energy.resample("D").sum()
data_energy_month = data_energy_day.resample("M").mean()

In [None]:
data_energy_day.head()

In [None]:
alt.Chart(data_energy_day.reset_index().melt("datetime")).mark_circle().encode(
    x='datetime',
    y='value',
    color='variable',
).properties(width=800, height=400)

In [None]:
plot_day = alt.Chart(data_energy_day.reset_index().melt("datetime")).mark_line(strokeWidth=1).encode(
    x='datetime',
    y='value',
    color='variable',
).properties(width=800, height=400)

plot_day

In [None]:
plot_month = alt.Chart(data_energy_month.reset_index().melt("datetime")).mark_line(opacity=0.75, interpolate="basis").encode(
    x='datetime',
    y='value',
    color='variable',
).properties(width=800, height=400)

plot_month

In [None]:
plot_day + plot_month

## Bonus: Stocks

In [None]:
pip install yfinance
import yfinance as yf

In [None]:
data_stocks = yf.download(tickers='META', period='5y', interval='1d')
#Print data
data

In [None]:
sns.lineplot(data=data,
             x="Date",
             y="Close")
plt.show()