In [None]:
# Optional: change Jupyter Notebook theme to GDD theme
from IPython.core.display import HTML
HTML(url='https://gdd.li/jupyter-theme')

![footer_logo](images/logo.png)
# Hackathon!

It is time to put your accumulated Time Series skills to work. You can select one of the offered datasets or work with your own. There will be a range of questions based on the materials we have covered, but you are encouraged to do your own data exploration and answer your own questions!

Good luck!

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

%matplotlib inline
plt.rcParams['figure.figsize'] = (16,4)

from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import PolynomialFeatures

# The Data
![footer_logo](images/cheese.jpeg)

### Dataset 1: Cheese Production in the Netherlands

The processing of raw cow's milk into cheese products by dairy factories in the Netherlands. The raw material for these products is the volume of cow's milk collected from dairy farmers in the Netherlands as well as imported milk. This dataset contains only cheeses made from cow milk only. Data comes from [CBS](https://opendata.cbs.nl/statline/#/CBS/en/dataset/7425eng/table?dl=31ECC).

In [None]:
cheese = pd.read_csv('data/cheese_production.csv')
cheese.head()

![footer_logo](images/car.jpeg)

### Dataset 2: Road Accidents in United Kingdom

The UK government [collects](https://www.kaggle.com/tsiaras/uk-road-safety-accidents-and-vehicles) and publishes detailed information about traffic accidents across the country. This information includes, but is not limited to, geographical locations, weather conditions, type of vehicles and number of casualties. The current dataset focuses on the aggregate number of traffic accidents. 

In [None]:
accidents = pd.read_csv('data/accidents.csv')
accidents.head()

### Dataset 3: Your Own Dataset

If you have another interesting time series dataset to work with, feel free to focus on it instead, but try to answer the questions in the assignments below.

In [None]:
#load your data here


# The Assignment

You are strongly encouraged to explore your chosen dataset and think of some questions you might want to answer. Write down some ideas here! 


However, we will also provide you with some questions to guide the exploration and analysis process. 

## Questions
Follow the below questions to analyze the chosen time series, but do not feel too restricted by them. It is also a good idea to think of your own questions and answer them along the way.

*Note:* the $y$ value (cheese production, number of accidents etc) is further referred to as "*the value*"

### Preliminary 

#### 1. Is the date set as the index? If not, make sure it is set! 

#### 2. What range of dates does the dataset cover? How frequent are timestamps?

#### 3. When is the highest / lowest value of $y$ observed and what is it?

#### 4. Plot the data. Do you see any yearly, monthly or any other cycles? Is there a trend?

### Data Analysis (aggregations)

#### 1. Use .resample() method to find quarterly averages. In which quarter are the highest values observed?

#### 2. In which month on average do the values change the most compared to the previous month?

#### 3. Plot two centered rolling means with two different windows. 

### Modeling
#### 1. Fit and plot a linear model. Does it adequately represent the trend?

#### 2. Are there any noticeable break points? If so, add respective dummy(s) & interactions to the linear model.

#### 3. Add seasonal dummies to the model. How frequent should they be to capture seasonality well?

#### 4. Add rbf features instead of dummies to the model. Does it fit the data better now?
 

#### 5. Experiment with the linear model and try to find the best fit.