This notebook contains the first task in DELTA. Additionally, the notebook shows how to import the provided data.

### **1. Description of the first Task**

**1) What to submit?**

We ask you to submit one zip archive including a Jupyter notebook and a csv file containing your predictions on the test set.

**2) Where to submit?**

Submit the zip file via the corresponding Moodle module.

**3) Where to find the data?**

The data sets for this task and the demo predictions file 'Demo_Predictions.csv' can be found on the main Moodle page under 'Task 1 Data'. The format of your predictions should be the same as the csv demo file. Also, your predictions should be in the original scale of the train and test datasets. The files 'train_X.csv' and 'test_X.csv' contain the predictors in the train and test set, respectively. The file 'train_y.csv' contains the target variable in the train set. In the second section of this notebook we demonstrate how to import the data.


**4) What is expected?**

You are provided with a tabular dataset containing 19 predictor features and 1 continuous target variable. Your task is to build a neural network in keras/tensorflow to predict the target variable. The data is separated into a train set of 1875 data samples, and a test set of 625 data samples.

The jupyter notebook should contain the following elements:

- Preprocessing of your datasets according to the requirements of neural networks. (different features have different descriptive statistics!)
- Implementation of 3 benchmark models: a linear model, a tree-based model and a neural network that has not been tuned. 
- Tuning of a neural network using a hyperparameter optimization library of your choice. The code for hyperparameter tuning should include regularization. 
- A well annotated plot showing the train vs. validation loss of the best performing neural network. Based on the plot you should interpret the model fit with max. 2 sentences. Anything beyond these 2 sentences will not participate in the evaluation!
- Additionally, you should provide an interpretation of why you believe the best performing architecture is suitable for your task in max. 3 sentences.
- A table showing all results (on a validation subset) from the tuned network and your benchmarks.
- Well documented code, which is divided into functions, classes, etc.


**5) How will your submissions be evaluated?**

- 50 % of your grade on the first task will be related to the above mentioned points.
- the remaining 50% will come from ranking your predictions in terms of RMSE.

**6) Can I submit after the deadline?**

We understand that unexpected circumstances sometimes lead to delayed submissions. While we will accept late submissions via e-mail, please keep in mind that there will be a deduction of 2 points per day for each day the submission is late.

### **2. Data Import**

So that you can import the data with the code cell below, you should put the data in the same folder of the notebook. Otherwise, you would have to adjust the file path.

In [2]:
import pandas as pd

#Predictor features on the train set:
train_X=pd.read_csv('train_X.csv',sep=',',header=0,index_col=0)
#Target variable on the train set:
train_y=pd.read_csv('train_y.csv',sep=',',header=0,index_col=0)
#Predictor features on the test set:
test_X=pd.read_csv('test_X.csv',sep=',',header=0,index_col=0)
#Demo predictions file:
demo_predictions=pd.read_csv('Demo_Predictions.csv',sep=',',header=0,index_col=0)

In [17]:
train_X.head(5)

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
2247,7.251153,51.921083,-58.389635,3.255184,27.654863,-0.104062,7.421228,2.150974,-0.385294,5.874683,-0.162073,-90.981115,-3.904283,-3.260125,4.188588,1.599832,14.902995,3.006737,-1.09607
816,561.546314,49.383197,-82.180264,212.208694,14.874882,0.573039,-0.454757,-11.507246,-0.673895,-0.4829,2.57382,3.977701,57.294613,59.415319,-14.18177,-58.285078,1.731263,-0.173237,4.866928
2153,3.176991,4.047675,198.122697,-0.031795,-5.60916,0.270688,9.013148,4.761844,-0.162209,5.002455,0.313917,98.434435,-3.442229,-50.922848,47.704868,41.96865,21.377814,-4.523995,-1.123949
1162,4.921202,37.843248,-64.065346,-2.396584,20.127733,-1.303227,7.597263,2.581991,0.202523,-1.808456,-1.531472,-65.145109,-3.358736,-41.952477,41.554045,30.415441,18.108487,1.446456,-0.184203
480,524.827753,51.046876,-51.283559,202.486337,29.323544,1.270929,2.092076,-9.902368,-1.978757,-0.134721,-0.765519,-84.410329,60.194127,57.161506,-14.77551,-61.650444,1.070161,2.408323,3.341689
