<a href="https://colab.research.google.com/github/Mohamed-Silaya/ML-ZAKA/blob/main/04_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

© 2022, Zaka AI, Inc. All Rights Reserved

#Regression
**Objective:** In this notebook exercise, we will work in the Boston House Price dataset to predict through regression the price of houses in thousand of dollars. 
We will load the data, create a baseline model, train and evaluate it to predict with it and finally alter the performance of our model by standardizing our dataset and trying out wider and/or deeper network topologies.

### Importing the data from the github repository

In [None]:
# clone git repo
!git clone https://github.com/zaka-ai/intro2dl.git

# change directory
%cd intro2dl/data/

## Regression with Boston House Price dataset

### 1. Load data

In this notebook, we are going to use the [**Boston house price dataset** dataset](https://archive.ics.uci.edu/ml/datasets/Housing). 

The dataset describes 13 numerical properties of houses in Boston suburbs and is concerned with modeling the price of houses in those suburbs in thousands of dollars. As such, this is a regression predictive modeling problem. Input attributes include things like crime rate, proportion of nonretail business acres, chemical concentrations and more.



In [None]:
from pandas import read_csv

# load dataset
dataframe = read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = dataframe.values

# split into input (X) and output (Y) variables
X = dataset[:,0:13]
Y = dataset[:,13]

### 2. Define Base Model

Create a Keras model with 1 hidden layer (size = input layer size).

We should define a `baseline_model()` funtion that will create the model, compile it and return it.

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# define baseline model
def baseline_model():
	# FILL BLANKS
	# build the model


 	#compile the model
	 

	return model

### 3. Evaluate baseline model

Evaluate the model using stratified cross validation in the scikit-learn framework. Number of splits should be 10. 

In [None]:
! pip install scikeras
from scikeras.wrappers import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold

# evaluate model
estimator = KerasRegressor(model=baseline_model, epochs=10, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, random_state=42)
results = cross_val_score(estimator, X, Y, cv=kfold)
print("Baseline: %.2f (%.2f) MSE" % (results.mean(), results.std()))

## Lift Performance By Standardizing The Dataset
Standardizing the dataset referes to transforming all datapoints values to the range of 0 to 1. This is done using `scikit-learn`'s StandardScaler. We will also build a pipeline which will call the function creating the model, then compile and training it. The last step is to evaluate the performance of the model using cross-validation. This will show us whether standardizing a dataset betters the performance of a deep learning model.

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# evaluate baseline model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(model=baseline_model, epochs=10, batch_size=5, verbose=0)))

pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, shuffle=True, random_state=42)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Standarized: %.2f (%.2f) MSE" % (results.mean(), results.std()))

## Tune The Neural Network Topology
We can alter the architecture of the hidden layers of the neural network to observe changes in the results we get.

### Evaluate a wider network
A wider network is a network where the hidden layer has more neurons than it previously had. Let's create a network which has 25 neurons in the hidden layer instead of thirteen. So, almost double.

In [None]:
def wider_model():
	# FILL BLANKS


	return model

Next up, standardizing the dataset and using the pipeline to build, compile and train the model to get the score of this wider network.

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# evaluate baseline model with standardized dataset
# FILL BLANKS
# pipeline


# evaluation



### Evaluate a deeper network
A deeper network is a network which has more hidden layers than the previous baseline network. Let's add another hidden layer for a total of two hidden layers with 13 neurons each and check the results we get.

In [None]:
def deeper_model():
	# FILL BLANKS
 
 #compile the model

	return model

Standardizing the dataset and using the pipeline to build, compile and train the model to get the score of this wider network.

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# evaluate baseline model with standardized dataset
# FILL BLANKS
