## Building multilayer perceptron (MLP) in tf.keras

This code is structured to cover **binary and multi-class classification**, as well as **regresion** using different MLP implementations.

Main references:

* [TensorFlow 2 Tutorial: Get Started in Deep Learning With tf.keras](https://machinelearningmastery.com/tensorflow-tutorial-deep-learning-with-tf-keras/)
* [How to create an MLP classifier with TensorFlow 2 and Keras](https://www.machinecurve.com/index.php/2019/07/27/how-to-create-a-basic-mlp-classifier-with-the-keras-sequential-api/)

---

## The 5-Step Model Life-Cycle

The knowledge about the model life-cycle provides the backbone for both modeling a dataset and understanding the `tf.keras` API.

The five steps in the life-cycle are as follows:

* Define the model
  * `model = ...`
* Compile the model
  * `optimizer = ...`
  * `loss_function = ...`
  * `metrics = ...`
  * `model.compile (optimizer, loss_function, metrics)`
* Fit the model
  * `model.fit(X_training_features, y_training_labels, epochs, batch_size, verbose=2)`
* Evaluate the model
  * `loss = model.evaluate(X_testing_features, y_testing_labels, verbose=0)`
* Make predictions
  * `y_predicted = model.predict(X_new_data)`

---

## Step 1: Defining the model

### 1.1) the functional API

The [Keras functional API](https://www.tensorflow.org/guide/keras/functional) is a way to create models that are more flexible than the `tf.keras.Sequential` API. The functional API can handle models with non-linear topology, shared layers, and even multiple inputs or outputs.

The main idea is that a deep learning model is usually a *directed acyclic graph (DAG)* of layers. So the functional API is a way to build **graphs of layers**.

In order to build a model using the functional API, we need to explicity connect the output from one layer to the input to the next layer.

Suppose we want to define a model that receives 8 features as input, has a hidden layer with 10 nodes and generates a prediction for a numerical value as output.

In [None]:
# example of a model defined with the functional API
from tensorflow.keras import Model
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense

# the input layer is defined through the Input class
x_in = Input(shape=(8,))
# Next, a fully connected layer can be connected to the input by calling 
# the layer and passing the input layer. This will return a reference to the 
# output connection in this new layer.
x_hidden = Dense(10)(x_in)
# Finally, we connect the output layer
x_out = Dense(1)(x_hidden)

# then, we define the model by specifying the input and output layers
model2 = Model(inputs=x_in, outputs=x_out)

### 1.2) the `Sequential` API

We will be designing some MLP models using the `Sequential` API.

---

## CASE STUDY 1: MLP for binary classification

We will use the **Ionosphere** dataset to demonstrate an MLP for binary classification.

This dataset involves predicting whether a structure is in the atmosphere or not given radar returns.

The dataset will be downloaded automatically using Pandas, but you can learn more about it here.

* [Ionosphere Dataset](https://raw.githubusercontent.com/jbrownlee/Datasets/master/ionosphere.csv) (csv).
* [Ionosphere Dataset Description](https://raw.githubusercontent.com/jbrownlee/Datasets/master/ionosphere.names)


In [None]:
# importing necessary libraries
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

### 1.1 Data ingestion/acquisition

In [None]:
# load the dataset
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/ionosphere.csv'
df = read_csv(path, header=None)

In [None]:
df.head()

### 1.2 Data pre-processing

We need to separate input features (`X`) and labels/targets (`y`).

We also use a `LabelEncoder` to encode the class labels (strings) as integer values 0 and 1. 

The model will be fit on 67 percent of the data (`X_train`, `y_train`), and the remaining 33 percent will be used for evaluation (`X_test`, `y_test`). We use the `train_test_split()` function for that.

In [None]:
# split the dataset into input features (X) and class labels/targets (y)
X, y = df.values[:, :-1], df.values[:, -1]
# ensure all data are floating point values
X = X.astype('float32')
# encode strings to integer
y = LabelEncoder().fit_transform(y)
# split into training (67%) and testing (33%) samples
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

In [None]:
# determine the number of input features
n_features = X_train.shape[1]
n_features

### 1.3 Model definition using the `Sequential` API

* **How many layers should we use?**
* **How many nodes in each layer?**

The model predicts the probability of class 1 and uses the `sigmoid` activation function. 

In [None]:
# define model
# YOUR CODE HERE

For compiling the model, we need to specify:

* an [optimizer](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers)
* a [loss function](https://www.tensorflow.org/api_docs/python/tf/keras/losses)
* the [metrics](https://www.tensorflow.org/api_docs/python/tf/keras/metrics) for evaluating the model

In [None]:
# compile the model
# YOUR CODE HERE

### 1.4 Training the model

The model should be trained based on training input features and labels.

You need to specify: i) the iteration steps (`epochs`) and ii) the number of samples (`batch_size`) to be used during training.

In [None]:
# fit the model
# YOUR CODE HERE

### 1.5 Evaluating the model

For evaluation, we make use of testing input features and labels and usually keep track of `loss` and `accuracy` figures.

In [None]:
# evaluate the model
loss, acc = # YOUR CODE HERE
print('Test Accuracy: %.3f' % acc)

### 1.6 Using the model for classification

For predicting a new class label, the model should be fed with some new (unseen) data instance.

In [None]:
# make a prediction
row = [1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708,1,0.03760,0.85243,-0.17755,0.59755,-0.44945,0.60536,-0.38223,0.84356,-0.38542,0.58212,-0.32192,0.56971,-0.29674,0.36946,-0.47357,0.56811,-0.51171,0.41078,-0.46168,0.21266,-0.34090,0.42267,-0.54487,0.18641,-0.45300]
yhat = # YOUR CODE HERE
print('Predicted: %.3f' % yhat)

# CASE STUDY 2: 

We will use the **Iris flowers** dataset to demonstrate an MLP for multiclass classification.

This problem involves predicting the species of iris flower given measures of the flower.

The dataset will be downloaded automatically using Pandas, but you can learn more about it here.

* [Iris Dataset](https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv) (csv).
* [Iris Dataset Description](https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.names).

In [None]:
from IPython.display import Image
Image(filename='/content/iris-dataset.png', width=600) 

In [None]:
from numpy import argmax
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

### Data exploration (just for demonstration)

The [Seaborn](https://seaborn.pydata.org/) library also has this dataset for demonstration purposes.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# load dataset from Seaborn
iris = sns.load_dataset('iris')
 
# style used as a theme of graph
# for example if we want black graph with grid then write "darkgrid"
sns.set_style("whitegrid")
 
# sepal_length, petal_length are iris feature data 
# height is used to define the height of graph 
# hue store the class
sns.FacetGrid(iris, hue ="species",
              height = 6).map(plt.scatter,
                              'sepal_length',
                              'petal_length').add_legend()

### 2.1 Data ingestion/acquisition

We will use the Iris dataset available in the [Scikit-learn](https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html) library.

In [None]:
# load the dataset
#path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv'
#df = read_csv(path, header=None)

from sklearn.datasets import load_iris
iris = load_iris()

### 2.2 Data pre-processing


In [None]:
# split into input and output columns
#X, y = df.values[:, :-1], df.values[:, -1]
X, y = iris.data, iris.target
# ensure all data are floating point values
X = X.astype('float32')
# encode strings to integer
y = LabelEncoder().fit_transform(y)
# split into train and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
# determine the number of input features
n_features = X_train.shape[1]

### 2.3 Model defiinition

Given that it is a multiclass classification, the model must have one node for each class in the output layer and use the `softmax` activation function. 

The loss function is the `sparse_categorical_crossentropy`, which is appropriate for integer encoded class labels (e.g. 0 for one class, 1 for the next class, etc.).

In [None]:
# define model
# YOUR CODE HERE

In [None]:
# compile the model
# YOUR CODE HERE

### 2.4 Model training

In [None]:
# fit the model
# YOUR CODE HERE

### 2.5 Model evaluation

In [None]:
# evaluate the model
loss, acc = # YOUR CODE HERE
print('Test Accuracy: %.3f' % acc)

### 2.6 Classification of new data

In [None]:
# make a prediction
row = [5.1,3.5,1.4,0.2]
yhat = # YOUR CODE HERE
print('Predicted: %s (class=%d)' % (yhat, argmax(yhat)))

---

## CASE STUDY 3: MLP for Regression

We will use the [California housing dataset](https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset) to demonstrate an MLP for regression predictive modeling.

This problem involves predicting house value based on properties of the house and neighborhood.

The dataset will be downloaded automatically from `Scikit-learn`.


In [None]:
# importing necessary libraries
from numpy import sqrt
from sklearn.model_selection import train_test_split
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

### 3.1 Data ingestion/acquisition

Basic characteristics of the dataset:

* Number of instances: 20640
* Number of attributes: 8 numeric, predictive attributes + the target
* Attribute information:
  * `MedInc`: median income in block group
  * `HouseAge`: median house age in block group
  * `AveRooms`: average number of rooms per household
  * `AveBedrms`: average number of bedrooms per household
  * `Population`: block group population
  * `AveOccup`: average number of household members
  * `Latitude`: block group latitude
  * `Longitude`: block group longitude

The target variable is the `median house value` for California districts, expressed in hundreds of thousands of dollars ($100,000).



In [None]:
from sklearn.datasets import fetch_california_housing
# load the dataset
housing = fetch_california_housing()

In [None]:
housing.feature_names

In [None]:
housing.data[0]

In [None]:
housing.target_names

In [None]:
housing.target

### 3.2 Data pre-processing

In [None]:
# split into input and output columns
X, y = housing.data, housing.target
# split into train and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
# determine the number of input features
n_features = X_train.shape[1]

### 3.3 Model definition

This is a regression problem that involves predicting a single numerical value. As such, the output layer has a single node and uses the `default` or `linear activation function` (no activation function). The mean squared error (`mse`) loss is minimized when fitting the model.

In [None]:
# define model
# YOUR CODE HERE

In [None]:
# compile the model
# YOUR CODE HERE

### 3.4 Model training

In [None]:
# fit the model
# YOUR CODE HERE

### 3.5 Model evaluation

Recall that this is a regression, not classification; therefore, we cannot calculate classification accuracy. 

For more on this, see [this tutorial](https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/).

In [None]:
# evaluate the model
error = # YOUR CODE HERE
print('MSE: %.3f, RMSE: %.3f' % (error, sqrt(error)))

### 3.6 Making prediction from new data

In [None]:
# make a prediction
row = [0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900]
yhat = # YOUR CODE HERE
print('Predicted: %.3f' % yhat)