<font size=4 color='blue'>

# <center> Clase 7, noviembre 11 del 2020 </center>

<font size=4 color='blue'>

# <center> Study topic: Mortality from diabetes </center>

<font size=4 color='blue'>
    
## Information about the topic

<font size=4>

Evolution of diabetes after one year.
    
In the present work, we characterize diabetes with the following ten features: age, sex, body mass index, mean blood pressure, and six measurements of blood serum (S1, S2, S3, S4, S5, S6).

<font size=4 color='blue'>
    
## Quantification of this information

<font size=4>

Information is available on 442 patients (m = 442). The response of interest, Y, is a quantitative measure of disease progression one year after the start of the study. Y values vary between 25 and 346​.

Information source: [diabetes data](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html)    

Original paper: [Least-Angle-Regression_2004](./Literatura/Least-Angle-Regression_2004.pdf)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import time

In [None]:
# Data are available in the file diabetes.csv

df = pd.read_csv('diabetes.csv', sep ='\t')

In [None]:
# Showing the first 5 samples (features and target Y)

df.head()

In [None]:
# The describe() method generates a table with statistical information 
# for each of the features and the target.

df.describe()

## Histograms are created for each of the features that characterize patients with diabetes:

In [None]:
plt.figure(figsize=(20,8)) 

ax1 = plt.subplot(2,4,1)
ax2 = plt.subplot(2,4,2)
ax3 = plt.subplot(2,4,3)
ax4 = plt.subplot(2,4,4)

ax1.hist(df.AGE, bins=30, color='green',edgecolor='purple', alpha=0.5)
ax1.set_xlabel('Age (years)', size=15)
ax1.set_ylabel('Frequency', size=15)

ax2.hist(df.SEX, bins=30, color='orange',edgecolor='purple', alpha=0.5)
ax2.set_xlabel('Sex', size=15)

ax3.hist(df.BMI, bins=30, color='red',edgecolor='purple', alpha=0.5)
ax3.set_xlabel('Body_mass_index', size=15)

ax4.hist(df.BP, bins=30, color='blue',edgecolor='purple', alpha=0.5)
ax4.set_xlabel('Average_blood_pressure', size=15);

In [None]:
plt.figure(figsize=(20,8)) 

ax1 = plt.subplot(2,4,1)
ax2 = plt.subplot(2,4,2)
ax3 = plt.subplot(2,4,3)
ax4 = plt.subplot(2,4,4)

ax1.hist(df.S1, bins=30, color='green',edgecolor='purple', alpha=0.5)
ax1.set_xlabel('S1', size=15)
ax1.set_ylabel('Frequency', size=15)

ax2.hist(df.S2, bins=30, color='orange',edgecolor='purple', alpha=0.5)
ax2.set_xlabel('S2', size=15)

ax3.hist(df.S3, bins=30, color='red',edgecolor='purple', alpha=0.5)
ax3.set_xlabel('S3', size=15)

ax4.hist(df.S4, bins=30, color='blue',edgecolor='purple', alpha=0.5)
ax4.set_xlabel('S4', size=15);

In [None]:
plt.figure(figsize=(15,8)) 

ax1 = plt.subplot(2,3,1)
ax2 = plt.subplot(2,3,2)
ax3 = plt.subplot(2,3,3)

ax1.hist(df.S5, bins=30, color='green',edgecolor='purple', alpha=0.5)
ax1.set_xlabel('S5', size=15)
ax1.set_ylabel('Frequency', size=15)

ax2.hist(df.S6, bins=30, color='orange',edgecolor='purple', alpha=0.5)
ax2.set_xlabel('S6', size=15)

ax3.hist(df.Y, bins=30, color='purple',edgecolor='black', alpha=0.5)
ax3.set_xlabel('Y', size=15)


<font size=4>

To remove any possible correlation between the samples (the rows of the DataFrame), they are randomly reordered.

In [None]:
np.random.seed(1)

df = df.sample(frac=1)

<font size=4>
    
The original samples are divided into 2 sets: 90% for training and 10% for making inferences (predictions) with what has been learned.

In [None]:

test_ratio = 0.1

train_ratio = int((1.0-test_ratio)*len(df.values[:,:]))

df_train = df.iloc[0:train_ratio,:]
df_test  = df.iloc[train_ratio:,:]

In [None]:
print(df_train.shape)
print(df_test.shape)

<font size=4>

To work with the models, it is required that all the variables have the same order of magnitude. For this reason, their values are normalized in the samples that are going to be used in training, both the features (X) and the target (Y):

$$x_{norm} = \dfrac{x-\bar{x}}{\sigma}$$

In [None]:
df_train_norm = (df_train - df_train.mean()) / df_train.std()

In [None]:
df_test_norm = (df_test - df_train.mean()) / df_train.std()

<font size=4>
    
Histograms of the variables to be used in the training:

In [None]:
plt.figure(figsize=(20,8)) 

ax1 = plt.subplot(2,4,1)
ax2 = plt.subplot(2,4,2)
ax3 = plt.subplot(2,4,3)
ax4 = plt.subplot(2,4,4)

ax1.hist(df_train_norm.AGE, bins=30, color='green',edgecolor='purple', alpha=0.5)
ax1.set_xlabel('x1(Age)', size=15)
ax1.set_ylabel('Frequency', size=15)

ax2.hist(df_train_norm.SEX, bins=30, color='orange',edgecolor='purple', alpha=0.5)
ax2.set_xlabel('x2(Sex)', size=15)

ax3.hist(df_train_norm.BMI, bins=30, color='red',edgecolor='purple', alpha=0.5)
ax3.set_xlabel('x3(Body_mass_index)', size=15)

ax4.hist(df_train_norm.BP, bins=30, color='blue',edgecolor='purple', alpha=0.5)
ax4.set_xlabel('x4(Average_blood_pressure)', size=15);

In [None]:
plt.figure(figsize=(20,8)) 

ax1 = plt.subplot(2,4,1)
ax2 = plt.subplot(2,4,2)
ax3 = plt.subplot(2,4,3)
ax4 = plt.subplot(2,4,4)

ax1.hist(df_train_norm.S1, bins=30, color='green',edgecolor='purple', alpha=0.5)
ax1.set_xlabel('x5(S1)', size=15)
ax1.set_ylabel('Frequency', size=15)

ax2.hist(df_train_norm.S2, bins=30, color='orange',edgecolor='purple', alpha=0.5)
ax2.set_xlabel('x6(S2)', size=15)

ax3.hist(df_train_norm.S3, bins=30, color='red',edgecolor='purple', alpha=0.5)
ax3.set_xlabel('x7(S3)', size=15)

ax4.hist(df_train_norm.S4, bins=30, color='blue',edgecolor='purple', alpha=0.5)
ax4.set_xlabel('x8(S4)', size=15);

In [None]:
plt.figure(figsize=(20,8)) 

ax1 = plt.subplot(2,3,1)
ax2 = plt.subplot(2,3,2)
ax3 = plt.subplot(2,3,3)

ax1.hist(df_train_norm.S5, bins=30, color='green',edgecolor='purple', alpha=0.5)
ax1.set_xlabel('x9(S5)', size=15)
ax1.set_ylabel('Frequency', size=15)

ax2.hist(df_train_norm.S6, bins=30, color='orange',edgecolor='purple', alpha=0.5)
ax2.set_xlabel('x10(S6)', size=15)

ax3.hist(df_train_norm.Y, bins=30, color='purple',edgecolor='black', alpha=0.5)
ax3.set_xlabel('Y', size=15)


<font size=4>
X and Y values are extracted from the columns of the DataFrame.

In [None]:
train_x = df_train_norm.values[:,:-1]
train_y = df_train_norm.values[:,-1:]

In [None]:
test_x = df_test_norm.values[:,:-1]
test_y = df_test_norm.values[:,-1:]

In [None]:
print(train_x.shape)
print(train_y.shape)
print(test_x.shape)
print(test_y.shape)

<font size=5 color='blue'>

# <center> Modeling different learning systems </center>




<font size=4 color='blue'>

# <center> Implemented using the Keras framework as frontend </center>


<font size=4 color='mediumvioletred'>
   
[Keras](https://keras.io/)

In [None]:
from keras.models import Sequential
from keras.layers import Input, Dense
from keras.layers import Activation
from keras.optimizers import SGD
from keras.models import Model
from keras.utils import plot_model
from keras import initializers
from keras import optimizers
import tensorflow as tf

[Sequential](https://keras.io/guides/sequential_model/)

[layers](https://keras.io/api/layers/): [Dense](https://keras.io/api/layers/core_layers/dense/), [Activation](https://keras.io/api/layers/activations/#relu-function)

[Optimizers](https://keras.io/api/optimizers/#available-optimizers)

[utils](https://keras.io/api/utils/)

[Keras API reference](https://keras.io/api/)

<font size=4 color='black'>

The features that determine the phenomenon are described by the vector  $X = (x_1, x_2, x_3, ...x_k,...x_K)$ and is called a feature vector.
    
The model assumes that the output $y$ varies linearly with each feature
    $$ F(X) = \sum_{k=1}^K w_k*x_k + b$$

<font size=5 color='blue'>

First model: The output $y$ depends linearly on each of the features.

In [None]:
import networkx as nx

class Network(object):
    
    def  __init__ (self,sizes):
        self.num_layers = len(sizes)
        print("It has", self.num_layers, "layers,")
        self.sizes = sizes
        print("with the following number of nodes per layer",self.sizes)
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
        self.weights = [np.random.randn(y, x)
                        for x, y in zip(sizes[:-1], sizes[1:])]
        
    def feedforward(self, x_of_sample):
        """Return the output of the network F(x_of_sample) """        
        for b, w in zip(self.biases, self.weights):
            x_of_sample = sigmoid(np.dot(w, x_of_sample)+b)
        return x_of_sample
    
    def graph(self,sizes):
        a=[]
        ps={}
        Q = nx.Graph()
        for i in range(len(sizes)):
            Qi=nx.Graph()    
            n=sizes[i]
            nodos=np.arange(n)
            Qi.add_nodes_from(nodos)
            l_i=Qi.nodes
            Q = nx.union(Q, Qi, rename = (None, 'Q%i-'%i))
            if len(l_i)==1:
                ps['Q%i-0'%i]=[i/(len(sizes)), 1/2]
            else:
                for j in range(len(l_i)+1):
                    ps['Q%i-%i'%(i,j)]=[i/(len(sizes)),(1/(len(l_i)*len(l_i)))+(j/(len(l_i)))]
            a.insert(i,Qi)
        for i in range(len(a)-1):
            for j in range(len(a[i])):
                for k in range(len(a[i+1])):
                    Q.add_edge('Q%i-%i' %(i,j),'Q%i-%i' %(i+1,k))            
        nx.draw(Q, pos = ps)
                

In [None]:
n_x = train_x.shape[1] 
n_y = train_y.shape[1]
    
layers = [n_x, n_y]
net = Network(layers)
net.graph(layers)

<font size=5 color='blue'>
    
Definition of architecture. 
    
It includes the initialization of weights and biases, as well as the activation functions.

In [None]:
np.random.seed(1)

input_nodes = n_x     # The input layer has n_x nodes
output_nodes = n_y    # The output layer has n_y nodes

linear_model = Sequential()

# For the first layer, you need to indicate its input layer, which corresponds to
# the input layer of the network.

linear_model.add(Dense(output_nodes,  kernel_initializer='uniform', bias_initializer='zeros', \
                input_dim=input_nodes, activation='linear'))

<font size=4>
    
  **Parameter Initialization** Strategies

Training algorithms for deep learning models are usually iterative in nature and thus require the user to specify some initial point from which to begin the iterations. Moreover, training deep models is a sufficiently difficult task that most algorithms are strongly affected by the choice of initialization. The initial point can determine whether the algorithm converges at all, with some initial points being so unstable that the algorithm encounters numerical difficulties and fails altogether. When learning does converge, the initial point can determine how quickly learning converges and whether it converges to a point with high or low cost. Also, points of comparable cost can have wildly varying generalization error, and the initial point can affect the generalization as well.

     Yoshua Bengio, Aaron Courville, Ian Goodfellow. Deep Learning. pp 301

<font size=5 color='blue'>
Architecture Summary and Chart

In [None]:
plot_model(linear_model, to_file='linear_model.png', show_shapes=True, rankdir='TB', 
           expand_nested=True, show_layer_names=True, dpi=96)

In [None]:
#help(plot_model)

In [None]:
linear_model.summary()

<font size=5  color='blue'>
    
Compiling the model. Includes the optimizer definition

In [None]:
# We define the optimizing function and their hyperparameters: learining rate(lr), 
# decay, momentum and nesterov (whether to apply Nesterov gradient)

sgd = optimizers.SGD(lr=0.01, momentum=0.0, nesterov=False)

linear_model.compile(loss='mean_squared_error', optimizer=sgd)

<font size=4>

**Stochastic gradient descent (SGD)** is an extension of the gradient descent algorithm.

A recurring problem in machine learning is that large training sets are necessary for good generalization, but large training sets are also more computationally expensive.

The insight of stochastic gradient descent is that the gradient is an expectation. The expectation may be approximately estimated using a small set of samples. Specifically, on each step of the algorithm, we can sample a minibatch of examples.
    
    Yoshua Bengio, Aaron Courville, Ian Goodfellow. Deep Learning. pp 151


<font size=5 color='blue'>
    
Training the learning system

In [None]:
# 10 % of the training data will be used to validate the training
validation_portion = 0.1
epochs = 600

history = linear_model.fit(train_x, train_y, epochs=epochs, validation_split = validation_portion, verbose=1)

# the "history" object contains the information generated during the training

In [None]:
history.history

<font size=5 color='blue'>

Plots of the cost function versus epoch    

In [None]:
plt.figure(figsize=(10, 7))

plt.plot(history.history['loss'], color='red')
plt.plot(history.history['val_loss'], color='green')
plt.title('Cost function')
plt.ylabel('cost')
plt.xlabel('epoch')
plt.legend(['cost_train', 'cost_validation'])
plt.show()


<font size=4>

The factors determining how well a machine learning algorithm will perform are its ability to:

1. Make the training error small.
2. Make the gap between training and test error small.

**Underfitting** occurs when the model is not able to obtain a sufficiently low error value on the training set.

**Overfitting** occurs when the gap between the training error and test error is too large.
<img src='images\bengio.png'>



    
    Yoshua Bengio, Aaron Courville, Ian Goodfellow. Deep Learning. pp 111
    
    
<font size=4>
    
When we compare the training and validation errors, we want to be mindful of two common situations. First, we want to watch out for cases when our training error and validation error are both substantial but there is a little gap between them. If the model is unable to reduce the training error, that could mean that our model is too simple (i.e., insufficiently expressive) to capture the pattern that we are trying to model. Moreover, since the generalization gap between our training and validation errors is small, we have reason to believe that we could get away with a more complex model. This phenomenon is known as **underfitting**.

On the other hand, as we discussed above, we want to watch out for the cases when our training error is significantly lower than our validation error, indicating severe **overfitting**. Note that overfitting is not always a bad thing. With deep learning especially, it is well known that the best predictive models often perform far better on training data than on holdout data. Ultimately, we usually care more about the validation error than about the gap between the training and validation errors.
    
 <img src='images\dive.png'>
 
 

    
     Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola. Dive into Deep Learning. pp 146

<font size=5 color='blue'>

Underfitting

In [None]:
#history_model.history

<font size = 5 color = 'blue'>
    
Evaluation of the learning. 
    
This is done using the test data.

In [None]:
preds = linear_model.evaluate(x=test_x, y=test_y)

print ("Loss = " + str(preds))

<font size=5 color='blue'>
    
# <center> ---The following is a new model--- </center> 

<font size=5 color='blue'>

Second model: The output $y$ does not depend linearly with the features. 
This fact is modeled with a sigmoid type function; for example, a hyperbolic tangent

<font size=4 color='black'>

The features that determine the phenomenon are described by the vector  $X = (x_1, x_2, x_3, ...x_k,...x_K)$
    
The our model assumes that the output y varies linearly with each feature
    $$ z = \sum_{k=1}^K w_k*x_k + b$$
    $$ F(X) = tanh(z)= \frac{{e}^{2z} - 1}{{e}^{2z} + 1}$$

In [None]:
def tanh(z):
    return (np.exp(2*z)- 1)/(np.exp(2*z)+1)

# The following array is generated for plotting the hyperbolic tangent function
x1 = np.arange(-2, 2.0, 0.1)
y1 = 1.759*tanh((2/3*x1))

y2 = x1
#Samples and function F are plotted
plt.figure(figsize=(13,8))

plt.rc('xtick', labelsize=16)
plt.rc('ytick', labelsize=16)
plt.rc('legend', fontsize=16)
plt.ylabel('Y', fontsize=16)
plt.xlabel('Z', fontsize=16)
plt.grid(True)
plt.title('Sigmoid-type = 1.7159*tanh((2/3*x)', size=20)

#Plotting function
plt.plot(x1, y1, color='green', lw=4)
plt.plot(x1, y2, color='red', lw=3)

plt.show()

In [None]:
n_x = train_x.shape[1] 
n_y = train_y.shape[1]
    
layers = [n_x, n_y]
net = Network(layers)
net.graph(layers)

<font size=5 color='blue'>
    
Model architecture 


In [None]:
np.random.seed(1)

input_nodes = n_x     # The input layer has n_x nodes
output_nodes = n_y    # The output layer has n_y nodes

sigmoid_model = Sequential()

sigmoid_model.add(Dense(output_nodes,  kernel_initializer='uniform', bias_initializer='zeros', \
                input_dim=input_nodes, activation='tanh'))


<font size=5 color='blue'>
Architecture Summary and Chart

In [None]:
plot_model(sigmoid_model, to_file='sigmoid_model.png', show_shapes=True, rankdir='TB', 
      expand_nested=True, show_layer_names=True, dpi=96)

In [None]:
sigmoid_model.summary()

<font size=5  color='blue'>
    
Compiling the model. Includes the optimizer definition

In [None]:
sgd = optimizers.SGD(lr=0.01)

sigmoid_model.compile(loss='mean_squared_error', optimizer=sgd)


<font size=5 color='blue'>
    
Training the learning system

In [None]:
# 10 % of the training data will be used to validate the training
validation_portion = 0.1
epochs = 600

history = sigmoid_model.fit(train_x, train_y, epochs=epochs, validation_split = validation_portion, verbose=2)

# the "history" object contains information generated during training

<font size=5 color='blue'>

Plots of cost function versus epoch    

In [None]:
plt.figure(figsize=(10, 7))

plt.plot(history.history['loss'], color='red')
plt.plot(history.history['val_loss'], color='green')
plt.title('Cost function')
plt.ylabel('cost')
plt.xlabel('epoch')
plt.legend(['cost_train', 'cost_validation'])
plt.show()


<font size=5 color='blue'>

Underfitting

<font size=5 color='blue'>
    
# <center> Now we will construct new non linear models: Artificial Neural Networks </center> 

<font size=5 color='black'>
<center> SEM images of a neuron and a network of neurons. Neuron model and mathematical model of a neuron </center>    


<table>
  <tr>
    <td>Neuron</td>
     <td>Network of neurons</td>
      <td>Neuron model</td>
      <td>Mathematical model of a neuron</td>
         
  </tr>
  <tr>
    <td><img src="images\neuron_SEM.jpg" width=290 height=480></td>
    <td><img src="images\human-neuron.png" width=270 height=480></td>
    <td><img src="images\Neuron_labelled.png" width=200 height=380></td>
    <td><img src="images\neuron-mat-model.png" width=370 height=380></td>
  </tr>
 </table>

<font size=5 color='black'>
<center> Approximation by Superpositions of a Sigmoidal Function </center>    

<font size=4 color='black'>
$\bf Abstract$-In this paper we demonstrate that finite linear combinations of com-
positions of a fixed, univariate function and a set ofaffine functionals can uniformly
approximate any continuous function of n real variables with support in the unit
hypercube; only mild conditions are imposed on the univariate function. Our
results settle an open question about representability in the class of single bidden
layer neural networks. In particular, we show that arbitrary decision regions can
be arbitrarily well approximated by continuous feedforward neural networks with
only a single internal, hidden layer and any continuous sigmoidal nonlinearity. The
paper discusses approximation properties of other possible types of nonlinearities
that might be implemented by artificial neural networks.
    
[Reference](./Literatura/Approx-superpositions-sigmoids_1989.pdf)

<font size=5 color='black'>
<center> Approximation Capabilities of Multilayer Feedforward Networks </center>    

<font size=4 color='black'>
$\bf Abstract$--We show that standard multilayer feedfbrward networks with as few as a single hidden layer and
arbitrary bounded and nonconstant activation function are universal approximators with respect to LP(lt) per-
formance criteria, for arbitrary finite input environment measures p, provided only that sufficiently many hidden
units are available. If the activation function is continuous, bounded and nonconstant, then continuous mappings
can be learned uniformly over compact input sets. We also give very general conditions ensuring that networks
with sufficiently smooth activation functions are capable of arbitrarily accurate approximation to a_Function and
its derivatives.
    
[Reference](./Literatura/FF-NN-universal-Approximator_1991.pdf)

<font size=5 color='blue'>
    
Model using three neurons. Full-Connected Feedforward Network (FF). The activation function of the last neuron is linear


In [None]:
n_x = train_x.shape[1] 
n_h = 2
n_y = train_y.shape[1]
    
layers = [n_x, n_h, n_y]
net = Network(layers)
net.graph(layers)

<font size=5 color='blue'>
    
Model architecture 


In [None]:
np.random.seed(1)

input_nodes = n_x     # The input layer has n_x nodes
hlayer1_nodes = n_h   # The first hidden layer has n_h nodes
output_nodes = n_y    # The output layer has n_y nodes

model1 = Sequential()

model1.add(Dense(hlayer1_nodes,  kernel_initializer='uniform', bias_initializer='zeros', \
                input_dim=input_nodes, activation='tanh'))

model1.add(Dense(output_nodes, kernel_initializer='uniform', bias_initializer='zeros', activation='linear'))


<font size=5 color='blue'>
Architecture Summary and Chart

In [None]:
plot_model(model1, to_file='model1.png', show_shapes=True, rankdir='TB', 
      expand_nested=True, show_layer_names=True, dpi=96)

In [None]:
model1.summary()

<font size=5  color='blue'>
    
Compiling the model. Includes the optimizer definition

In [None]:
sgd = optimizers.SGD(lr=0.01)

model1.compile(loss='mean_squared_error', optimizer=sgd)

<font size=5 color='blue'>
    
Training the learning system

In [None]:
validation_portion = 0.1
epochs = 600

history = model1.fit(train_x, train_y, epochs=epochs, validation_split = validation_portion, verbose=0)

<font size=5 color='blue'>

Plots of cost function versus epoch    

In [None]:
plt.figure(figsize=(10, 7))

plt.plot(history.history['loss'], color='red')
plt.plot(history.history['val_loss'], color='green')
plt.title('Cost function')
plt.ylabel('cost')
plt.xlabel('epoch')
plt.legend(['cost_train', 'cost_validation'])
plt.show()


<font size=5 color='blue'>

A good model

<font size=5 color='blue'>
    
# <center> ---The following is a new model--- </center> 

<font size=5 color='blue'>
    
Model using three neurons. The activation function of the last neuron is sigmoid type


In [None]:
n_x = train_x.shape[1] 
n_h = 2
n_y = train_y.shape[1]
    
layers = [n_x, n_h, n_y]
net = Network(layers)
net.graph(layers)

<font size=5 color='blue'>
    
Model architecture 


In [None]:
np.random.seed(1)

input_nodes = n_x     # The input layer has n_x nodes
hlayer1_nodes = n_h   # The first hidden layer has n_h nodes
output_nodes = n_y    # The output layer has n_y nodes

model2 = Sequential()

model2.add(Dense(hlayer1_nodes,  kernel_initializer='uniform', bias_initializer='zeros', \
                input_dim=input_nodes, activation='tanh'))

model2.add(Dense(output_nodes, kernel_initializer='uniform', bias_initializer='zeros', activation='tanh'))

<font size=5 color='blue'>
Architecture Summary and Chart

In [None]:
plot_model(model2, to_file='model2.png', show_shapes=True, rankdir='TB', 
      expand_nested=True, show_layer_names=True, dpi=96)

In [None]:
model2.summary()

<font size=5  color='blue'>
    
Compiling the model. Includes the optimizer definition

In [None]:
sgd = optimizers.SGD(lr=0.01)

model2.compile(loss='mean_squared_error', optimizer=sgd)

<font size=5 color='blue'>
    
Training the learning system

In [None]:
validation_portion = 0.1
epochs = 600

history = model2.fit(train_x, train_y, epochs=epochs, validation_split = validation_portion, verbose=0)

<font size=5 color='blue'>

Plots of cost function versus epoch    

In [None]:
plt.figure(figsize=(10, 7))

plt.plot(history.history['loss'], color='red')
plt.plot(history.history['val_loss'], color='green')
plt.title('Cost function')
plt.ylabel('cost')
plt.xlabel('epoch')
plt.legend(['cost_train', 'cost_validation'])
plt.show()


<font size=5 color='blue'>

Overfitting

<font size=5 color='blue'>
    
# <center> ---The following is a new model--- </center> 

<font size=5 color='blue'>
    
Model using four neurons. The activation function of the last neuron is linear


In [None]:
n_x = train_x.shape[1] 
n_h = 3
n_y = train_y.shape[1]
    
layers = [n_x, n_h, n_y]
net = Network(layers)
net.graph(layers)

<font size=5 color='blue'>
    
Model architecture 


In [None]:
np.random.seed(1)

model3 = Sequential()

input_nodes = n_x     #input layer has n_x nodes
hlayer1_nodes = n_h   #first hidden layer has n_h nodes
output_nodes = n_y    #output layer has n_y nodes

model3.add(Dense(hlayer1_nodes,  kernel_initializer='uniform', bias_initializer='zeros', \
                input_dim=input_nodes, activation='tanh'))

model3.add(Dense(output_nodes, kernel_initializer='uniform', bias_initializer='zeros', activation='linear'))

<font size=5 color='blue'>
Architecture Summary and Chart

In [None]:
plot_model(model3, to_file='model3.png', show_shapes=True, rankdir='TB', 
      expand_nested=True, show_layer_names=True, dpi=96)

In [None]:
model3.summary()

<font size=5  color='blue'>
    
Compiling the model. Includes the optimizer definition.

In [None]:
sgd = optimizers.SGD(lr=0.01)

model3.compile(loss='mean_squared_error', optimizer=sgd)

<font size=5 color='blue'>
    
Training the learning system

In [None]:
validation_portion = 0.1
epochs = 600

history = model3.fit(train_x, train_y, epochs=epochs, validation_split = validation_portion, verbose=0)

<font size=5 color='blue'>

Plots of cost function versus epoch    

In [None]:
plt.figure(figsize=(10, 7))

plt.plot(history.history['loss'], color='red')
plt.plot(history.history['val_loss'], color='green')
plt.title('Cost function')
plt.ylabel('cost')
plt.xlabel('epoch')
plt.legend(['cost_train', 'cost_validation'])
plt.show()


<font size=5 color='blue'>

Overfitting

<font size=5 color='blue'>
    
# <center> ---The following is a new model--- </center> 

<font size=5 color='blue'>
    
Model using five neurons. The activation function of the last neuron is linear


In [None]:
n_x = train_x.shape[1] 
n_h = 4
n_y = train_y.shape[1]
    
layers = [n_x, n_h, n_y]
net = Network(layers)
net.graph(layers)

<font size=5 color='blue'>
    
Model architecture 


In [None]:
np.random.seed(1)

input_nodes = n_x     # The input layer has n_x nodes
hlayer1_nodes = n_h   # The first hidden layer has n_h nodes
output_nodes = n_y    # The output layer has n_y nodes


model4 = Sequential()

model4.add(Dense(hlayer1_nodes,  kernel_initializer='uniform', bias_initializer='zeros', \
                input_dim=input_nodes, activation='tanh'))

model4.add(Dense(output_nodes, kernel_initializer='uniform', bias_initializer='zeros', activation='linear'))

<font size=5 color='blue'>
Architecture Summary and Chart

In [None]:
plot_model(model4, to_file='model4.png', show_shapes=True, rankdir='TB', 
      expand_nested=True, show_layer_names=True, dpi=96)

In [None]:
model4.summary()

<font size=5  color='blue'>
    
Compiling the model. Includes the optimizer definition.

In [None]:
sgd = optimizers.SGD(lr=0.01,)

model4.compile(loss='mean_squared_error', optimizer=sgd)

<font size=5 color='blue'>
    
Training the learning system

In [None]:
validation_portion = 0.1
epochs = 600

history = model4.fit(train_x, train_y, epochs=epochs, validation_split = validation_portion,verbose=0)

<font size=5 color='blue'>

Plots of cost function versus epoch    

In [None]:
plt.figure(figsize=(10, 7))

plt.plot(history.history['loss'], color='red')
plt.plot(history.history['val_loss'], color='green')
plt.title('Cost function')
plt.ylabel('cost')
plt.xlabel('epoch')
plt.legend(['cost_train', 'cost_validation'])
plt.show()


<font size=5 color='blue'>

Overfitting   

<font size=5 color='blue'>
    
# <center> ---The following is a new model--- </center> 

<font size=5 color='blue'>
    
Model using ten neurons. The activation function of the last neuron is linear


In [None]:
n_x = train_x.shape[1] 
n_h1 = 5
n_h2  = 4
n_y = train_y.shape[1]
    
layers = [n_x, n_h1, n_h2, n_y]
net = Network(layers)
net.graph(layers)

<font size=5 color='blue'>
    
Model architecture 


In [None]:
np.random.seed(1)

input_nodes = n_x     # The input layer has n_x nodes
hlayer1_nodes = n_h1  # The first hidden layer has n_h1 nodes
hlayer2_nodes = n_h2  # The second hidden layes has n_h2 nodes
output_nodes = n_y    # The output layer has n_y nodes


model5 = Sequential()

model5.add(Dense(hlayer1_nodes,  kernel_initializer='uniform', bias_initializer='zeros', \
                input_dim=input_nodes, activation='tanh'))

model5.add(Dense(hlayer2_nodes,  kernel_initializer='uniform', bias_initializer='zeros', \
                input_dim=input_nodes, activation='tanh'))


model5.add(Dense(output_nodes, kernel_initializer='uniform', bias_initializer='zeros', activation='linear'))

<font size=5 color='blue'>
Architecture Summary and Chart

In [None]:
plot_model(model5, to_file='model5.png', show_shapes=True, rankdir='TB', 
      expand_nested=True, show_layer_names=True, dpi=96)

In [None]:
model5.summary()

<font size=5  color='blue'>
    
Compiling the model. Includes the optimizer definition.

In [None]:
sgd = optimizers.SGD(lr=0.01,)

model5.compile(loss='mean_squared_error', optimizer=sgd)

<font size=5 color='blue'>
    
Training the learning system

In [None]:
validation_portion = 0.1
epochs = 600

history = model5.fit(train_x, train_y, epochs=epochs, validation_split = validation_portion,verbose=0)

<font size=5 color='blue'>

Plots of cost function versus epoch    

In [None]:
plt.figure(figsize=(10, 7))

plt.plot(history.history['loss'], color='red')
plt.plot(history.history['val_loss'], color='green')
plt.title('Cost function')
plt.ylabel('cost')
plt.xlabel('epoch')
plt.legend(['cost_train', 'cost_validation'])
plt.show()


<font size=5 color='blue'>

Overfitting    