###### **Machine Learning book on Jupyter NBs**

* Giuseppe Longo - LMDS (DSF - UNINA)
* Michele delli Veneri (DIETI - UNINA)

# Chapter 1 - 
## Section 2 - Introduction to CNNs

**Contents:**
* Introduction to CNN
* Representation and compositionality
* Fukushima model
* Building a simple fully connected network


In [1]:
!py -m pip install pydot     
!py -m pip install GraphViz
!pip list

/bin/bash: py: command not found
/bin/bash: py: command not found
Package                            Version            
---------------------------------- -------------------
alabaster                          0.7.12             
anaconda-client                    1.7.2              
anaconda-navigator                 1.9.12             
anaconda-project                   0.8.3              
applaunchservices                  0.2.1              
appnope                            0.1.0              
appscript                          1.0.1              
argh                               0.26.2             
asn1crypto                         1.3.0              
astroid                            2.3.3              
astropy                            4.0                
atomicwrites                       1.3.0              
attrs                              19.3.0             
autopep8                           1.4.4              
Babel                              2.8.0              

XlsxWriter                         1.2.7              
xlwings                            0.17.1             
xlwt                               1.3.0              
xmltodict                          0.12.0             
yapf                               0.28.0             
zict                               1.0.0              
zipp                               2.2.0              


Regular Neural Nets don’t scale well to full images. Let us exemplicate this using a small image, let us say 32x32 pxls x 3 channels. A single fully-connected neuron in a first hidden layer of a regular Neural Network would have 32*32*3 = 3072 weights. Multiply this figure for the number of neurons in the hiddel layer (let us say 64, for instance) and you end up with 3072x64 weights as imput of the first hidden layer. An image of more respectable size (but still very small for presentday standards), e.g. 300x300x3, would lead to neurons that have 300*300*3 = 270,000 weights. Clearly, full connectivity is wasteful and the huge number of parameters would quickly lead to overfitting.

The way out was in Convolutional Neuran Networks or CNN

### 1.2.1 Introduction to CNN

What is the Link between Einstein's famous sentence: *the most incomprehensible thing about the universe is that it is comprehensible*, and Deep Learning? To understand this, let us start from a quote by J. Lecun, the inventor of Convolutional Neural Networks or CNNs (also Turing Laureate in 2018). 

From <a href="https://www.researchgate.net/publication/277411157_Deep_Learning" target="_blank">Le cun et al 2015</a> (emphasys is mine):

*ConvNets are designed to process data that come in the form of multiple arrays, for example a colour image composed of three 2D arrays containing pixel intensities in the three colour channels. Many data modalities are in the form of multiple arrays: 1D for signals and sequences, including language; 2D for images or audio spectrograms; and 3D for video or volumetric images. There are four key ideas behind ConvNets that take advantage of the properties of natural signals: **local connections, shared weights, pooling and the use of many layers**. The architecture of a typical ConvNet is structured as a series of stages. The first few stages are composed of two types of layers: convolutional layers and pooling layers. Units in a convolutional layer are organized in **feature maps**, within which each unit is connected to local patches in the feature maps of the previous layer through a set of weights called a filter bank. The result of this local weighted sum is then passed through a non-linearity such as a ReLU. **All units in a feature map share the same filter bank. Different feature maps in a layer use different filter banks**. The reason for this architecture is twofold. First, in array data such as images, local groups of values are often highly correlated, forming distinctive local motifs that are easily detected. Second, the local statistics of images and other signals are invariant to location. In other words, if a motif can appear in one part of the image, it could appear anywhere, hence the idea of units at different locations sharing the same weights and detecting the same pattern in different parts of the array. Mathematically, the filtering operation performed by a feature map is a discrete convolution, hence the name. 
Although the role of the convolutional layer is to detect local conjunctions of features from the previous layer, the role of the pooling layer is to merge semantically similar features into one. Because the relative positions of the features forming a motif can vary somewhat, reliably detecting the motif can be done by coarse-graining the posi- tion of each feature. A typical pooling unit computes the maximum of a local patch of units in one feature map (or in a few feature maps). Neighbouring pooling units take input from patches that are shifted by more than one row or column, thereby reducing the dimension of the representation and creating an invariance to small shifts and distortions. Two or three stages of convolution, non-linearity and pooling are stacked, followed by more convolutional and fully-connected layers. Backpropagating gradients through a ConvNet is as simple as through a regular deep network, allowing all the weights in all the filter banks to be trained. 
Deep neural networks exploit the property that many natural signals are compositional hierarchies, in which higher-level features are obtained by composing lower-level ones. In images, local combinations of edges form motifs, motifs assemble into parts, and parts form objects. Similar hierarchies exist in speech and text from sounds to phones, phonemes, syllables, words and sentences. The pooling allows representations to vary very little when elements in the previ- ous layer vary in position and appearance. 
The convolutional and pooling layers in ConvNets are directly **inspired by the classic notions of simple cells and complex cells in visual neuroscience, and the overall architecture is reminiscent of the LGN–V1–V2–V4–IT hierarchy in the visual cortex ventral pathway**. 
When ConvNet models and monkeys are shown the same picture, the activations of high-level units in the ConvNet explains half of the variance of random sets of 160 neurons in the monkey’s inferotemporal cortex. ConvNets have their roots in the <a href="https://en.wikipedia.org/wiki/Neocognitron" target="_blank">neocognitron</a>, the architecture of which was somewhat similar, but did not have an end-to-end supervised-learning algorithm such as backpropagation. A primitive 1D ConvNet (also called a time-delay neural net) was used for the recognition of phonemes and simple words. 
There have been numerous applications of convolutional networks going back to the early 1990s, starting with time-delay neural networks for speech recognition and document reading. The document reading system used a ConvNet trained jointly with a probabilistic model that implemented language constraints. By the late 1990s this system was reading over 10% of all the cheques in the United States. 
A number of ConvNet-based optical character recognition and handwriting recognition systems were later deployed by Microsoft. ConvNets were also experimented with in the early 1990s for object detection in natural images, including faces and hands, and for face recognition.*

This short text summarizes all main aspects of DL. In what follows we shall try to understand it but, before going into the technical details, let us focus on a particular aspect which in our opinion is particularly relevant. 

#### 1.2.1.1 Compositionality and representation

Why does a hierarchical representation of the world work? Because the world we live in (or at least our perception of it) is compositional. This point is alluded to in previous sections. In images, for example, such hierarchical nature can be observed from the fact that local pixels assemble to form simple motifs such as oriented edges. These edges in turn are assembled to form local features such as corners, T-junctions, etc. These local features are then assembled to form motifs that are even more abstract. We can keep building on these hierarchical representation to eventually form the objects we observe in the real world.

<img src="Immagini_ANN/Imma_121.png" alt="drawing" width="50%"/>

This picture is from  from [Zeiler & Fergus 2013].

When describing systems with limited number of degrees of freedom this can be easily understood also as an immanent property of the world, but whether this compositionality operates at all levels of the reality must still be understood (and actually DL may help in this).However, reasoning by analogy, at the lowest level of description, we have elementary particles, which assembled to form atoms, atoms together form molecules, we continue to build on this process to form materials, parts of objects and eventually full objects in the physical world.
The compositional nature of the world might therefore be the answer to Einstein’s rhetorical question on how humans understand the world they live in: **The most incomprehensible thing about the universe is that it is comprehensible.**

*Notice that: The fact that humans understand the world thanks to this compositional nature is interpreted as a conspiracy by Yann Le Cun <a href="https://cfcs.pku.edu.cn/english/docs/2019-10/20191010134544591063.pdf" target="_blank"> link </a>.* 

So, is Deep Learning  rooted in the idea that our world is comprehensible and has a compositional nature? Research conducted by Simon Thorpe may support this hypothesis. He measured the speed at which the brain reacted when some images were flashed in front of the eyes by asking users to identify these images, which they were able to do successfully. This demonstrated that it takes about 100ms for humans to detect objects. Furthermore, consider the diagram below, illustrating parts of the brain annotated with the time it takes for neurons to propagate from one area to the next:

<img src="Immagini_ANN/Imma_131.png" alt="drawing" width="50%"/>

Signals pass from the retina to the LGN (helps with contrast enhancement, gate control, etc.), then to the V1 primary visual cortex, V2, V4, then to the inferotemporal cortex (PIT), which is the part of the brain where categories are defined. Observations from open-brain surgery showed that if you show a human a film, neurons in the PIT will fire only when they detect certain images -- such as Sophia Loren, Brad Pitt or a person's grandmother -- and nothing else. The neural firings are invariant to things such as position, size, illumination, your grandmother's orientation, what she's wearing, etc. Furthermore, the fast reaction times with which humans were able to categorize these items -- barely enough time for a few spikes to get through -- demonstrates that it's possible to do this without additional time spent on complex recurrent computations. Rather, this is a single feed-forward process. 

These insights suggest that we can develop a neural network architecture which is completely feed-forward, yet still able to solve the problem of recognition, in a way that is invariant to irrelevant transformations of the input. 

One further insight from the human brain comes from Gallant & Van Essen, whose model of the human brain illustrates two distinct pathways:


<img src="Immagini_ANN/Imma_141.png" alt="drawing" width="50%"/>
Figure depicts Gallen & Van Essen's model of dorsal & ventral pathways in the brain. The right side shows the ventral pathway, which tells you what you're looking at, while the left side shows the dorsal pathway, which identifies locations, geometry, and motion. They seem fairly separate in the human (and primate) visual cortex (with a few interactions between them of course).  

<img src="Immagini_ANN/Imma_15.png" alt="drawing" width="50%"/>

Hubel & Weisel's <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1359523/pdf/jphysiol01247-0121.pdf" target="_blank"> (Hubel & Weisel's, 1962)</a> experiments with visual stimuli in cat brains, used electrodes to measure neural firings in cat brains in response to visual stimuli. They discovered that neurons in the V1 region are only sensitive to certain areas of a visual field (called "receptive fields"), and detect oriented edges in that area. For example, they demonstrated that if you showed the cat a vertical bar and start rotating it, at a particular angle the neuron will fire. Similarly, as the bar moves away from that angle, the activation of the neuron diminishes. These activation-selective neurons Hubel & Weisel named "simple cells", for their ability to detect local features. They also discovered that if you move the bar out of the receptive field, that particular neuron doesn't fire any more, but another neuron will. There are local feature detectors corresponding to all areas of the visual field, hence the idea that the human brain processes visual information as a collection of "convolutions".
Another type of neuron, which they named "complex cells", aggregate the output of multiple simple cells within a certain area (a great book to read about their work is: <a href="https://books.google.it/books?id=8YrxWojxUA4C&pg=PA106&redir_esc=y#v=onepage&q&f=false" target="_blank"> Brain and Visual Perception: history of a 25 years collaboration</a>). 

From a DL point of view, we can think of these narural structures as computing an aggregate of the activations using a function such as maximum, sum, sum of squares, or any other function not depending on the order. These complex cells detect edges and orientations in a region, regardless of where those stimuli lie specifically within the region. In other words, they are shift-invariant with respect to small variations in positions of the input (remind Fukushima's contributions - 1982).

#### 1.2.1.2 Fukushima model
<img src="Immagini_ANN/Imma_16.png" alt="drawing" width="30%"/>

Fukushima, in a 1979 paper (in Japanese) was the first to implement the idea of multiple layers of simple cells and complex cells with computer models, using a dataset of handwritten digits. Some of these feature detectors were hand-crafted or learned, though the learning used unsupervised clustering algorithms, trained separately for each layer, as backpropagation was not yet in use. 

Yann LeCun came in quite a few years later (1989, 1998) and implemented the same architecture, but this time trained them in a supervised setting using backpropagation. This is widely regarded as the genesis of modern convolutional neural networks. (Note: Riesenhuber at MIT in 1999 also re-discovered this architecture, though he didn't use backpropagation.)
Le Cun used the simple/complex cell hierarchy combined with supervised training and backpropagation to develope the first CNN at University of Toronto in ‘88-‘89 . The experiments used a small dataset of 320 ‘mouser-written’ digits. Performances of the following architectures were compared:

* Single FC(fully connected) Layer
* Two FC Layers
* Locally Connected Layers w/o shared weights
* Constrained network w/ shared weights and local connections
* Constrained network w/ shared weights and local connections 2 (more feature maps)

The most successful networks (constrained network with shared weights) had the strongest generalizability, and form the basis for modern CNNs. After a few years, always LeCunn implemented the first modern CNN which included a new type of Layers: *the pooling layer*.


## Let us build a simple neural network with Keras

We shall now build a very simple network using tensorflow and keras. First of all make sure that you are in the tensorflow environment of your Conda navigator. 
We shall use this network to process handwritten digits (as in the original work by Le Cun). The data set is kn own as the MNIST dataset.

In [None]:
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout
from tensorflow.keras.utils import to_categorical, plot_model
# from tensorflow.keras.layers import Conv2D, MaxPooling2D, AveragePooling2D,Flatten
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import pydot as pydot

##### Setting some parameters

In [None]:
EPOCHS= 200
BATCH_SIZE=128
VERBOSE=1
NB_CLASSES = 10 
H_HIDDEN=128
VALIDATION_SPLIT =0.2

* EPOCHS = maximum number of iterations
* BATCH_SIZE = number of samples you feed into your network at a time
* VALIDATION_SPLIT = fraction of the knowledge base which is reserved for validation
* VERBOSE =
* N_HIDDEN =

##### loading and checking the NMIST dataset

In [None]:
# load dataset
mnist=keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print(x_train.shape[0],'train samples')
print(x_test.shape[0],'test samples')

So, we have 60000 samples for training and 10000 samples in the test set. Let us look at a sample of data

In [None]:
# sample 25 mnist digits from train dataset
indexes = np.random.randint(0, x_train.shape[0], size=25)
images = x_train[indexes]
labels = y_train[indexes]

# plot the 25 mnist digits
plt.figure(figsize=(5,5))
for i in range(len(indexes)):
    plt.subplot(5, 5, i + 1)
    image = images[i]
    plt.imshow(image, cmap='gray')
    plt.axis('off')

# plt.savefig("mnist-samples.png")
plt.show()
# plt.close('all')

Let us now count how many instances of each class we have in out training and test set

In [None]:
# count the number of unique train labels
unique, counts = np.unique(y_train, return_counts=True)
print("Train labels: ", dict(zip(unique, counts)))

# count the number of unique test labels
unique, counts = np.unique(y_test, return_counts=True)
print("Test labels: ", dict(zip(unique, counts)))

This allows us to confirm that classes are almost equally represented in both train and test (and therefore that we have a balanced classification).

##### Preparing the data for the network

Images need to be prepared for the ingestion in the network.

1. we have 60k images 28x28 pixels. Images need to be converted to vector of 784 (28x28=784) values. 
2. The values need to be floating point and in double precision
3. They need to be normalized between 0 and 1
4. Target values need to be hot encoded. 

In [None]:
# reshaping
RESHAPED=784
x_train = x_train.reshape(60000,RESHAPED)
x_test  = x_test.reshape(10000,RESHAPED)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
# normalizing
x_train /=255
x_test /=255

# one hot encoding of target values
y_train = tf.keras.utils.to_categorical(y_train, NB_CLASSES)
y_test = tf.keras.utils.to_categorical(y_test, NB_CLASSES)


##### Building and compiling the model

In [None]:
model=tf.keras.models.Sequential()
model.add(keras.layers.Dense(NB_CLASSES,input_shape=(RESHAPED,),name='dense_layer',activation='softmax'))
model.summary()
#plot_model(model,to_file='model_10.png')
#plt.imshow(mpimg.imread('model_10.png'))
#plt.axis('off')

At this point, the network is created. Before running it, however, we have to select:
* a cost (or loss, or objective) function. Tensorflow provides a rich variety of loss functions. Here you have access to the <a href="https://keras.io/api/losses/" target="_blank"> complete list of options</a>.
* optimizer: the algorithm used to update the weights at each iteration in order to find the minimum of the loss function. Tensorflow provides an extensive choice of optimizers (check here for a <a href="https://keras.io/api/optimizers/" target="_blank"> complete list</a>).
* a criterion to evaluate performances (also known as metrics).

In the following example we shall use the Stochastic Gradient Descent as optimizer, the categorical crossentropy as loss (cost) function, and the accuracy as metric. 

In [None]:
model.compile(optimizer='SGD',loss='categorical_crossentropy',metrics=['accuracy'])

##### training the model 
We need to define other parameters:
* epochs: the number of times that the model is exposed to the entire training set.The optimization algorithm at the end of eache epoch corrects the weights.
* batch_size: number of training instances observed before a weight update

In [None]:
model.fit(x_train,y_train,batch_size=BATCH_SIZE,epochs=EPOCHS, verbose=VERBOSE,validation_split=VALIDATION_SPLIT)

##### Evaluating the results
It is worth noticing that the accuracy (and the loss) at some poch start improving by smaller and smaller amounts. This is what we call *convergence*. 

In [None]:
test_loss,test_accuracy=model.evaluate(x_test,y_test)
print('Loss: ',test_loss)
print('Accuracy:', test_accuracy)

#### A more complex architecture

Let us see if adding an additional layer (and hence a larger number of neurons and weights) can improve the results.
For convenience we repeat all steps of the network.

In [2]:
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout
from tensorflow.keras.utils import to_categorical, plot_model
from tensorflow.keras.layers import Conv2D, MaxPooling2D, AveragePooling2D,Flatten
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import pydot as pydot
EPOCHS= 50
BATCH_SIZE=128
VERBOSE=1
NB_CLASSES = 10 
N_HIDDEN=128
VALIDATION_SPLIT =0.2
# load dataset
mnist=keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print(x_train.shape[0],'train samples')
print(x_test.shape[0],'test samples')
# reshaping
RESHAPED=784
x_train = x_train.reshape(60000,RESHAPED)
x_test  = x_test.reshape(10000,RESHAPED)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
# normalizing
x_train /=255
x_test /=255
# one hot encoding of target values
y_train = tf.keras.utils.to_categorical(y_train, NB_CLASSES)
y_test = tf.keras.utils.to_categorical(y_test, NB_CLASSES)

60000 train samples
10000 test samples


##### building a more complex network


In [3]:
model=tf.keras.models.Sequential()
model.add(keras.layers.Dense(N_HIDDEN, input_shape=(RESHAPED,),name='dense_layer',activation='relu'))
model.add(keras.layers.Dense(N_HIDDEN,name='dense_layer_2',activation='relu'))
model.add(keras.layers.Dense(NB_CLASSES,name='dense_layer_3',activation='softmax'))
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_layer (Dense)          (None, 128)               100480    
_________________________________________________________________
dense_layer_2 (Dense)        (None, 128)               16512     
_________________________________________________________________
dense_layer_3 (Dense)        (None, 10)                1290      
Total params: 118,282
Trainable params: 118,282
Non-trainable params: 0
_________________________________________________________________


2021-11-22 09:53:18.855132: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2021-11-22 09:53:18.855398: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.


In [4]:
model.compile(optimizer='SGD',loss='categorical_crossentropy',metrics=['accuracy'])

In [5]:
model.fit(x_train,y_train,batch_size=BATCH_SIZE,epochs=EPOCHS, verbose=VERBOSE,validation_split=VALIDATION_SPLIT)

Train on 48000 samples, validate on 12000 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<tensorflow.python.keras.callbacks.History at 0x7f8ed8972b90>

In [10]:
test_loss,test_accuracy=model.evaluate(x_test,y_test,  verbose=0)
print('Loss: ',test_loss)
print('Accuracy:', test_accuracy)

Loss:  0.11789100770354272
Accuracy: 0.9663


##### Introducing Dropout

In [12]:
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout
from tensorflow.keras.utils import to_categorical, plot_model
from tensorflow.keras.layers import Conv2D, MaxPooling2D, AveragePooling2D,Flatten
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
#import pydot as pydot
#import graphviz as graphviz
import datetime
%load_ext tensorboard
EPOCHS= 200
BATCH_SIZE=128
VERBOSE=1
NB_CLASSES = 10 
N_HIDDEN=128
VALIDATION_SPLIT =0.2
DROPOUT = 0.3
# load dataset
mnist=keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print(x_train.shape[0],'train samples')
print(x_test.shape[0],'test samples')
# reshaping
RESHAPED=784
x_train = x_train.reshape(60000,RESHAPED)
x_test  = x_test.reshape(10000,RESHAPED)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
# normalizing
x_train /=255
x_test /=255
# one hot encoding of target values
y_train = tf.keras.utils.to_categorical(y_train, NB_CLASSES)
y_test = tf.keras.utils.to_categorical(y_test, NB_CLASSES)

60000 train samples
10000 test samples


##### Building the model

In [13]:
model=tf.keras.models.Sequential()
model.add(keras.layers.Dense(N_HIDDEN,input_shape=(RESHAPED,),name='dense_layer',activation='relu'))
model.add(keras.layers.Dropout(DROPOUT))
model.add(keras.layers.Dense(N_HIDDEN,name='dense_layer_2',activation='relu'))
model.add(keras.layers.Dropout(DROPOUT))
model.add(keras.layers.Dense(NB_CLASSES,name='dense_layer_3',activation='softmax'))
model.summary()

#plot_model(model,to_file="model_2.png")
#plt.imshow(mpimg.imread('model_2.png'))
#plt.axis('off')

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_layer (Dense)          (None, 128)               100480    
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense_layer_2 (Dense)        (None, 128)               16512     
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_layer_3 (Dense)        (None, 10)                1290      
Total params: 118,282
Trainable params: 118,282
Non-trainable params: 0
_________________________________________________________________


##### Compiling and running the model

In [14]:
model.compile(optimizer='SGD',loss='categorical_crossentropy',metrics=['accuracy'])

In [15]:
log_dir = "logs/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

In [16]:
model.fit(x_train,y_train,batch_size=BATCH_SIZE,epochs=EPOCHS, verbose=VERBOSE,validation_split=VALIDATION_SPLIT,callbacks=[tensorboard_callback])

Train on 48000 samples, validate on 12000 samples
Epoch 1/200
 1152/48000 [..............................] - ETA: 1:13 - loss: 2.3622 - accuracy: 0.1085

2021-11-22 10:09:47.164801: I tensorflow/core/profiler/lib/profiler_session.cc:184] Profiler session started.


Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200


Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78/200
Epoch 79/200
Epoch 80/200
Epoch 81/200
Epoch 82/200
Epoch 83/200
Epoch 84/200
Epoch 85/200
Epoch 86/200
Epoch 87/200
Epoch 88/200
Epoch 89/200
Epoch 90/200
Epoch 91/200
Epoch 92/200
Epoch 93/200
Epoch 94/200
Epoch 95/200
Epoch 96/200
Epoch 97/200
Epoch 98/200
Epoch 99/200
Epoch 100/200
Epoch 101/200
Epoch 102/200
Epoch 103/200
Epoch 104/200
Epoch 105/200
Epoch 106/200
Epoch 107/200
Epoch 108/200


Epoch 109/200
Epoch 110/200
Epoch 111/200
Epoch 112/200
Epoch 113/200
Epoch 114/200
Epoch 115/200
Epoch 116/200
Epoch 117/200
Epoch 118/200
Epoch 119/200
Epoch 120/200
Epoch 121/200
Epoch 122/200
Epoch 123/200
Epoch 124/200
Epoch 125/200
Epoch 126/200
Epoch 127/200
Epoch 128/200
Epoch 129/200
Epoch 130/200
Epoch 131/200
Epoch 132/200
Epoch 133/200
Epoch 134/200
Epoch 135/200
Epoch 136/200
Epoch 137/200
Epoch 138/200
Epoch 139/200
Epoch 140/200
Epoch 141/200
Epoch 142/200
Epoch 143/200
Epoch 144/200
Epoch 145/200
Epoch 146/200
Epoch 147/200
Epoch 148/200
Epoch 149/200
Epoch 150/200
Epoch 151/200
Epoch 152/200
Epoch 153/200
Epoch 154/200
Epoch 155/200
Epoch 156/200
Epoch 157/200
Epoch 158/200
Epoch 159/200
Epoch 160/200
Epoch 161/200
Epoch 162/200


Epoch 163/200
Epoch 164/200
Epoch 165/200
Epoch 166/200
Epoch 167/200
Epoch 168/200
Epoch 169/200
Epoch 170/200
Epoch 171/200
Epoch 172/200
Epoch 173/200
Epoch 174/200
Epoch 175/200
Epoch 176/200
Epoch 177/200
Epoch 178/200
Epoch 179/200
Epoch 180/200
Epoch 181/200
Epoch 182/200
Epoch 183/200
Epoch 184/200
Epoch 185/200
Epoch 186/200
Epoch 187/200
Epoch 188/200
Epoch 189/200
Epoch 190/200
Epoch 191/200
Epoch 192/200
Epoch 193/200
Epoch 194/200
Epoch 195/200
Epoch 196/200
Epoch 197/200
Epoch 198/200
Epoch 199/200
Epoch 200/200


<tensorflow.python.keras.callbacks.History at 0x7f8eb8459b10>

In [1]:
test_loss,test_accuracy=model.evaluate(x_test,y_test, verbose=0)
print('Loss: ',test_loss)
print('Accuracy:', test_accuracy)

NameError: name 'model' is not defined

#### what happens:
<img src="Immagini_ANN/Imma_21.gif" alt="drawing" width="20%"/>

##### Saving a model

Once you have trained a model you want to save the weight configuration so that you can apply to data not belonging to the training set for which you want to predict the target value.

##### Saving the weights of a model

Once you have trained a model you want to save the weight configuration so that you can apply to data not belonging to the training set for which you want to predict the target value.You can save them either in the internal tensorflow or in the internal keras format. The second one is recommendable since it can be exported also in other frameworks. 

In [11]:
# save weights to a tensorflow internal format
model.save_weights('./weights/my_model')

In [None]:
# save weights to keras internal format
model.save_weights('my_model.h5',save_format='h5')

##### Saving the model & the weights

If you wish to save the model (architecture + parameters) and the weights you must use the command:

In [12]:
model.save('my_model.h5')

##### To load a trained model

In [13]:
model=tf.keras.models.load_model('my_model.h5')