# Neural network basics
1. input layer to a neural network
2. how this is connected to an output layer
3. how hidden layers are added in between
4. how the layers are made of nodes
5. what nodes do
6. how layers are connected to each other to form fully connected neural networks

## 1. Input layer
The input layer to a neural network takes numbers. All the input data is converted to numbers. Everything is a number.

Neural networks take numbers as vectors, matrices, or tensors.

Speaking of numbers, you might have heard terms like **normalization or standardization**. In **standardization**, numbers are converted to be **centered around a
mean of zero**, with **one standard deviation** on **each side** of the **mean**.

Standardization is basically a button to push, and it doesn’t even need a lever, so there are no parameters to set.

With TensorFlow 2.0, Keras is built in and the recommended model API, referred to now as TF.Keras. TF.Keras is based on object-oriented programming with a collection of classes and associated methods and properties.

Let’s start simply. Say we have a dataset of housing data. Each row has 14 columns
of data. One column indicates the sale price of a home. We are going to call that the
label. The other 13 columns have information about the house, such as the square
footage and property tax. It’s all numbers. We are going to call those the features. What
we want to do is learn to predict (or estimate) the label from the features.

In [2]:
import tensorflow as tf
from tensorflow import keras
from keras import Input

In [3]:
Input(shape=(13, ))

<KerasTensor: shape=(None, 13) dtype=float32 (created by layer 'input_1')>

 We will start by first importing the Keras module from TensorFlow, and then
instantiate an Input class object. For this class object, we define the shape or dimensions of the input. In our example, the input is a one-dimensional array (a vector) of 13 elements, one for each feature

This output shows you what Input(shape=(13,)) evaluates to. It produces a tensor object named input_1.

This name will be useful later in assisting you in debugging your models. The `None` in shape shows that the input object takes an unbounded number.
of entries (examples or rows) of 13 elements each.

That is, at runtime it will bind the number of one-dimensional vectors of 13 elements to the actual number of examples (rows) you pass in, referred to as the **(mini) batch size**. The dtype shows the default data type of the elements, which in this case is a 32-bit float (single precision).

## 2 Deep neural networks
The neural network has one or more layers between the input layer and the output layer == Deep neural network.

Fully connected neural network (FCNN): every node on each layer is connected to every other node on the next layer

<img src="img.png">

## 3. Feed-forward networks
The DNN and CNN known as feed forward neural networks.

Feed-forward means that data moves through the network sequentially, in one direction, from the input layer to the output layer.

The inputs are passed as parameters in the input layer, and the function performs a sequenced set of actions based on the
inputs (in the hidden layers) and then outputs a result (the output layer).


## 4. Sequential API Methods
The sequential API method is easier to read and follow for beginners, but the tradeoff is
that it is less flexible.

Essentially, you create an empty feed-forward neural network with the Sequential class object, and then “add” one layer at a time, until the output
layer.

In [5]:
from keras import Sequential
from keras import layers
model = Sequential(name="my_sequential")

In [6]:
model.add(layers.Dense(2, activation="relu", name="layer1"))

In [7]:
model.add(layers.Dense(3, activation="relu", name="layer2"))

Alternatively, the layers can be specified in sequential order as a list passed as a parameter when instantiating the Sequential class object

In [20]:
model2 = Sequential(
[    layers.Dense(2, activation="relu", name="layer1"),
    layers.Dense(3, activation="relu", name="layer2")]
)

In [10]:
model

<keras.engine.sequential.Sequential at 0x17d633143d0>

In [21]:
model2

<keras.engine.sequential.Sequential at 0x17d6332c4c0>

if I am writing code for production, I use the sparser list method, where I can visualize and edit the code more easily.

In [24]:
model2.build(input_shape=((3,)))
model2.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer1 (Dense)              (3, 2)                    8         
                                                                 
 layer2 (Dense)              (3, 3)                    9         
                                                                 
Total params: 17
Trainable params: 17
Non-trainable params: 0
_________________________________________________________________


In [25]:
model.build(input_shape=(3,3))
model.summary()

Model: "my_sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer1 (Dense)              (3, 2)                    8         
                                                                 
 layer2 (Dense)              (3, 3)                    9         
                                                                 
Total params: 17
Trainable params: 17
Non-trainable params: 0
_________________________________________________________________


## 5. Functional API method
The functional API method is more advanced, allowing you to construct models that are nonsequential in flow—such as branches, skip links, and multiple inputs and outputs.

You build the layers separately and then tie them together. This latter step gives you the freedom to connect layers in creative ways.


Essentially, for a feed-forward neural network, you create the layers, bind them to another layer or layers, and then pull all the layers together in
a final instantiation of a Model class object.

In [26]:
from keras import Model

In [27]:
input = Input((28,28,1))

In [31]:
x = layers.Conv2D(filters= 16, kernel_size=3, activation="relu")(input)
x = layers.Conv2D(32, 3, activation="relu")(x)
x = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(32, 3, activation="relu")(x)
x = layers.Conv2D(16, 3, activation="relu")(x)
encoder_output = layers.GlobalMaxPooling2D()(x)

In [32]:
model = Model(input, encoder_output)

In [33]:
model.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 conv2d_4 (Conv2D)           (None, 26, 26, 16)        160       
                                                                 
 conv2d_5 (Conv2D)           (None, 24, 24, 32)        4640      
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 8, 8, 32)         0         
 2D)                                                             
                                                                 
 conv2d_6 (Conv2D)           (None, 6, 6, 32)          9248      
                                                                 
 conv2d_7 (Conv2D)           (None, 4, 4, 16)          4624      
                                                           

<img src="img_1.png">

Model  = Input + Output
Steps:
1. Construct input
2. Construct hidden layers. **put input to hidden layers**
3. Construct output
model = Model(intput, output)

##  6. Input shape vs. input layer
The input shape and input layer can be confusing at first. They are not the same thing.

More specifically, the number of nodes in the input layer does not need to match the shape of the input vector. That’s because every element in the input vector will be passed to every node in the input layer
<img src="img_2.png">

Each connection between an element in the input vector and a node in the input layer has a weight, and each node in the input layer has a bias.

The weights and biases are what the neural network will “learn” during training. The weights and biases are also referred to as parameters. These values stay with the model after it is trained. This operation will otherwise be invisible to you.

## 7. Dense Layers
In TF.Keras, layers in an FCNN are called dense layers. A dense layer has n number of nodes and is fully connected to the previous layer.

In this example, we are going to use a neural network as a regressor, which means the neural network will output a single
real number:
Input layer = 10 nodes
Hidden layer = 10 nodes
Output layer = 1 node
*7.1 Sequential API*

In [43]:
model = Sequential([
    layers.Dense(10, input_shape=(13,)),
    layers.Dense(10),
    layers.Dense(1)
])
model.summary()

Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_9 (Dense)             (None, 10)                140       
                                                                 
 dense_10 (Dense)            (None, 10)                110       
                                                                 
 dense_11 (Dense)            (None, 1)                 11        
                                                                 
Total params: 261
Trainable params: 261
Non-trainable params: 0
_________________________________________________________________


*Functional API*

In [44]:
input_f = Input((13,))

In [45]:
x = layers.Dense(10)(input_f)
hidden_layers = layers.Dense(10)(x)
outputs = layers.Dense(1)(hidden_layers)
model = Model(input_f, outputs)

In [46]:
model.summary()

Model: "model_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_5 (InputLayer)        [(None, 13)]              0         
                                                                 
 dense_12 (Dense)            (None, 10)                140       
                                                                 
 dense_13 (Dense)            (None, 10)                110       
                                                                 
 dense_14 (Dense)            (None, 1)                 11        
                                                                 
Total params: 261
Trainable params: 261
Non-trainable params: 0
_________________________________________________________________


<img src="img_3.png">

## Activation functions
When training or predicting (via inference), each node in a layer will output a value to the
nodes in the next layer. We don’t want to pass the value as-is, but instead sometimes want
to change the value in a particular manner. This process is called an activation function.

Think of a function that returns a result, like return result. In the case of an activation function, instead of returning result, we would return the result of passing the
result value to another (activation) function, like return A(result), where A() is the
activation function. Conceptually, you can think of this as follows:

def layer(params):
     """ inside are the nodes """
     result = some_calculations
     return A(result)
def A(result):
     """ modifies the result """
     return some_modified_value_of_result

Activation functions assist neural networks in learning faster and better. By default, when no activation function is specified, the values from one layer are passed as-is (unchanged) to the next layer. The most basic activation function is a step function. If the value is greater than 0, a 1 is outputted; otherwise, a 0 is outputted. The step function hasn’t been used in a long, long time.

Activation functions assist in finding the nonlinear separations and corresponding clustering of nodes within input sequences, which then learn the (near) linear relationship to the output. Most of the time, you will use three activation functions: the rectified linear unit (ReLU), sigmoid, and softmax.

The ReLU is generally used between layers. While early researchers used different activation functions (such as a hyperbolic tangent) between layers, researchers found that the ReLU produced the best result in training a model.

In [48]:
input_f = Input((28,28,1))
X = layers.Dense(10)(input_f)
X = layers.ReLU()(X)
X = layers.Dense(10)(X)
X = layers.ReLU()(X)
X = layers.Dense(1)(X)
model = Model(input_f, X)
model.summary()

Model: "model_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_7 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 dense_17 (Dense)            (None, 28, 28, 10)        20        
                                                                 
 re_lu_2 (ReLU)              (None, 28, 28, 10)        0         
                                                                 
 dense_18 (Dense)            (None, 28, 28, 10)        110       
                                                                 
 re_lu_3 (ReLU)              (None, 28, 28, 10)        0         
                                                                 
 dense_19 (Dense)            (None, 28, 28, 1)         11        
                                                                 
Total params: 141
Trainable params: 141
Non-trainable param

## Shorthand Syntax
TF.Keras provides a shorthand syntax when specifying layers. You don’t actually need
to separately specify activation functions between layers, as we did in the previous
example. Instead, you can specify the activation function as a (keyword) parameter
when instantiating a Dense layer.

In [49]:
model = Sequential([
    layers.Dense(10, input_shape=(28, 28, 1), activation="relu"),
    layers.Dense(10, activation="relu"),
    layers.Dense(1)
])
model.summary()

Model: "sequential_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_20 (Dense)            (None, 28, 28, 10)        20        
                                                                 
 dense_21 (Dense)            (None, 28, 28, 10)        110       
                                                                 
 dense_22 (Dense)            (None, 28, 28, 1)         11        
                                                                 
Total params: 141
Trainable params: 141
Non-trainable params: 0
_________________________________________________________________


## 10. Improving accuracy with an optimizer

Once you’ve completed building the feed-forward portion of your neural network,  you need to add a few things for training the model. This is done with the compile() method. This step adds the backward propagation during training.

 Each time we send data (or a batch of data) forward through the neural network, it calculates the errors in the predicted results (known as the loss) from the actual values (called labels) and uses that information to incrementally adjust the weights and biases of the nodes. This, for a model, is the process of learning.

The calculation of the error, as I’ve said, is called a loss. It can be calculated in many ways. Since we designed our example neural network to be a regressor (meaning that the output, house price, is a real value), we want to use a loss function that is best suited for a regressor. Generally, for this type of neural network, we use the mean square error method of calculating a loss. In Keras, the compile() method takes a (keyword) parameter loss used to specify how we want to calculate the loss. We are going to pass it the value mse (for mean square error).

For our regressor neural network, we will use the rmsprop method (root mean square
property):
model.compile(loss='mse', optimizer='rmsprop')

