In [1]:
from tensorflow import keras
from tensorflow.keras import layers

# What is deep learning?
Some of the most impressive advances in AI have been in deep learning. Natural Language translation, image recognition and game playing are all tasks where deep learning models have been used.

Deep learning is an approach to ML characterized by the deep stacks of computations. The depth of computation is what has enabled deep learning models to disentangle the kinds of complex and hierarchial patterns found in the most challenging real-world problems.

## The linear unit.
A diagram of a neuron or unit with 1 input looks like:-

![image.png](attachment:image.png)

The input is `x`. Its connection to the neuron has a weight which is <b>w</b>. Whenever a value flows through a connection, you multiply the value by the connection's weight. For the input <b>x</b>, what reaches the neuron is `w*x`. A neural network learns by modifying its weights.

The `b` is a special kind of weight called the <b>bias</b>. The bias doesn't have any input data associated with it, instead we put a `1` in the diagram so that the value that reaches the neuron is just `b` as `1*b=b`. The bias enables the neuron to modify the output independently of its inputs.

The `y` is the same value the neuron ultimately outputs. To get the output the neuron sums up all the values it receives through its connections. This neuron's activation is `y=w*x+b` or as the formula $y = wx + b$

A linear unit with 3 inputs:

![image.png](attachment:image.png)

Formula for this neuron will be $y = w_0x_0 + w_1x_1 + w_2x_2 + b$

In [2]:
# linear unit in Keras
# Create a network with 1 linear unit

model = keras.Sequential([
    layers.Dense(units=1, input_shape=[3])
])

With the first argument, units, we define how many outputs we want. In this case we are just predicting 'calories', so we'll use units=1.

With the second argument, input_shape, we tell Keras the dimensions of the inputs. Setting input_shape=[3] ensures the model will accept three features as input ('sugars', 'fiber', and 'protein').

This model is now ready to be fit to training data!

<b>Why is input_shape a Python list?</b>

The data we'll use in this course will be tabular data, like in a Pandas dataframe. We'll have one input for each feature in the dataset. The features are arranged by column, so we'll always have input_shape=[num_columns]. The reason Keras uses a list here is to permit use of more complex datasets. Image data, for instance, might need three dimensions: [height, width, channels].

In [3]:
a = model.weights

In [4]:
a

[<tf.Variable 'dense/kernel:0' shape=(3, 1) dtype=float32, numpy=
 array([[0.7680495 ],
        [0.05747128],
        [1.1977309 ]], dtype=float32)>,
 <tf.Variable 'dense/bias:0' shape=(1,) dtype=float32, numpy=array([0.], dtype=float32)>]

Correct: Do you see how there's one weight for each input (and a bias)? Notice though that there doesn't seem to be any pattern to the values the weights have. Before the model is trained, the weights are set to random numbers (and the bias to 0.0). A neural network learns by finding better values for its weights.

# Deep Neural Networks
Here we will see how we can build neural networks capable of learning the complex kinds of relationships deep neural nets.

The key idea here is <i>modularity</i> building up a complex network from simpler function units. We've seen how a linear unit computes a linear function, now we will see how to combine and modify these single units to model more complex relationships.

## Layers
Neural networks typically organize their neurons into layers. When we collect together linear units having a common set of inputs we get a dense layer.

![Screen%20Shot%202021-08-30%20at%2012.49.28%20AM.png](attachment:Screen%20Shot%202021-08-30%20at%2012.49.28%20AM.png)

You could think of each layer in a neural network as performing some kind of relatively simple transformation. Through a deep stack of layers, a neural network can transform its inputs in more and more complex ways. In a well-trained neural network, each layer is a transformation getting us a little bit closer to a solution.

### Many kinds of Layers
A <b>layer</b> in Keras is a very general kind of thing. A layer can be, essentially, any kind of data transformation. Many layers like the convolutional and recurrent layers, transform data through use of neurons and differ primarily in the pattern of connections they form. Others though are used for feature engineering or just simple arithmetic.

## The Activation Function
It turns out, however that two dense layers with nothing in between are no better than a single dense layer by itself. Dense layers by themselves can never move us out of the world of lines and planes. What we need is something <b>nonlinear</b>, what we need are activation functions.

![Screen%20Shot%202021-08-30%20at%2012.56.32%20AM.png](attachment:Screen%20Shot%202021-08-30%20at%2012.56.32%20AM.png)

An <b>activation function</b> is simply some function we apply to each of the layer's outputs (its <i>activations</i>). The most common is the <i><b>rectifier function max(0, x)</b></i>.

![Screen%20Shot%202021-08-30%20at%2012.58.49%20AM.png](attachment:Screen%20Shot%202021-08-30%20at%2012.58.49%20AM.png)

The rectifier function has a graph that's a line with the negative part "rectified" to zero. Applying the function to the outputs of a neuron will put a <i>bend</i> in the data, moving us away from simple lines.

When we attach the rectifier to a linear unit, we get a <b>rectified linear unit</b> or <b>ReLU</b>. Applying the ReLU activation to a linear unit means the output becomes `max(0, w*x + b)`, which we might draw in a diagram like:

![Screen%20Shot%202021-08-30%20at%201.02.18%20AM.png](attachment:Screen%20Shot%202021-08-30%20at%201.02.18%20AM.png)

## Stacking Dense Layers
Now that we have some non-linearity, let's see how we can stack layers to get complex data transformations

![Screen%20Shot%202021-08-30%20at%201.07.34%20AM.png](attachment:Screen%20Shot%202021-08-30%20at%201.07.34%20AM.png)

The layers before the output are sometimes called <b>hidden</b> since we never see their outputs directly.

Now, notice that the final output layer is a linear unit (meaning, no activation function). That makes this network <b>appropriate to a regression task</b>, where we are trying to predict some arbitary numeric value. Other tasks (like classification) might require an activation function on the output.

## Building Sequential Models
The Sequential model we've been using will connect together a list of layers in order from first to last: the first layer gets the input, the last layers produces the output. This creates the model.

In [5]:
model = keras.Sequential([
    # the hidden ReLU layers
    layers.Dense(units=4, activation='relu', input_shape=[2]),
    layers.Dense(units=3, activation='relu'),
    # the linear output layer
    layers.Dense(units=1),
])