# Welcome to Deep Learning! #

- do deep learning for **regression** and **classification**
- design **neural network architectures**
- navigate the **loss landscape**
- master **stochastic gradient descent**
- solve real world problems

You'll be prepared for deep learning if you've taken our *Introduction to Machine Learning* course.

Let's get started!

# The Linear Unit #

A single neuron with one input looks like:

<figure style="padding: 1em;">
<img src="https://i.imgur.com/xxS8rzf.png" width="250" alt="Diagram of a linear unit.">
<figcaption style="textalign: center; font-style: italic"><center>The Linear Unit
</center></figcaption>
</figure>

When reading this diagram think about the computation as flowing from left to right. The numbers on the connections we call **weights** and the values that flow from input to output we call **activations**. Notice that this neuron has a constant input of 1 attached; its connection has a special weight called the **bias**. This neuron has two weights, `w` and `b`.

The rule is that whenever an activation flows through a connection, you multiply it by the weight, and to get the output of the unit you just sum up all of the inputs. So, this unit computes a function like $y = w x + b$, or in Python `output = w * input + b`.

# Example #

Say the weights on our neuron happened to be `w=3` and `b=2`. What would we get if we plug in `x=-4`?

<figure style="padding: 1em;">
<img src="https://i.imgur.com/.png" width="300" alt="Diagram of neural computation.">
<figcaption style="textalign: center; font-style: italic"><center>Computing with the linear unit.
</center></figcaption>
</figure>

Which checks with our formula: $y = 3(-4) + 2 = -10$.

(By the way, running all of your training data through a network like this is sometimes called doing the *forward pass*.)

# A Linear Unit Fits a Line #

Most of the problems we'll work in this course will be *curve-fitting* problems. Given some data-points, we want to draw a curve that runs through the points as close as possible. (These are also called *regression* problems.)

What kind of curve does a linear unit fit? Does the formula $y=w x + b$ look familiar? It's an equation of a line! It's the slope-intercept equation, where $w$ is the slope and $b$ is the y-intercept. That's why we call it the <em>linear</em> unit.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/.png" width="300" alt="Three input connections: x0, x1, and x2, along with the bias.">
<figcaption style="textalign: center; font-style: italic"><center><strong>Left: </strong>The untrained linear unit. <strong>Right: </strong>The trained linear unit.
</center></figcaption>
</figure>

When first created, a neuron has its weights set randomly. The goal of training is to find values for the weights that fit the curve. For our linear unit, we're trying to find the best slope ($w$) and y-intercept ($b$).

# Multiple Inputs #

What if we wanted to fit a curve to more than one input? That's easy enough. We can just add more input connections to the neuron. To find the output, multiply each input to its connection weight and then add them all together.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/.png" width="300" alt="Three input connections: x0, x1, and x2, along with the bias.">
<figcaption style="textalign: center; font-style: italic"><center>A linear unit with three inputs.
</center></figcaption>
</figure>

The formula for this neuron would be $y = w_0 x_0 + w_1 x_1 + b$. A linear unit with two inputs will fit a plane. (And one more more inputs than that will fit a *hyperplane*!)

# Example - Red Wine Quality #

Now let's see this in action! Our goal in this example will be to predict the perceived quality of a wine (on a scale of 3-8) given its *residual sugar* content, which is the amount of grape sugar remaining after fermentation. High levels of residual sugar make a wine *sweet* while low levels make it *dry*. The data is from the *Red Wine Quality* dataset.

In [None]:
#$HIDE$
import pandas as pd

red_wine = pd.read_csv('../input/dl-course-data/dl-course-data/red-wine.csv')

# Create training and validation splits
df_train = red_wine.sample(frac=0.7, random_state=0)

# Neural networks perform best when the data is in on a common scale.
# We will rescale each feature into the interval $[0, 1]$.
max_ = df_train.max(axis=0)
min_ = df_train.min(axis=0)
df_train = (df_train - min_) / (max_ - min_)

# Split feature and target
x_train = df_train['residual sugar']
y_train = df_train['quality']

In Keras, you can create a model with a single linear unit using what's called a `Dense` layer. Most neural networks are built by stacking layers of neurons that connect in a particular way, which we'll learn about in Lesson 2. (Stacking layers is what `Sequential` does.)

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

# Create a network with 1 linear unit
model = keras.Sequential([
    layers.Dense(units=1)
])

When first created, a model has its weights initialized randomly. We'll need to fit it to the training data before we make predictions, which is what this next hidden cell will do. Take a look now if you like, but we'll go over the training process fully in Lesson 3.

In [None]:
$HIDE$
# Add the optimizer and loss function
model.compile(
    optimizer='sgd',
    loss='mse',
)

# Fit the network to the training data
history = model.fit(
    x=x_train,
    y=y_train,
    batch_size=256,
    epochs=50,
    verbose=0,
)

<blockquote style="margin-right:auto; margin-left:auto; background-color: #ebf9ff; padding: 1em; margin:24px;">
    <strong>Why Not Just Use Scikit-Learn?</strong><br>
If you've studied machine learning much, you've probably come across linear regression. A single `Dense(units=1)` layer, in fact, creates a linear regression model equivalent to those in scikit-learn. So why go to all the trouble?

It's always good to start your model development with the simplest model possible. For one, this can help you check that there aren't any bugs in the rest of your code. Secondly, it gives you a solid baseline to start from. You know how well linear regression does -- can deep learning do better?
</blockquote>

# Conclusion #
