##### Copyright 2018 The TensorFlow Authors.

In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# A hand-waving introduction to ANN

PS: Much of this has been inspired by the recent Udacity course: https://classroom.udacity.com/courses/ud187 . Some of the descriptions have also been copied from one of their notebooks.

#Diode Equation

Lets think of the way  William Shockley might have found the diode equation. All he had with him were some measurements of $I_D$ given values for $V_D$. 

We already know that the equation that he found the best is $I_D = I_O (e^{V_D/V_T} - 1)$. 

Lets assume that we have the following readings of $V_D$ = (0, 0.2, 0.4, 0.6, 0.7) .

Lets cheat and find the values for $I_D$. Assume that we have been given these two arrays of number and are asked to act Shockly and find a model which can churn out values of $I_D$ given any value of $V_D$.

In [0]:
import matplotlib as mp
import numpy as np

I_o = 3.2e-12 
v_d = np.array([0,0.1, 0.2,0.22, 0.3, 0.35, 0.4,0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8],  dtype=float)
i_d = I_o*(np.exp(v_d/0.025) - 1) 
print(i_d)


<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/examples/courses/udacity_intro_to_tensorflow_for_deep_learning/l02c01_celsius_to_fahrenheit.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/examples/courses/udacity_intro_to_tensorflow_for_deep_learning/l02c01_celsius_to_fahrenheit.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

The first step in any machine learning activitiy is to check how does the data "look" like! Dont forget that a lot of our intuitive gut feeling is based around vision. (We will talk more about this later.) This exactly why methods like principal component analysis are extremely powerful. 

So lets plot it. 

In [0]:
mp.pyplot.plot(v_d,i_d)

So from the plot what do you think the equation might be? Maybe its quadratic? Maybe its exponential. So if its exponential then it will look like $I_D = x (e^{yV_D} + p)$. So to get this trick to work we need to find these three **parameters**  and thats why such methods are called .....guess.....** parametric methods** (we will discuss more about this later!)

How would you find these parameters? What are the possible issues and benefits of this method?

Now lets jump to the other type of trick where we do not impose any "model" on the data. Rather, we expect the data to lead our way! 

**In God we trust, rest others bring data!**...late Edwards Deming (father of lean startup)

Lets build a Neural Network to get model these two arrays of numbers.

## Import dependencies

First, import TensorFlow. Here, we're calling it `tf` for ease of use. We also tell it to only display errors.



In [0]:
from __future__ import absolute_import, division, print_function
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)

import numpy as np

## Set up training data

As we saw before, supervised Machine Learning is all about figuring out an algorithm given a set of inputs and outputs. Since the task in this Codelab is to create a model that can give the temperature in Fahrenhet when given the degrees in Celsius, we create two lists `celsius_q` and `fahrenheit_a` that we can use to train our model.

### Some Machine Learning terminology

 - **Feature** — The input(s) to our model. In this case, a single value — the diode voltage in V.

 - **Labels** — The output our model predicts. In this case, a single value — the diode current in A.
 
 - **Example** — A pair of inputs/outputs used during training. In our case a pair of values from `V_D` and `I_D` at a specific index. 


## Create the model

Next create the model. We will use simplest possible model we can, a Dense network. Since the problem is straightforward, this network will require only a single layer, with a single neuron. 

### Build a layer

We'll call the layer `l0` and create it by instantiating `tf.keras.layers.Dense` with the following configuration:

*   `input_shape=[1]` — This specifies that the input to this layer is a single value. That is, the shape is a one-dimensional array with one member. Since this is the first (and only) layer, that input shape is the input shape of the entire model. The single value is a floating point number, representing diode voltage.

*   `units=1` — This specifies the number of neurons in the layer. The number of neurons defines how many internal variables the layer has to try to learn how to solve the problem. Since this is the final layer, it is also the size of the model's output — a single float value representing diode current. (In a multi-layered network, the size and shape of the later would need to match the `input_shape` of the next layer.)


In [0]:
l0 = tf.keras.layers.Dense(units=1, input_shape=[1])  
#l0 = tf.keras.layers.Dense(units=1, activation='elu', input_shape=[1])  

### Assemble layers into the model

Once layers are defined, they need to be assembled into a model. The Sequential model definition takes a list of layers as argument, specifying the calculation order from the input to the output.

This model has just a single layer, l0.

In [0]:
model = tf.keras.Sequential([l0])

**Note**

You will often see the layers defined inside the model definition, rather than beforehand:

```python
model = tf.keras.Sequential([
  tf.keras.layers.Dense(units=1, input_shape=[1])
])
```

## Compile the model, with loss and optimizer functions

Before training, the model has to be compiled. When compiled for training, the model is given:

- **Loss function** — A way of measuring how far off predictions are from the desired outcome. (The measured difference is called the "loss".

- **Optimizer function** — A way of adjusting internal values in order to reduce the loss.


In [0]:
model.compile(loss='mean_squared_error',
              optimizer=tf.keras.optimizers.Adam(0.1))

These are used during training (`model.fit()`, below) to first calculate the loss at each point, and then improve it. In fact, the act of calculating the current loss of a model and then improving it is precisely what training is.

During training, the optimizer function is used to calculate adjustments to the model's internal variables. The goal is to adjust the internal variables until the model (which is really a math function) mirrors the actual equation for converting Celsius to Fahrenheit.

TensorFlow uses numerical analysis to perform this tuning, and all this complexity is hidden from you so we will not go into the details here. What is useful to know about these parameters are:

The loss function ([mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error)) and the optimizer ([Adam](https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/)) used here are standard for simple models like this one, but many others are available. It is not important to know how these specific functions work at this point.

One part of the Optimizer you may need to think about when building your own models is the learnign rate (`0.1` in the code above). This is the step size taken when adjusting values in the model. If the value is too small, it will take too many iterations to train the model. Too large, and accuracy goes down. Finding a good value often involves some trial and error, but the range is usually within 0.001 (default), and 0.1

## Train the model

Train the model by calling the `fit` method. 

During training, the model takes in v_d values, performs a calculation using the current internal variables (called "weights") and outputs values which are meant to be the i_d values. Since the weights are intially set randomly, the output will not be close to the correct value. The difference between the actual output and the desired output is calculated using the loss function, and the optimizer function directs how the weights should be adjusted. 

This cycle of calculate, compare, adjust is controlled by the `fit` method. The first argument is the inputs, the second argument is the desired outputs. The `epochs` argument specifies how many times this cycle should be run, and the `verbose` argument controls how much output the method produces.

In [0]:
history = model.fit(v_d, i_d, epochs=500, verbose=False)
print("Finished training the model")

In later videos, we will go into more details on what actually happens here and how a Dense layer actually works internally.

## Display training statistics

The `fit` method returns a history object. We can use this object to plot how the loss of our model goes down after each training epoch. 

We'll use [Matplotlib](https://matplotlib.org/) to visualize this (you could use another tool). As you can see, our model improves very quickly at first, and then has a steady, slow improvement until it is very near "perfect" towards the end.



In [0]:
import matplotlib.pyplot as plt
plt.xlabel('Epoch Number')
plt.ylabel("Loss Magnitude")
plt.plot(history.history['loss'])

## Use the model to predict values

Now you have a model that has been trained to learn the relationshop between `v_d` and `i_d`. You can use the predict method to have it calculate the diode current. But unfortunately the error it converges to seems pretty bad. Lets try to use it anyway. 

So, for example, if the diode voltage is 0.72 V then we expect a diode current of 10.3A. Lets see what we get.

In [0]:
print(model.predict([0.72]))

##Post-mortem 

Is this good? Hmmmmmmmm. Lets run another time.....But the error seems to be pretty high. What can we do now?

Mark that the model we used is linear. And we are trying to learn something which is exponential! Can we not add some nonlinearlity into the model? Try by adding a nonlinear activation function in the model and see what happens. 

The error seems to be going down. Should we increase the number of epochs? Lets try that as well. 






## Looking at the layer weights

Finally, let's print the internal variables of the Dense layer. 

In [0]:
print("These are the layer variables: {}".format(l0.get_weights()))

### Some more experiments

Something is not going well. What can we do? Can we add more layers? 

Universal approximation theorem. 
A single hidden layer can approximate any function!

In [0]:
l0 = tf.keras.layers.Dense(units=10, activation='relu',input_shape=[1])  
l1 = tf.keras.layers.Dense(units=1)  
model = tf.keras.Sequential([l0, l1])
model.compile(loss='mean_squared_error', optimizer=tf.keras.optimizers.Adam(0.1))
history = model.fit(v_d, i_d, epochs=800, verbose=False)
print("Finished training the model")
print(model.predict([0.72]))
#print("Model predicts that 100 degrees Celsius is: {} degrees Fahrenheit".format(model.predict([0.72])))
#print("These are the l0 variables: {}".format(l0.get_weights()))
#print("These are the l1 variables: {}".format(l1.get_weights()))
#print("These are the l2 variables: {}".format(l2.get_weights()))
plt.xlabel('Epoch Number')
plt.ylabel("Loss Magnitude")
plt.plot(history.history['loss'])

##Play-around

We still did not get anything decent. But the theory says it should work. Try increasing the number of nodes in the hidden layer and see what happens. 

So which is better? Parametric or data-centered ANN approach? 

What else did you learn from this?