# Core TensorFlow fundamentals

With numpy you can build ML and DL (neural networks) models. Building complex DL networks requires a lot of computations and caching intermediate results (weights, gradients).

There are several frameworks(like TensorFlow and PyTorch) that keep track of computations and their gradients, which can speed up your machine learning development significantly. Besides these frameworks often use parallel computing that can reduce training time enormously.

https://medium.com/@camrongodbout/tensorflow-in-a-nutshell-part-one-basics-3f4403709c9d

### Setup notebook

In [None]:
import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.python.framework import ops

%matplotlib inline
np.random.seed(1)

## TensorFlow in a nutshell

https://www.tensorflow.org/programmers_guide/low_level_intro

TensorFlow(by Google) is a framework for computing mathematical models in a __computation graph environment__. 

Think of a computational graph as a network(object) of nodes and edges:  
 - each __node__ is an __operation__ object, contaning functions that consume and produce tensors
 - each __edge__ as a __Tensor__ object, representing the values(`numpy ndarrays`) that will flow through the graph.

**Important:** `tf.Tensors` do not have values, they are just handles to elements in the computation graph. These elements in the graph are objects that hold values, functions, how it must be computed(and derived), and attributes like shape, type.
The values are stored as n-dimensional (numpy) arrays. The shape can be fully or even partially known.

<img src="../data/reg_images/tensor_img.png" width=200px>


When you create/specify operations needed for computation, TensorFlow adds and arranges these __'ops'__ as nodes automagically to the default computation graph.

TensorFlow Core programs consists of __two discrete sections__:
 - __Building__ the computational graph (a tf.Graph)
 - __Running__ the computational graph (using a tf.Session)

Variables and operations that are created, are automagically added to the default graph in TensorFlow. The graph default is instantiated when the library is imported. It's recommended to work in the default graph. However, creating a (named) Graph object instead of using the default graph is useful __when creating multiple models in one file that do not depend on each other__.

```python
# ! only needed when using multiple independent graphs within the same file:

new_graph = tf.Graph()

# Set as default and work within the context/environment:
with new_graph.as_default():
    new_g_const = tf.constant([1., 2.])

# Handle to the default graph with
default_g = tf.get_default_graph()
    ```

An Operation also referred to as __op__ can return zero or more tensors which can be used later on in the graph. Each operation can be handed a constant, array, n-dimensional matrix. Another word for an n-dimensional matrix is a tensor, a 2-dimensional tensor is equivalent to a (m x m) matrix.

<img src="../data/reg_images/tensor_graph_1.png" width=700px>

The above graph shows two constant tensors and multiplying them together and outputting our result. All inputs needed by the op are run automatically. They’re typically ran in parallel. 

Writing and running programs in TensorFlow has the following steps:

1. Create Tensors (variables) that are not yet executed/evaluated: 'Create Tensor blueprint for computation graph'
2. Create operations on those Tensors:                             'Create Tensor blueprint for computation graph'
3. Initialize your Tensors:                                        'Placeholder for initialization of the Tensors'
4. Create a Session:                                               'Placeholder for the computation graph environment'
5. Run the Session:                                                'Launch the graph in a session = Activates the environment'
6. Run Initializer:                                                'Adds to/initializes the variables in the environment'
7. Run variable:                                                   'Evaluates their values'     
6. Close the Session:                                              'Closes the environment'

When we create a Tensor for the loss, we simply defined the loss as a function of other quantities.  
To evaluate the loss, we first run `init=tf.global_variables_initializer()` to initialize the loss variable, and then start a session and run the Tensors to evaluate their values.

## Tensors

A tensor is a generalization of vectors and matrices to potentially higher dimensions. Internally, TensorFlow represents tensors as n-dimensional arrays of base datatypes.

Most used types of Tensors: 
```python
tf.constant
tf.Variable
tf.placeholder
```

tf.Tensor properties:
 - data type (float32, int32, string, ...)
 - shape

The rank of a tf.Tensor object is its number of dimensions.



These tf.Tensor objects just represent the results of the operations that will be run.

Each operation in a graph is given a unique name. This name is independent of the names the objects are assigned to in Python. 
Tensors are named after the operation that produces them followed by an output index, as in "add:0" above.

#### tf.constant()

Constants hold values that are immutable during runtime. Values are invariant.

In [None]:
a = tf.constant(9.0, dtype=tf.float32)
b = tf.constant([4.0, 1]) # also tf.float32 implicitly
a, b

#### tf.Variable()

Variables hold values or functions that are mutable during runtime. Values are updated by the graph.

In [None]:
mystr = tf.Variable(["Hello"], tf.string)
cool_numbers  = tf.Variable([3.14159, 2.71828], tf.float32)
first_primes = tf.Variable([2, 3, 5, 7, 11], tf.int32)
its_very_complicated = tf.Variable([12.3 - 4.85j, 7.5 - 6.23j], tf.complex64)
mystr, cool_numbers, first_primes, its_very_complicated

#### tf.placeholder()
Placeholders are tensors that you will 'feed' data to during runtime.
The consumer can pass in values by using a "feed dictionary" (`feed_dict` variable).
Placeholders can have a fully, partially or unknown shape.

In [None]:
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32, name='y')
x, y

### Tensor properties

In [None]:
mtx = tf.random_uniform(shape=(3,2))
mtx, mtx.name, mtx.shape, mtx.dtype

In [None]:
mtx = tf.random_uniform(shape=(3,2))
mtx.shape, len(mtx.shape), mtx.shape[0]

#### Rank

In [None]:
tf.rank(mtx) # dimensions of the ndarray
isess = tf.InteractiveSession()
isess.run(tf.rank(mtx))
isess.close()

#### Shape and reshape

In [None]:
mtx = tf.random_uniform(shape=(9, 2))
mtx.shape
mtx = tf.reshape(mtx, [3, 2, -1])
mtx.shape

#### Slicing

In [None]:
mtx.shape, mtx[:1,:1, :1].shape

#### Data types

Casting is modifying the data type of a Tensor, like integer tensor into floating point.

In [None]:
mystr = tf.constant(["Hello"], tf.string)
cool_numbers  = tf.Variable([3.14159, 2.71828], tf.float32)
first_primes = tf.Variable([2, 3, 5, 7, 11], tf.int32)
its_very_complicated = tf.Variable([12.3 - 4.85j, 7.5 - 6.23j], tf.complex64)
mystr, cool_numbers, first_primes, its_very_complicated

In [None]:
tf.cast(mystr, dtype=tf.float32)

In [None]:
tf.cast(first_primes, dtype=tf.float32)

In [None]:
tf.cast(its_very_complicated, dtype=tf.float32)

In [None]:
tf.cast(cool_numbers, dtype=tf.int16)

### Create special arrays

In [None]:
sess = tf.InteractiveSession()
zeros = tf.zeros([2, 3, 4])
ones = tf.ones([5,])
rand = tf.random_normal([5,])
uni = tf.random_uniform([5,])
normal = tf.truncated_normal([5,])
trunc = tf.truncated_normal([5,], mean=0.0, stddev=1.0)
zeros.eval(), ones.eval(), rand.eval(), uni.eval(), normal.eval(), trunc.eval()
sess.close()

## Session

There are __two kinds of Session objects__ in TensorFlow: __tf.Session()__ and __tf.InteractiveSession()__.

__tf.InteractiveSession()__
This encapsulates the environment that operations and tensors are executed and evaluated in. Sessions can have their own variables, queues and readers that are allocated. So it’s important to use the close() method when the session is over. 

There are 3 arguments for a Session, all of which are optional.
1. target — The execution engine to connect to.
2. graph — The Graph to be launched.
3. config — A ConfigProto protocol buffer with configuration options for the session

Functions `run()` or `eval()` execute one "step" of the TensorFlow computation graph; all of the necessary dependencies for the graph are executed one time.

__tf.InteractiveSession()__
This is the exact same as tf.Session() but is targeted for using IPython and Jupyter Notebooks that allows you to add things and use Tensor.eval() and Operation.run() instead of having to do Session.run() every time you want something to be computed.

In [None]:
default_graph = tf.get_default_graph()
default_graph

In [None]:
constant = tf.constant([1, 2, 3])
tensor = constant**2

# 1. 
tensor.eval(session=tf.Session())

# 2. 
sess = tf.Session()
sess.run(tensor)
sess.close()

# 3. 
with tf.Session() as sess:
    tensor.eval()

# 4. 
sess = tf.Session()
with sess.as_default():
    tensor.eval()
sess.close()
    
# 5. 
default_graph = tf.get_default_graph()
sess = tf.Session(graph=default_graph)
with sess.as_default():
    tensor.eval()
sess.close()

#### Interactive session

##### Warning: Interactive session must be closed. 
Opening multiple sessions can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).

In [None]:
sess = tf.InteractiveSession()
constant = tf.constant([1, 2, 3])
tensor = constant**2
tensor.eval()
sess.close()

#### Unique operation name 

Each time a Tensor is (re)created, it will be assigned unique operation name. When the name is occupied it will add an iterative suffix '_i'.

In [None]:
a = tf.constant(9.0, dtype=tf.float32)
b = tf.constant(4.0) # also tf.float32 implicitly
total = a + b + 1
total_tf_name = total.name  # unique operation name

In [None]:
sess = tf.InteractiveSession()
sess.run(total)
sess.run(total_tf_name)
total.name

#### Return as dictionary

In [None]:
sess.run({'total_name':total_tf_name, 'total':total})

In [None]:
sess.close()

#### Initialization - pass variables to session

`tf.global_variables_initializer()` only creates and returns a handle to a TensorFlow operation.  
That op will initialize all the global variables when we run it with tf.Session.run.

###### Note: Variables MUST be initialized - Constants and Placeholders NOT
Once Variables are declared they are part of the global Variable space and must be initialized. 
Call `tf.reset_default_graph()` to clearout 'old/unused' Variables preventing interventions and errors.

##### Warning: Tensors that are added after initialization are UNKNOWN to the graph

#### Reset graph

Clears the default graph stack and resets the global default graph.

In [None]:
tf.reset_default_graph()

#### Initialization sequence

In [None]:
initial_var = tf.Variable(1)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
sess.run(initial_var)

In [None]:
second_var = tf.Variable(1)
changed_var = initial_var.assign(initial_var + initial_var)
# init = tf.global_variables_initializer()
# sess = tf.Session()
# sess.run(init)
for i in range(4):
    sess.run(changed_var)
    try:
        sess.run(second_var)
    except:
        print('UNKNOWN') 
sess.close()

In [None]:
second_var = tf.Variable(1)
changed_var = initial_var.assign(initial_var + initial_var)
# Init again
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(4):
    sess.run(changed_var)
    try:
        sess.run(second_var)
    except:
        print('UNKNOWN') 
sess.close()

In [None]:
counter = tf.Variable(0)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(4):
    sess.run(counter.assign_add(1))
sess.close()

#### Evaluating by interactive session

In [None]:
sess = tf.InteractiveSession()
p = tf.placeholder(tf.float32)
q = 1. #tf.Variable(1.)
t = p + q
# t.eval()  # This will fail, since the placeholder did not get a value.
t.eval(feed_dict={p:2.0})  # This will succeed because we're feeding a value
                           # to the placeholder.
sess.close()

#### Evaluating by session

In [None]:
x = tf.placeholder(tf.float32)
y = 2 * x
z = 2 * y

sess = tf.Session()
sess.run([y, z, y], feed_dict={x:4})
sess.close()

### Scope
To control the complexity of models and make them easier to break down into individual pieces TensorFlow has scopes. Scopes are very simple and even help break down your model when using TensorBoard (which will be covered in Part 2). Scopes can even be nested inside of other scopes.

#### Naming convention

Each time tensors are created with existing names, TF will suffix the names with '_counter'

In [None]:
# comments based on reset graph
tf.reset_default_graph()

c_0 = tf.constant(0, name="c")  # => operation named "c:0"
c_0.name

# Already-used names will be "uniquified".
c_1 = tf.constant(2, name="c")  # => operation named "c_1:0"
c_1.name

# Name scopes add a prefix to all operations created in the same context.
with tf.name_scope("outer"):
    c_2 = tf.constant(2, name="c")  # => operation named "outer/c:0"
    c_2.name
    
    # Name scopes nest like paths in a hierarchical file system.
    with tf.name_scope("inner"):
        c_3 = tf.constant(3, name="c")  # => operation named "outer/inner/c:0"
        c_3.name
        
    # Exiting a name scope context will return to the previous prefix.
    c_4 = tf.constant(4, name="c")  # => operation named "outer/c_1:0"
    c_4.name

    # Already-used name scopes will be "uniquified".
    with tf.name_scope("inner"):
        c_5 = tf.Variable(5, name="c")  # => operation named "outer/inner_1/c:0"
        c_5.name

# https://www.tensorflow.org/versions/r1.0/api_docs/python/tf/GraphKeys

#### Show variables in graph

Commonly, all TRAINABLE_VARIABLES variables will be in MODEL_VARIABLES, and all MODEL_VARIABLES variables will be in GLOBAL_VARIABLES.

In [None]:
tf.global_variables()[-6:]

In [None]:
tf.get_collection(
    tf.GraphKeys.GLOBAL_VARIABLES,
    scope='outer/in'  # starts with ...
)

In [None]:
tf.reset_default_graph()

with tf.name_scope('some_scope1'):
    a = tf.Variable(1, name='a')
    c = tf.constant(3, name='c')

with tf.name_scope('some_scope2'):
    d = tf.Variable(4, name='d')
    f = tf.constant(6, name='f')
    
with tf.name_scope('some_scope2'):
    d = tf.Variable(4, name='d')
    f = tf.constant(6, name='f')

h = tf.Variable(8, 'h')

for i in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='some_'):
    print(i.name)

In [None]:
tfg = tf.get_default_graph()
graph_def = tfg.as_graph_def()

nodes = [n.name for n in graph_def.node]
node_ops = [n.op for n in graph_def.node]
ops = [op.name for op in tfg.get_operations()]
nodes[:5]
node_ops[:5]
ops[:5]

# sess.close()

In [None]:
counter = tf.Variable(2.)
with tf.name_scope("Scope1"):
    with tf.name_scope("Scope_nested"):
        nested_var = counter * counter
        nv_name = nested_var.name
        print(nv_name)
        
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()    
sess.run(init)
sess.run(nv_name)
sess.close()

## Tensorflow in action

In [None]:
# TODO
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()   

a = tf.constant(0.)
b = 4 * a**2
g = tf.gradients(b, [a, b]) # gradient of b wrt a
sess.run(g)

In [None]:
vec = tf.random_uniform(shape=(3,))
out1 = vec + 1
out2 = vec + 2
print(sess.run(vec))
print(sess.run(vec))
print(sess.run((out1, out2)))
# The result shows a different random value on each call to run, 
# and a consistent value during a single run (out1 and out2 receive the same random input):

#### Feeding placeholders

In [None]:
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
z = x + y * 2

In [None]:
print(sess.run(z, feed_dict={x: 3, y: 4.5}))
print(sess.run(z, feed_dict={x: [1, 3], y: [2, 4]}))

#### Datasets

Placeholders work for simple experiments, but Datasets are the **preferred method of streaming data into a model**.

To get a runnable tf.Tensor from a Dataset you must first convert it to a tf.data.Iterator, and then call the Iterator's get_next method.

The simplest way to create an Iterator is with the make_one_shot_iterator method. For example, in the following code the next_item tensor will return a row from the my_data array on each run call:

In [None]:
my_data =  [[0, 1,],
            [2, 3,],
            [4, 5,],
            [6, 7,]]
slices = tf.data.Dataset.from_tensor_slices(my_data)
next_item = slices.make_one_shot_iterator().get_next()
next_item

In [None]:
while True:
    try:
        print(sess.run(next_item))
    except tf.errors.OutOfRangeError:
        break
sess.close()

#### Stateful iterations

If the Dataset depends on stateful operations you may need to initialize the iterator before using it, as shown below:

In [None]:
r = tf.random_normal([10,3])
dataset = tf.data.Dataset.from_tensor_slices(r)
iterator = dataset.make_initializable_iterator()
next_row = iterator.get_next()

sess = tf.Session()
sess.run(iterator.initializer)
while True:
    try:
        print(sess.run(next_row))
    except tf.errors.OutOfRangeError:
        break

In [None]:
x = tf.constant([[37.0, -23.0], [1.0, 4.0]])
w = tf.Variable(tf.random_uniform([2, 2]))
y = tf.matmul(x, w)
output = tf.nn.softmax(y)
init_op = w.initializer

with tf.Session() as sess:
    # Run the initializer on `w`.
    # Constants do not need to be initialized
    sess.run(init_op)

    # Evaluate `output`. `sess.run(output)` will return a ndarray
    print(sess.run(output)) 

    # Evaluate `y` and `output`. 
    # Evaluate `y` => result used both to return `y_val` and as an input to the `tf.nn.softmax(y)` op
    # Both `y_val` and `output_val` will be NumPy arrays.
    y_val, output_val = sess.run([y, output])
    print(y_val, output_val)

### Layers

A trainable model must modify the values in the graph to get new outputs with the same input. Layers are the preferred way to add trainable parameters to a graph.

Layers package together both the variables and the operations that act on them. For example a densely-connected layer performs a weighted sum across all inputs for each output and applies an optional activation function. The connection weights and biases are managed by the layer object.

#### Creating Layers

The following code creates a Dense layer that takes a batch of input vectors, and produces a single output value for each. To apply a layer to an input, call the layer as if it were a function. For example:

```python
x = tf.placeholder(tf.float32, shape=[None, 3])
linear_model = tf.layers.Dense(units=1)
y = linear_model(x)
```

The layer inspects its input to determine sizes for its internal variables. So here we must set the shape of the x placeholder so that the layer can build a weight matrix of the correct size.

Now that we have defined the calculation of the output, y, there is one more detail we need to take care of before we run the calculation.

#### Initializing Layers

The layer contains variables that must be initialized before they can be used. While it is possible to initialize variables individually, you can easily initialize all the variables in a TensorFlow graph as follows:

```python
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
```

**Important:**
Calling tf.global_variables_initializer only;
1. creates and returns a handle to a TensorFlow operation. That op will initialize all the global variables when we run it with tf.Session.run.
2. initializes variables that existed in the graph when the initializer was created. So the initializer should be one of the last things added during graph construction.

#### Executing Layers

Now that the layer is initialized, we can evaluate the linear_model's output tensor as we would any other tensor. For example, the following code:

```python
sess.run(y, {x: [[1, 2, 3],[4, 5, 6]]})
```

#### Compute the loss

To optimize a model, you first need to define the loss. We'll use the mean square error, a standard loss for regression problems.

While you could do this manually with lower level math operations, the tf.losses module provides a set of common loss functions. You can use it to calculate the mean square error as follows:

```python
loss = tf.losses.mean_squared_error(labels=y_true, predictions=y_pred)
print(sess.run(loss))
```
 
See below code:


In [None]:
x = tf.placeholder(tf.float32, shape=[None, 3])
y_true = tf.constant([[1.],[0.]], dtype=tf.float32)
linear_model = tf.layers.Dense(units=1)
y = linear_model(x)
x, y_true

In [None]:
sess = tf.Session()
init = tf.global_variables_initializer() # handle to the initializer
sess.run(init)                           # initializing op

In [None]:
y_pred = sess.run(y, feed_dict={x: [[1, 2, 3],[4, 5, 6]]})
y_pred

In [None]:
loss = tf.losses.mean_squared_error(labels=y_true, predictions=y_pred)
sess.run(loss)

#### Manual loss calculation

In [None]:
loss = tf.Variable((y_true - y_pred)**2)
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    np.mean(sess.run(loss))

***

### 1.1 - Linear function

Lets start this programming exercise by computing the following equation: $Y = WX + b$, where $W$ and $X$ are random matrices and b is a random vector. 

**Exercise**: Compute $WX + b$ where $W, X$, and $b$ are drawn from a random normal distribution. W is of shape (4, 3), X is (3,1) and b is (4,1). As an example, here is how you would define a constant X that has shape (3,1):
```python
X = tf.constant(np.random.randn(3,1), name = "X")

```
You might find the following functions helpful: 
- tf.matmul(..., ...) to do a matrix multiplication
- tf.add(..., ...) to do an addition
- np.random.randn(...) to initialize randomly


In [None]:
def linear_function():
    """
    Implements a linear function: 
            Initializes W to be a random tensor of shape (4,3)
            Initializes X to be a random tensor of shape (3,1)
            Initializes b to be a random tensor of shape (4,1)
    Returns: 
    result -- runs the session for Y = WX + b 
    """
    
    np.random.seed(1)
    
    X = np.random.randn(3, 1)
    W = np.random.randn(4, 3)
    b = np.random.randn(4, 1)
    Y = tf.add(tf.matmul(W, X), b)
    
    # Create the session using tf.Session() and run it with sess.run(...) on the variable you want to calculate
    sess = tf.Session()
    result = sess.run(Y)
    
    # close the session 
    sess.close()

    return result

In [None]:
print( "result = " + str(linear_function()))

*** Expected Output ***: 

<table> 
<tr> 
<td>
**result**
</td>
<td>
[[-2.15657382]
 [ 2.95891446]
 [-1.08926781]
 [-0.84538042]]
</td>
</tr> 

</table> 

### 1.2 - Computing the sigmoid 
Great! You just implemented a linear function. Tensorflow offers a variety of commonly used neural network functions like `tf.sigmoid` and `tf.softmax`. For this exercise lets compute the sigmoid function of an input. 

You will do this exercise using a placeholder variable `x`. When running the session, you should use the feed dictionary to pass in the input `z`. In this exercise, you will have to (i) create a placeholder `x`, (ii) define the operations needed to compute the sigmoid using `tf.sigmoid`, and then (iii) run the session. 

** Exercise **: Implement the sigmoid function below. You should use the following: 

- `tf.placeholder(tf.float32, name = "...")`
- `tf.sigmoid(...)`
- `sess.run(..., feed_dict = {x: z})`


Note that there are two typical ways to create and use sessions in tensorflow: 

**Method 1:**
```python
sess = tf.Session()
# Run the variables initialization (if needed), run the operations
result = sess.run(..., feed_dict = {...})
sess.close() # Close the session
```
**Method 2:**
```python
with tf.Session() as sess: 
    # run the variables initialization (if needed), run the operations
    result = sess.run(..., feed_dict = {...})
    # This takes care of closing the session for you :)
```


In [None]:
def sigmoid(z):
    """
    Computes the sigmoid of z
    
    Arguments:
    z -- input value, scalar or vector
    
    Returns: 
    results -- the sigmoid of z
    """
    
    # Create a placeholder for x.
    x = tf.placeholder(tf.float32, name="x")

    # compute sigmoid(x)
    sigmoid = tf.sigmoid(x)

    # Create a session, and run it. Please use the method 2 explained above. 
    # You should use a feed_dict to pass z's value to x. 
    with tf.Session() as sess: 
        # Run session and call the output "result"
        result = sess.run(sigmoid, feed_dict={x:z})
    
    return result

In [None]:
print ("sigmoid(0) = " + str(sigmoid(0)))
print ("sigmoid(12) = " + str(sigmoid(12)))

*** Expected Output ***: 

<table> 
<tr> 
<td>
**sigmoid(0)**
</td>
<td>
0.5
</td>
</tr>
<tr> 
<td>
**sigmoid(12)**
</td>
<td>
0.999994
</td>
</tr> 

</table> 

<font color='blue'>
**To summarize, you how know how to**:
1. Create placeholders
2. Specify the computation graph corresponding to operations you want to compute
3. Create the session
4. Run the session, using a feed dictionary if necessary to specify placeholder variables' values. 

### 1.3 -  Computing the Cost

You can also use a built-in function to compute the cost of your neural network. So instead of needing to write code to compute this as a function of $a^{[2](i)}$ and $y^{(i)}$ for i=1...m: 
$$ J = - \frac{1}{m}  \sum_{i = 1}^m  \large ( \small y^{(i)} \log a^{ [2] (i)} + (1-y^{(i)})\log (1-a^{ [2] (i)} )\large )\small\tag{2}$$

you can do it in one line of code in tensorflow!

**Exercise**: Implement the cross entropy loss. The function you will use is: 


- `tf.nn.sigmoid_cross_entropy_with_logits(logits = ...,  labels = ...)`

Your code should input `z`, compute the sigmoid (to get `a`) and then compute the cross entropy cost $J$. All this can be done using one call to `tf.nn.sigmoid_cross_entropy_with_logits`, which computes

$$- \frac{1}{m}  \sum_{i = 1}^m  \large ( \small y^{(i)} \log \sigma(z^{[2](i)}) + (1-y^{(i)})\log (1-\sigma(z^{[2](i)})\large )\small\tag{2}$$



In [None]:
def cost(logits, labels):
    """
    Computes the cost using the sigmoid cross entropy
    
    Arguments:
    logits -- vector containing z, output of the last linear unit (before the final sigmoid activation)
    labels -- vector of labels y (1 or 0) 
    
    Note: What we've been calling "z" and "y" in this class are respectively called "logits" and "labels" 
    in the TensorFlow documentation. So logits will feed into z, and labels into y. 
    
    Returns:
    cost -- runs the session of the cost (formula (2))
    """
    
    # Create the placeholders for "logits" (z) and "labels" (y)
    z = tf.placeholder(tf.float32, name="z")
    y = tf.placeholder(tf.float32, name="y")
    
    # Use the loss function
    cost = tf.nn.sigmoid_cross_entropy_with_logits(logits=z, labels=y)
    
    # Create a session
    sess = tf.Session()
    
    # Run the session
    cost = sess.run(cost, feed_dict={z:logits, y:labels})
    
    # Close the session
    sess.close()
    
    return cost

In [None]:
logits = sigmoid(np.array([0.2, 0.4, 0.7, 0.9]))
cost = cost(logits, np.array([0, 0, 1, 1]))
print("cost = " + str(cost))

** Expected Output** : 

<table> 
    <tr> 
        <td>
            **cost**
        </td>
        <td>
        [ 1.00538719  1.03664088  0.41385433  0.39956614]
        </td>
    </tr>

</table>

### 1.4 - Using One Hot encodings

Many times in deep learning you will have a y vector with numbers ranging from 0 to C-1, where C is the number of classes. If C is for example 4, then you might have the following y vector which you will need to convert as follows:


<img src="../data/reg_images/onehot.png" style="width:600px;height:150px;">

This is called a "one hot" encoding, because in the converted representation exactly one element of each column is "hot" (meaning set to 1). To do this conversion in numpy, you might have to write a few lines of code. In tensorflow, you can use one line of code: 

- tf.one_hot(labels, depth, axis) 

**Exercise:** Implement the function below to take one vector of labels and the total number of classes $C$, and return the one hot encoding. Use `tf.one_hot()` to do this. 

In [None]:
def one_hot_matrix(labels, C):
    """
    Creates a matrix where the i-th row corresponds to the ith class number and the jth column
                     corresponds to the jth training example. So if example j had a label i. Then entry (i,j) 
                     will be 1. 
                     
    Arguments:
    labels -- vector containing the labels 
    C -- number of classes, the depth of the one hot dimension
    
    Returns: 
    one_hot -- one hot matrix
    """
    
    # Create a tf.constant equal to C (depth)
    C = tf.constant(C, name='C')
    
    # Use tf.one_hot, be careful with the axis
    one_hot_matrix = tf.one_hot(indices=labels, depth=C, axis=0)
    
    # Create the session
    sess = tf.Session()
    
    # Run the session
    one_hot = sess.run(one_hot_matrix)
    
    # Close the session
    sess.close()
    
    return one_hot

In [None]:
labels = np.array([1,2,3,0,2,1])
one_hot = one_hot_matrix(labels, C=4)
print ("one_hot = " + str(one_hot))

In [None]:
def convert_to_one_hot(Y, C):
    """One Hot - numpy version
    Return Identity vector(s) by indexing with the classes(Y)"""
    Y = np.eye(C)[Y.reshape(-1)].T
    return Y

**Expected Output**: 

<table> 
    <tr> 
        <td>
            **one_hot**
        </td>
        <td>
        [[ 0.  0.  0.  1.  0.  0.]
 [ 1.  0.  0.  0.  0.  1.]
 [ 0.  1.  0.  0.  1.  0.]
 [ 0.  0.  1.  0.  0.  0.]]
        </td>
    </tr>

</table>


### 1.5 - Initialize with zeros and ones

Now you will learn how to initialize a vector of zeros and ones. The function you will be calling is `tf.ones()`. To initialize with zeros you could use tf.zeros() instead. These functions take in a shape and return an array of dimension shape full of zeros and ones respectively. 

**Exercise:** Implement the function below to take in a shape and to return an array (of the shape's dimension of ones). 

 - tf.ones(shape)


In [None]:
def ones(shape):
    """
    Creates an array of ones of dimension shape
    
    Arguments:
    shape -- shape of the array you want to create
        
    Returns: 
    ones -- array containing only ones
    """
    
    # Create "ones" tensor
    ones = tf.ones(shape)
    
    # Create the session
    sess = tf.Session()
    
    # Run the session to compute 'ones'
    ones = sess.run(ones)
    
    # Close the session
    sess.close()
    
    return ones

In [None]:
print ("ones = " + str(ones([3])))

**Expected Output:**

<table> 
    <tr> 
        <td>
            **ones**
        </td>
        <td>
        [ 1.  1.  1.]
        </td>
    </tr>

</table>

## 2 - Building your first neural network in tensorflow

In this part of the assignment you will build a neural network using tensorflow. Remember that there are two parts to implement a tensorflow model:

- Create the computation graph
- Run the graph

Let's delve into the problem you'd like to solve!

### 2.0 - Problem statement: SIGNS Dataset

One afternoon, with some friends we decided to teach our computers to decipher sign language. We spent a few hours taking pictures in front of a white wall and came up with the following dataset. It's now your job to build an algorithm that would facilitate communications from a speech-impaired person to someone who doesn't understand sign language.

- **Training set**: 1080 pictures (64 by 64 pixels) of signs representing numbers from 0 to 5 (180 pictures per number).
- **Test set**: 120 pictures (64 by 64 pixels) of signs representing numbers from 0 to 5 (20 pictures per number).

Note that this is a subset of the SIGNS dataset. The complete dataset contains many more signs.

Here are examples for each number, and how an explanation of how we represent the labels. These are the original pictures, before we lowered the image resolutoion to 64 by 64 pixels.
<img src="../data/reg_images/hands.png" style="width:800px;height:350px;"><caption><center> <u><font color='purple'> **Figure 1**</u><font color='purple'>: SIGNS dataset <br> <font color='black'> </center>


Run the following code to load the dataset.

In [None]:
def load_dataset():
    """"""
    train_dataset = h5py.File('../data/reg_datasets/train_signs.h5', "r")
    train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
    train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels

    test_dataset = h5py.File('../data/reg_datasets/test_signs.h5', "r")
    test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
    test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels

    classes = np.array(test_dataset["list_classes"][:]) # the list of classes
    
    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
    
    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

In [None]:
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

Change the index below and run the cell to visualize some examples in the dataset.

In [None]:
# Example of a picture
index = 0
_ = plt.imshow(X_train_orig[index])
print("y = " + str(np.squeeze(Y_train_orig[:, index])))

As usual you flatten the image dataset, then normalize it by dividing by 255. On top of that, you will convert each label to a one-hot vector as shown in Figure 1. Run the cell below to do so.

In [None]:
# Flatten the training and test images
X_train_flatten = X_train_orig.reshape(-1, X_train_orig.shape[0])
X_test_flatten = X_test_orig.reshape(-1, X_test_orig.shape[0])

# Normalize image vectors
X_train = X_train_flatten / 255.
X_test = X_test_flatten / 255.

# Convert training and test labels to one hot matrices
Y_train = convert_to_one_hot(Y_train_orig, 6)
Y_test = convert_to_one_hot(Y_test_orig, 6)

print("number of training examples = " + str(X_train.shape[1]))
print("number of test examples = " + str(X_test.shape[1]))
print("X_train shape: " + str(X_train.shape))
print("Y_train shape: " + str(Y_train.shape))
print("X_test shape: " + str(X_test.shape))
print("Y_test shape: " + str(Y_test.shape))

**Note** that 12288 comes from $64 \times 64 \times 3$. Each image is square, 64 by 64 pixels, and 3 is for the RGB colors. Please make sure all these shapes make sense to you before continuing.

**Your goal** is to build an algorithm capable of recognizing a sign with high accuracy. To do so, you are going to build a tensorflow model that is almost the same as one you have previously built in numpy for cat recognition (but now using a softmax output). It is a great occasion to compare your numpy implementation to the tensorflow one. 

**The model** is *LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX*. The SIGMOID output layer has been converted to a SOFTMAX. A SOFTMAX layer generalizes SIGMOID to when there are more than two classes. 

### 2.1 - Create placeholders

Your first task is to create placeholders for `X` and `Y`. This will allow you to later pass your training data in when you run your session. 

**Exercise:** Implement the function below to create the placeholders in tensorflow.

In [None]:
def create_placeholders(n_x, n_y):
    """
    Creates the placeholders for the tensorflow session.
    
    Arguments:
    n_x -- scalar, size of an image vector (num_px * num_px = 64 * 64 * 3 = 12288)
    n_y -- scalar, number of classes (from 0 to 5, so -> 6)
    
    Returns:
    X -- placeholder for the data input, of shape [n_x, None] and dtype "float"
    Y -- placeholder for the input labels, of shape [n_y, None] and dtype "float"
    
    Tips:
    - You will use None because it let's us be flexible on the number of examples you will for the placeholders.
      In fact, the number of examples during test/train is different.
    """

    X = tf.placeholder(tf.float32, [n_x, None], name="X")
    Y = tf.placeholder(tf.float32, [n_y, None], name="Y")
    
    return X, Y

In [None]:
X, Y = create_placeholders(12288, 6)
print("X = " + str(X))
print("Y = " + str(Y))

**Expected Output**: 

<table> 
    <tr> 
        <td>
            **X**
        </td>
        <td>
        Tensor("Placeholder_1:0", shape=(12288, ?), dtype=float32) (not necessarily Placeholder_1)
        </td>
    </tr>
    <tr> 
        <td>
            **Y**
        </td>
        <td>
        Tensor("Placeholder_2:0", shape=(6, ?), dtype=float32) (not necessarily Placeholder_2)
        </td>
    </tr>

</table>

### 2.2 - Initializing the parameters

Your second task is to initialize the parameters in tensorflow.

**Exercise:** Implement the function below to initialize the parameters in tensorflow. You are going use Xavier Initialization for weights and Zero Initialization for biases. The shapes are given below. As an example, to help you, for W1 and b1 you could use: 

```python
W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers.xavier_initializer(seed=1))
b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer())
```
Please use `seed = 1` to make sure your results match ours.

In [None]:
def initialize_parameters():
    """
    Initializes parameters to build a neural network with tensorflow. The shapes are:
                        W1 : [25, 12288]
                        b1 : [25, 1]
                        W2 : [12, 25]
                        b2 : [12, 1]
                        W3 : [6, 12]
                        b3 : [6, 1]
    
    Returns:
    parameters -- a dictionary of tensors containing W1, b1, W2, b2, W3, b3
    """
    from tensorflow.contrib.layers import xavier_initializer
    
    tf.set_random_seed(1)                   # so that your "random" numbers match ours
        
    W1 = tf.get_variable("W1", [25, 12288], initializer=xavier_initializer(seed=1))
    b1 = tf.get_variable("b1", [25, 1], initializer=tf.zeros_initializer())
    
    W2 = tf.get_variable("W2", [12, 25], initializer=xavier_initializer(seed=1))
    b2 = tf.get_variable("b2", [12, 1], initializer=tf.zeros_initializer())
    
    W3 = tf.get_variable("W3", [6, 12], initializer=xavier_initializer(seed=1))
    b3 = tf.get_variable("b3", [6, 1], initializer=tf.zeros_initializer())

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}
    
    return parameters

In [None]:
tf.reset_default_graph()
with tf.Session() as sess:
    parameters = initialize_parameters()
    print("W1 = " + str(parameters["W1"]))
    print("b1 = " + str(parameters["b1"]))
    print("W2 = " + str(parameters["W2"]))
    print("b2 = " + str(parameters["b2"]))

**Expected Output**: 

<table> 
    <tr> 
        <td>
            **W1**
        </td>
        <td>
         < tf.Variable 'W1:0' shape=(25, 12288) dtype=float32_ref >
        </td>
    </tr>
    <tr> 
        <td>
            **b1**
        </td>
        <td>
        < tf.Variable 'b1:0' shape=(25, 1) dtype=float32_ref >
        </td>
    </tr>
    <tr> 
        <td>
            **W2**
        </td>
        <td>
        < tf.Variable 'W2:0' shape=(12, 25) dtype=float32_ref >
        </td>
    </tr>
    <tr> 
        <td>
            **b2**
        </td>
        <td>
        < tf.Variable 'b2:0' shape=(12, 1) dtype=float32_ref >
        </td>
    </tr>

</table>

As expected, the parameters haven't been evaluated yet.

### 2.3 - Forward propagation in tensorflow 

You will now implement the forward propagation module in tensorflow. The function will take in a dictionary of parameters and it will complete the forward pass. The functions you will be using are: 

- `tf.add(...,...)` to do an addition
- `tf.matmul(...,...)` to do a matrix multiplication
- `tf.nn.relu(...)` to apply the ReLU activation

**Question:** Implement the forward pass of the neural network. We commented for you the numpy equivalents so that you can compare the tensorflow implementation to numpy. It is important to note that the forward propagation stops at `z3`. The reason is that in tensorflow the last linear layer output is given as input to the function computing the loss. Therefore, you don't need `a3`!



In [None]:
def forward_propagation(X, parameters):
    """
    Implements the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX
    
    Arguments:
    X -- input dataset placeholder, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3"
                  the shapes are given in initialize_parameters

    Returns:
    Z3 -- the output of the last LINEAR unit
    """
    
    # Retrieve the parameters from the dictionary "parameters" 
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    W3 = parameters['W3']
    b3 = parameters['b3']
                                                           # Numpy Equivalents:
    Z1 = tf.add(tf.matmul(W1, X), b1)                      # Z1 = np.dot(W1, X) + b1
    A1 = tf.nn.relu(Z1)                                    # A1 = relu(Z1)
    Z2 = tf.add(tf.matmul(W2, A1), b2)                     # Z2 = np.dot(W2, a1) + b2
    A2 = tf.nn.relu(Z2)                                    # A2 = relu(Z2)
    Z3 = tf.add(tf.matmul(W3, A2), b3)                     # Z3 = np.dot(W3,Z2) + b3

    
    return Z3

In [None]:
tf.reset_default_graph()

with tf.Session() as sess:
    X, Y = create_placeholders(12288, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    print("Z3 = " + str(Z3))

**Expected Output**: 

<table> 
    <tr> 
        <td>
            **Z3**
        </td>
        <td>
        Tensor("Add_2:0", shape=(6, ?), dtype=float32)
        </td>
    </tr>

</table>

You may have noticed that the forward propagation doesn't output any cache. You will understand why below, when we get to brackpropagation.

### 2.4 Compute cost

As seen before, it is very easy to compute the cost using:
```python
tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = ..., labels = ...))
```
**Question**: Implement the cost function below. 
- It is important to know that the "`logits`" and "`labels`" inputs of `tf.nn.softmax_cross_entropy_with_logits` are expected to be of shape (number of examples, num_classes). We have thus transposed Z3 and Y for you.
- Besides, `tf.reduce_mean` basically does the summation over the examples.

In [None]:
def compute_cost(Z3, Y):
    """
    Computes the cost
    
    Arguments:
    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
    Y -- "true" labels vector placeholder, same shape as Z3
    
    Returns:
    cost - Tensor of the cost function
    """
    
    # to fit the tensorflow requirement for tf.nn.softmax_cross_entropy_with_logits(...,...)
    logits = tf.transpose(Z3)
    labels = tf.transpose(Y)
    
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=labels))
    
    return cost

In [None]:
tf.reset_default_graph()

with tf.Session() as sess:
    X, Y = create_placeholders(12288, 6)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    cost = compute_cost(Z3, Y)
    print("cost = " + str(cost))

**Expected Output**: 

<table> 
    <tr> 
        <td>
            **cost**
        </td>
        <td>
        Tensor("Mean:0", shape=(), dtype=float32)
        </td>
    </tr>

</table>

### Build minibatches for gradient descent

In [None]:
def random_mini_batches(X, Y, mini_batch_size=64, seed=0):
    """
    Creates a list of random minibatches from (X, Y)
    
    Arguments:
    X -- input data, of shape (input size, number of examples)
    Y -- true "label" vector (1 for blue dot / 0 for red dot), of shape (1, number of examples)
    mini_batch_size -- size of the mini-batches, integer
    
    Returns:
    mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)
    """
    
    np.random.seed(seed)            # To make your "random" minibatches the same as ours
    m = X.shape[1]                  # number of training examples
    mini_batches = []
        
    # Step 1: Shuffle (X, Y)
    permutation = list(np.random.permutation(m))
    shuffled_X = X[:, permutation]
    shuffled_Y = Y[:, permutation].reshape((Y.shape[0], m))

    # Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.
    start = list(range(m))[::mini_batch_size]
    end = list(range(mini_batch_size, m))[::mini_batch_size]+[m+1]
    
    for k in range(len(start)):
        mini_batch_X = shuffled_X[:, start[k]:end[k]]
        mini_batch_Y = shuffled_Y[:, start[k]:end[k]]
        mini_batch = (mini_batch_X, mini_batch_Y)
        mini_batches.append(mini_batch)
    
    return mini_batches

### 2.5 - Backward propagation & parameter updates

This is where you become grateful to programming frameworks. All the backpropagation and the parameters update is taken care of in 1 line of code. It is very easy to incorporate this line in the model.

After you compute the cost function. You will create an "`optimizer`" object. You have to call this object along with the cost when running the tf.session. When called, it will perform an optimization on the given cost with the chosen method and learning rate.

For instance, for gradient descent the optimizer would be:
```python
optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(cost)
```

To make the optimization you would do:
```python
_ , c = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})
```

This computes the backpropagation by passing through the tensorflow graph in the reverse order. From cost to inputs.

**Note** When coding, we often use `_` as a "throwaway" variable to store values that we won't need to use later. Here, `_` takes on the evaluated value of `optimizer`, which we don't need (and `c` takes the value of the `cost` variable). 

### 2.6 - Building the model

Now, you will bring it all together! 

**Exercise:** Implement the model. You will be calling the functions you had previously implemented.

In [None]:
def model(X_train, Y_train, X_test, Y_test, learning_rate=0.0001,
          num_epochs=1500, minibatch_size=32, print_cost=True):
    """
    Implements a three-layer tensorflow neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX.
    
    Arguments:
    X_train -- training set, of shape (input size = 12288, number of training examples = 1080)
    Y_train -- test set, of shape (output size = 6, number of training examples = 1080)
    X_test -- training set, of shape (input size = 12288, number of training examples = 120)
    Y_test -- test set, of shape (output size = 6, number of test examples = 120)
    learning_rate -- learning rate of the optimization
    num_epochs -- number of epochs of the optimization loop
    minibatch_size -- size of a minibatch
    print_cost -- True to print the cost every 100 epochs
    
    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """
    
    tf.reset_default_graph()                         # to be able to rerun the model without overwriting tf variables
    tf.set_random_seed(1)                             # to keep consistent results
    seed = 3                                          # to keep consistent results
    (n_x, m) = X_train.shape                          # (n_x: input size, m : number of examples in the train set)
    n_y = Y_train.shape[0]                            # n_y : output size
    costs = []                                        # To keep track of the cost
    
    # Create Placeholders of shape (n_x, n_y)
    X, Y = create_placeholders(n_x, n_y)

    # Initialize parameters
    parameters = initialize_parameters()
    
    # Forward propagation: Build the forward propagation in the tensorflow graph
    Z3 = forward_propagation(X, parameters)
    
    # Cost function: Add cost function to tensorflow graph
    cost = compute_cost(Z3, Y)
    
    # Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer.
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
    
    # Initialize all the variables
    init = tf.global_variables_initializer()

    # Start the session to compute the tensorflow graph
    with tf.Session() as sess:
        
        # Run the initialization
        sess.run(init)
        
        # Do the training loop
        for epoch in range(num_epochs):

            epoch_cost = 0.                           # Defines a cost related to an epoch
            num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set
            seed = seed + 1                           # for each epoch different shuffling of indices
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)

            for minibatch in minibatches:

                # Select a minibatch
                (minibatch_X, minibatch_Y) = minibatch
                
                # IMPORTANT: The line that runs the graph on a minibatch.
                # Run the session to execute the "optimizer" and the "cost", the feedict should contain a minibatch for (X,Y).
                _ , minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})
                
                epoch_cost += minibatch_cost / num_minibatches
                
            print(epoch_cost) # FJE

            # Print the cost every epoch
            if print_cost == True and epoch % 100 == 0:
                print("Cost after epoch %i: %f" % (epoch, epoch_cost))
            if print_cost == True and epoch % 5 == 0:
                costs.append(epoch_cost)
                
        # plot the cost
        plt.plot(np.squeeze(costs))
        plt.ylabel('cost')
        plt.xlabel('iterations (per tens)')
        plt.title("Learning rate =" + str(learning_rate))
        plt.show()

        # lets save the parameters in a variable
        parameters = sess.run(parameters)
        print("Parameters have been trained!")

        # Calculate the correct predictions
        correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))

        # Calculate accuracy on the test set
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

        print("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train}))
        print("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test}))
        
        return parameters

Run the following cell to train your model! On our machine it takes about 5 minutes. Your "Cost after epoch 100" should be 1.016458. If it's not, don't waste time; interrupt the training by clicking on the square (⬛) in the upper bar of the notebook, and try to correct your code. If it is the correct cost, take a break and come back in 5 minutes!

In [None]:
parameters = model(X_train, Y_train, X_test, Y_test, 0.0001, 1500, 16)

**Expected Output**:

<table> 
    <tr> 
        <td>
            **Train Accuracy**
        </td>
        <td>
        0.999074
        </td>
    </tr>
    <tr> 
        <td>
            **Test Accuracy**
        </td>
        <td>
        0.716667
        </td>
    </tr>

</table>

Amazing, your algorithm can recognize a sign representing a figure between 0 and 5 with 71.7% accuracy.

**Insights**:
- Your model seems big enough to fit the training set well. However, given the difference between train and test accuracy, you could try to add L2 or dropout regularization to reduce overfitting. 
- Think about the session as a block of code to train the model. Each time you run the session on a minibatch, it trains the parameters. In total you have run the session a large number of times (1500 epochs) until you obtained well trained parameters.

### 2.7 - Test with your own image (optional / ungraded exercise)

Congratulations on finishing this assignment. You can now take a picture of your hand and see the output of your model. To do that:
    1. Click on "File" in the upper bar of this notebook, then click "Open" to go on your Coursera Hub.
    2. Add your image to this Jupyter Notebook's directory, in the "images" folder
    3. Write your image's name in the following code
    4. Run the code and check if the algorithm is right!

In [None]:
# import scipy
from PIL import Image
# from scipy import ndimage
from skimage.transform import resize

my_image = "thumbs_up.jpg"

# We preprocess your image to fit your algorithm.
fname = "images/" + my_image
image = np.array(ndimage.imread(fname, flatten=False))
my_image = resize(image, size=(64, 64)).reshape((1, 64 * 64 * 3)).T
my_image_prediction = predict(my_image, parameters)

plt.imshow(image)
print("Your algorithm predicts: y = " + str(np.squeeze(my_image_prediction)))

You indeed deserved a "thumbs-up" although as you can see the algorithm seems to classify it incorrectly. The reason is that the training set doesn't contain any "thumbs-up", so the model doesn't know how to deal with it! We call that a "mismatched data distribution" and it is one of the various of the next course on "Structuring Machine Learning Projects".

<font color='blue'>
**What you should remember**:
- Tensorflow is a programming framework used in deep learning
- The two main object classes in tensorflow are Tensors and Operators. 
- When you code in tensorflow you have to take the following steps:
    - Create a graph containing Tensors (Variables, Placeholders ...) and Operations (tf.matmul, tf.add, ...)
    - Create a session
    - Initialize the session
    - Run the session to execute the graph
- You can execute the graph multiple times as you've seen in model()
- The backpropagation and optimization is automatically done when running the session on the "optimizer" object.