# Lab 1: TensorFlow Basics

## Section 0: Setup

In [1]:
# Import essential libraries
import numpy as np
import tensorflow as tf

## Section 1: Tensors
If you want to review any of this material, look over https://docs.scipy.org/doc/numpy-1.15.0/user/quickstart.html and https://www.tensorflow.org/guide/tensors.

### 1.1: Tensor values, shapes, rank, and axes
Make tensor values by hand (e.g. `x = np.array([[1, 2, 3], [4, 5, 6]])`) of the following shapes:
 * a: (2, 2)
 * b: (3)
 * c: (3, 1)
 * d: (1, 3)
 * e: ()
 * f: (1)
 * g: (2, 2, 2)
 * h: (2, 3, 1, 2)
 
 For each, put its tensor rank and total number of elements in a comment.
 Yes, this is pretty boring, but it's also short and it's really important to understand what tensors of different shapes look like and how shapes, rank, and axes interact.

In [2]:
# Your code here
a = np.array([[1, 2], [3, 4]])                     # rank = 2; total number of elements = 4
b = np.array([1, 2, 3])                            # rank = 1; total number of elements = 3
c = np.array([[1], [2], [3]])                      # rank = 2; total number of elements = 3
d = np.array([[1, 2, 3]])                          # rank = 2; total number of elements = 3
e = np.array(1)                                    # rank = 0; total number of elements = 1
f = np.array([1])                                  # rank = 1; total number of elements = 1
g = np.array([[[1, 2], [3, 4]], 
              [[5, 6], [7, 8]]])                   # rank = 3; total number of elements = 8
h = np.array([[[[1, 2]], [[3, 4]], [[5, 6]]], 
              [[[7, 8]], [[9, 10]], [[11, 12]]]])  # rank = 4; total number of elements = 12

### 1.2: Slices and reductions
Use slicing or `tf.reduce_mean`, `tf.reduce_sum`, and `tf.reduce_any` on the tensors defined below to print:
 * The (1-2-3)-st element of `a`
 * The first column of `b`
 * The shape-(2, 3, 2) tensor obtained by selecting the second and third elements of the third axis of `a`
 * The sum of all values in `b`
 * The 2-vector containing means of each row of `b` 
 * The (1, 3) tensor containing, for each column in `c`, whether that column contains any `True` values
 
Each statement should take the form 
```
print(sess.run(something[...]))
```
or 
```
print(sess.run(tf.reduce_something(...)))
```
Follow each with a comment stating the shape of the output.
For a rank-2 tensor, the first index specifies row and the second specifies column.
Make sure to pay attention to the `axis` and `keepdims` arguments of the `reduce` functions.
 
 
For this problem, I'll set up the default session and name scope, but for all future problems you'll need to do that.

In [3]:
a = tf.constant(np.ones((2, 3, 4))) # Tensor of ones with shape (2, 3, 4)
b = tf.constant([[1., 2.], 
                 [3., 4.]]) # Tensor of the matrix [1 2; 3 4] with shape (2, 2)
c = tf.constant([[True, True, False],
                 [False, True, False]]) # Binary tensor with shape (2, 3)

In [48]:
with tf.Session() as sess:
    with tf.name_scope('slices_and_reductions'):
        print(sess.run(a[0, 1, 2])) # rank = 0
        print(sess.run(b[:, 0])) # rank = 1
        print(sess.run(a[:, :, 1:3])) # rank = 3
        print(sess.run(tf.reduce_sum(a))) # rank = 0
        print(sess.run(tf.reduce_mean(b, 1))) # rank = 1
        print(sess.run(tf.reshape(tf.reduce_any(c,0),(3,1)))) # rank = 2

1.0
[1. 3.]
[[[1. 1.]
  [1. 1.]
  [1. 1.]]

 [[1. 1.]
  [1. 1.]
  [1. 1.]]]
24.0
[1.5 3.5]
[[ True]
 [ True]
 [False]]


### 1.3: Transposition and reshaping
Use `tf.transpose` to print:
 * `b` with its rows and columns swapped
 * `a` with its second and third axes swapped; comment its shape
 
Use `tf.reshape` to print:
 * The values of `b` in a tensor with shape (1, 4)
 * The values of `b` in a tensor with shape (4, 1)
 
Do this all inside the name scope "transposition_and_reshaping".

In [52]:
with tf.Session() as sess:
    with tf.name_scope('transposition_and_reshaping'):
        print(sess.run(tf.transpose(b)))
        print(sess.run(tf.transpose(a, perm = [0, 2, 1]))) # shape = (2, 4, 3)
        print(sess.run(tf.reshape(b,(1,4))))
        print(sess.run(tf.reshape(b,(4,1))))

[[1. 3.]
 [2. 4.]]
[[[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]]
[[1. 2. 3. 4.]]
[[1.]
 [2.]
 [3.]
 [4.]]


## Section 2: Computing with Operations and Graphs 

### 2.1: The dot product (as a sum of scalar products)
Write a function `dot_sum()` that takes in two rank-1 tensors `a` and `b` of equal shape and returns a tensor that holds their dot product, $$\text{result} = a \cdot b = \sum_{i = 1}^{\dim{a}} a_i \cdot b_i $$

The computation should first multiply the elements in $a$ and $b$ into a vector $a \odot b$ (the [Hadamard product](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)) of $a$ and $b$), then sum across the vector to produce a scalar. 
Your implementation should be _vectorized_: it should not explicitly use the shape of an input tensor or do any looping.
The tensor output by your function must be rank-0.

The entire computation should use the name scope "dot_sum" and the tensor you return should have the name "result".

TensorFlow operations to look at:
 * `tf.multiply` (or equivalently, the binary operation *)
 * `tf.reduce_sum`

In [53]:
def dot_sum(a, b):
    '''
    Given rank-1 tensors a and b with equal shapes, return the dot product 
    of a and b as a rank-0 tensor computed via Hadamard product.
    '''
    with tf.name_scope('dot_sum'):
        result = tf.reduce_sum(tf.multiply(a, b), name="result")
        return result

### 2.2: The dot product (as matrix multiplication)
Write a function `dot_multiply()` that takes in two rank-1 tensors `a` and `b` of equal shape and returns a tensor that holds their dot product, $$\text{result} = a \cdot b = a^T b $$

The computation should use `tf.matmul` to perform the multiplication, which expects that your input tensors have rank of at least two (they should be matrices, not vectors).
Since your input vectors are rank-1, this means you'll need to use `tf.expand_dims` with `axis=-1` to add a "dummy dimension".
This is a subtle but important point: your vectors start with shape [n], but matrix multiplication is only defined for matrices with shapes [1, n] and [n, 1].
Depending on how you do it, you will probably get a rank-2 tensor with a shape like [1, 1].
You must return a rank-0 tensor, so use `tf.squeeze` to eliminate dummy dimensions.

The entire computation should use the name scope "dot_multiply" and the tensor you return should have |the name "result".
This will not collide with the previous "result" tensor because of name scoping.
(If it did, it would be renamed to "result_0" in the graph)

TensorFlow operations to look at:
 * `tf.matmul`
 * `tf.transpose`
 * `tf.expand_dims`
 * `tf.squeeze`

In [54]:
def dot_multiply(a, b):
    '''
    Given rank-1 tensors a and b with equal shapes, return the dot product 
    of a and b as a rank-0 tensor computed via matrix multiplication.
    '''
    with tf.name_scope('dot_multiply'):
        am = tf.expand_dims(a, axis = -1)  # matrix form of a
        bm = tf.expand_dims(b, axis = -1)  # matrix form of b
        abm = tf.multiply(tf.transpose(a), b) # product of a and b, 1 by 1 matrix
        result = tf.squeeze(abm, name="result")
        return result

### 2.3: A single ReLU unit
The "default" activation function for modern neural networks is the [rectified linear unit](https://en.wikipedia.org/wiki/Rectifier_(neural_networks) (or "ReLU"):
$$ \text{relu}(x) = max(0, x). $$

In a neural network using ReLU activation, a single unit with $n$ inputs has parameters $w$ (an $n$-vector of weights) and $b$ (a scalar).
It computes the function
$$ f(x; w, b) = \text{relu}(w \cdot x + b). $$

Using either `dot_sum` or `dot_multiply`, add these tensors and operations to the default graph:
$$
\begin{align}
&x: \space \text{placeholder} \\
&w = \begin{bmatrix}2 & 0.5 & -1\end{bmatrix} \\
&b = 0.3 \\
&\text{state} = w \cdot x + b \\
&\text{activation} = \max(\text{state}, 0)
\end{align}
$$

`x` should have shape [3] and dtype `tf.float32`, and all tensors should be named, under the name scope "ReLU".
This includes the tensors created through your dot product function, but do not change your implementation to add to the name!

Then, use a default `tf.Session` to evaluate and print `activation` for:
 * $x = \begin{bmatrix} 1 & 1 & 1 \end{bmatrix}$
 * $x = \begin{bmatrix} -1 & 2 & 0 \end{bmatrix}$
 * $x = \begin{bmatrix} 1 & 0 & 0 \end{bmatrix}$
 * $x = \begin{bmatrix} 0 & 0 & 0 \end{bmatrix}$

You should only call your dot product function once.
Note that calling the function _adds operations to the graph_: doing it multiple times will create many new operations and tensors.
Instead, you want to create them once and evaluate the same `activation` tensor multiple times.
Recall that a tensor is just a placeholder and may take different values in different runs.


TensorFlow operations to look at:
 * tf.constant
 * tf.placeholder
 * tf.add
 * tf.maximum

In [55]:
with tf.name_scope('ReLU'):
    x = tf.placeholder(tf.float32, (3), name='x')
    w = tf.constant([2, 0.5, -1], name='w')
    b = tf.constant(0.3, name='b')
    state = tf.add(dot_sum(w, x), b, name='state')
    activation = tf.maximum(state, 0, name='activation')

In [58]:
with tf.Session() as sess:
    print('activation =', sess.run(activation, feed_dict={x: [1, 1, 1]}))
    print('activation =', sess.run(activation, feed_dict={x: [-1, 2, 0]}))
    print('activation =', sess.run(activation, feed_dict={x: [1, 0, 0]}))
    print('activation =', sess.run(activation, feed_dict={x: [0, 0, 0]}))

activation = 1.8
activation = 0.0
activation = 2.3
activation = 0.3


#### Aside on activation functions

One way to derive feedforward neural networks is to begin by saying "I'd like to do a simple (linear) transformation on my input features to make them easier to model, then use a simple model (e.g. linear regression) that instead uses the transformed features."
Doing this means your total model is $y = ABx + b$ where $B$ is the matrix multiplying an input point $x$ into a new representation and $A$ is the matrix parameterizing the linear regression.

But, $AB$ is just another matrix, and so by adding a representation you have not made your model more powerful; instead if you'd "twisted" the input space after appyling B, the overall map would be nonlinear and the composite model would have greater representation power.
Activation functions perform this "twisting".
Deep neural networks come from the observation that it'd be easier to get a good representation (top layer) if it was based on a lower-level representation (early layers).

Here's a great article explaining the geometric interpretation of activation functions: https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/.
The general idea is that neural networks can learn parameters that use the "twists" such that the entire network deforms space so that the manifold defined by your input data is simple. 

### 2.4: Graph optimizations
You don't have to write code for this section, just look over the cell below, execute it and look at the output.

`tf.Print` is an operation that takes a tensor and makes a copy of it that prints itself to the terminal when evaluated.
Unfortunately it prints to the terminal running the jupyter notebook and not the notebook itself, so you'll have to look at the terminal to see the output.

In [62]:
with tf.name_scope('subgraph_execution'):
    a = tf.constant(1, name='a')
    b = tf.constant(2, name='b')

    # 'print_tensor' is a tensor that's a copy of 'a' except it prints '1' when evaluated 
    print_tensor = tf.Print(a, [a], name='print_tensor')
    
    c = tf.add(print_tensor, b, name='c')
    d = tf.multiply(print_tensor, c, name='d')
    e = tf.add(a, b)

with tf.Session() as sess:
    sess.run(d)
    sess.run(e)

Notice that computing `d` uses the value of `print_tensor` twice, but when we evaluate `e`, the value is printed only once.
This indicates that TensorFlow is caching the value of `d` instead of computing it multiple times.
When computing `e`, the value is not printed at all because it does not depend on the value of `print_tensor`.
These are the simplest two graph optimizations that TensorFlow can make.

### 2.5: Runs vs sessions
Again, just run the code and look at the output.

A `tf.Session` object contains the _context_ of every run that happens within it, but that doesn't mean things stay the same within the session.
Within a single _run_, tensors always have the same value. 
The below code generates a random number tensor, then makes a second tensor by adding 1 to the first one, and prints both twice within the same session in separate runs.

In [61]:
with tf.name_scope('random_experiment'):
    random = tf.random_uniform(())
    plus_one = random + 1
    
with tf.Session() as sess:
    print(sess.run([random, plus_one]))
    print(sess.run([random, plus_one]))

[0.7848897, 1.7848897]
[0.90443933, 1.9044393]


Within a single run, the values of `random` and `plus_one` are consistent, so `random`'s value is fixed during the run.
In the second run of the session, `random` and `plus_one` take on different values than they did in the first run.

## Section 3: Optimization

### Minimizing a function with gradient descent
Minimize the scalar function $f(x) = (x-1)(2x-2)(x-3)(x-4)$, plotted below, using gradient descent.
It has a local minimum near $x = 1$ and a global minimum near $x = 3.5$.

![f(x)](./images/plot_f.png)

The steps to build the graph are:
 1. Use `tf.get_variable` to get a variable named `x` that uses a `tf.random_uniform_initializer` on the range [-1, 5] 
 2. Make a tensor `y` that represents f(x) given a value of `x`
 3. Make a `tf.train.GradientDescentOptimizer` named "optimizer" with a learning rate of 0.01
 4. Make the operation that performs gradient updates by using the `minimize` function of the optimizer on `y`. Name it `gradient_step`.
 
Only build the graph once!
The whole subgraph for this problem should go under a name scope of "minimize_f", and operations to compute `y` should have an additional name scope of "compute_f".

Then, the steps to minimize the function once are:
 1. Run `tf.global_variables_initializer()` to initialize `x`
 2. Print the initial values of `x` and `y`
 3. Run `gradient_step` 1000 times
 4. Print the final values of x and y
 
Minimize the function a few times. If you did it right, you'll find that in each run the optimizer finds one of two minima. Running minimization a few times, you should see it find both eventually. What determines which minimum is found? Answer in the markdown box below.

In [2]:
with tf.name_scope('minimize_f'):
    x = tf.get_variable('x', shape=(), dtype=tf.float32, initializer=tf.random_uniform_initializer(-1,5))
    with tf.name_scope('compute_f'):
        y = (x-1.0) * (2 * x - 2.0) * (x - 3.0) * (x - 4.0)
        gradient_step = tf.train.GradientDescentOptimizer(0.01, name='optimizer').minimize(y)

In [37]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer()) 
    
    print('Initial value of x:', sess.run(x))
    print('Initial value of f(x):', sess.run(y))
    
    for step in range(100):
        sess.run(gradient_step)
        
    print('Final value of x:', sess.run(x))
    print('Final value of f(x):', sess.run(y))

Initial value of x: 4.492996
Initial value of f(x): 17.960917
Final value of x: 3.5930705
Final value of f(x): -3.245519


Your answer here.