# Quiz 2

The second quiz will take place this Wednesday:
- 45 minutes
- SVM, decision tree, neural network


# Final Project Schedule
**Proposal **

- Form a team of 1-3 students (3-student team is only allowed if the project is significantly more complex than the mid-term project)
- Describe the background of the problem,
- Describe where to get the data,
- Frame the Machine Learning problem: What are input features? What are the model expected to learn? Is it supervised learning / unsupervised learning? Is it a classification / regression problem?
- Describe briefly the research plan: what models to use? How to measure the performance?

**Each team should submit a project proposal at the beginning of the class on Monday, May 6th.**

**Project Submission**
Each team is expected provide a Jupyter notebook containing:
- Written description on every step and their results. For example, if you decide to build a decision tree model, you should describe how the set up the model (value of maximum depth, maximum number of leaves, etc), and explain how well the model works for the problem.
- Executable code that performs the essential steps of the project, including: data cleaning, data visualization/summarization, model training/fine-tuning/evaluation.
- A conclusion session that summarizes the project, with explicit statements on the outcome of the project.

**Submission Deadline: Wednesday, May 22nd (last day of the exam week)**

** Online data sets**
- [Kaggle.com](kaggle.com): Gain access to their data set by entering a competition
- [UCI machine learning repository](http://mlr.cs.umass.edu/ml/) is one of the oldest sources of data sets on the web. These data sets tend to be fairly small, but are usually clean and ready for machine learning to be applied.

In [33]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import tensorflow as tf
import tensorflow.keras as keras
tf.__version__

'1.13.1'

# TensorFlow

TensorFlow is an open source software library for high performance numerical computation.
- Originally it was developed by Google Brain team, and now it is one of the most popular open source projects on GitHub (check out https://github.com/jtoy/awesome-tensorflow)
- Its basic principle is simple: first define in Python a graph of computations to perform, and then TensorFlow takes that graph and runs it efficiently using optimized C++ code.
![](Data/tf_1.png)
- TensorFlow can break up the graph into several chuncks and run them in parallel across multiple CPUs , GPUs, Tensor Processing Units (TPUs), and from desktops to clusters of servers to mobile and edge devices.
- It comes with a great visualization tool called TensorBoard that allows you to browse through the computation graph, view learning curves, and more.
- Google also launched a cloud service to run TensorFlow computational graphs (cloud.google.com/ml)

In [6]:
# Create a constant string
hello = tf.constant('hello')  
# Create a TensorFlow session
sess = tf.Session()
# Print the string during a session run
print(sess.run(hello))
# print(hello) # this will not work; must execute in a session

b'hello'


## Dataflow Graph
TensorFlow uses a **dataflow graph** to represent your computation in terms of the dependencies between individual operations. This leads to a three-step programming procedure:
1. Define the dataflow graph
2. Create a TensorFlow **session**
3. Run the graph

A TensorFlow session will take care of placing the operation onto devices such as CPUs and GPUs and running them.

In [8]:
# Reset a dataflow graph
tf.reset_default_graph()

# Define two variables x, y
x = tf.Variable(3, name='x') # equivalent way in tf for x=3
y = tf.Variable(4, name='y') # y=4

# Define f based on x and y
f = x*x*y + y + 2 

# The following print statement only gives the 
# description of variable f, not its value (since
# the value hasn't been computed yet)
print(f)

Tensor("add_1:0", shape=(), dtype=int32)


In [9]:
# Create a TensorFlow session and evaluates f
sess = tf.Session()

# Initialize x and y
sess.run(x.initializer)
sess.run(y.initializer)

# Evaluate f
result = sess.run(f)

print(result)

# close the session
sess.close()

42


In [10]:
# use with statement to set sess as default session
with tf.Session() as sess:
    # equivalent to sess.run(x.initializer):
    x.initializer.run() 
    # equivalent to sess.run(y.initializer):
    y.initializer.run()
    # equivalent to sess.run(f)
    result = f.eval()
print(result)

42


In [11]:
# Use tf.global_variables_initializer() to 
# initialize all variables
init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    result = f.eval()
print(result)

42


In [12]:
# EXERCISE: 
# Create a TensorFlow graph to compute
# S = pi * r ** 2
# where pi=3.14, r=1.0



## Feeding data to the graph
We can use tf.placeholder() to delay initialization. This is particularly useful when we want to feed data to the graph during execution. The following code creates a placeholder node A, and B = A + 5:

In [13]:
tf.reset_default_graph()

A = tf.placeholder(tf.float32, shape=(None, 3))
# None as a dimension means any size.
B = A + 5
with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
    B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], [7, 8, 9]]})

print(B_val_1)
print(B_val_2)

[[6. 7. 8.]]
[[ 9. 10. 11.]
 [12. 13. 14.]]


## Example: Gradient Descent using TensorFlow

In [14]:
# Load California housing data
import numpy as np
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

# Feature scaling
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

**Calculating Gradient:**
- Cost generated from each instance $(\textbf{x}^{(i)}, y^{(i)})$ is: 

$(\theta\cdot\textbf{x}^{(i)} - y^{(i)})^2 = (\theta_0\cdot 1 + \theta_1x^{(i)}_1 + \theta_2x^{(i)}_2 + \cdots + \theta_nx^{(i)}_n - y^{(i)})^2$.
- Its partial derivative with respect to $\theta_j$ is:

$2x^{(i)}_j(\theta_0\cdot 1 + \theta_1x^{(i)}_1 + \theta_2x^{(i)}_2 + \cdots + \theta_nx^{(i)}_n - y^{(i)})$.
- Average cost is: $\frac{1}{m}\sum_{i=1}^m(y^{(i)} - \theta\cdot\textbf{x}^{(i)})^2$.
- The partial derivative of the average cost with respect to $w_i$ is:

$\frac{1}{m}\sum_{i=1}^m2x^{(i)}_j(\theta_0\cdot 1 + \theta_1x^{(i)}_1 + \theta_2x^{(i)}_2 + \cdots + \theta_nx^{(i)}_n - y^{(i)})$.

**Use tf.gradients(ys, xs) to ask TensorFlow automatically compute the derivatives of sum of ys with respect to xs. **

**The update rule of gradient descent:**

$\theta_j = \theta_j - \textit{(learning_rate)}\cdot\textit{partial derivative}$.

The formula is

$\theta_j = \theta_j - \textit{learning_rate}\cdot\frac{1}{m}\sum_{i=1}^m2x^{(i)}_j(\theta_0\cdot 1 + \theta_1x^{(i)}_1 + \theta_2x^{(i)}_2 + \cdots + \theta_nx^{(i)}_n - y^{(i)})$.



In [15]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

gradients = tf.gradients(mse, [theta])[0]

training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

print("Best theta:")
print(best_theta)

Epoch 0 MSE = 2.7544265
Epoch 100 MSE = 0.632222
Epoch 200 MSE = 0.5727804
Epoch 300 MSE = 0.5585008
Epoch 400 MSE = 0.54907006
Epoch 500 MSE = 0.542288
Epoch 600 MSE = 0.5373791
Epoch 700 MSE = 0.533822
Epoch 800 MSE = 0.53124255
Epoch 900 MSE = 0.5293705
Best theta:
[[ 2.06855249e+00]
 [ 7.74078071e-01]
 [ 1.31192386e-01]
 [-1.17845066e-01]
 [ 1.64778143e-01]
 [ 7.44078017e-04]
 [-3.91945094e-02]
 [-8.61356676e-01]
 [-8.23479772e-01]]


## Constructing a Neural Network using plain TensorFlow

In [21]:
# Load MNIST data
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('tmp/data')

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting tmp/data\train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting tmp/data\train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting tmp/data\t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting tmp/data\t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


In [31]:
X_train = mnist.train.images
X_test = mnist.test.images
y_train = mnist.train.labels.astype('int')
y_test = mnist.test.labels.astype('int')

In [22]:
n_inputs = 28 * 28
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10

In [23]:
tf.reset_default_graph()
X = tf.placeholder(tf.float32,
                   shape=(None, n_inputs),
                   name='x')
y = tf.placeholder(tf.int64,
                   shape=(None),
                   name='y')

In [24]:
def neuron_layer(X, n_neurons, name, activation):
    with tf.name_scope(name):
        n_inputs = int(X.get_shape()[1])
        stddev = 2 / np.sqrt(n_inputs)
        init = tf.truncated_normal(
            (n_inputs, n_neurons),
            stddev=stddev
        )
        W = tf.Variable(init, name='weights')
        b = tf.Variable(tf.zeros([n_neurons]),
                        name='bias')
        Z = tf.matmul(X, W) + b
        return activation(Z)

In [25]:
with tf.name_scope('dnn'):
    hidden1 = neuron_layer(
        X,
        n_hidden1,
        name='hidden1',
        activation=tf.nn.relu
    )
    hidden2 = neuron_layer(
        hidden1,
        n_hidden2,
        name='hidden2',
        activation=tf.nn.relu
    )
    logits = neuron_layer(
        hidden2,
        n_outputs,
        name='outputs',
        activation=tf.identity
    )

In [26]:
with tf.name_scope('loss'):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
        labels=y,
        logits=logits
    )
    loss = tf.reduce_mean(xentropy,
                          name='loss')

In [27]:
learning_rate = 0.01

with tf.name_scope('train'):
    optimizer = tf.train.GradientDescentOptimizer(
        learning_rate
    )
    training_op = optimizer.minimize(loss)

In [28]:
with tf.name_scope('eval'):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(
        tf.cast(correct, tf.float32)
    )

In [29]:
init = tf.global_variables_initializer()

In [32]:
n_epochs = 40
batch_size = 50
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples // batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op,
                     feed_dict={X: X_batch,
                                y: y_batch})
        acc = accuracy.eval(
            feed_dict={X: X_test,
                       y: y_test}
        )
        print(epoch, 'Accuracy:', acc)

0 Accuracy: 0.9109
1 Accuracy: 0.9309
2 Accuracy: 0.9394
3 Accuracy: 0.9476
4 Accuracy: 0.951
5 Accuracy: 0.9544
6 Accuracy: 0.958
7 Accuracy: 0.9602
8 Accuracy: 0.9615
9 Accuracy: 0.9638
10 Accuracy: 0.9649
11 Accuracy: 0.9665
12 Accuracy: 0.9685
13 Accuracy: 0.9685
14 Accuracy: 0.97
15 Accuracy: 0.9698
16 Accuracy: 0.9706
17 Accuracy: 0.9719
18 Accuracy: 0.9719
19 Accuracy: 0.9714
20 Accuracy: 0.9729
21 Accuracy: 0.9731
22 Accuracy: 0.9743
23 Accuracy: 0.9729
24 Accuracy: 0.9747
25 Accuracy: 0.9742
26 Accuracy: 0.9757
27 Accuracy: 0.9751
28 Accuracy: 0.9746
29 Accuracy: 0.975
30 Accuracy: 0.9759
31 Accuracy: 0.9754
32 Accuracy: 0.9769
33 Accuracy: 0.9767
34 Accuracy: 0.976
35 Accuracy: 0.9767
36 Accuracy: 0.9764
37 Accuracy: 0.9769
38 Accuracy: 0.9771
39 Accuracy: 0.9767
