# Quiz 2

The second quiz will take place this Wednesday:
- 45 minutes
- SVM, decision tree, neural network


# Final Project Schedule
**Proposal **

- Form a team of 1-3 students (3-student team is only allowed if the project is significantly more complex than the mid-term project)
- Describe the background of the problem,
- Describe where to get the data,
- Frame the Machine Learning problem: What are input features? What are the model expected to learn? Is it supervised learning / unsupervised learning? Is it a classification / regression problem?
- Describe briefly the research plan: what models to use? How to measure the performance?

**Each team should submit a project proposal at the beginning of the class on Monday, May 6th.**

**Project Submission**
Each team is expected provide a Jupyter notebook containing:
- Written description on every step and their results. For example, if you decide to build a decision tree model, you should describe how the set up the model (value of maximum depth, maximum number of leaves, etc), and explain how well the model works for the problem.
- Executable code that performs the essential steps of the project, including: data cleaning, data visualization/summarization, model training/fine-tuning/evaluation.
- A conclusion session that summarizes the project, with explicit statements on the outcome of the project.

**Submission Deadline: Wednesday, May 22nd (last day of the exam week)**

** Online data sets**
- [Kaggle.com](kaggle.com): Gain access to their data set by entering a competition
- [UCI machine learning repository](http://mlr.cs.umass.edu/ml/) is one of the oldest sources of data sets on the web. These data sets tend to be fairly small, but are usually clean and ready for machine learning to be applied.

In [45]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import tensorflow as tf
import tensorflow.keras as keras
tf.__version__

'1.13.1'

# TensorFlow

TensorFlow is an open source software library for high performance numerical computation.
- Originally it was developed by Google Brain team, and now it is one of the most popular open source projects on GitHub (check out https://github.com/jtoy/awesome-tensorflow)
- Its basic principle is simple: first define in Python a graph of computations to perform, and then TensorFlow takes that graph and runs it efficiently using optimized C++ code.
![](Data/tf_1.png)
- TensorFlow can break up the graph into several chuncks and run them in parallel across multiple CPUs , GPUs, Tensor Processing Units (TPUs), and from desktops to clusters of servers to mobile and edge devices.
- It comes with a great visualization tool called TensorBoard that allows you to browse through the computation graph, view learning curves, and more.
- Google also launched a cloud service to run TensorFlow computational graphs (cloud.google.com/ml)

In [47]:
# Create a constant string
hello = tf.constant('hello')  
# Create a TensorFlow session
sess = tf.Session()
# # Print the string during a session run
print(sess.run(hello))
# print(hello) # this will not work; must execute in a session

b'hello'


## Dataflow Graph
TensorFlow uses a **dataflow graph** to represent your computation in terms of the dependencies between individual operations. This leads to a three-step programming procedure:
1. Define the dataflow graph
2. Create a TensorFlow **session**
3. Run the graph

A TensorFlow session will take care of placing the operation onto devices such as CPUs and GPUs and running them.

In [48]:
# Reset a dataflow graph
tf.reset_default_graph()

# Define two variables x, y
x = tf.Variable(3, name='x') # equivalent way in tf for x=3
y = tf.Variable(4, name='y') # y=4

# Define f based on x and y
f = x*x*y + y + 2 

# The following print statement only gives the 
# description of variable f, not its value (since
# the value hasn't been computed yet)
print(f)

Tensor("add_1:0", shape=(), dtype=int32)


In [49]:
# Create a TensorFlow session and evaluates f
sess = tf.Session()

# Initialize x and y
sess.run(x.initializer)
sess.run(y.initializer)

# Evaluate f
result = sess.run(f)

print(result)

# close the session
sess.close()

42


In [50]:
# use with statement to set sess as default session
with tf.Session() as sess:
    # equivalent to sess.run(x.initializer):
    x.initializer.run() 
    # equivalent to sess.run(y.initializer):
    y.initializer.run()
    # equivalent to sess.run(f)
    result = f.eval()
print(result)

42


In [51]:
# Use tf.global_variables_initializer() to 
# initialize all variables
init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    result = f.eval()
print(result)

42


In [53]:
# EXERCISE: 
# Create a TensorFlow graph to compute
# S = pi * r ** 2
# where pi=3.14, r=1.0
tf.reset_default_graph()
r = tf.Variable(2.0)
pi = tf.Variable(3.14)
S = pi * r * r

init = tf.global_variables_initializer()
with tf.Session() as sess:
    init.run()
    print(S.eval())

12.56


## Feeding data to the graph
We can use tf.placeholder() to delay initialization. This is particularly useful when we want to feed data to the graph during execution. The following code creates a placeholder node A, and B = A + 5:

In [56]:
tf.reset_default_graph()

A = tf.placeholder(tf.float32, shape=(None, None))
# None as a dimension means any size.
B = A + 5
with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3, 4]]})
    B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], 
                                    [7, 8, 9]]})

print(B_val_1)
print(B_val_2)

[[6. 7. 8. 9.]]
[[ 9. 10. 11.]
 [12. 13. 14.]]


## Example: Gradient Descent using TensorFlow

In [59]:
# Load California housing data
import numpy as np
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
housing.keys()

dict_keys(['data', 'target', 'feature_names', 'DESCR'])

In [62]:
housing['feature_names']

['MedInc',
 'HouseAge',
 'AveRooms',
 'AveBedrms',
 'Population',
 'AveOccup',
 'Latitude',
 'Longitude']

In [61]:
housing['data'][:5, :]

array([[ 8.32520000e+00,  4.10000000e+01,  6.98412698e+00,
         1.02380952e+00,  3.22000000e+02,  2.55555556e+00,
         3.78800000e+01, -1.22230000e+02],
       [ 8.30140000e+00,  2.10000000e+01,  6.23813708e+00,
         9.71880492e-01,  2.40100000e+03,  2.10984183e+00,
         3.78600000e+01, -1.22220000e+02],
       [ 7.25740000e+00,  5.20000000e+01,  8.28813559e+00,
         1.07344633e+00,  4.96000000e+02,  2.80225989e+00,
         3.78500000e+01, -1.22240000e+02],
       [ 5.64310000e+00,  5.20000000e+01,  5.81735160e+00,
         1.07305936e+00,  5.58000000e+02,  2.54794521e+00,
         3.78500000e+01, -1.22250000e+02],
       [ 3.84620000e+00,  5.20000000e+01,  6.28185328e+00,
         1.08108108e+00,  5.65000000e+02,  2.18146718e+00,
         3.78500000e+01, -1.22250000e+02]])

In [63]:

m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]
housing_data_plus_bias[:5, :]


array([[ 1.00000000e+00,  8.32520000e+00,  4.10000000e+01,
         6.98412698e+00,  1.02380952e+00,  3.22000000e+02,
         2.55555556e+00,  3.78800000e+01, -1.22230000e+02],
       [ 1.00000000e+00,  8.30140000e+00,  2.10000000e+01,
         6.23813708e+00,  9.71880492e-01,  2.40100000e+03,
         2.10984183e+00,  3.78600000e+01, -1.22220000e+02],
       [ 1.00000000e+00,  7.25740000e+00,  5.20000000e+01,
         8.28813559e+00,  1.07344633e+00,  4.96000000e+02,
         2.80225989e+00,  3.78500000e+01, -1.22240000e+02],
       [ 1.00000000e+00,  5.64310000e+00,  5.20000000e+01,
         5.81735160e+00,  1.07305936e+00,  5.58000000e+02,
         2.54794521e+00,  3.78500000e+01, -1.22250000e+02],
       [ 1.00000000e+00,  3.84620000e+00,  5.20000000e+01,
         6.28185328e+00,  1.08108108e+00,  5.65000000e+02,
         2.18146718e+00,  3.78500000e+01, -1.22250000e+02]])

In [65]:
# Feature scaling
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]
scaled_housing_data_plus_bias[:5, :]

array([[ 1.        ,  2.34476576,  0.98214266,  0.62855945, -0.15375759,
        -0.9744286 , -0.04959654,  1.05254828, -1.32783522],
       [ 1.        ,  2.33223796, -0.60701891,  0.32704136, -0.26333577,
         0.86143887, -0.09251223,  1.04318455, -1.32284391],
       [ 1.        ,  1.7826994 ,  1.85618152,  1.15562047, -0.04901636,
        -0.82077735, -0.02584253,  1.03850269, -1.33282653],
       [ 1.        ,  0.93296751,  1.85618152,  0.15696608, -0.04983292,
        -0.76602806, -0.0503293 ,  1.03850269, -1.33781784],
       [ 1.        , -0.012881  ,  1.85618152,  0.3447108 , -0.03290586,
        -0.75984669, -0.08561576,  1.03850269, -1.33781784]])

In [66]:
df = pd.DataFrame(scaled_housing_data_plus_bias,
                  columns=['Dummy'] + housing['feature_names'])
df.head()

Unnamed: 0,Dummy,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,1.0,2.344766,0.982143,0.628559,-0.153758,-0.974429,-0.049597,1.052548,-1.327835
1,1.0,2.332238,-0.607019,0.327041,-0.263336,0.861439,-0.092512,1.043185,-1.322844
2,1.0,1.782699,1.856182,1.15562,-0.049016,-0.820777,-0.025843,1.038503,-1.332827
3,1.0,0.932968,1.856182,0.156966,-0.049833,-0.766028,-0.050329,1.038503,-1.337818
4,1.0,-0.012881,1.856182,0.344711,-0.032906,-0.759847,-0.085616,1.038503,-1.337818


**Calculating Gradient:**
- Cost generated from each instance $(\textbf{x}^{(i)}, y^{(i)})$ is: 

$(\theta\cdot\textbf{x}^{(i)} - y^{(i)})^2 = (\theta_0\cdot 1 + \theta_1x^{(i)}_1 + \theta_2x^{(i)}_2 + \cdots + \theta_nx^{(i)}_n - y^{(i)})^2$.
- Its partial derivative with respect to $\theta_j$ is:

$2x^{(i)}_j(\theta_0\cdot 1 + \theta_1x^{(i)}_1 + \theta_2x^{(i)}_2 + \cdots + \theta_nx^{(i)}_n - y^{(i)})$.
- Average cost is: $\frac{1}{m}\sum_{i=1}^m(y^{(i)} - \theta\cdot\textbf{x}^{(i)})^2$.
- The partial derivative of the average cost with respect to $w_i$ is:

$\frac{1}{m}\sum_{i=1}^m2x^{(i)}_j(\theta_0\cdot 1 + \theta_1x^{(i)}_1 + \theta_2x^{(i)}_2 + \cdots + \theta_nx^{(i)}_n - y^{(i)})$.

**Use tf.gradients(ys, xs) to ask TensorFlow automatically compute the derivatives of sum of ys with respect to xs. **

**The update rule of gradient descent:**

$\theta_j = \theta_j - \textit{(learning_rate)}\cdot\textit{partial derivative}$.

The formula is

$\theta_j = \theta_j - \textit{learning_rate}\cdot\frac{1}{m}\sum_{i=1}^m2x^{(i)}_j(\theta_0\cdot 1 + \theta_1x^{(i)}_1 + \theta_2x^{(i)}_2 + \cdots + \theta_nx^{(i)}_n - y^{(i)})$.



In [72]:
tf.reset_default_graph()

n_epochs = 5000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

gradients = tf.gradients(mse, [theta])[0]

training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

print("Best theta:")
print(best_theta)

Epoch 0 MSE = 2.7544265
Epoch 100 MSE = 0.632222
Epoch 200 MSE = 0.5727804
Epoch 300 MSE = 0.5585008
Epoch 400 MSE = 0.54907006
Epoch 500 MSE = 0.542288
Epoch 600 MSE = 0.5373791
Epoch 700 MSE = 0.533822
Epoch 800 MSE = 0.53124255
Epoch 900 MSE = 0.5293705
Epoch 1000 MSE = 0.52801067
Epoch 1100 MSE = 0.52702194
Epoch 1200 MSE = 0.5263023
Epoch 1300 MSE = 0.52577746
Epoch 1400 MSE = 0.52539444
Epoch 1500 MSE = 0.5251144
Epoch 1600 MSE = 0.52490914
Epoch 1700 MSE = 0.5247584
Epoch 1800 MSE = 0.5246476
Epoch 1900 MSE = 0.52456564
Epoch 2000 MSE = 0.5245052
Epoch 2100 MSE = 0.5244602
Epoch 2200 MSE = 0.5244267
Epoch 2300 MSE = 0.52440166
Epoch 2400 MSE = 0.5243829
Epoch 2500 MSE = 0.52436876
Epoch 2600 MSE = 0.524358
Epoch 2700 MSE = 0.52434987
Epoch 2800 MSE = 0.5243435
Epoch 2900 MSE = 0.52433884
Epoch 3000 MSE = 0.5243351
Epoch 3100 MSE = 0.5243322
Epoch 3200 MSE = 0.52432984
Epoch 3300 MSE = 0.5243281
Epoch 3400 MSE = 0.5243267
Epoch 3500 MSE = 0.5243256
Epoch 3600 MSE = 0.5243248
Epoc

In [70]:
# Compare the previous result with linear regression from sklearn
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(scaled_housing_data_plus_bias,
          housing.target.reshape([-1, 1]))

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [71]:
print(model.coef_, model.intercept_)

[[ 0.          0.8296193   0.11875165 -0.26552688  0.30569623 -0.004503
  -0.03932627 -0.89988565 -0.870541  ]] [2.06855817]


## Constructing a Neural Network using plain TensorFlow

In [73]:
# Load MNIST data
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('tmp/data')

Extracting tmp/data\train-images-idx3-ubyte.gz
Extracting tmp/data\train-labels-idx1-ubyte.gz
Extracting tmp/data\t10k-images-idx3-ubyte.gz
Extracting tmp/data\t10k-labels-idx1-ubyte.gz


In [78]:
X_train = mnist.train.images
X_test = mnist.test.images
y_train = mnist.train.labels.astype('int')
y_test = mnist.test.labels.astype('int')

In [79]:
n_inputs = 28 * 28
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10

In [80]:
tf.reset_default_graph()
X = tf.placeholder(tf.float32,
                   shape=(None, n_inputs),
                   name='x')
y = tf.placeholder(tf.int64,
                   shape=(None),
                   name='y')

In [81]:
def neuron_layer(X, n_neurons, name, activation):
    with tf.name_scope(name):
        n_inputs = int(X.get_shape()[1])
        stddev = 2 / np.sqrt(n_inputs)
        init = tf.truncated_normal(
            (n_inputs, n_neurons),
            stddev=stddev
        )
        W = tf.Variable(init, name='weights')
        b = tf.Variable(tf.zeros([n_neurons]),
                        name='bias')
        Z = tf.matmul(X, W) + b
        return activation(Z)

In [82]:
with tf.name_scope('dnn'):
    hidden1 = neuron_layer(
        X,
        n_hidden1,
        name='hidden1',
        activation=tf.nn.relu
    )
    hidden2 = neuron_layer(
        hidden1,
        n_hidden2,
        name='hidden2',
        activation=tf.nn.relu
    )
    logits = neuron_layer(
        hidden2,
        n_outputs,
        name='outputs',
        activation=tf.identity
    )

In [83]:
with tf.name_scope('loss'):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
        labels=y,
        logits=logits
    )
    loss = tf.reduce_mean(xentropy,
                          name='loss')

In [84]:
learning_rate = 0.01

with tf.name_scope('train'):
    optimizer = tf.train.GradientDescentOptimizer(
        learning_rate
    )
    training_op = optimizer.minimize(loss)

In [85]:
with tf.name_scope('eval'):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(
        tf.cast(correct, tf.float32)
    )

In [86]:
init = tf.global_variables_initializer()

In [87]:
n_epochs = 40
batch_size = 50
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples // batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op,
                     feed_dict={X: X_batch,
                                y: y_batch})
        acc = accuracy.eval(
            feed_dict={X: X_test,
                       y: y_test}
        )
        print(epoch, 'Accuracy:', acc)

0 Accuracy: 0.9176
1 Accuracy: 0.9319
2 Accuracy: 0.9412
3 Accuracy: 0.9469
4 Accuracy: 0.9509
5 Accuracy: 0.9554
6 Accuracy: 0.9566
7 Accuracy: 0.9597
8 Accuracy: 0.9599
9 Accuracy: 0.9629
10 Accuracy: 0.9644
11 Accuracy: 0.9663
12 Accuracy: 0.9676
13 Accuracy: 0.9686
14 Accuracy: 0.9688
15 Accuracy: 0.9701
16 Accuracy: 0.9704
17 Accuracy: 0.9696
18 Accuracy: 0.9721
19 Accuracy: 0.9723
20 Accuracy: 0.973
21 Accuracy: 0.9739
22 Accuracy: 0.974
23 Accuracy: 0.974
24 Accuracy: 0.9758
25 Accuracy: 0.9751
26 Accuracy: 0.9752
27 Accuracy: 0.9759
28 Accuracy: 0.9757
29 Accuracy: 0.9758
30 Accuracy: 0.9758
31 Accuracy: 0.9771
32 Accuracy: 0.9762
33 Accuracy: 0.9768
34 Accuracy: 0.9763
35 Accuracy: 0.9771
36 Accuracy: 0.9775
37 Accuracy: 0.9771
38 Accuracy: 0.9776
39 Accuracy: 0.9782
