<h1>CSUA ML Workshop</h1>
<p>Welcome to the CSUA ML Workshop. This is intended to be for beginners with little to no experience in machine learning or artificial intelligence, and is meant to not only teach the implementation of ML but also provide some of the motivating factors behind it.</p>

<p>TensorFlow is required for this workshop due to its beginner friendliness. If you are looking for a more customizable framework, it is best to use Torch. A rundown of the two is below.</p>

<h2>TensorFlow</h2>
<p>One of the hottest Machine Learning frameworks, TensorFlow is managed by Google and is used internally by them as well. </p>
<h3>Pros</h3>
- Google 
- Well abstracted and high-level
- More built-in models to play with
- Quicker to get off the ground
- Set, rigid models for optimization

<h3>Cons</h3>
- Google 
- Still slower than Torch
- Autodifferentiation is more limited
- Writing custom models is a pain in the ass
- Customization of models is extremely verbose and tedious
- Bloated API

<h2>Torch</h2>
<p> Also one of the hottest Machine Learning frameworks, but worked on by Facebook, Nvidia, Salesforce, Stanford, etc.</p>
<h3>Pros</h3>
- Fast
- Dynamic models allowing for variable inputs
- Better autodifferentiation
- Extremely detailed customization available
- Relatively sleek API

<h3>Cons</h3>
- More difficult to get used to
- Requires a deeper understanding of ML
- Not ideal for building fast
- More headaches while creating simpler models

<h1>Getting Started</h1>
<p>To get started, we will import the libraries that we will use. In this workshop, we will be using TensorFlow, Numpy, and MatPlotLib. TensorFlow and Numpy will be used for numerical manipulation, while MatPlotLib will be used for visualizations. There is also a provided housing_data and housing_test_data. Our first motivating example will be predicting housing prices given square footage.</p>

<h2>Motivating Example: Housing Prices</h2>
<p>You're looking for houses in the Seattle region and notice that housing prices seem to be very closely correlated with square footage of the house. Can you develop a linear model that predicts housing prices based off of only square footage?</p>

<p>The data is provided as a two-dimensional array of [num_examples] pairs. The first value in the pair is the square footage, and the second value is the actual price. <br><br> ```Example: [[1, 300], [2, 600], ...]```</p>

In [None]:
#Import key libraries
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
#Import data source 
import datasets

#Generate training and test data. This is a two-dimensional array of lengths [number_of examples, 2]
housing_data = datasets.housing_data_gen()
housing_test_data = datasets.housing_data_gen()

#Prepare data for graphing
graph_data = list(zip(*housing_data))

plt.scatter(graph_data[0], graph_data[1])
plt.xlabel('Square Footage')
plt.ylabel('Price')
plt.show()

<h1>Regression</h1>
<p>Linear regression is one of those ideas you see everywhere. It is critical to our daily lives and allows humans to extract good patterns from data. Here, we will show how linear regression can be implemented within the context of Machine Learning.</p>

<h1>Concept 1: Variables</h1>
<p>Variables are the trainable matrices available in TensorFlow. They represent the parameters your model will ultimately be using to predict housing prices. For example, in the equation y = ax + b, a and b would be the parameters associated with the model. Within the context of ML, the variable names w and b would typically be used instead: y = wx + b. The w stands for weight, which makes more sense in the context of neural networks. b stands for bias.</p>

<p>In TensorFlow, you can create a variable using: <br><br>
```tf.get_variable(name, shape=None, dtype=None, initializer=None, regularizer=None, trainable=True, collections=None, caching_device=None, partitioner=None, validate_shape=True, custom_getter=None)``` <br><br>
The only relevant fields for now will be name, shape, and dtype. The name is user-defined. The shape is a list of integers describing the dimensions of the Tensor this variable represents. The dtype is the data type of the Tensor. Note that giving the first dimension "None" allows for batch training.</p>

In [None]:
#FILL IN CODE; CREATE WEIGHTS AND BIASES

<h1>Concept 2: Placeholders</h1>
<p>Ironically, TensorFlow doesn't recommend use of their Tensor type. Placeholders and Variables are used for inputs. They define an operation in the computational graph that represents an input. During actual runtime of the graph, these will have Tensor information flow through it.</p>
<p>In TensorFlow, you can create a placeholder using:<br><br>
```tf.placeholder(dtype, shape=None, name=None)```<br><br></p>

In [None]:
#FILL IN CODE; CREATE PLACEHOLDER FOR INPUTS

<h1>Concept 3: Computational Graphs</h1>
<p>A computational graph is what TensorFlow uses to train the model you define. They are very well abstracted, so you can generate the computation graph simply by using matrix operations as you normally would. You can think of the computational graph as a function definition - here, we're merely defining the function. The actual usage will come later.</p>

In [None]:
#FILL IN CODE; BEGIN TO CREATE COMPUTATIONAL GRAPH

<h1>Concept 4: Error/Loss Functions</h1>
<p>In order for the Linear Regression model to be trained, there must be a way for the model to understand how well it is doing. Because we know what the inputs and outputs are, we would like to define how close the actual prediction is to the actual prediction. The typical loss function for this is Mean Squared Error (MSE) loss.</p>
<p>Notice that here, by defining the loss, we're extending the computational graph.</p>

In [None]:
#Define a placeholder for the known actual price
#FILL IN CODE; PUT IN PLACEHOLDER FOR ACTUAL PRICE

#CALCULATE THE MEAN SQUARED ERROR LOSS
loss = None #REPLACE LINE

<h1>A Note On Data</h1>
<p>Please normalize your data before running regressions. While this does obfuscate the actual weights and biases and prevents you from using them in other models, they are easily recoverable with a bit of data manipulation.</p>

<p>While this normalization is provided for you, please take a moment to understand what is going on below. A more statistically rigorous way of normalizing data is using the standard deviation, but here I decided to simply divide by the mean to make the code more concise, readable, and clean.</p>

In [None]:
housing_mean_sqft = sum(graph_data[0]) / len(graph_data[0])
housing_sqft_range = max(graph_data[0]) - min(graph_data[0])
housing_mean_price = sum(graph_data[1]) / len(graph_data[1])
housing_price_range = max(graph_data[1]) - min(graph_data[1])

#Normalize data to a -0.5 to 0.5 range by taking (value - mean)/range
norm_housing_data = np.array([[(pair[0] - housing_mean_sqft) / housing_sqft_range, 
                     (pair[1] - housing_mean_price) / housing_price_range] for pair in housing_data])
norm_housing_test_data = np.array([[(pair[0] - housing_mean_sqft) / housing_sqft_range, 
                          (pair[1] - housing_mean_price) / housing_price_range] for pair in housing_test_data])

#Randomly shuffle the data to make training more robust
np.random.shuffle(norm_housing_data)
np.random.shuffle(norm_housing_test_data)

#Function to return data back into the i/o space
def return_to_io_space(sqft, price):
    return sqft * housing_sqft_range + housing_mean_sqft, price * housing_price_range + housing_mean_price

<h1>Concept 5: Training Operations</h1>
<p>To actually train the graph, Tensorflow requires a bit of handling. For TensorFlow, you define an operation on your loss function. The autodifferentiation of TensorFlow will then build up a gradient graph and use it during training. In Torch, the process is slightly more complex. There are a few function calls you must make, one for feedforward, one for calculating gradients, and one for backpropagation. This, however, allows for some slightly more complex and modular training operations.</p>

In [None]:
train_step = tf.train.GradientDescentOptimizer(learning_rate = 0.1).minimize(loss)

<h1>Concept 6: TensorFlow Runtime</h1>
<p>When it's time to train your model, TensorFlow has a few special prerequesites that you need to add to your code to make it run smoothly. Firstly, TensorFlow relies on an idea of a Session - an object acting as a Singleton for all of the computations and runtime variables for your model. Much like how you start a new Python session when running a script, you need to start a new TensorFlow session to run the computational graph.</p>
<p>To begin a session, it is best practice to do the following:</p>
<p>```with tf.Session() as sess:```</p>
<p>After initializing a session, you must initialize all of the Variables that you had previously defined. While before, you were building up a computational graph, now you are initializing the computational graph so it can begin training.</p>
<p>```sess.run(tf.global_variables_initializer())```</p>
<p>Finally, you can begin training. During training, you can initialize the state of any Variable or Placeholder to some tensor, and the computational graph will automatically use those values in its calculations. You can pass those in using the argument feed_dict, which is a dictionary mapping Placeholders/Variables to arrays/numpy arrays. You can run the computational graph using the following:</p>
<p>```sess.run(fetches, feed_dict=None, options=None, run_metadata=None)```</p>
<p>This code take the Tensors you ask for it to return, resolves the graph to include only everything needed to return those Tensors, and then substitutes in the Tensors in the feed_dict into the computational graph and runs it.</p>

In [None]:
#To remember the parameters for evaluation-time
w_val, b_val = None, None

#Spin up a new session
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    #Feed in each example one by one, for 10 epochs
    for j in range(10):
        #Separate the sqft and the price from the raw data input
        sq_feet_normed = [[norm_housing_data[i][0]] for i in range(len(housing_data))]
        actual_price_normed = [[norm_housing_data[i][1]] for i in range(len(housing_data))]
        
        for i in range(len(housing_data)):
        #FILL IN TRAINING STEP
        
        #Randomly shuffle training data
        np.random.shuffle(norm_housing_data)
        
print(w_val)
print(b_val)

<h1>Concept 7: TensorFlow Evaluation</h1>
<p>Finally, we're ready to see how well our model does. Let's run through and see what the error is on some random examples taken from the same distribution. To do this, we start a new session (isolated from the first for safety). Then, we evaluate the price_pred, giving the feed_dict our extracted values for the weights and biases. For each prediction, we calcualte the error, and then at the end we average over all of the errors.</p>

In [None]:
#Error and predictions
err = 0
preds = []

#Spin up a new session
with tf.Session() as sess:    
    
    #Iterate through testing data
    for example in norm_housing_test_data:
        pred = sess.run(price_pred, feed_dict = {sq_feet: [[example[0]]], 
                                                                 w: w_val, 
                                                                 b: b_val})
        
        #Return predictions and true p back to i/o space and calculate the error
        _, pred = None, None #REPLACE LINE
        _, true_p = None, None #REPLACE LINE
        preds.append(pred)
        err += None #REPLACE LINE
    
print('Total error: ' + str(err / len(preds)))
print('Total accuracy: ' + str(1 - err / len(preds)))

graph_data = list(zip(*housing_test_data))
preds_x = [return_to_io_space(item, 0)[0] for item in norm_housing_test_data[:,0]]
plt.scatter(graph_data[0], graph_data[1])
plt.scatter(preds_x, preds)
plt.xlabel('Square Footage')
plt.ylabel('Price')
plt.show()

<h1>Special Techniques: Kernel Trick</h1>
<p> A fun thing happens when the data is non-linear. How do you predict data using such a model? Thankfully, the design of gradient-descent based linear regression lends itself to one of the most useful techniques in data science and machine learning, often called the Kernel Trick. Essentially, you design a kernel function mapping some higher-dimensional, potentially nonlinear space to a linear space. For regression, this is not very interesting, mainly because there is very little dimensionality reduction that can actually be implemented. However, later on, this will come in handy.</p> 
<p> An example of where this can be used is in quadratic regression. Let's use quadratic regression to analyze the same data from above. </p>

In [None]:
#FILL IN CODE FOR COMPUTATION GRAPH AND META GRAPH

In [None]:
#FILL IN CODE FOR TRAINING RUN

In [None]:
#Error and predictions
err = 0
preds = []

#FILL IN CODE FOR EVALUATION
        
print('Total error: ' + str(err / len(preds)))
print('Total accuracy: ' + str(1 - err / len(preds)))

graph_data = list(zip(*housing_test_data))
preds_x = [return_to_io_space(item, 0)[0] for item in norm_housing_test_data[:,0]]
plt.scatter(graph_data[0], graph_data[1])
plt.scatter(preds_x, preds)
plt.show()

<h1>Logistical Regression</h1>
<p>Logistical Regression is often used for classification. Given some input data, it is reasonable to wonder: is this part of class A or class B? For example, given patient data, one can try to answer the question whether somebody has cancer or not. Let's use our knowledge of computational graphs to generate a model for this.</p>

<h2>Concept 8: Activation Functions</h2>
<p>Before we do so, however, we need to introduce the concept of an activation function. Activation functions are functions that alter the inputs into them in a nonlinear way and have easily computable derivatives for easy backpropagation. This allows them to learn nonlinear relationships. They can also allow for interpretation of the internals of a neural network. For example, a sigmoid function takes in inputs, scales them to between 0 and 1, and can therefore be interpreted as a probability of a certain class.</p>
<p>To add a sigmoid function to a computation graph, simply call: <br><br>
```tf.sigmoid(x)```</p>
<p>To utilize sigmoid loss, simply call: <br><br>
```tf.nn.sigmoid_cross_entropy_with_logits(a, b)```</p>
<p>As an exercise now, create a model that takes in 9 inputs and outputs the probability that a certain patient has cancer. Then, create the loss and training steps to train the model.</p>

In [None]:
#FILL IN CODE FOR COMPUTATIONAL GRAPH AND META GRAPH

<p>Now, let's write the training session and loop.</p>

In [None]:
patient_data = datasets.cancer_data_gen()
patient_data, patient_test_data = patient_data[:550], patient_data[550:]

input_patient_data = [patient_data[i][:len(patient_data[i]) - 1] for i in range(len(patient_data))]
input_patient_diag = [[1 if patient_data[i][-1] == 4 else 0] for i in range(len(patient_data))]

#FILL IN CODE FOR TRAINING RUN

Finally, let's evaluate the model.

In [None]:
acc = 0
preds = []

#Separate input data from the diagnosis data
input_test_data = [patient_test_data[i][:len(patient_test_data[i]) - 1] for i in range(len(patient_test_data))]
input_test_diag = [[1 if patient_test_data[i][-1] == 4 else 0] for i in range(len(patient_test_data))]

#FILL IN CODE FOR EVALUATION RUN
#print(str(pred[0]) + ' ?= ' + str(input_test_diag[i][0])) #UNCOMMENT LATER
        
print('Accuracy: ' + str(acc / len(preds)))
        