Lecture 9: Intro to Neural Nets
===============

10/12/2023, CS 4/6120 Natural Language Processing, Muzny



Task 1: Writing a neural net from scratch
-----------------

In [1]:
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline
# seed random number generation so that you can 
# track the same numbers as each other
np.random.seed(42)

In [2]:
def sigmoid(x: float) -> float:
    """
    Apply the sigmoid function (1 / (1 + e^(-x)))
    to the passed in value.
    Parameters:
        x - float value to pass through sigmoid
    Return:
    float in [0, 1]
    """
    return 1/(1 + np.exp(-x))

def sigmoid_deriv(x: float) -> float:
    """
    Apply the derivative of the sigmoid function
    sigmoid(x) * (1 - sigmoid(x))
    to the passed in value.
    Parameters:
        x - float value to pass through sigmoid derivative
    Return:
    float result
    """
    return sigmoid(x) * (1 - sigmoid(x))

In [3]:
# input dataset
# 3rd "feature" is the bias term
X = np.array([  [0,0,1],
                [0,1,1],
                [1,0,1],
                [1,1,1] ])
    
# labels, transposed so that they match
# easily with our inputs X
# the first label matches the first row in our input data,
# the second label matches the second row in our input data, etc
# .T gets the transpose for us, which makes 
# matrix math easier later
y = np.array([[0,1,1,0]]).T

1. What logical function (AND, OR, etc) does this dataset represent? (remember that this function should apply to two inputs (our two input features and produce the matching label)

__YOUR ANSWER HERE__

In [5]:
hidden_units = 4
input_features = X.shape[1]

# initialize weights randomly with mean 0 and range [-1, 1]
# TODO: fill in dimensions here for W and U
# fill these in as a tuple like (rows, columns)
# this corresponds to how shapes are represented for numpy arrays
W_dim = (input_features, hidden_units)

# you'll need to use W_dim and U_dim to produce the
# correct number of random numbers
W = 2 * np.random.random(W_dim) - 1
# note that we are doing binary classification, so the second dimension here is 1 
# (corresponding to one output unit)
U_dim = (hidden_units, 1)
U = 2 * np.random.random(U_dim) - 1
print("W:", W)
print("U:", U)


inputs = X
num_epochs = 1000
for i in range(num_epochs):
    # forward propagation—sigmoid, relu, tanh, etc
    h = sigmoid(np.dot(inputs,W))
    
    # always sigmoid—classification
    # note that this gives us the classification for every input
    # example simultaneously
    y_hat = sigmoid(np.dot(h,U))

    # how much did we miss?
    layer2_error = y - y_hat
    
    # this is telling us how much to move
    # our weights and in what direction
    # use the corresponding derivative to the non-linearity used above
    layer2_delta = layer2_error * sigmoid_deriv(y_hat)

    # how much did each L1 value contribute to 
    # the L2 error (according to the weights)?
    layer1_error = layer2_delta.dot(U.T)
    
    # this is telling us how much to move
    # our weights and in what direction
    layer1_delta = layer1_error * sigmoid_deriv(h)

    U += h.T.dot(layer2_delta)
    W += inputs.T.dot(layer1_delta)

W: [[-0.39151551  0.04951286 -0.13610996 -0.41754172]
 [ 0.22370579 -0.72101228 -0.4157107  -0.26727631]
 [-0.08786003  0.57035192 -0.60065244  0.02846888]]
U: [[ 0.18482914]
 [-0.90709917]
 [ 0.2150897 ]
 [-0.65895175]]


2. Does the hidden layer have a bias term in this neural net? __YOUR ANSWER HERE__
3. What variables' values are updated as the loop above iterates? __YOUR ANSWER HERE__

In [None]:
print("Output After Training:")
# these are the same as the inputs that we trained this net on
test_inputs = np.array([  [0,0,1],
                [0,1,1],
                [1,0,1],
                [1,1,1] ])
gold_labels = np.array([[0,1,1,0]]).T

# TODO: Write the code to assign labels to the test data
h = FILL ME IN
y_hat = FILL ME IN

# These should match with each other
# y was our gold labels from the beginning
print("Actual labels:", gold_labels.T)
print("Assigned probabilities:", y_hat)
print("Assigned labels:", [1 if y_hat_val > .5 else 0 for y_hat_val in y_hat])

4. How many iterations did you need for the predicted values $\hat y$ to match the actual values? __YOUR ANSWER HERE__
5. Make a graph of how the `layer2_error` changes as epochs progress.

Task 2: Neural Nets from libraries (you'll be doing a similar thing in your sentiment analysis HW!)
----------------

Now, we'll take a look at some common libraries used to create classifiers using neural nets. We'll take a look at [`keras`](https://keras.io/) which provides a nice API for implementing neural nets and can be run on top of TensorFlow, CNTK, or Theano. We'll look at an example using [`tensorflow`](https://github.com/tensorflow/tensorflow) as our backend.

Installation of component libraries:

```
pip3 install tensorflow
sudo pip3 install keras
```

If you are working on a Silicon chip Mac (Macs with M1 and M2 chips), you'll need at least OS 12.0+ (Monterey (12) or Ventura (13)), then you'll want to follow the [instructions on the Apple developers website](https://developer.apple.com/metal/tensorflow-plugin/). We will be using tensorflow/keras going forward, so this is worth doing on your own outside of class!

In the meantime, you can also upload this notebook to [Google colaboratory](https://colab.research.google.com/) and run this portion on the cloud.

In [None]:
# Uncomment these lines of code to do the import. Left
# commented because on Macs with unsupported architecture, these
# imports will kill your kernel which is highly annoying.


# Sequential will be the base model we'll use
from keras.models import Sequential
# Dense layers are our base feed forward layers
from keras.layers import Dense

In [None]:
# set up the basis for a feed forward network
model = Sequential()

# same X and y as above
X = np.array([  [0,0,1],
                [0,1,1],
                [1,0,1],
                [1,1,1] ])
y = np.array([[0,1,1,0]]).T

# hidden layer
# you can play around with different activation functions
model.add(Dense(units=4, activation='relu', input_dim=X.shape[1]))

# output layer
# activation function is our classification function
model.add(Dense(units=1, activation='sigmoid'))

# configure the learning process
model.compile(loss='binary_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])


# 1 epoch = once through the data
model.fit(X, y, epochs=1, verbose=1)

In [None]:
x_test = np.array([  [0,0,1],
                [0,1,1],
                [1,0,1],
                [1,1,1] ])
y_test = np.array([[0,1,1,0]]).T
labels = model.predict(x_test)
print("Assigned probabilities:", labels)
print("Assigned labels:", [1 if y_hat_val > .5 else 0 for y_hat_val in labels])

6. How many epochs did you need for 100% accuracy? __YOUR ANSWER HERE__

Interested in getting deeper into neural nets? 


Here are two places to start from:
- take a look at the data that you can load from [`nltk`](https://www.nltk.org/data.html) and [`scikit-learn`](https://scikit-learn.org/stable/datasets/index.html#dataset-loading-utilities), then work on creating a neural net to do either binary or multinomial classification
- take a look at the tensorflow + keras word embeddings tutorial [here](https://www.tensorflow.org/tutorials/text/word_embeddings). Note! This tutorial mentions RNNs, which are special kind of neural net (they are not the feedforward architecture that we've seen so far). We'll get into RNNs after next week.