# Neural Network Basics

Use **Code** cells to write and run any code you need to answer the question and **Markdown** cells to write out answers in words. After you are finished with the assignment, remember to download it as an **HTML file** and submit it in **ELMS**.

In [None]:
# !pip install tensorflow

In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.linear_model import LogisticRegression

from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense, Dropout
from tensorflow.keras.losses import SparseCategoricalCrossentropy

## Neural Networks 

Artificial Intelligence (AI) tools have seen a huge surge in popularity in recent years. The wide availability of tools like ChatGPT has made AI a very hot topic in past year or so. But what exactly is AI? How does it work? We'll take a look some of the mechanism behind how AI works with an introduction to neural networks.

In this notebook, we will go over how to train neural network models to do supervised machine learning using tensorflow. This is mostly to demonstrate how the code works and how one goes about building these types of models. A more detailed explanation of what exactly is going on behind the scenes and the math behind the implementation is reserved for another class. 

### Machine Learning 

Before we talk about neural networks, however, we should first go over what machine learning is. 

> **One definition of machine learning**: Machine is said to learn when a machine improves on a task with respect to a performance measure.

Generative AI tools like chatbots fit under the umbrella of **unsupervised learning**, in which there is nothing we to predict or classify. Instead, these types of machine learning involve getting some new information as outputs. Another example of unsupervised learning that we have already used is **topic modeling**. Since we are trying to generative new information (the topics and the assignment of documents to topics), this fits under unsupervised learning. 

In **supervised learning**, we are focused on finding the relationship between a **label y** and **features x**.

$$ y = f(x) $$

"Learning" is finding a function f that minimizes future error in recovering y. 

For supervised learning, we must have a y variable that we know. That is, we need to have a the y variable in our dataset, so that we can build our model and use that model to predict y for future data.

### Prediction Example

Let's take a look at a quick example of doing some prediction. The `ncbirths` dataset has information on births in North Carolina, including information about the mother, weeks of pregnancy, and whether the baby was a low birthweight baby or not.

In [None]:
ncbirths = pd.read_csv('ncbirths.csv')
ncbirths.head()

In [None]:
# Change the low birthweight indicator variable to a 0/1 variable
ncbirths['low'] = ncbirths.lowbirthweight == 'low'
ncbirths.low = ncbirths.low.astype(int)

We'll try a very simple example of predicting the low birthweight status of the baby using the number of weeks that the pregnancy lasted. If we were to take a look at the relationship with a graph, it might look like the following. Note that `1` refers to low birthweight while `0` refers to not low birthweight. 

In [None]:
ncbirths.plot.scatter(y = 'low', x = 'weeks')

So, how do we use `weeks` to predict the low birthweight status? Well, using a straight line to show the relationship wouldn't make sense. This is because low birthweight can only take one of two values: `0` or `1`. So, instead, we try to create a curved function that gets as close to the points as possible. That's essentially what Logistic Regression is doing -- optimizing by minimizing a **loss function**, or a measure of how far off the predictions are from the actual values. 

Let's take a look at what happens when we fit a logistic regression line.

In [None]:
logit = LogisticRegression()
logit.fit(ncbirths[['weeks']], ncbirths.low)

Once we fit the logistic regression model, we can see the predictions it would make by using a line. 

In [None]:
preds = logit.predict_proba(ncbirths[['weeks']])[:,1]

In [None]:
pred_by_week = pd.DataFrame({'weeks': ncbirths.weeks, 'preds':preds}).sort_values('weeks')

In [None]:
fig, axes = plt.subplots(figsize=(8,6))
pred_by_week.plot.line('weeks','preds', ax = axes)
ncbirths.plot.scatter(y = 'low', x = 'weeks', ax = axes)

The predictions here are just ok. It looks like it might do well for some, but there are lots of points that it does poorly on. That's because we're just using one feature (variable) to make a prediction. In reality, we might want to use many features. In addition, there might be relationships that aren't quite exactly this type of neat curved line. 

That's where something like neural networks might come in. The above logistic regression is an example of what one **node** in a neural network might look like. Neural networks essentially work by combining lots of these types of simple relationships to create a complex model that makes predictions. 

![Neural Network](neural_network.png)

*Source: https://towardsdatascience.com/simple-introduction-to-neural-networks-ac1d7c3d7a2c*

### MNIST Data

Let's look at an example of applying neural network modeling to some image data. The MNIST dataset that comes with the `tensorflow` package contains images from handwritten digits (numbers). Our goal is to train a neural network that is able to accurately determine what number is written based on the data from the image. In other words, we want to build a neural network that is able to recognize numbers that have been handwritten. 

The data itself is structured so that it is in a 2-dimensional format for each observation. Each observation is 28 by 28, with the values within each cell representing the intensity of the pixel. These values make up the **features**, or variables that we use to predict/classify the observation as one of the 10 numerical digits. 

The data has been split into a **train** set and a **test** set for us already. The **train** set is what we will use to build our neural network model. The test set is what we will use to evaluate how well our model would perform if we were to get new data.

In [None]:
# Load the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Scale it so that the values are between 0 and 1
x_train, x_test = x_train / 255.0, x_test / 255.0

To visually see what the data look like, let's graph some of the observations. 

In [None]:
plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(x_train[i], cmap=plt.cm.binary)
    plt.xlabel(y_train[i])

plt.show()

We specify the neural network model using `Sequential` and adding the layers in a list. 

Let's take a look at the layers one by one.
- `Flatten(input_shape=(28, 28))`: This flattens the 28 by 28 data into a 1-D format. There isn't anything being done to values at this step -- all that is happening is that the 2-D shape is being changed to a 1-D shape so that all of the same values are in a vector format.
- `Dense(128, activation='relu')` / `Dense(64, activation='relu')` : This is a dense layer, with the first argument specifying how many nodes there are. We have two dense layers in this neural network: one with 128 nodes and one with 64 nodes. You can imagine all of the features (variables) in our data feeding into every single one of the 128 nodes, and the outputs of those 128 nodes feeding into the 64 nodes in the next step. 
- `Dense(10)`: This is an **output layer**. Since we are trying to predict the image as being one of ten different categories (that is, the individual digits values from 0 to 9), we need a layer with 10 nodes.

In [None]:
model = Sequential([
  Flatten(input_shape=(28, 28)),
  Dense(128, activation='relu'),
  Dense(64, activation='relu'),
  Dense(10)
])

Then, we need to compile the model, specify the loss function, and give it the metric we will use to evaluate how it is doing. 

In [None]:
model.compile(optimizer='adam',
              loss=SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

Finally, we fit the model by giving it our data. Since we are using the training set to build our model, we give it the x and y data from the train. We also set the batch size and the number of epochs to 10. The **batch size** refers to how much of the data is used to fit the model at a time. An **epoch** refers to the number of times that the full data has been sent through the neural network. 

In [None]:
model.fit(x_train, y_train, batch_size = 32, epochs=10)

## Evaluation

Now, let's take a look at how this would do on new data. We can use the `evaluate` method to apply our trained model to the test set and see how accurate it actually is.

In [None]:
model.evaluate(x_test,  y_test, verbose=2)

In [None]:
predictions = model.predict(x_test)
plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(x_test[i], cmap=plt.cm.binary)
    if np.argmax(predictions[i]) == y_test[i]:
        color = 'green'
    else:
        color = 'red'
    plt.xlabel(f'Predicted: {np.argmax(predictions[i])}, Actual: {y_test[i]}', color = color)

plt.show()

### Next Steps

This is an example of using neural networks for image classification. There are lots of different applications of neural networks, though. For example, you can also use neural networks for **unsupervised learning** tasks, such as generative AI. Neural networks are also widely used in **text analysis** applications, such as sentiment analysis. 