# Advanced Machine Learning with TensorFlow

In this lecture, we'll begin a relatively long arc in which we will learn to use the TensorFlow package for advanced machine learning, with an emphasis on neural networks.  

## A Few Notes on Google and Ethics

[TensorFlow](https://www.tensorflow.org/) is a Google product. By teaching you TensorFlow, I take a small step toward extending the influence of Google in the field of machine learning and in the STEM community more broadly. By using TensorFlow in and out of this class, you will do the same. It's important that we both go into this with open eyes about some of the ethical questions surrounding Google's recent work in machine learning. I'd like to stress that I am not an expert on ethics in artificial intelligence. There are likely MANY other important ethical concerns about Google's work  (and the work of other giants in tech and artificial intelligence, like Facebook and Amazon) which are not on my radar. 

### 1. Language Models and Dr. Timnit Gebru

In December 2020, Google [fired](https://www.washingtonpost.com/technology/2020/12/03/timnit-gebru-google-fired/) prominent AI ethicist Dr. Timnit Gebru, an Ethiopian-American woman, over a sequence of events surrounding [one of her papers](https://faculty.washington.edu/ebender/papers/Stochastic_Parrots.pdf). In this paper, Gebru and coauthors raise ethical concerns about language models---like the ones that Google uses for predictive text applications. These concerns include:

- The role of these models in homogenizing online culture. 
- The environmental cost of training these (in some cases, comparable to the carbon footprint of a transatlantic flight). 
- The frequent instances of language models learning to reproduce biased or hateful text. 

Dr. Gebru and her collaborators ultimately recommend a more thoughtful, "small", and ethically-informed approach to constructing language models. Naturally, this recommendation would require extensive reorganization of one of Google's major research areas. Possibly because they felt threatened by this and related recommendations, Google managers invented an additional, internal review layer for this paper, even after it had already been accepted at a prominent computer science conference. Dr. Gebru's protest of this made-up red tape was one of the events that led to her firing. 

### 2. Google Translate Is Sexist

It is well-documented that machine learning algorithms trained on natural text can inherit biases present in those texts. One of the most direct ways in which we can observe such bias is in Google Translate. Some languages, such as Hungarian, do not possess gendered pronouns. When Google Translate attempts to render these pronouns into a gendered language like English, assumptions are made, as pointed out in [this Tweet by Dora Vargha](https://twitter.com/DoraVargha/status/1373211762108076034?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1373211762108076034%7Ctwgr%5E%7Ctwcon%5Es1_&ref_url=https%3A%2F%2Fd-7356743851859968838.ampproject.net%2F2103240330002%2Fframe.html).  Let's demonstrate with the following English sentences. 

> **he** cooks.
> **she** is a political leader.
> **she** is an engineer.
> **he** is a cleaner.
> **he** is beautiful. 
> **she** is strong. 

Translate these into Hungarian and back via Google Translate, and here's what you'll get: 

> **she** cooks.
> **he** is a political leader.
> **he** is an engineer.
> **she** is cleaning.
> **she** is beautiful.
> **he** is strong.

Considering that English *has* a gender neutral pronoun (*they*), this would be an easy item to fix, which Google has declined to do. 

### 3. Historical Racial and Gender Biases

Google Search has a striking history of bias against Black people, especially Black women. This bias was made widely public by UCLA professor Safiya Noble in her book *Algorithms of Oppression*. In one of Dr. Nobel's most famous examples, top results for the phrase "black girls" in 2011 consisted of links to porn sites, which did not hold true of searches for "white girls" or "black men." As late as 2016, an image search for "gorillas" would surface pictures of Black individuals. You can find a brief synopsis of some of Dr. Noble's findings [here](https://time.com/5209144/google-search-engine-algorithm-bias-racism/) (content warning: highly sexually explicit language).  Google has since taken steps to improve these specific examples. 



Keeping these items in mind, let's now begin our study of machine learning via Tensorflow. 

## Tensors

So, uh, what's a tensor? As you may remember from Riemannian geometry, a prerequisite for this class, 

> An $s$-contravariant and $t$-covariant *tensor* $\sigma$ on a vector space $V$ is a multilinear map $\sigma: \left(V^*\right)^s \times V^t \rightarrow \mathbb{R}$, where $V^*$ denotes the space of linear functions on $V$...  

<br> <br>

Just kidding! A tensor is pretty much just a Numpy array. 

<figure class="image" style="width:40%">
  <img src="https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/_images/tensor.jpeg" alt="">
  <figcaption><i></i></figcaption>
</figure>

Here's another one in case that one didn't sink in: 

<figure class="image" style="width:40%">
  <img src="https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/_images/tensor-2.jpeg" alt="">
  <figcaption><i></i></figcaption>
</figure>

Let's take a look. 

In [1]:
import tensorflow as tf
import numpy as np

We can create a simple, "constant" tensor using `tf.constant`: 

In [2]:
s = tf.constant([1, 2, 3])
s

<tf.Tensor: id=0, shape=(3,), dtype=int32, numpy=array([1, 2, 3], dtype=int32)>

As you can see from the output, this object has a `shape`, a `dtype`, and even an internal `numpy` representation -- it really is a lot like a `numpy` array! Like `numpy` arrays, we can do various mathematical operations: 

In [3]:
2*s

<tf.Tensor: id=2, shape=(3,), dtype=int32, numpy=array([2, 4, 6], dtype=int32)>

In [4]:
t = tf.constant([3, 2, 1])
s*t

<tf.Tensor: id=4, shape=(3,), dtype=int32, numpy=array([3, 4, 3], dtype=int32)>

There *is* a reason that we use `tf.Tensor` rather than Numpy array objects. One of the primary reasons is that the tensor data type is set up to support *automatic differentiation*, which is important when it comes time to train our models. That said, it's sometimes useful to convert back to literal Numpy arrays, which can usually be done like this: 

In [5]:
s.numpy()

array([1, 2, 3], dtype=int32)

Really, they could have called the whole thing ArrayFlow or RectanglesOfNumbersFlow, but those don't really say "I am a smart, smart person" in quite the same way. 

### Dtypes

Because computation on 64-bit floating point numbers can be quite expensive at scale, most operations in TensorFlow prefer that you supply floating point numbers with data type `float32`. For example: 

In [6]:
tf.constant([1.2, 3.3, 5.6], dtype = np.float32)

<tf.Tensor: id=5, shape=(3,), dtype=float32, numpy=array([1.2, 3.3, 5.6], dtype=float32)>

Supplying `float64` data types will usually lead to annoying warning messages, and possibly slower performance. 

While it's good to know what tensors are and how they work, we usually won't need to construct them explicitly. Rather, we'll be able to feed Numpy arrays to our models, which will handle all the tensor operations internally. 

## Layers

*Layers* are the building blocks of models. You can think of a layer as a function that takes in one tensor and spits out another tensor, possibly of a different shape. Many layers have requirements on the kinds of tensors they admit; for example, they might only work on 2d tensors.

In this course, we'll always work with layers via the high-level Keras API, which allows us to easily create and combine layers. 

In [7]:
from tensorflow.keras import layers

In [8]:
data = np.random.rand(10, 3)
print(data)

[[0.39823625 0.7849845  0.51018681]
 [0.51530702 0.19017106 0.55016498]
 [0.60246758 0.90148036 0.93134562]
 [0.2406677  0.41770436 0.22679922]
 [0.80317438 0.2554333  0.10474707]
 [0.0927559  0.70667639 0.09502461]
 [0.20616771 0.78910279 0.32330852]
 [0.79250252 0.66254713 0.87584261]
 [0.15777652 0.86337802 0.05560337]
 [0.28962723 0.35896415 0.79341453]]


In [9]:
data = tf.constant(data, dtype=np.float32)

Now we can create a layer. Note that we first have to *make* the layer before we *call* the layer on any data. The `Dense` layer is the simplest and among the most generally useful kinds of layers. 

In [10]:
first_layer = layers.Dense(units = 2)
first_layer(data)

<tf.Tensor: id=33, shape=(10, 2), dtype=float32, numpy=
array([[0.7440451 , 1.0529554 ],
       [0.45738617, 0.4378069 ],
       [1.085637  , 1.3594254 ],
       [0.36436817, 0.5403206 ],
       [0.17681311, 0.30792803],
       [0.42996937, 0.79082406],
       [0.62507737, 0.97737753],
       [0.9165848 , 1.0792118 ],
       [0.48240364, 0.93907624],
       [0.7195109 , 0.7265251 ]], dtype=float32)>

The `units` argument controls the shape of the output. The `Dense` layer interprets an `m` $\times$ `n` tensor as a set of data points with `m` rows and `n` columns (or features). It outputs a new tensor with `m` rows and `units` columns. So, you can think of `units` as controlling the number of *hidden features* learned by this layer. 

What's the deal with the actual outputs? This layer has parameters (*weights*) which control how these numbers are calculated. Because we haven't done any model training, these weights are pretty much random, so the output is random as well. We'll look at training the model in the next section. 

## Models

A model consists of: 

- A sequence of layers. The final layer is an *output* layer, and plays a major role in determining what "kind" of model we have (e.g. regression vs. classification). 
- Specs for how the model should be fit to data. The most important choice to make here is the *loss function*, which governs how, mathematically, the model will be judged on its performance.

Let's download some real data, create a model, and evaluate its performance. 

In [11]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

# #BestData
url = "https://philchodrow.github.io/PIC16A/datasets/palmer_penguins.csv"
penguins = pd.read_csv(url)

# only use these columns
df = penguins[["Culmen Length (mm)", "Culmen Depth (mm)", "Flipper Length (mm)",  "Species"]]
df = df.dropna()

# categorically encode the "Species" column
le = LabelEncoder()
y = le.fit_transform(df["Species"])

# predictor data
X = df[["Culmen Length (mm)", "Culmen Depth (mm)", "Flipper Length (mm)"]]

# convert to numpy array
X = np.array(X, dtype = np.float32)

# train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)

In [12]:
X_train[:5] # first five rows of predictor data

array([[ 49.3,  19.9, 203. ],
       [ 42.5,  20.7, 197. ],
       [ 40.5,  17.9, 187. ],
       [ 39.7,  18.4, 190. ],
       [ 48.5,  17.5, 191. ]], dtype=float32)

In [13]:
y_train[:5] # first five rows of target data

array([1, 0, 0, 0, 1])

The simplest way to make a model is by using the `tf.keras.models.Sequential` API, which allows you to construct a model by simply passing a list of layers. Let's do two "hidden" layers of 500 units each, and then an output layer of 3 units. If you've used the [multilayer perceptron algorithm](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html) from `scikit-learn` before, the model below is essentially a multilayer perceptron with parameter `hidden_layer_sizes = (500, 500)`. 

In [14]:
model = tf.keras.models.Sequential([
    layers.Dense(500),
    layers.Dense(500),
    layers.Dense(3)
])

The reason we've chosen the final layer to output 3 features is that there are 3 species in the data. 

Let's make some "predictions":

In [15]:
model(X_train[:5])

<tf.Tensor: id=112, shape=(5, 3), dtype=float32, numpy=
array([[-5.7138624,  9.355782 , 50.443665 ],
       [-6.4953294,  8.979216 , 47.8926   ],
       [-5.9222   ,  8.305211 , 45.621056 ],
       [-6.254411 ,  8.390101 , 46.077606 ],
       [-4.9149933,  8.7550335, 47.93194  ]], dtype=float32)>

These are predictions for the first 5 rows of the data set. Each row is a specific penguin, and each column is one of the three species. The number in the column is related to the model's guess about the species. To convert these into probabilities, we use the `Softmax` layer: 

In [16]:
softmax = tf.keras.layers.Softmax()
softmax(model(X_train[:5]))

<tf.Tensor: id=126, shape=(5, 3), dtype=float32, numpy=
array([[4.08409191e-25, 1.43139386e-18, 1.00000000e+00],
       [2.39674225e-24, 1.25930895e-17, 1.00000000e+00],
       [4.12143851e-23, 6.22206051e-17, 1.00000000e+00],
       [1.87281227e-23, 4.29066955e-17, 1.00000000e+00],
       [1.11910096e-23, 9.67580956e-18, 1.00000000e+00]], dtype=float32)>

The model hasn't been trained yet, so these predictions don't really mean anything yet. 

### Model Training

Training has two stages. First we *compile* the model, by specifying the loss function and optimization algorithm. Then, we perform the actual training. The choice of loss function is highly dependent on your problem domain. For classification problems, categorical cross-entropy is a good one.  Finally, the `metrics` argument is helpful for controlling which model performance measures are shown when training or evaluating the model. 

In [17]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer = "adam",
              loss = loss_fn,
              metrics = ["accuracy"])

Finally ready for training! 

In [18]:
model.fit(X_train, y_train, epochs = 100, verbose=0)

<tensorflow.python.keras.callbacks.History at 0x7f94088313c8>

Now we can evaluate the model on our test data: 

In [19]:
model.evaluate(X_test, y_test, verbose = 2)

103/1 - 0s - loss: 0.0650 - accuracy: 0.9612


[0.13009598301452266, 0.9611651]

Our model is able to correctly predict the species of a penguin based on its culmen length, culmen depth, and flipper length over 95% of the time. Not bad! Further training (by choosing more `epochs` in `model.fit` could potentially improve this further---or lead to overfitting. 

### Prediction Probabilities

A minor annoyance is that, even after training, our model still doesn't have very interpretable outputs: 

In [20]:
model(X_train[:5])

<tf.Tensor: id=3559, shape=(5, 3), dtype=float32, numpy=
array([[  1.5114409,   8.988683 ,  -5.6625223],
       [ 20.955286 ,  -1.3249254, -15.60679  ],
       [ 11.239933 ,  -1.9928429,  -5.7652345],
       [ 15.902178 ,  -4.855223 ,  -7.7263427],
       [ -9.42366  ,  12.035456 ,   2.06576  ]], dtype=float32)>

Having trained our model, we can create a new, interpretable version by adding a Softmax layer. 

In [21]:
prob_model = tf.keras.models.Sequential([
    model, 
    layers.Softmax()
])

In [22]:
prob_model(X_train[:5])

<tf.Tensor: id=3573, shape=(5, 3), dtype=float32, numpy=
array([[5.6549557e-04, 9.9943405e-01, 4.3332787e-07],
       [1.0000000e+00, 2.1077869e-10, 1.3221840e-16],
       [9.9999821e-01, 1.7909265e-06, 4.1185942e-08],
       [1.0000000e+00, 9.6644137e-10, 5.4734824e-11],
       [4.7907556e-10, 9.9995315e-01, 4.6794590e-05]], dtype=float32)>

Each row now reflects the model's level of confidence that the given penguin is of the given species, which is much more interpretable. 

(*In case you're wondering, we don't include the Softmax layer in the model before we train it for numerical reasons*). 

## What's next? 

We've learned about tensors, layers, and models, and we've used a simple model to make predictions on a pretty small data set. In coming lectures, we'll ask questions like: 

- How can I represent text or images as tensors? 
- How can I perform classification, regression, or clustering tasks? 
- How can I interpret what my model is doing? 
- How can I speed up my model training? 