# Machine Learning (continued)

# The key terms for today

* supervised
* unsupervised
* classification
* clustering
* reinforcement learning

# Let's talk about machine learning

The phrase "machine learning" refers to any method for approximating a solution to a problem for which we don't have an analytical solution (an algorithmic solution) through examining data. The basic taxonomy of machine learning approaches is depicted below:



![ML algorithms](https://blogs.sas.com/content/subconsciousmusings/files/2017/04/machine-learning-cheet-sheet-2.png)

*Image from https://blogs.sas.com/content/subconsciousmusings/*



However, this diagram does not include a third major class of ML algorithm, reinforcement learning, which has been used (among many other applications!) to develop ChatGPT.

When we talk about machine learning, we talk about:
* *fitting* (or *training*) a *prediction function*, or *model*, to
* *training* data, experimenting with various
* *hyperparameters* related to the *model architecture* using held-out
* *development* data, so that the resulting model generalizes well, making good *predictions* on held-out
* *test* (or *evaluation*) data

The goal of unsupervised learning is to uncover latent structure or patterns in the data. An example of an application of unsupervised learning is topic modeling. 

The goal of supervised learning is to learn to match the labels (or answers, or ground truth, or dependent variable) in the data. An example of an application of supervised learning is part of speech tagging. 

If you want to investigate ML further, here is a great python library for ML:
* scikit-learn (sklearn): https://scikit-learn.org/stable/index.html

Sklearn uses a pattern; for each ML algorithm, there is a *fit* function (for training), a *predict* function (for inference or testing), and a *score* function (for evaluation).



We evaluate models / prediction functions using [any number of metrics](https://scikit-learn.org/stable/modules/model_evaluation.html). A commonly used one for supervised machine learning is:
* accuracy - what percent of the data points were classified correctly?

Of course, accuracy is just one number. To get a clearer understanding, we can construct a
* confusion matrix

which has the classes (the labels) along rows and columns, and in each cell indicates the number of data points classified as *row* that are truly in class *column*. 

We will look more at confusion matrices later this week.

# Question!

If the transformer is the model architecture, and sentiment analysis is the task:
* What is some training data we could use?
* What are some hyperparameters we could set?
* What is a metric we could use?


# Training Transformers

With transformers, we talk about three types or stages of "training" (only two of them involve training):




1. Pretraining using self-supervision - the training data is the web (for text generation), or lots of random images (for image generation), and the label is whatever we remove from each training data instance: a word in the middle, a word at the end, some pixels.... We train to get a good representation of text (or images) *in general*. For pretraining, we need millions to billions of data points, but fortunately we don't have to manually label them all!



2. Finetuning using labeled data - the training data is some data for a task, which a human has labeled. For example, it could be text with sentiment labels; or images with object classes. We take a pretrained transformer encoder model (which already knows about text/images/etc *in general*) and we train just a last layer on top *for this task*. (This can be done with seq2seq / encoder-decoder models too, but usually it's done with encoder models.) For fine tuning, we need a few hundred to a few thousand data points per class (per label).



3. Few shot - we take a small amount of data and prompt the pretrained model with it, ending with an example we want the model to actually classify. For example, we might say: "Complete the list. The movie was great: positive. The film was terrible: negative. The movie was directed by Sofia Coppola: neutral. The movie was bad: " and then let the pretrained model complete the input. The model is just a great big pattern matcher, so it may complete with something correct (like 'negative') or something incorrect (like 'positive') or something spurious (like 'on Tuesday'). Bigger models will work pretty well for lots of tasks, though, if properly prompted with a few examples.