# The Nuts & Bolts of Machine Learning

## A Definition

>"A computer program is said to learn from **experience** *E* with respect to some class of **tasks** *T* and **performance measure** *P*, if its performance is at tasks in *T*, as measured by *P*, improves with experience *E*." 
(Tom Mitchell, *Machine Learning* (1997))

### Unpack the Definition
So computer programs are the things that do the learning. They are programmed to do certain tasks. Let's focus on a single task -- say, the ability to pick out a face in a photograph. As we increase the computer program's exposure to experience, its performance can either stay the same, get worse, or improve. If performance improves, we say that the computer program *learns from experience*. If not, then there's no machine learning to speak of.

The components to keep a watch out for are:
- a computer program
- a task that the computer program performs
- experience used by the computer program to perform the task
- measures of how well the computer program performs on the task

Let's start with *experience*.

## What is Experience?

Life provides us with a variety of experiences. We can see, feel, touch, smell, and taste (and perhaps have other abilities to sense and navigate our environment).

For a computer program that learns -- i.e., for machine learning -- there are three things to keep in mind about exeperience.

- Experience $\neq$ Rules
- Experience = Data
- Data = A Table of Numbers

It's as easy as that! Let's see what this looks like for various kinds of experience.

## Experiece in the Form of Images


![Valentino Rossi](../Images/Nuts-and-Bolts/Rossi.png)

To a computer program that learns from experience, an image is a grid of pixels. In this case, we've arbitrarily created a grid of 5x10 pixels. For color images, each pixel is represented by 3 numbers: the values of the Red, Green, and Blue components of the color (RGB color values). Each of these values is an integer between 0 and 255.

So this image is read by the computer program as a row of 50 items, each item containing 3 integers.

![Valentino Rossi Pixelated](../Images/Nuts-and-Bolts/Image-Pixel-Row.png)


## Experience in the Form of Images

This is what we see. But a computer sees...is just a table of numbers. So being able to detect a motorcylce in a picture (or a face in a picture) is not as easy as you'd think.

![What Computers See](../Images/Nuts-and-Bolts/motorcylce.png)

Of course, humans find doing this kind of thing (identifying pictures of motorcycles or helmets on motorcycle rides) trivially easy. Computer programs do get better as they're exposed to more images, so this fits the classic definition of machine learning and computer vision continues to be hot area of research in the field.

## Experience in the Form of Text

![Some Text](../Images/Nuts-and-Bolts/Text.png)

To a computer program that learns from experience, a text document is a row of numbers. Each word in the document is represented by a row of string of numbers. We'll see later what these numbers are.

![Text as Row of Numbers](../Images/Nuts-and-Bolts/Text-Data.png)

## Experience in the Form of Sounds

To a computer program, an audio stream is, not surprisingly, also a row of numbers. For example, these numbers can be [time, amplitude] pairs such as [1, 23.2].

![Audio Streams](../Images/Nuts-and-Bolts/Audio-Streams.png)
![Audio as Numbers](../Images/Nuts-and-Bolts/Audio-Data-Row.png)

## Experience in the Form of Spreadsheets

We've seen that data that comes to us in the form of images, text, and audio are converted into tables of numbers so that computer programs can make sense of them.

Data that comes to us in spreadsheets are already in the form of an m x n table where m is the number of rows and n is the number of columns of the table.

![Text as Row of Numbers](../Images/Nuts-and-Bolts/Data-Table.png)

To make this table suitable for use in a computer program, we represent categories like "Male" and "Female" with integrers (say, Male = 0, Female = 1). Similarly, Smoker can be 1 and Non-Smoker can be represented by 0.

## Experience is a Table of Numbers

Let's sum up what experience is to a computer program.

- In machine learning experience $\neq$ rules or recipes.
- Experience is a table of numbers.

The table of numbers has a structure as we see below:

![Text as Row of Numbers](../Images/Nuts-and-Bolts/Features.png)

- The inputs are called *features*. These are marked f1 through f6 in the diagram above.
- One of the columns is labeled the *output*.
- The inputs and the output are *always* numbers. These numbers can be:
    - postitive or negative integers such as 0, 1, -2, and so on
    - positive or negative reals such as 0.5, 26.4, -3.6, and so on
    
And that's what experience is for a computer program.

## What are Tasks?

In machine learning, tasks fall into 3 categories:
- Predict a number (e.g., something that's on a continuous scale, like temparature)
- Predict a category or class (e.g., smoker or non-smoker?)
- Don't predict anything; instead find patterns in the data set. These patterns fall into 3 groups:
    - Clustering (which rows of the dataset can be grouped together? For example, do purchasers of )
    - Association (are there rules that connect rows of a dataset to other rows of the dataset?)
    - Reduction/Compression (can the dataset be represented by one that has fewer features?)
    
In machine learning, prediction is possible only when the dataset has a clearly demarcated output column. In other words, we need to have a dataset that already says if the values of features f1, f2, f3 are such and such then the value of the output is such and such. If we don't have this kind of dataset, we *cannot* predict anything.

For this reason, prediction is called a *supervised* learning problem. The learning is supervised or controlled by the actual outputs in the dataset.

When the dataset does not have a designated column of outputs, then there's no sense in predicting an output. Rather, in these cases, machine learning algoritms are used to better understand the structure of the dataset -- whether and how the elements of the dataset are grouped together, how these elements are associated with each other. This is called *unsupervised* learning.

There are other types of learning that are variants or each of these main types.

## Task = Predict the Price of a House

Let's take a specific task. Suppose we have some data on house prices in Portland, Oregon. In particular, we have a dataset that lists a number of houses (let's say we have 300 of them). For each row of the dataset represents the features and the output of a single house. The two features are the number of bedrooms a house has and the size of the house in square feet. The output is the price of the house.

![House Price Dataset Excerpt](../Images/Nuts-and-Bolts/House-Features.png)

**Our task, to put it precisely, is to predict the price of house that is *not* in this dataset. In other words, given number of bedrooms and the size of the house in square feet, we'll (or the computer program will) have to predict the price of the house.**

Note: We think of prediction as having to do with events in the future (remember the Yogi Berra's line that prediction is hard, especially of the future?). But prediction in the context of machine learning has to do with coming up with an output that is not in the dataset given a set of input/feature values that are not in the dataset. If the feature values are in the dataset then we can just look up the output in the dataset and we wouldn't need to predict anything. 

(There is a subtle point here about machine learning that we'll come to see in a later session when we talk about measuring how good a machine learning model is.)

So prediction in the machine learning context is about the as yet unseen rather than the temporal notion of something that's going to happen in the future.

## Notation

We won't be using a lot of notation in this course, but this is a situation where just a bit of notation goes a long way.

Think of the dataset as being built up of the following values:

![Text as Row of Numbers](../Images/Nuts-and-Bolts/House-Price-Dataset.png)

![Text as Row of Numbers](../Images/Nuts-and-Bolts/Notation.png)

## Constructing a Model to Do the Prediction Task

To predict the price of a house that's not in the dataset we're going to pretend that the price can be constructed by adding and multiplying some numbers.

$$(w_{1} * x_{1}) + (w_{2} * x_{2}) = \hat{y}$$

- $x_{1}$ = number of bedrooms (the first feature)
- $x_{2}$ = size in square feet (the second feature)
- $w_{1}, w_{2}$ = parameters
- $\hat{y}$ = predicted house price

For mathematical reasons, this is written with an additional parameter $w_{0}$ like this:

$$(w_{0} * x_{0}) + (w_{1} * x_{1}) + (w_{2} * x_{2}) = \hat{y}$$

$w_{0}$ is called the *intercept value* and we'll determine this value as part of the machine learning process. $x_{0}$ on the other hand is a constant that is always equal to the value of 1.

## What the Model Has Done to the Dataset

Let's look at what we've done in constructing the model.

![Model Adds Columns](../Images/Nuts-and-Bolts/Dataset-Expand-Columns.png)

