# Inside the Black-box of Machine Learning: Linear Regression

### Introduction

A machine learning algorithm isn't called a black-box because we don't know what's inside; it's the opposite. We actually know exactly what's there - some very hairy and dirty math, so we cover it up nicely in a box.

PROMPT: maybe an image of a black box with a lid slightly open and some equations coming out of it? Could make a good cover image. 

And peeping inside the box of each and every algorithm isn't the right (fun) way to learn them. So, we'll first set our thinking straight (go around the box, if you will) and learn the most important ideas in machine learning through the most basic algorithm that's almost as old as dinosaurs: **linear regression**. 

Linear regression was first used in 1805 by Legendra, but that event was shadowed when the algorithm was used again by a more important figure, Gauss, just two years later:

![P1kH7JtwJgpsAAAAAElFTkSuQmCC.png](attachment:6c2e1509-35cb-41ea-82f4-e55f283c8d56.png)

Caption: this favors us nicely as I couldn't find Legendre's image from Wikipedia.

Even thought that was in the early 19th century, the algorithm is still heavily used in the ML world. So, let's dive deeper.

### The attitude of an ML algorithm

Linear regression tries to predict a number given some information, like guessing the calories in a smoothie by looking at its ingredients. 

What would you guess if you have a smoothie that has a cup of plain yogurt, one secret ingredient and some random food dye? 

IMAGE: The image of the smoothie (just the smoothie) from the PPTX.

If you are thinking "that's impossible to know", you've got the wrong attitude, at least for the purposes of this example.

The attitude of machine learning is not perfectionism, it is not about finding the exactly right answer. It is about getting started with something, even if it is a random number like 477 calories and iterating towards the correct answer in small improvements. 

Your first guess will almost always be terrible. If we reveal the secret ingredient, which is 58 grams sardines, the total calorie amount is 265. We guessed almost twice as much. 

I understand that was a hard example, so let's look at another smoothie:

Image: the yellow smoothie in a cup from the PPTX

But this time, we are informed that this smoothie was prepared 16 times to get the taste right and here are the calorie amounts for the past 15 mixes:

TEXT: calories amounts or the slide image from PPTX

Please, stop if you started reading the numbers. We've got computers for that and they calculate that the average calorie is 236.9, and the middle value is 240. 

Now make a guess using this information, noticing how your search range is now much narrower (this will be key later). I will guess that it is 238 and look at the answer:

Image: Quiche Smoothie image from the slides

It turns out the secret ingredient was 88 grams of [quiche lorraine](https://en.wikipedia.org/wiki/Quiche), shooting the calorie amount to 445.

So, even our albeit informed guess was off by 207 calories. Let's try to do better.

### Regression lines

Since the main factor in the smoothie calorie count is the secret ingredient (the amount of yogurt stays constant), we will plot a bunch of them and the resulting calorie counts.

IMAGE: Plain plot without the lines

CAPTION: The x-axis is the weight of the ingredient with a range of 0 to 200 grams and calories are on the y-axis from 0 to 500. 

Let's try to discern any trend or pattern from the plot that would allow us to predict the calories using just the weight. 

And since linear regression is all about putting lines through stuff (hint: linear), we will draw a line. 

IMAGE: The plot with the horizontal line

Does the line look familiar? That horizontal line is the guess we've made the last time - 238 calories. This means we've been guessing 238 regardless of the secret ingredient's weight. No wonder we were so off. 

So, our goal here is to find a line that better represents the relationship between weight and calories. To do that, we must be able to change the angling of the line (slope) and its height (intercept).

Our initial line had an intercept of 238 but 0 slope. So, we define the recipe or our linear regression model as *intercept + slope \* weight*. We play around the numbers for slope and intercept to determine where the line goes, and then we pop the weight in there and read the calories off of it. 

### Error

Looking at the plot again from a common sense perspective, our perfect line would be as close as possible to all the points. So, we better define a notion of error to tell us how far our guessed line is away from that line. 

And this error could just be the point minus the line, or in other words, the true calorie count minus our guess calorie number. 