<a href="https://colab.research.google.com/github/BreakoutMentors/Data-Science-and-Machine-Learning/blob/main/machine_learning/lesson%201%20-%20linear%20regression/examples/From_Linear_Regression_to_Deep_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Learning Objectives


* Understand how regression lines can help us make predictions about data
* Understand how the components of slope and y-intercept determine the output of a regression line
* Understand how to represent a line as a function.
* Introduce Deep Learning



### Import Python Libraries

In [1]:
from pprint import pprint
import torch
import torch.optim as optim
import pandas as pd
import numpy as np
import plotly


def trace(data, mode = 'markers', name="data"):
    x_values = list(map(lambda point: point['x'],data))
    y_values = list(map(lambda point: point['y'],data))
    return {'x': x_values, 'y': y_values, 'mode': mode, 'name': name}


def trace_values(x_values, y_values, mode = 'markers', name="data", text_values = []):
    return {'x': x_values, 'y': y_values, 'mode': mode, 'name': name, 'text': text_values}


def m_b_data(m, b, x_values):
    """
    A function that returns a dict with two keys: 
        1) x, which points to a list of x_values, and
        2) y, which points to a list of y_values,
    each y value is the expected output of a regression line for the given
    m, b, and x values. 

    Example. Given m=1.5, b=20, x_values=[0, 50, 100] our function should
    return:
        {'x': [0, 50, 100], 'y': [20.0, 95.0, 170.0]}
    """
    data = {'x': x_values}
    y_values = []
    for xi in x_values:
        # compute yi, the expected y value given the value of m, b, and xi.
        # yi = your code here
        yi = xi*m + b
        # add yi to the y_values list
        y_values.append(yi)
    
    # add the 'y' key pointing to the y_values list to the data dictionary
    data['y'] = y_values

    # return the data dictionary with the x and y keys and values.
    return data


def m_b_trace(m, b, x_values, mode = 'lines', name = 'line function'):
    """
    A function that uses our m_b_data function to return a dictionary that 
    includes keys of 'name' and 'mode' in addition to 'x' and 'y'. The values 
    of 'mode' and 'name' are provided as arguments. When the mode argument is 
    not provided, it has a default value of 'lines' and when name is not 
    provided, it has a default value of 'line function'.

    Example. Given m=1.5, b=20, and x_values=[0,50,100] this function returns:
        {'mode': 'line', 'name': 'line function', 'x': [0, 50, 100], 'y': [20.0, 95.0, 170.0]}
    """
    data = m_b_data(m, b, x_values)
    return {'mode': mode, 'name': name, 'x': data['x'], 'y': data['y']}


def layout(x_range = None, y_range = None, options = {}):
    layout = {}
    if isinstance(x_range, list): layout.update({'xaxis': {'range': x_range}})
    if isinstance(y_range, list): layout.update({'yaxis': {'range': y_range}})
    layout.update(options)
    return layout


def plot(traces = [], layout = {}):
    if not isinstance(traces, list): raise TypeError('first argument must be a list.  Instead is', traces)
    plotly.offline.iplot({'data': traces, 'layout': layout})

# Single Variable Regression

## Making money: predicting movie revenue from movie budget 
Imagine we are hired as a consultant for a movie executive. The movie executive receives a budget proposal, and wants to know how much money the movie might make. We can help her by building a model of the relationship between the money spent on a movie and money made.



### Representing linear regression graphically
To predict movie revenue based on a budget, let's draw a single straight line that represents the relationship between how much a movie costs and how much it makes:

In [2]:
movie_budgets = [0, 25, 50, 75] # x
movie_revenues = [0, 50, 100, 150] # y
regression_trace = trace_values(movie_budgets, movie_revenues, mode='lines', name='estimated revenue')
movie_layout = layout(options={'title': 'Movie Budget and Revenue (in millions)'})
plot([regression_trace], movie_layout)

By using a line, we can see how much money is earned for any point on this line. All we need to do is look at a given $x$ value, and find the corresponding $y$ value at that point on the line.

*   Spend 20 million, and expect to bring in about 40 million.
*   Spend 30 million, and expect to bring in 60 million.

This approach of modeling a linear relationship (that is drawing a straight line) between an input and an output is called **linear regression**. We call the input ($x$) our *explanatory variable*, and the output ($y$) the *dependent variable*. So here, we are saying budget explains our dependent variable, revenue.




## Representing linear regression with functions
Instead of only representing this line visually, we also would like to represent this line with a **function**. That way, instead of having to *see* how an $x$ value points to a $y$ value along our line, we simply could *feed* this input into our function to calculate the proper output.



### First guess
Let's take an initial (wrong) guess at turning this line into a function.
First, we represent the line as a mathematical formula:

$y = x$

Then, we turn this formula into a function:

In [3]:
def f(x):
    """
    A function f(x) = x to compute movie revenue (y) based on movie budget (x).
    """
    return x

print(f"When movie budget is $0, the expected movie revenue is ${f(0)} (in millions).")
print(f"When movie budget is $25, the expected movie revenue is ${f(25)} (in millions).")

When movie budget is $0, the expected movie revenue is $0 (in millions).
When movie budget is $25, the expected movie revenue is $25 (in millions).


This is pretty nice! We just wrote a function $f(x)$ that automatically calculates the expected revenue ($y$) given a certain movie budget ($x$). This function says that for every value of $x$ that we input to the function, we get back an equal value $y$. So according to the function, if the movie has a budget of 25 million, it will earn 25 million.

### A better guess: Matching lines to functions
Take a look at the line that we drew. Does our line say something different? Yes, it does! It says that spending 25 million brings predicted revenue of 50 million. Therefore, we need to change our function so that it matches our line. In fact, we need a consistent way to turn lines into functions, and vice versa. Let's get to it!

**We start** by turning our line into a table below. It shows how our line relates x-values and y-values, or our budgets and revenues in this example.

| X (budget) | Y (revenue) |
|------------|:-----------:|
| 0          |      0      |
| 25 million |  50 million |
| 50 million | 100 million |
| 75 million | 150 million |


**Next**, we need an equation that allows us to match this data:

* input 0 and get back 0
* input 25 million and get back 50 million
* and input 50 million and get back 100 million.

What equation is that? Well it's $y = 2x$. Take a look to see for yourself.

* $0 * 2 = 0$
* $25 * 2 = 50$
* $50 * 2 = 100$

Now, try to see if we can code this function:

In [4]:
def f(x):
    """
    A function f(x) = 2x to compute movie revenue (y) based on movie budget (x).
    """
    # add your code here
    # return x
    pass

print(f"When movie budget is $0, the expected movie revenue is ${f(0)} (in millions).") # Output should be $0
print(f"When movie budget is $25, the expected movie revenue is ${f(25)} (in millions).") # Output should be $50

When movie budget is $0, the expected movie revenue is $None (in millions).
When movie budget is $25, the expected movie revenue is $None (in millions).


Progress! We multiplied each $x$ value by 2 so that the output of our function $f(x)$ corresponds to the $y$ value appearing along our graphed line.



### The Slope Variable
By multiplying $x$ by 2, we just altered the **slope** variable. The slope variable changes the inclination of the line in our graph. Slope generally is represented by $m$ like so:

$y = mx$

We say that a higher value of $m$ means out line is steeper. In our example, this means we expect more money in revenue per dollar spent on movie budget. 

Let's make sure we understand what all of our variables stand for. Here they are:

* $y$: the output value returned by the function, also called the **response variable**, as it responds to values of $x$
* $x$: the input variable, also called the **explanatory variable**, as it explains the value of $y$
* $m$: the **slope variable**, determines how vertical or horizontal the line will appear

Let's adapt these terms to our movie example. The $y$ value is the revenue earned from the movie, which we say is in response to our budget. The explanatory variable of our budget, $x$, represents our budget, and the $m$ corresponds to our value of 2, which describes how much money is earned for each dollar spent. Therefore, with an $m$ of 2, our line says to expect to earn 2 dollars for each dollar spent making the movie. Likewise, an $m$ of 3 suggests we earn 3 dollars for every dollar we spent.



### The y-intercept
There is one more thing that we need to learn in order to describe every straight line in a two-dimensional (2-D) world. That is the **y-intercept**.

The y-intercept is the $y$ value of the line where it intersects the y-axis.
Or, put another way, the y-intercept is the value of $y$ when $x$ equals zero.

Let's redraw our initial line for the movie table but with a higher y-intercept:

In [5]:
movie_budgets = [0, 25, 50, 75] # our x values
movie_revenues = [50, 100, 150, 200] # our y values
regression_trace_increased = trace_values(movie_budgets, movie_revenues, mode='lines', name='increased estimated revenue')
movie_layout = layout(options={'title': 'Movie Budget and Revenue (in millions)'})
plot([regression_trace_increased, regression_trace], movie_layout)

What is the y-intercept of the original estimated revenue line? Well, it's the value of $y$ when that line crosses the y-axis. That value is 0. Our second line is parallel to the first but is shifted higher so that the y-intercept increases up to 50 million. Here, for every value of $x$, the corresponding value of $y$ is higher by 50 million!


Now, our formula is not $y = 2x$, instead it is:



$$ y = 2x + 50 $$


It is common to represent the y-intercept of a line by $b$. Now we have all of the information needed to describe any straight line using the formula below:

$$y = mx + b $$

Once more, in this formula:

* $m$ is our *slope* of the line, and
* $b$ is our *y-intercept*, the value of $y$ when $x$ equals zero.

So thinking about it visually, increasing $m$ makes the line steeper, and increasing $b$ pushes the line higher.

In the context of our movies example, we said that the the line with values of $m$ = 2 and $b$ = 50 million describes our line, giving us:

$y = 2x + 50 $, (in millions).

Let's see if you can translate this into a function $f(x)$. For any input of $x$ our function returns the value of $y$ along that line:

In [6]:
def f(x):
    """
    A function f(x) = 2x + 50 million that returns expected movie revenue (y) 
    based on movie budget (x).
    """
    # add your code here
    

print(f"When movie budget is $0, the expected movie revenue is ${f(0)} (in millions).") # Output should be $50
print(f"When movie budget is $25, the expected movie revenue is ${f(25)} (in millions).") # Output should be $100

When movie budget is $0, the expected movie revenue is $None (in millions).
When movie budget is $25, the expected movie revenue is $None (in millions).


## How to model a regression line
Let's step back for a moment and think about what we've learned so far.

First, we saw how to estimate the relationship between an input variable $x$ and an output value $y$. We did so by first drawing a straight line on a graph to represent the relationship between a movie's budget and it's revenue, and manually determined the output for a given input by looking at the y-value of the line at that input point of $x$. 

We then learned how to represent a line as a mathematical formula, and ultimately a function $f(x)$. We first saw that lines can be described with the formula $y = mx + b $, where $m$ represents the slope of the line, and $b$ represents the value of $y$ when $x$ equals zero. The $b$ variable shifts the line up or down while the $m$ variable tilts the line forwards or backwards. We then translate this formula into a function $f(x)$ that returns an expected value of $y$ for an input value of $x$! 

So what's missing? Well, we know what the $m$ and $b$ values represent. However, until this point, we've manually computed them, and manually computing numbers in no fun! We prefer to calculate $m$ and $b$ algorithmically (automatically) instead. But how? One way is to use a machine learning algorithm called linear regression to find the "line of best fit". It is not the only way to find the best fit line, but its pretty good. 

### The model concept 
So, how can we mathematically model single linear regression? Well, we know the goal is to find the perfect line ("line of best fit"), so we start by defining a **function** (also called a **model**) that describes how predictions will be computed for a line:

$$ f(x,m,b)=mx+b $$

The line of best fit can then be used to predict (guess) y from x, so in our movies example revenue from budget. 

The first step we take is to determine what the "best line" is exactly. To do so, we define a **loss function** (also called a cost function), which measures how bad a particular choice of m and b are. Values of m and b that seem poor (a line that does not fit the data set) should result in a large value of the loss function, whereas good values of m and b (a line that fits the data set well) should result in small values of the loss function. In other words, the loss function should measure how far the predicted line is from each of the data points, and add this value up for all data points. We can write this as:
$$
    L(m, b) = \sum_{i=1}^m (f(x_i, m, b) - y_i)^2 = \sum_{i=1}^m (mx_i + b-y_i)^2
$$

where $m$ is the number of examples in our dataset, $x_i$ is the i'th input example, and $y_i$ is the i'th desired output. So, $(f(x_i, m, b) - y_i)^2$ measures how far the i'th prediction is from the i'th desired output. For example, if the prediction $f(x_i)$ is 7, and the correct output $y$ is 10, then we would get $(7 - 10)^2 = 9$. Squaring it is important so that it is always positive. Finally, we just add up all of these individual losses. Since the smallest possible values for the squared terms indicate that the line fits the data as closely as possible, the line of best fit (determined by the choice of $m$ and $b$) occurs exactly at the smallest value of $L(m, b)$. For this reason, the model is also called [least squares regression](https://en.wikipedia.org/wiki/Least_squares).

### Finding the optimal values
With our loss function defined as $L(m,b)$, which is smallest exactly when each predicted value $f(x, m, b)$ is as close as possible to the actual data $y$, our goal then is to make the distance between the data points and predicted line as small as possible to produce the "line of best fit". We achieve this by finding the "optimal" values of $m$ and $b$ that minimize the loss function $L(m,b)$. But what does $L$ actually look like? Well, it's basically a 3D parabola, which looks similar to a rolling hills landscape: 


![Loss landscape](https://pyimagesearch.com/wp-content/uploads/2019/10/train_val_loss_landscape.png)




The "we want to get here" dot marked on the plot of $L$ shows where the desired minimum is. We need an algorithm to find this minimum. The most common algorithm is called **gradient descent**, which uses techniques from calculus to to find the minimum. 

> For now, we don't worry about the mathematical explanation of how gradient descent works, but later in our machine learning journey we'll get to know it well.

The general idea of gradient descent is intuitive: image placing a ball at an arbitrary location on the surface of $L$ (i.e., at our "start here" dot), naturally, it will roll downhill towards the flat and hopefully low valley of $L$, and thus find the minimum. 

> Bonus knowledge: we know the direction of "downhill" at any location since we know the derivatives of $L$ (derivatives are an idea from calculus so don't worry to much about them for now). The derivatives tell us the direction of greatest upward slope (this is known as the gradient), so the opposite (negative) derivatives are the most downhill direction. Therefore, if the ball is currently at location (m,b), we can see where it would go by moving it slightly to location (m′,b′). In case you're curious, here is how we express this mathematically:
$$
 m' = m - \alpha \frac{\partial L}{\partial m} \\\\
 b' = b - \alpha \frac{\partial L}{\partial b} 
$$
where $\alpha$ is a constant called the **learning rate**, which we will talk about more later. If we repeat this process many times then the ball will continue to roll downhill and hopefully into the minimum.

When we run the gradient descent algorithm for long enough, then it will find the optimal location for (m,b). Once we have the optimal values of $m$ and $b$, then that's it. We can then use them to predict our movie revenue based on budget, using our **function** (also called **model**):

$$ f(x) = mx + b $$


## Coding linear regression
Let's quickly review what we did when defining the theory of linear regression:

1. Describe the dataset
2. Define the function (model)
3. Define the loss function
4. Run the gradient descent optimization algorithm
5. Use the optimal model to make predictions
6. Profit!

When coding this we will follow the exact same steps!

### Describing the dataset
Remeber that we were hired as a consultant for a movie executive to build a model of to predict the amount of revenue a movie would make given its budget. When we first took the job, our movie revenue and budget dataset was very small. But now it's much bigger. Let's take a look!


In [7]:
def generate_synthetic_movie_revenue_dataset(n=50, true_m=1.5, true_b=10, random_noise=1.0):
    """
    Generates a synthetic movie revenue vs budget dataset and returns
    a dictionary with three key-value pairs:
        * `x`: a list of x values
        * `y`: a list of y values
        * `true_m`: the true slope of our data
        * `true_b`: the true y-intercept of our data
    """
    # Setting seed to get the same random results
    np.random.seed(10)
    # generate the list of numbers between 0 and n for x values (budget)
    x = np.arange(n)

    # generate the list of y values (revenue) corresponding to each x value (budget)
    y = []
    for xi in x:
        noise = np.random.uniform(-random_noise, random_noise)
        # the "true" function of our data
        yi = true_m*xi + true_b + noise
        y.append(yi)
    y = np.array(y)

    return {'x': x, 'y': y, 'true_m': true_m, 'true_b': true_b}

pprint(generate_synthetic_movie_revenue_dataset(10, 1.5, 10, 0.5))

{'true_b': 10,
 'true_m': 1.5,
 'x': array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 'y': array([10.27132064, 11.02075195, 13.13364823, 14.74880388, 15.99850701,
       17.22479665, 18.69806286, 20.76053071, 21.66911084, 23.08833981])}


Now let's visualize it!

In [8]:
# generate a larger dataset with 1500 movie budget and revenue samples and plot it
movies_ds = generate_synthetic_movie_revenue_dataset(100, 1.5, 10, 10) 
regression_trace = trace_values(movies_ds['x'], movies_ds['y'], mode='markers', name="synthetic movie revenue vs budget data")
movies_layout = layout(options={'title': "A scatter plot of our synthetic movie revenue vs budget data (in millions)"})
plot([regression_trace], movies_layout)

### Define the function (model)
Now we can define our function to model the movies data and predict revenue from budget. For single variable linear regression we used the model $f(x)=mx+b$. Geometrically, this means that the model can only guess lines. Since the movies data is roughly in the shape of a line, our linear model should work work well for this problem. But, in the real-world there are very few problems that linear, so soon we'll look at more complex models. One other limitation of the current model is it only accepts one input variable. But if our data set had both budget and year, for example, perhaps we could more accurately predict revenue. Later, we will discuss a more complex model that can handle multiple input variables.

Here is the code to implement our linear model:

In [9]:
# First we define the trainable parameters m and b 
m = torch.randn(1, requires_grad=True) # requires_grad means it is trainable
b = torch.randn(1, requires_grad=True)

# Then we define the prediction model
def model(x):
    return m * x + b

### Define the loss function
We have the model defined, so now we need to define the loss function. Recall that the loss function is how the model is evaluated (smaller loss values are better), and it is also the function that we need to minimize in terms of $m$ and $b$. Previously we said the loss function was:
$$
    L(m,b) = \sum_{i=1}^m (f(x_i,m,b) - y_i)^2 = \sum_{i=1}^m (mx_i + b-y_i)^2
$$

However, we normally interpret $f(x)$ and $y$ as vectors (i.e., lists of numbers like \[0, 3, 6\] would be vector of 3 numbers. We can rewrite the loss function as:

$$
    L(m,b) = \text{sum}((f(x) - y)^2)
$$

Note that since $f(x)$ and $y$ are vectors, $(f(x) - y)$ is also a vector that just contains every number stored in $f(x)$ minus every corresponding number in $y$. Likewise, $((f(x) - y)^2)$ is also a vector, with every number individually squared.  Then, the $\mathrm{sum}$ function (which we just made up) adds up every number stored in the vector $((f(x) - y)^2)$. This is the same as the original loss function, but is a vector interpretation of it instead. We can code this directly as a Python function:

In [10]:
def loss(y_predicted, y_target):
    return ((y_predicted - y_target)**2).sum()

The `.sum()` function is an operation which adds up all the numbers stored in a vector. With just these two lines of code we have defined our loss function.

### Minimizing the loss function with gradient descent
With our model and loss function defined, we are now ready to use the gradient descent algorithm to minimize the loss function, and thus find the optimal values for $m$ and $b$. Fortunately, a library called `PyTorch` already has an implemented version of the gradient descent algorithm for us, and we just need to use it. The algorithm acts almost like a ball rolling downhill into the minimum of the function, but it does so in discrete time steps. PyTorch does not handle this aspect, we need to be responsible for performing each time step of gradient descent. So, roughly in pseudo-code we want to do this:
```python
for t in range(10000):
    # Tell PyTorch to do 1 time step of gradient descent
```

We can't do this yet, since we don't yet have a way to tell PyTorch to perform 1 time step of gradient descent. To do so, we create an optimizer with a learning rate $\alpha$ of $0.2$:

```python
optimizer = optim.Adam([m, b], lr=0.2)
```


The `optim.Adam` optimizer knows how to perform the gradient descent algorithm for us (actually a faster version of gradient descent). Note that this *does not yet minimize $L$)*. This code only create an optimizer object which we will use to minimize $L$. Note that we indicate which variables we want the optimizer to optimize (that is, modify). 

Using our new `optimizer` object we are ready to write the optimization loop pseudo-code that we originally wanted. Let's look at the code first, and then break it down:

```python
for t in range(10000):
    optimizer.zero_grad() # 1.
    y_predicted = model(x_dataset) # 2.
    current_loss = loss(y_predicted, y_dataset) # 3.
    current_loss.backward() # 4.
    optimizer.step() # 5.
    print(f"t = {t}, loss = {current_loss}, m = {m}, b = {b}") # 6.
```

Let's walk through each of the 6 steps to understand all that is happening here:

1. Under the hood, PyTorch keeps track of a gradient for each variable and step of the model and loss computation. The first thing we do is set all of these stored gradients to 0, so that we don't reuse any previous, old gradient computations.
2. Using the current values of `m` and `b` we compute the predictions of the model.
3. We then compute the value of our loss function for the predictions we just made.
4. At this point we have the current value of the loss $L$. What we want to do is compute $\frac{\partial L}{\partial a}$ and $\frac{\partial L}{\partial b}$. We ask PyTorch to do this for us using `.backward()`. The name comes from the fact that in order to find the derivatives, PyTorch works "backward", starting with the loss and working back to $m$ and $b$. However, the details of how PyTorch computes it are not that important right now. What matters is that `.backward()` does this desired computation, and stores the results somewhere (you can see the results by doing `a.grad` if you wish).
5. Crucially though, `.backward()` does NOT actually update the values of `m` and `b`. Instead, we ask the `optimizer` to update `m` and `b`, based on the currently computed gradients.
6. Finally we optionally print out some current info so we can observe the training.

What we want to see from the print statements is that the gradient descent algorithm **converged**, which means that the algorithm stopped making significant progress because it found the minimum location of the loss function. When the last few print outputs look like:

```
t = 9992, loss = 83.83984375, m = 1.57234649658203, b = 9.991298828125
t = 9993, loss = 83.8359375, m = 1.572356033325195, b = 9.9916040039062
t = 9994, loss = 83.84765625, m = 1.57235984802246, b = 9.9919091796875
t = 9995, loss = 83.828125, m = 1.572371292114258, b = 9.9922143554688
t = 9996, loss = 83.8359375, m = 1.572380828857422, b = 9.99251953125
t = 9997, loss = 83.8359375, m = 1.572382736206055, b = 9.9928247070312
t = 9998, loss = 83.84375, m = 1.572397994995117, b = 9.9931298828125
t = 9999, loss = 83.82421875, m = 1.572401809692383, b = 9.9934350585938
```

then we can tell that we have achieved convergence, and therefore found the best values of $m$ and $b$.


### Putting it all together
Now let's write the full optimization loop for our model and dataset:

In [11]:
# First we define the trainable parameters m and b 
m = torch.randn(1, requires_grad=True) # requires_grad means it is trainable
b = torch.randn(1, requires_grad=True)

# Then we define the prediction model
def model(x):
    return m * x + b


# Then we define the loss function
def loss(y_predicted, y_target):
    return ((y_predicted - y_target)**2).sum()

# Next setup the optimizer object, so it optimizes m and b.
optimizer = optim.Adam([m, b], lr=0.2)

# Finally convert the dataset to PyTorch Tensors
x = torch.tensor(movies_ds['x'], dtype=torch.float)
y = torch.tensor(movies_ds['y'], dtype=torch.float)


# Main optimization loop
losses = []
m_values = []
b_values = []
for e in range(1, 3001):
    # Set the gradients to 0.
    optimizer.zero_grad()
    # Compute the current predicted y's from x_dataset
    y_predicted = model(x) # this is our f(x) function
    # See how far off the prediction is
    current_loss = loss(y_predicted, y)
    # Compute the gradient of the loss with respect to a and b.
    current_loss.backward()
    # Update a and b accordingly.
    optimizer.step()
    
    losses.append(current_loss.item())
    m_values.append(m.item())
    b_values.append(b.item())
    if e % 100 == 0: 
        # print and plot the model's estimated line of best fit for the data
        print(f"e = {e}, loss = {current_loss.item()}, m = {m.item()}, b = {b.item()}")

est_best_fit_line = m_b_trace(m_values[-1], b_values[-1], movies_ds['x'].tolist())
est_best_fit_trace = trace_values(est_best_fit_line['x'], est_best_fit_line['y'], mode='lines', name="estimated line of best fit")
plot([regression_trace, est_best_fit_trace], movies_layout)

e = 100, loss = 3493.1298828125, m = 1.554046630859375, b = 6.008647441864014
e = 200, loss = 3029.975830078125, m = 1.5003267526626587, b = 9.48725700378418
e = 300, loss = 3012.7470703125, m = 1.4891738891601562, b = 10.224417686462402
e = 400, loss = 3012.601806640625, m = 1.4881019592285156, b = 10.295206069946289
e = 500, loss = 3012.601318359375, m = 1.48805832862854, b = 10.298081398010254
e = 600, loss = 3012.601318359375, m = 1.4880578517913818, b = 10.298110961914062
e = 700, loss = 3012.601318359375, m = 1.4880578517913818, b = 10.298110961914062
e = 800, loss = 3012.601318359375, m = 1.4880578517913818, b = 10.298110961914062
e = 900, loss = 3012.601318359375, m = 1.4880578517913818, b = 10.298110961914062
e = 1000, loss = 3012.601318359375, m = 1.4880578517913818, b = 10.298110961914062
e = 1100, loss = 3012.601318359375, m = 1.4880578517913818, b = 10.298110961914062
e = 1200, loss = 3012.601318359375, m = 1.4880578517913818, b = 10.298110961914062
e = 1300, loss = 3012.6

This looks pretty good! Recall that our true $m$ was 1.5 and our true $b$ was about 10, and we see that our model learned the values $m$ equal to about 1.488 and $b$ equal to 10.298. We can also inspect the plot and see that the line of best fit is pretty close to our data, although it could probably decrease the y-intercept $b$ value a little. 

### Time to make predictions
Once we've trained our model, we can use it to predict movie revenue based on movie budget. Let's try it out!

In [12]:
### Using the trained model to make predictions ###
print(f"Predicts revenue when budget is 1 (in millions): ${round(model(torch.FloatTensor([1])).item(), 2)}")
print(f"Predicts revenue when budget is 60 (in millions): ${round(model(torch.FloatTensor([60])).item(), 2)}")


Predicts revenue when budget is 1 (in millions): $11.79
Predicts revenue when budget is 60 (in millions): $99.58


## Challenge: Single Variable Regression
Time to test our knowledge! 

Imagine that you are the producer for a comedy show at your school. We need you to use your knowledge of linear regression to make predictions as to the success of the show.


### Working through a linear regression
The comedy show is trying to figure out how much money to spend on advertising in the student newspaper. The newspaper tells the show that:

* For every two dollars spent on advertising, three students attend the show.
* If no money is spent on advertising, 20 friends still attend the show.

As the producer of the show, your goal is to write a linear regression function (model) called `attendance` that models the relationship between advertising and attendance expressed by the newspaper.

Here's the function that generates our comdey dataset:


In [13]:
def generate_synthetic_comedy_dataset(n=50, true_m=1.5, true_b=10, random_noise=1.0):
    """
    Generates a synthetic movie revenue vs budget dataset and returns
    a dictionary with three key-value pairs:
        * `x`: a list of x values
        * `y`: a list of y values
        * `true_m`: the true slope of our data
        * `true_b`: the true y-intercept of our data
    """
    # generate the list of numbers between 0 and n for x values (budget)
    x = np.arange(n)

    # generate the list of y values (revenue) corresponding to each x value (budget)
    y = []
    for xi in x:
        noise = np.random.uniform(-random_noise, random_noise)
        # the "true" function of our data
        yi = true_m*xi + true_b + noise
        y.append(yi)
    y = np.array(y)

    return {'x': x, 'y': y, 'true_m': true_m, 'true_b': true_b}

comedy_ds = generate_synthetic_comedy_dataset(100, 1.5, 20, 10)

As a first step, let's plot the comedy dataset (`comedy_ds`) to see if the data looks approximately linear. If it does, linear regression is probably approriate. 

In [14]:
# generate a larger dataset with 1500 movie budget and revenue samples and plot it
regression_trace = trace_values(comedy_ds['x'], comedy_ds['y'], mode='markers', name="school comedy show attendance vs budget")
comedy_layout = layout(options={'title': "School comedy show attendance vs budget"})
plot([regression_trace], comedy_layout)

It looks like show attendance is a function of budget in our comedy show data! Now, try writing the code to model attendance based on budget!

> Hint it's similar to the linear regression code we wrote earlier. A good place to start might be defining your function (model) `attendance`.

In [15]:
def attendance(x):
    """
    A function (i.e., f(x)) that returns the expected show attendance based on
    money spent advertising in the school newspaper.
    """
    # your code here
    pass


# Expanding to Deep Learning
Now that we have built several linear regression models, it's probably a good time to show you a few examples of what more complex machine learning models can do. 

Check these awesome examples out:

* [Music Generation](https://openai.com/blog/musenet/)
* [Generating images from text](https://openai.com/blog/dall-e/)
* [Text Generation and Summarization](https://openai.com/blog/better-language-models/)
