# Derivatives of straight lines

### Learning objectives

* Understand that derivatives are the instantaneous rate of change of a function
* Understand how to calculate a derivative of a straight line

### Introduction

In the lesson discussing step sizes of our gradient descent algorithm, we filled in some more information on how to find "best fit" regression line by using gradient descent.  Namely, we learned how to efficiently change the y-intercept of the regression line to minimize the residual sum of squares.  

We did this by calibrating the size and direction of our change of one regression line parameter -- let's say $b$, our y-intercept -- to the slope of the line tangent to the cost curve at that value of $b$. By tangent line, we mean a line that "just touches" our curve at a given point.  

Below is a curve that shows the RSS of a regression line with different values of $b$.  Our orange, green, and red lines are each tangent to the curve at their respective points. 

![grafik.png](attachment:grafik.png)

With our gradient descent algorithm, the larger the absolute value of the slope, the larger the change in our regression line parameter -- that is, the larger our step size.  We take a much larger step when our slope is -146.17 at $b = 70$ than we do when our slope equals -58.51 at $b = 85$.

Here is what **we know so far:** 
* How to apply gradient descent by using the slope of the cost curve to determine the direction and magnitude of the next step for updating the parameter of a regression line

Here is **what we need to learn:**
* How to find that slope or rate of change of a function at a given point.  

> The instantaneous rate of change at a given point is called the **derivative**.  

Derivatives are important because they tell us how a function is changing at any given point.  Derivatives allow us to see what is coming next.  

All a derivative means is the instantaneous rate of change of a function.  We already learned how to calculate the derivative of a straight line: it's the rise over the run. The rate of change of a line is constant for all points along the line, therefore the derivative is the same for all of its points.  We'll focus on calculating the derivatives of straight line functions, or linear functions, before moving onto calculating the derivative of curved lines (like our cost curve) in a future lesson.   

### Understanding the rate of change

Let's say that we want a function that  represents a person taking a jog.  We'll represent this by drawing a straight line.

![grafik.png](attachment:grafik.png)

The graph above helps us see how distance changes in relation to time, or in other words speed.  So here, when we ask about rate of change, we're asking how fast is our jogger traveling? 

### Calculating the rate of change

To calculate the miles per hour we can see where a person is at a given time, then wait an hour and to see how far he traveled.  Or we can wait two hours and divide distance traveled by two.  Generally, our technique is to divide number of miles traveled by the number of hours passed.  In this specific example we'll imagine doing the following to calculate the speed at hour 1.

> * Start a stop watch after one hour and note the distance at that hour
> * Then, let one hour elapse and mark down the distance at that next hour.  
> * Finally, divide the difference in the distances by the elapsed time.  

In the below graph, we begin to calculate the speed at hour number one.

![grafik.png](attachment:grafik.png)

We calculate our jogger's speed by seeing were he starts at hour one and ends at hour two.  Our jogger went from mile numbers three to six -- indicated by the orange line, so miles per hour is:

 $$ \frac{miles}{hour} = \frac {end distance - start distance}{end time - start time} = \frac {6 - 3}{2 - 1} = 3$$

Miles per hour is just one example of rate of change. Anytime we come across the word *per*, we know this is a form of rate of change.  All forms of **rate of change** are calculated the same way: the change in y divided by the change in x. 

* Another way of expressing **change in y** is:  
   * $y_2 - y_1$ or $\Delta y$, read delta y 
* Likewise, another way of expressing **change in x** is:  
   * $x_2 - x_1$ or $\Delta x$, read delta x

Generally, we can say that: 

* rate of change $= \frac{rise}{run} = \frac{\Delta y}{\Delta x} = \frac{y_2 - y_1}{x_2 - x_1}$

Just like in our example, we saw: 

* miles per hour =  $\frac{distance_2 - distance_1}{time_2 - time_1} = \frac{6 - 3}{2 - 1} = \frac{3}{1} = 3$ mph

Derivatives are a specific kind of rate of change -- the rate of change of a function **at a given point**.  For a linear function as we work with here, we calculate them through rise over run, or the change in y divided by the change in x, expressed $\frac{\Delta y}{\Delta x}$.  The rest of this lesson, will simply be introducing more math terms and symbols for expressing this same concept.  

> Stick with us, fully understanding these will pay off when we take the derivative of more complex functions.

### Derivatives with *even more symbols* 

Since our jogger is running at a constant rate, our calculated rate of change of 3 miles per hour is also the derivative.  Of course, we know that in math we express our functions as $f(x)$.  Let's do that here.

![grafik.png](attachment:grafik.png)

If we are given a function $f(x)$, we say the derivative of that function is $f'(x)$ -- read f primed of x. 

We already can express the derivative of a linear function $f(x)$ many different ways: 

* $ f'(x) = \frac{rise}{run} = \frac{\Delta y}{\Delta x} = \frac{y_2 - y_1}{x_2 - x_1} =  \frac{f(x_2) - f(x_1)}{x_2 - x_1}$

Take a look to at the equation far to the right:
    
$$f'(x) = \frac{f(x_2) - f(x_1)}{x_2 - x_1} $$ 

You see that we replaced $y_2 - y_1$ with $f(x_2) - f(x_1)$.  This makes sense, because really when we say $y_2$ and $y_1$, we mean the function's output at the first x value and the function's output at the second x value.  

We indicate that we are calculating the derivative of $f(x)$ at a specific point, say hour 1, by calling $f'(1)$.  That's the rate of change at hour 1.  Now we can plug in our values to calculate the derivative.  

* $x_1 = 1$ as hour 1 is our starting point
* $x_2 = 2$ as hour 2 is ending point starting point

giving us: 

$$f'(1) = \frac{f(2) - f(1)}{2 - 1} = \frac{6 - 3}{2 - 1} = 3 $$ 

So $f(x)$ equals the output at a given point.  And $f'(x)$ is the rate of change at a given point.  So then:
* $f(1)$ 
    * means the output at $x = 1$, or in our example, *the distance* at hour one, and 
* $f'(1)$ 
    * means the rate of change at $x = 1$, or in our example, *the speed* at hour one

Because the jogger's speed never changes throughout and since the derivative is the rate of change at a given point, we can conclude that the derivative also never changes.  Let's plot the distance from hours zero through five on the left and the speed from hours zero through five on the right to visualize this steady pace.

![grafik.png](attachment:grafik.png)

> * To the left is a graph of $f(x) = 3x$ for different values of x.  
> * And to the right is a plot of the rate of change of that function, $f'(x)$, for different values $x$.
> * So while *the distance* changes through time, *the speed*, or rate of change, stays the same.

### Expressing the derivative in terms of change

Now our above formulas for calculating the derivative do the job, but they don't exactly express our technique in the example of our jogger.  Remember that our technique for calculating the jogger's speed is the following: 

> * Start a stop watch after one hour and see the distance at that hour
> * Then, let time elapse one hour and see the distance at that hour.  
> * Finally, divide the difference in the distances by the elapsed time.  

This is what this looks like in terms of math: 

$f'(x) = \frac{f(x_1 + \Delta x) - f(x_1)}{\Delta x} $

Let's take a second to fully understand this new formula because it's not going away.  

* $f'(x)$ is the rate of change at a given value, or here the speed at a given time


* $f(x)$ is the distance at a given time, and $f(x_1)$ is the distance at the starting time, $x_1$


* The elapsed time is $\Delta x$, the change in x.


* $f(x_1 + \Delta x)$ is the distance at the starting time plus the elapsed time 

This is the definition that we will often see.  It expresses our technique for calculating the derivative.  
* Subtract the output at one input, x, from the output at that initial input plus a change in x.  
* Then divide that difference by the change in x.  

In summation, that is the derivative of a line, or the rate of change of a linear function.  The rate of change answers how much is our output changing at a given point.

### Summary 

In this lesson, we saw that the derivative is the change in output per a change in input.  In the case of our jogger, the input was time and the output was distance traveled.  We learned that the derivative is the change in the runner's distance traveled divided by the amount of time passed.

Graphically, we see that the derivative is simply the rise over run or change in x divided by change in y or:

$$ f'(x) = \frac{\Delta y}{\Delta x} = \frac{y_2 - y_1}{x_2 - x_1} $$

Then we saw that we can express the derivative in terms of $f(x)$ instead of $y_1$ and $y_2$ as in the output at second x minus the output at the first x divided by the difference between the two x values.  Or, in an equation:

$$ f'(x) = \frac{f(x_2) - f(x_1)}{x_2 - x_1} $$

And finally we saw how we can express the derivative in terms of $\Delta x$ as in subtract the output at an initial x value from the output at that initial x value plus some change in x, then divide by that change in x:

$$ f'(x) = \frac{f(x_1 + \Delta x) - f(x_1)}{\Delta x} $$


# Derivatives of Linear Functions Lab

### Introduction: Start here

In this lab, we will practice our knowledge of derivatives. Remember that our key formula for derivatives, is 
$f'(x) = \frac{\Delta y}{\Delta x} =  \frac{f(x + \Delta x) - f(x)}{\Delta x}$.  So in driving towards this formula, we will do the following: 

1. Learn how to represent linear and nonlinear functions in code.  
2. Then because our calculation of a derivative relies on seeing the output at an initial value and the output at that value plus delta x, we need an `output_at` function.  
3. Then we will be able to code the $\Delta f$ function that sees the change in output between the initial x and that initial x plus the $\Delta x$ 
4. Finally, we will calculate the derivative at a given x value, `derivative_at`. 

### Learning objectives 

For this first section, you should be able to answer all of the question with an understanding of our definition of a derivative:

1. Our intuitive explanation that a derivative is the instantaneous rate of change of a function
2. Our mathematical definition is that 

$f'(x) = \frac{\Delta y}{\Delta x} =  \frac{f(x + \Delta x) - f(x)}{\Delta x}$

### Let's begin: Starting with functions

#### 1. Representing Functions

We are about to learn to take the derivative of a function in code.  But before doing so, we need to learn how to express any kind of function in code.  This way when we finally write our functions for calculating the derivative, we can use them with both linear and nonlinear functions.

For example, we want to write the function $f(x) = 2x^2 + 4x - 10 $ in a way that allows us to easily determine the exponent of each term.

This is our technique: write the formula as a list of tuples.  

> A tuple is a list whose elements cannot be reassigned.  But everything else, for our purposes, is the same.  
```python
tuple = (7, 3)
tuple[0] # 7
tuple[1] # 3
```

> We get a TyperError if we try to reassign the tuple's elements.
```python
tuple[0] = 7
# TypeError: 'tuple' object does not support item assignment
```

Take the following function as an example: 

$$f(x) = 4x^2 + 4x - 10 $$

Here it is as a list of tuples:

In [1]:
four_x_squared_plus_four_x_minus_ten = [(4, 2), (4, 1), (-10, 0)]

So each tuple in the list represents a different term in the function.  The first element of the tuple is the term's constant and the second element of the tuple is the term's exponent.  Thus $4x^2$ translates to `(4, 2)` and  $-10$ translates to `(-10, 0)` because $-10$ is the same as $-10*x^0$.  
> We'll refer to this list of tuples as "list of terms", or `list_of_terms`.

Ok, so give this a shot. Write $ f(x) = 4x^3 + 11x^2 $ as a list of terms.  Assign it to the variable `four_x_cubed_plus_eleven_x_squared`.

In [2]:
four_x_cubed_plus_eleven_x_squared = [(4, 3), (11, 2)]

#### 2. Evaluating a function at a specific point 

Now that we can represent a function in code, let's write a Python function called `term_output` that can evaluate what a single term equals at a value of $x$.  

* For example, when $x = 2$, the term $3x^2 = 3*2^2 = 12 $.  
* So we represent $3x^2$ in code as `(3, 2)`, and: 
* `term_output((3, 2), 2)` should return 12


In [3]:
def term_output(term, input_value):
    return term[0]*(input_value**term[1])

In [4]:
term_output((3, 2), 2) # 12

12

> **Hint:** To raise a number to an exponent in python, like 3^2 use the double star, as in:
```python
3**2 # 9 
```

Now write a function called `output_at`, when passed a `list_of_terms` and a value of $x$, calculates the value of the function at that value.  
> * For example, we'll use `output_at` to calculate $f(x) = 3x^2 - 11$.  
> * Then `output_at([(3, 2), (-11, 0)], 2)` should return $f(2) = 3*2^2 - 11 = 1$

In [5]:
def output_at(list_of_terms, x_value):
    first_term = list_of_terms[0][0] * (x_value**list_of_terms[0][1])
    second_term = list_of_terms[1][0] * (x_value**list_of_terms[1][1])
    return first_term + second_term

In [6]:
three_x_squared_minus_eleven = [(3, 2), (-11, 0)]
output_at(three_x_squared_minus_eleven, 2) # 1 
output_at(three_x_squared_minus_eleven, 3) # 16

16

Now we can use our `output_at` function to display our function graphically.  We simply declare a list of `x_values` and then calculate `output_at` for each of the `x_values`.

In [7]:
def plot(traces, layout = {}):
    if not isinstance(traces, list): raise TypeError('first argument must be a list.  Instead is', traces)
    plotly.offline.iplot({'data': traces, 'layout': layout})

def trace_values(x_values, y_values, mode = 'markers+markers', name="data", text = []):
    return {'x': x_values, 'y': y_values, 'mode': mode, 'name': name, 'text': text}

def build_layout(x_range = None, y_range = None, options = {}):
    layout = {}
    if isinstance(x_range, list): layout.update({'xaxis': {'range': x_range}})
    if isinstance(y_range, list): layout.update({'yaxis': {'range': y_range}})
    layout.update(options)
    return layout

In [8]:
import plotly
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected=True)

# from graph import plot, trace_values ---> see above

x_values = list(range(-30, 30, 1))
y_values = list(map(lambda x: output_at(three_x_squared_minus_eleven, x), x_values))

three_x_squared_minus_eleven_trace  = trace_values(x_values, y_values, mode = 'lines+markers')
plot([three_x_squared_minus_eleven_trace], {'title': '3x^2 - 11'})

### Moving to derivatives of linear functions

Let's start with a function, $f(x) = 4x + 15$.  We represent the function as the following:

In [9]:
four_x_plus_fifteen = [(4, 1), (15, 0)]

We can plot the function by calculating outputs at a range of x values.  Note that we use our `output_at` function to calculate the output at each individual x value.

In [10]:
import plotly
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected=True)

# from graph import plot, trace_values, build_layout ---> see above

x_values = list(range(0, 6))
#layout = build_layout(y_axis = {'range': [0, 35]})

four_x_plus_fifteen_values = list(map(lambda x: output_at(four_x_plus_fifteen, x), x_values))
four_x_plus_fifteen_trace = trace_values(x_values, four_x_plus_fifteen_values, mode = 'lines+markers')
plot([four_x_plus_fifteen_trace])

Ok, time for what we are here for, derivatives.  Remember that the derivative is the instantaneous rate of change of a function, and is expressed as:

$$ f'(x) = \frac{\Delta f}{\Delta x}  = \frac{f(x + \Delta x) - f(x)}{\Delta x}  $$ 

#### Writing a function for $\Delta f$

We can see from the formula above that  $\Delta f = f(x + \Delta x ) - f(x) $.  Write a function called `delta_f` that, given a `list_of_terms`, an `x_value`, and a value $\Delta x $, returns the change in the output over that period.
> **Hint** Don't forget about the `output_at` function.  The `output_at` function takes a list of terms and an $x$ value and returns the corresponding output.  So really **`output_at` is equivalent to $f(x)$**, provided a function and a value of x.

In [11]:
four_x_plus_fifteen = [(4, 1), (15, 0)]

In [12]:
def delta_f(list_of_terms, x_value, delta_x):
    first_term = output_at(list_of_terms, x_value + delta_x)
    second_term =  output_at(list_of_terms, x_value)
    return first_term - second_term

In [13]:
delta_f(four_x_plus_fifteen, 2, 1) # 4

4

So for $f(x) = 4x + 15$, when x = 2, and $\Delta x = 1$, $\Delta f$ is 4.  

#### Plotting our function, delta f, and delta x  

Let's show $\Delta f$ and $\Delta x$ graphically.

In [14]:
def delta_f_trace(list_of_terms, x_value, delta_x):
    initial_f_value = output_at(list_of_terms, x_value)
    delta_f_value = delta_f(list_of_terms, x_value, delta_x)
    if initial_f_value and delta_f_value:
        trace =  trace_values(x_values=[x_value + delta_x, x_value + delta_x], 
                              y_values=[initial_f_value, initial_f_value + delta_f_value], mode = 'lines+markers',
                              name = 'delta f = ' + str(delta_x))
        return trace

In [15]:
trace_delta_f_four_x_plus_fifteen = delta_f_trace(four_x_plus_fifteen, 2, 1)

Let's add another function that shows the delta x.

In [16]:
def delta_x_trace(list_of_terms, x_value, delta_x):
    initial_f_value = output_at(list_of_terms, x_value)
    if initial_f_value:
        trace = trace_values(x_values=[x_value, x_value + delta_x],
                            y_values=[initial_f_value, initial_f_value], mode = 'lines+markers', 
                            name = 'delta x = ' + str(delta_x))
        return trace

In [17]:
# from graph import plot, trace_values

trace_delta_x_four_x_plus_fifteen = delta_x_trace(four_x_plus_fifteen, 2, 1)
if four_x_plus_fifteen_trace and trace_delta_f_four_x_plus_fifteen and trace_delta_x_four_x_plus_fifteen:
    plot([four_x_plus_fifteen_trace, trace_delta_f_four_x_plus_fifteen, trace_delta_x_four_x_plus_fifteen], {'title': '4x + 15'})

#### Calculating the derivative

Write a function, `derivative_at` that calculates $\frac{\Delta f}{\Delta x}$ when given a `list_of_terms`, an `x_value` for the value of $(x)$ the derivative is evaluated at, and `delta_x`, which represents $\Delta x$.  

Let's try this for $f(x) = 4x + 15 $.  Round the result to three decimal places.

In [18]:
def derivative_of(list_of_terms, x_value, delta_x):
    numerator = delta_f(list_of_terms, x_value, delta_x)
    denominator = delta_x
    return round(numerator/denominator, 3)

In [19]:
derivative_of(four_x_plus_fifteen, 3, 2) # 4.0

4.0

### We do: Building more plots

Ok, now that we have written a Python function that allows us to plot our list of terms, we can write a function that called `derivative_trace` that shows the rate of change, or slope, for the function between initial x and initial x plus delta x. We'll walk you through this one.  

In [20]:
def derivative_trace(list_of_terms, x_value, line_length = 4, delta_x = .01):
    derivative_at = derivative_of(list_of_terms, x_value, delta_x)
    y = output_at(list_of_terms, x_value)
    if derivative_at and y:
        x_minus = x_value - line_length/2
        x_plus = x_value + line_length/2
        y_minus = y - derivative_at * line_length/2
        y_plus = y + derivative_at * line_length/2
        return trace_values([x_minus, x_value, x_plus],[y_minus, y, y_plus], name = "f' (x) = " + str(derivative_at), mode = 'lines+markers')

> Our `derivative_trace` function takes as arguments `list_of_terms`, `x_value`, which is where our line should be tangent to our function, `line_length` as the length of our tangent line, and `delta_x` which is our $\Delta x$.

> The return value of `derivative_trace` is a dictionary that represents tangent line at that values of $x$.  It uses the `derivative_of` function you wrote above to calculate the slope of the tangent line.  Once the slope of the tangent is calculated, we stretch out this tangent line by the `line_length` provided.  The beginning x value is just the midpoint minus the `line_length/2` and the ending $x$ value is midpoint plus the `line_length/2`.  Then we calculate our $y$ endpoints by starting at the $y$ along the function, and having them ending at `line_length/2*slope` in either direction. 

In [21]:
tangent_line_four_x_plus_fifteen = derivative_trace(four_x_plus_fifteen, 2, line_length = 4, delta_x = .01)
tangent_line_four_x_plus_fifteen

{'x': [0.0, 2, 4.0],
 'y': [15.0, 23, 31.0],
 'mode': 'lines+markers',
 'name': "f' (x) = 4.0",
 'text': []}

Now we provide a function that simply returns all three of these traces.

In [22]:
def delta_traces(list_of_terms, x_value, line_length = 4, delta_x = .01):
    tangent = derivative_trace(list_of_terms, x_value, line_length, delta_x)
    delta_f_line = delta_f_trace(list_of_terms, x_value, delta_x)
    delta_x_line = delta_x_trace(list_of_terms, x_value, delta_x)
    return [tangent, delta_f_line, delta_x_line]

Below we can plot our trace of the function as well 

In [23]:
delta_x = 1

# derivative_traces(list_of_terms, x_value, line_length = 4, delta_x = .01)

three_x_plus_tangents = delta_traces(four_x_plus_fifteen, 2, line_length= 2*1, delta_x = delta_x)

# only plot the list of traces, if three_x_plus_tangents, does not look like [None, None, None]
if list(filter(None.__ne__, three_x_plus_tangents)):
    plot([four_x_plus_fifteen_trace, *three_x_plus_tangents])

So that function highlights the rate of change is moving at precisely the point x = 2.  Sometimes it is useful to see how the derivative is changing across all x values.  With linear functions we know that our function is always changing by the same rate, and therefore the rate of change is constant.  Let's write functions that allow us to see the function, and the derivative side by side.

In [25]:
from graph import make_subplots, trace_values, plot_figure

def function_values_trace(list_of_terms, x_values):
    function_values = list(map(lambda x: output_at(list_of_terms, x),x_values))
    return trace_values(x_values, function_values, mode = 'lines+markers')
    
def derivative_values_trace(list_of_terms, x_values, delta_x):
    derivative_values = list(map(lambda x: derivative_of(list_of_terms, x, delta_x), x_values))
    return trace_values(x_values, derivative_values, mode = 'lines+markers')

def function_and_derivative_trace(list_of_terms, x_values, delta_x):
    traced_function = function_values_trace(list_of_terms, x_values)
    traced_derivative = derivative_values_trace(list_of_terms, x_values, delta_x)
    return make_subplots([traced_function], [traced_derivative])

four_x_plus_fifteen_function_and_derivative = function_and_derivative_trace(four_x_plus_fifteen, list(range(0, 7)), 1)

plot_figure(four_x_plus_fifteen_function_and_derivative)

ModuleNotFoundError: No module named 'graph'

![newplot.png](attachment:newplot.png)

### Summary

In this section, we coded out our function for calculating and plotting the derivative.  We started with seeing how we can represent different types of functions.  Then we moved onto writing the `output_at` function which evaluates a provided function at a value of x.  We calculated `delta_f` by subtracting the output at initial x value from the output at that initial x plus delta x.  After calculating `delta_f`, we moved onto our `derivative_at` function, which simply divided `delta_f` from `delta_x`.  

In the final section, we introduced some new functions, `delta_f_trace` and `delta_x_trace` that plot our deltas on the graph.  Then we introduced the `derivative_trace` function that shows the rate of change, or slope, for the function between initial x and initial x plus delta x.