# Gradient Descent in 3d

### Learning Objectives

* Understand how the process of gradient descent when altering both y-intercept and slope variables
* Understand what it means to take a partial derivative 
* Understand the rule for taking partial derivatives

### Introduction

In the last section, we talked about how we to think about moving along a 3-d cost curve.

![grafik.png](attachment:grafik.png)

We know that moving along the 3-d cost curve above, means changing the $m$ and $b$ variables of a regression line like the one below.  And we do so with the purpose of having our line better match our data.

![grafik.png](attachment:grafik.png)

### Review gradient descent in two dimensions

In this lesson, we'll learn about gradient descent in three dimensions, but let's first remember how it worked in two dimensions when we changed just one variable of our regression line.  

In two dimensions, when changing just one variable, $m$ or $b$, gradient descent means stepping forwards or backwards along the cost curve and and taking a specific step size.  To determine whether to move forwards or backwards as well as the step size, we imagine standing on this two-dimension curve (shown below) and feeling the slope of our cost curve to tell us how to move.  A step in a direction means a change in one of our regression variables.

![grafik.png](attachment:grafik.png)

So that was gradient descent in two dimensions.  What is gradient descent in three dimensions? 

### Gradient Descent in 3 dimensions

In three dimensions, we once again choose an initial regression line, which means that we are choosing a point on the graph below.  Then we begin taking steps towards the minimum.  But of course, we are now able to walk not just forwards and backwards but left and right as well -- as we now can alter two variables.  

![grafik.png](attachment:grafik.png)

To get a sense of how this works, imagine our initial regression line places us at the back-left corner of the graph above, with a slope of 50, and y-intercept of negative 20.  Now imagine that we cannot see the rest of the graph - yet we still want to approach the minimum.  How do we do this?

Once again, we feel out the slope of the graph with our feet.  Only this time, as we shift our feet, we are preparing to walk in two dimensional space.  

![](./traveller-stepping.jpg)

So this is our approach.  We shift horizontally a little bit to determine the change in output in right-left direction, and then shift forward and back to determine the change in output in that direction.  From there we take the next step in the direction of the steepest descent. 

So this is why our technique of gradient descent is so powerful.  Once we consider that in moving towards our best fit lines, we have a choice of moving anywhere in a two-dimensional space, then using the slope to guide us only becomes more important.    

So how does this approach of shifting back and forth translate mathematically?  It means we determine the slope in one dimension, then the other. Then, we move where that that slope is steepest downwards.  This moves us towards our minimum.  

### Partial Derivatives

To measure the slope in each dimension, one after the other, we'll take the derivative with respect to one variable, and then take the derivative with respect to another variable.  Now let's be very explicit about what it means to take the partial derivative with respect to a variable.

Let's again talk about this procedure in general, and then we'll apply it to the cost curve.  So let's revisit our multivariable function: 

$$f(x, y) = y*x^2 $$

Remember that the function looks like the following: 

![](./parabolayx2.png)

To take a derivative with respect to $x$ means to ask, how does the output change, as we make a nudge only in the $x$ direction. To express that we are nudging in the $x$ direction we say $\frac{\delta f}{\delta x}$.  That symbol is the lower case delta.  We read this as taking the derivative with respect to $x$.  But it just means seeing the change in output as we nudge in the $x$ direction.  

And to express the change in output with respect to $y$, we say $\frac{\delta f}{\delta y}$.  And this just means calculating the change in output as we nudge our input over in the $y$ direction.

### Visualizing the partial derivative

So what does a derivative $\frac{\delta f}{\delta x}$ look like? How do we think of a partial derivative of a multivariable function?

Well remember how we think of a standard derivative of a one variable function, for example $f(x) = x^2 $. 

![grafik.png](attachment:grafik.png)

So in two dimensions, to take the derivative at a given point, we simply calculate the slope of the function at that x value.

Now the partial derivative of a multivariable function is fairly similar.  But here it's equal to the slope of the tangent line at a specific $x$ value **and** a specific $y$ value.  Let's break this down by using our patented "freeze-frame" method.  The graphs below shows lines tangent to the curve in the $x$ direction.  (The tangent lines are a little small, but they and their corresponding slopes are there). 

#### Graphs for $\frac{df}{dx}$

![grafik.png](attachment:grafik.png)

Let's take a close look.  The top left graph shows $\frac{\delta f}{\delta x}$ at different points of $f(x, y)$ where $y = -1$.  So as you can see, $\frac{\delta f}{\delta x}f(1, 3) = -6$ as shown in the green line in the top left.  That's because when you move to that point on the graph, $(3, -1)$ and then nudge a little bit in the $x$ direction, the change in output is $-6$.  And that is represented by the line tangent to the function at that point in the $x$ direction.  You can go through the other points in these graphs, and work through the same logic. 

So with taking the partial derivative $\frac{\delta f}{\delta x}$, you may think about moving to the slice of the graph for a given value of $y$, then moving to the proper value of $x$, and then finding the tangent line at that point.  

As you can see, $\frac{\delta f}{dx}$ means the change in output from a nudge in $x$ direction, but the derivative is still influenced by $y$ component of the function.  You can see this because for different values of $y$, our slice of the graph looks different, and thus tangent lines for those slices look different.

### One more example

This can be a little mind-bending so let's go through this again for $\frac{df}{dy}f(x, y)$ where $f(x,y) = (yx^2) $.  Once again, the 3-d graph of $f(x,y) = yx^2$ is the following: 

![](./parabolayx2.png)

Now for $\frac{df}{dy}$ of a function $f(x, y) $ you can think sliding through different slices of the function but this time for different values of $x$.  So again, we have our freeze frame, but this time each frame represents ascending values along the x axis.  

First let's understand our plots below -- they may be surprising.  Starting at the top left quadrant the graph of the function $f(x,y)$ makes sense as when $x =-1$ then the function is just $f(y) = -1*y $.  And moving down to the bottom left, $f(2, y) = 2^2*y = 4y$.  

So now, to think about taking the derivative, once again we move to a slice of graph for a value of $x$, and then move in the $y$ direction.  So $\frac{df}{dy}$ at $\frac{df}{dy}f(1, y)$ = 1.  And we know that the derivative of a line is always just equal to the line's slope.  For $f(1, y)$ that slope, and thus the derivative, is always $1$.  For $f(2, y)$ it's 4.

##### Graphs for $\frac{df}{dy}$

![grafik.png](attachment:grafik.png)

So that is our technique for a partial derivative.  For $\frac{df}{dy} $ we move to a slice of the curve at a specific value of $x$, move to the point for y, and then calculate the change in output as we nudge in the $y$ direction.  

For $\frac{df}{dx}$ (again below), we move to a slice of a curve of a specific value of $y$, move the correct value of $x$ and then calculate how much the output changes as we nudge in the $y$ direction.  Just think slide, slide then nudge.  That's a partial derivative.

#### Graphs for $\frac{df}{dx}$

![grafik.png](attachment:grafik.png)

### Our rule for partial derivatives

Ok, so now that you understand the slide, slide, nudge, maybe you can understand this little shortcut that we can pull.  For any multivariable function, the variables that you are **not** taking the derivative with respect to, can just be treated as a constant.

For example, with our function of $f(x, y) = y*x^2 $, when taking the partial derivative $\frac{df}{dy}f(x, y)$, we treat all values of $y$ as a constant.  Let's do it:


$$\frac{df}{dy}f(x,y) =  \frac{df}{dy}(y) * x^2 = 1*x^2 = x^2$$

So that's all it means to take a partial derivative of something: look at what you are taking a derivative with respect to, and only take the derivative of those types of variables.  And guess what, this result lines up to what we saw earlier.

![grafik.png](attachment:grafik.png)

We calculated that $\frac{df}{dy}f(x,y) = x^2 $, and that is what the graphs show.  When $x = 2$ our derivative is always 4.  And when $x$ is $3$ the derivative is always 9.  So even though we are taking $\frac{df}{dy}$, the $x$ value is influencing the steepness of that line.  But by the time we get to our nudge, that value of $x$ is **constant**, it's influenced has already been applied, and then we are seeing how the output changes as we nudge in the $y$ direction.

Now let's try our rule one more time, this time $\frac{df}{dx}f(x, y)$ for our function $f(x, y) = y*x^2 $.



$$\frac{df}{dx}f(x,y) = y*\frac{df}{dx}(x^2) = 2*y*x$$

So this time with $\frac{df}{dx}f(x,y) $, we treat $y$ as a constant, as the influence $y$ is first applied by moving to a slice of our graph for a value of $y$.  Then once there, we are evaluating the change in output as we nudge in the $x$ direction.   

![grafik.png](attachment:grafik.png)

### Summary

In this section, we have learned how to think about taking the partial derivative of a function.  For the partial derivative, we say we are taking the derivative with respect to a variable.  So for example, we can say for the function $f(x, y)$, take the partial derivative with respect to the variable $x$.  This means we are assessing the output after nudging in the $x$ direction, and we can express this as $\frac{\delta f}{\delta x} $.  Our rule for taking the partial derivative is to treat the variables that we are not taking the derivative with respect to as constants.  Which makes sense, because at the time that we are taking the derivative by making our "nudge" the only variable that is changing is the variable we are taking the derivative with respect to.

# Partial Derivatives Lab

### Introduction

In this lesson, we'll get some more practice with partial derivatives.

### Breaking down multivariable functions

In our explanation of derivatives, we discussed how taking the derivative of multivariable functions is similar to taking the derivatives of a multivariable function like $f(x)$.  In the first section we'll work up to taking the partial derivative of the multilinear function $ f(x,y) = 3xy $.  Here's what the function looks like in a 3d graph.

![grafik.png](attachment:grafik.png)

Before we get there, let's first just first break this function down into it's equivalent of different slices, like we have done previously.  We'll do this by taking different slices of the function, stepping through various values of $y$. So instead of considering the entire function, $f(x, y) = 3xy $ we can think about the function $f(x, y)$ evaluated at various points, where $y = 1$, $y = 3$, $y = 6$, and $y = 9$.

Write out Python functions that return the values $f(x, y)$ for $f(x, 1)$, $f(x, 3)$, $f(x, 6)$, and $f(x, 9)$ for the function $f(x, y) = 3xy $.

In [2]:
def three_x_y_at_one(x):
    return 3*x*1

def three_x_y_at_three(x):
    return 3*x*3

def three_x_y_at_six(x):
    return 3*x*6

def three_x_y_at_nine(x):
    return 3*x*9

In [3]:
three_x_y_at_one(3) # 9
three_x_y_at_three(3)# 27
three_x_y_at_six(1) # 18
three_x_y_at_six(2) # 36

36

Now that we have our functions written, we can write functions that provided an argument of `x_values`, return the associated `y_values` that our four functions return.  

In [4]:
zero_to_ten = list(range(0, 11))
zero_to_four = list(range(0, 5))

def y_values_for_at_one(x_values):
    return list(map(lambda x: three_x_y_at_one(x), x_values))

def y_values_for_at_three(x_values):
    return list(map(lambda x: three_x_y_at_three(x), x_values))

def y_values_for_at_six(x_values):
    return list(map(lambda x: three_x_y_at_six(x), x_values))

def y_values_for_at_nine(x_values):
    return list(map(lambda x: three_x_y_at_nine(x), x_values))

In [5]:
y_values_for_at_one(zero_to_four) # [0, 3, 6, 9, 12]
y_values_for_at_one(zero_to_ten) # [0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30]

y_values_for_at_three(zero_to_four) # [0, 9, 18, 27, 36]
y_values_for_at_three(zero_to_ten) # [0, 9, 18, 27, 36, 45, 54, 63, 72, 81, 90]

y_values_for_at_six(zero_to_four) # [0, 18, 36, 54, 72]
y_values_for_at_six(zero_to_ten) # [0, 18, 36, 54, 72, 90, 108, 126, 144, 162, 180]

y_values_for_at_nine(zero_to_four) # [0, 27, 54, 81, 108]
y_values_for_at_nine(zero_to_ten) #[0, 27, 54, 81, 108, 135, 162, 189, 216, 243, 270]

[0, 27, 54, 81, 108, 135, 162, 189, 216, 243, 270]

Now we are ready to plot the function $f(x) = x $, $f(x) = 3x $, $f(x) = 6x $ and $f(x) = 9x $

In [6]:
# from graph import trace_values
def trace_values(x_values, y_values, mode = 'markers', name="data", text = []):
    return {'x': x_values, 'y': y_values, 'mode': mode, 'name': name, 'text': text}

y_at_one_trace = trace_values(zero_to_ten, y_values_for_at_one(zero_to_ten), mode = 'lines+markers', name = 'f(x, y) at y=1') or {}

y_at_three_trace = trace_values(zero_to_ten, y_values_for_at_three(zero_to_ten),  mode = 'lines+markers',  name = 'f(x, y) at y=3') or {}

y_at_six_trace = trace_values(zero_to_ten, y_values_for_at_six(zero_to_ten),  mode = 'lines+markers', name = 'f(x, y) at y=6') or {}

y_at_nine_trace = trace_values(zero_to_ten, y_values_for_at_nine(zero_to_ten),  mode = 'lines+markers', name = 'f(x, y) at y=9') or {}


In [7]:
import plotly
from plotly.graph_objs import Scatter, Layout
from plotly.offline import init_notebook_mode, iplot
from IPython.display import display, HTML

init_notebook_mode(connected=True)

fig_constants_lin_functions = {
    "data": [y_at_one_trace, y_at_three_trace, y_at_six_trace, y_at_nine_trace],
    "layout": Layout(title="constants with linear functions")
}
plotly.offline.iplot(fig_constants_lin_functions)

So as you can see, plotting our multivariable $f(x, y)$ at different values of $y$ above lines up conceptually to having one plot step through these values of $y$. 

![](./plot3xy.png)

### Evaluating the partial derivative

So in the above section, we saw how we can think of representing our multivariable functions as a function evaluated at different value of $y$.

In [8]:
plotly.offline.iplot(fig_constants_lin_functions)

Now let's think of how to take the derivative of our $ \frac{\delta f}{\delta x} f(x, y)$ at values of $y$.  Knowing how to think about partial derivatives of multivariable functions, what is $ \frac{\delta f}{\delta x} $ at the following values of $y$.

In [9]:
def df_dx_when_y_equals_one():
    return 3*1

In [10]:
def df_dx_when_y_equals_three():
    return 3*3

In [11]:
def df_dx_when_y_equals_six():
    return 3*6

In [12]:
def df_dx_when_y_equals_nine():
    return 3*9

So notice that there is a pattern here, in taking $ \frac{\delta f}{\delta x}$ for our function $f(x, y) = 3xy$.  Now write a function that calculates $ \frac{\delta f}{\delta x}$ for our function $f(x,y)$ at any provided $x$ and $y$ value. 

In [13]:
def df_dx_3xy(x_value, y_value):
    return 3*y_value

In [14]:
df_dx_3xy(2, 1) # 3

3

In [15]:
df_dx_3xy(2, 2) # 6

6

In [16]:
df_dx_3xy(5, 2) # 6

6

So as you can see, our $y$ value influences the function, and from there it's a calculation of $\frac{\Delta f}{\Delta x}$, which in this case is constant.

## Using our partial derivative rule

Now let's consider the function $ f(x, y) = 4x^2y + 3x + y$.  Now soon we will want to take the derivative of this function with respect to $x$.  We know that in doing something like that, we will need to translate this function into code, and that when we do so, we will need to capture the exponent of any terms as well as.

Remember that the way we expressed a single variable function, $f(x)$ in Python was to represent the constant, and $x$ exponent for each term.  For example, the function $f(x) = 3x^2 + 2x$ can be represented as the following:

In [17]:
three_x_squared_plus_two_x = [(3, 2), (2, 1)]

Now let's talk about representing our multivariable function $ f(x, y) =4x^2y + 3x + y$ in Python.  Instead of using a tuple with two elements, we'll use a tuple with three elements and with that third element the exponent related to the $y$ variable.  So our function $ f(x, y) = 4x^2y + 3x  + y$ looks like the following:

In [18]:
four_x_squared_y_plus_three_x_plus_y = [(4, 2, 1), (3, 1, 0), (1, 0, 1)]

Let's get started by writing a function `multivariable_output_at` that takes in a multivariable function and returns the value $f(x, y)$ evaluated at a specific value of $x$ and $y$ for the function.

In [19]:
def multivariable_output_at(list_of_terms, x_value, y_value):
    output = []
    for term in list_of_terms:
        constant = term[0]
        x_exponent = term[1]
        y_exponent = term[2]
        term_output = constant * (x_value**x_exponent) * (y_value**y_exponent)
        output.append(term_output)
    return sum(output)

In [20]:
multivariable_output_at(four_x_squared_y_plus_three_x_plus_y, 1, 1) # 8

8

In [21]:
multivariable_output_at(four_x_squared_y_plus_three_x_plus_y, 2, 2) # 40

40

Let's also try this with another function $g(x, y) = 2x^3y + 3yx + x $.

In [22]:
two_x_cubed_y_plus_three_y_x_plus_x = [(2, 3, 1), (3, 1, 1), (1, 1, 0)]

In [23]:
multivariable_output_at(two_x_cubed_y_plus_three_y_x_plus_x, 1, 1) # 6

6

In [24]:
multivariable_output_at(two_x_cubed_y_plus_three_y_x_plus_x, 2, 2) # 46

46

So now we want to write a Python function that calculates $\frac{\delta f}{\delta x}$ of a multivariable function.  Let's start by writing a function that just calculates $\frac{\delta f}{\delta x}$ of a single term.

In [25]:
# f(x) = 4 * x^2 * y^1
# f(x) = term[0] * x^term[1] * y^term[2]

def term_df_dx(term):
    constant = term[0]*term[1]
    exponent = term[1] - 1
    y_constant = term[2]
    return (constant, exponent, y_constant)

In [26]:
four_x_squared_y = (4, 2, 1)
term_df_dx(four_x_squared_y) # (8, 1, 1) 

(8, 1, 1)

> This solution represents $8xy$

In [27]:
y = (1, 0, 1)
term_df_dx(y) # (0, -1, 1)

(0, -1, 1)

> This solution represents $0$, as the first element indicates we are multiplying the term by zero.

Now write a function that finds the derivative of all terms, $\frac{\delta f}{\delta x}$ of a function $f(x, y)$.

In [28]:
def df_dx(list_of_terms):
    all_terms = list(map(lambda term: term_df_dx(term), list_of_terms))
    return list(filter(lambda each_term: each_term[0] > 0, all_terms))

In [29]:
df_dx(four_x_squared_y_plus_three_x_plus_y) # [(8, 1, 1), (3, 0, 0)]

[(8, 1, 1), (3, 0, 0)]

Now that we have done this for $\frac{\delta f}{\delta x}$, lets work on taking the derivative $\frac{\delta f}{\delta y}$.  Once again, we can use as an example our function $ f(x, y) = 4x^2y + 3x + y$.  Let's start with writing the function `term_df_dy`, which takes the partial derivative $\frac{\delta f}{\delta y}$ of a single term.

In [30]:
# f(x) = 4 * x^2 * y^1
# f(x) = term[0] * x^term[1] * y^term[2]

def term_df_dy(term):
    constant = term[0]*term[2]
    x_constant = term[1]
    exponent = term[2] - 1
    return (constant, x_constant, exponent)

In [31]:
four_x_squared_y # (4, 2, 1)

(4, 2, 1)

In [32]:
term_df_dy(four_x_squared_y) # (4, 2, 0)

(4, 2, 0)

> This represents that $\frac{\delta f}{\delta y}4x^2y = 4x^2$

In [33]:
term_df_dy(y) # (1, 0, 0)

(1, 0, 0)

> This represents that $\frac{\delta f}{\delta y}y = 1$

In [34]:
three_x = four_x_squared_y_plus_three_x_plus_y[1]
term_df_dy(three_x) # (0, 1, -1)

(0, 1, -1)

> This represents that $\frac{\delta f}{\delta y}3x = 0$

Now let's write a function `df_dy` that takes multiple terms and returns an list of tuples that represent the derivative of our multivariable function.  So here is our function: $ f(x, y) = 4x^2y + 3x + y$.

In [35]:
four_x_squared_y_plus_three_x_plus_y

[(4, 2, 1), (3, 1, 0), (1, 0, 1)]

In [36]:
def df_dy(list_of_terms):
    all_terms = list(map(lambda term: term_df_dy(term), list_of_terms))
    return list(filter(lambda each_term: each_term[0] > 0, all_terms))

In [37]:
df_dy(four_x_squared_y_plus_three_x_plus_y) # [(4, 2, 0), (1, 0, 0)]

[(4, 2, 0), (1, 0, 0)]

In [38]:
two_x_cubed_y_plus_three_y_x_plus_x = [(2, 3, 1), (3, 1, 1), (1, 1, 0)]
df_dy(two_x_cubed_y_plus_three_y_x_plus_x) # [(2, 3, 0), (3, 1, 0)]

[(2, 3, 0), (3, 1, 0)]

Great job! Hopefully, now you understand a little more about multivariable functions and derivatives!