# Here, we will do the derivations. Confidently. ✊🏻

In the upcoming lectures -- we will be given the chance to do the same thing with **structured** steps -- but right now I am willing to do that thing in the old school way -- our way.

**Some things that need to keep in mind**:
1. The derivative is a value -- calculated for a given variable for the given value.
2. That's **how does B change when A change slightly when A = 2**. You know? So that **2** is important here. The slope can be different at other values.

In [3]:
import numpy as np

In [4]:
def compute_derivative(function: object, nudge_what: str, values: dict):
    '''
    At a time the function can tweak the single value
    '''
    tiny_nudge = 1e-8
    old_result = function(**values)
    print(f"{values=}", end="")
    values[nudge_what] += tiny_nudge
    print(f" | {values=}")
    
    new_result = function(**values)
    diff = new_result - old_result
    
    print(f"{old_result=} | {new_result=}\n")
    return diff / tiny_nudge

# 1️⃣ EXAMPLE: Simple

<img src="./images/computation-graph.png">

## Starting from the back

#### `1.` How does `J` change with `v`?

In [5]:
def j(v):
    return 3 * v

In [6]:
djdv = compute_derivative(j, "v", dict(v=11))
djdv

values={'v': 11} | values={'v': 11.00000001}
old_result=33 | new_result=33.00000003



3.000000248221113

👉🏻 The value slope of `j` for `v` is `3`.

#### `2.` How does `J` change with `a`?

In [7]:
def v(a, u):
    return a + u

In [8]:
dvda = compute_derivative(v, "a", dict(a=5, u=6))
dvda

values={'a': 5, 'u': 6} | values={'a': 5.00000001, 'u': 6}
old_result=11 | new_result=11.00000001



1.000000082740371

⛓️‍ But the chain rule...

In [9]:
djda = dvda * djdv
djda

3.0000004964422464

👉🏻 The value slope of `j` for `a` is `3`.

#### `3.` How does `J` change with `u`?

In [10]:
def v(a, u):
    return a + u

In [11]:
dvdu = compute_derivative(v, "u", dict(a=5, u=6))
dvdu

values={'a': 5, 'u': 6} | values={'a': 5, 'u': 6.00000001}
old_result=11 | new_result=11.00000001



1.000000082740371

⛓️‍ But the chain rule...

In [12]:
djdu = dvdu * djdv
djdu

3.0000004964422464

👉🏻 The value slope of `j` for `u` is `3`.

#### `4.` How does `J` change with `b`?

In [13]:
def u(b, c):
    return b * c

In [14]:
dudb = compute_derivative(u, "b", dict(b=3, c=2))
dudb

values={'b': 3, 'c': 2} | values={'b': 3.00000001, 'c': 2}
old_result=6 | new_result=6.00000002



1.999999987845058

⛓️‍ But the chain rule...

In [15]:
djdb = dudb * djdv
djdb

6.0000004599773975

👉🏻 The value slope of `j` for `b` is `6`.

#### `5.` How does `J` change with `b`?

In [16]:
def u(b, c):
    return b * c

In [17]:
dudc = compute_derivative(u, "c", dict(b=3, c=2))
dudc

values={'b': 3, 'c': 2} | values={'b': 3, 'c': 2.00000001}
old_result=6 | new_result=6.00000003



2.999999981767587

⛓️‍ But the chain rule...

In [18]:
djdc = dudc * djdv
djdc

9.000000689966095

👉🏻 The value slope of `j` for `c` is `9`.

# 2️⃣ EXAMPLE: LogReg

<img src="./images/log-reg-comp-graph.png">

Here, we will take some example to make sure the numbers that andrew has written make sense. So the number example looks like this 👇🏻

<img src="./images/log-reg-comp-graph-annotated.png">

#### `1.` (CODE) How does `L` change with `a`?

In [117]:
def l(a, y):
    '''
    Simple and Straight forward -- but doesn't care about the dtype
    '''
    return - ((y * np.log(a)) + ((1 - y) * np.log(1 - a)))

In [118]:
def l(y, a, eps=None, dtype=np.float64):
    '''
    Same as above, but takes care of the edge cases
    '''
    # choose epsilon based on dtype to avoid 1 - eps == 1 for float32
    if eps is None:
        eps = np.finfo(dtype).eps  # machine epsilon for dtype
    a = np.asarray(a, dtype=dtype)
    y = np.asarray(y, dtype=dtype)
    a = np.clip(a, eps, 1 - eps)   # keep in (0,1)
    return -(y * np.log(a) + (1 - y) * np.log(1 - a))

In [119]:
dlda = compute_derivative(l, "a", dict(a=0.999, y=1))
dlda

values={'a': 0.999, 'y': 1} | values={'a': 0.99900001, 'y': 1}
old_result=0.0010005003335835344 | new_result=0.0010004903235735242



-1.0010010010122472

#### `1.` (ANALYTICAL) How does `L` change with `a`?

```
dlda = - (y / a) + ((1-y) / (1 - a))
```

In [120]:
a = 0.999
y = 1
- (y / a) + ((1-y) / (1 - a))

-1.001001001001001

# 

#### `2.` (CODE) How does `L` change with `z`?

In [121]:
def sigma(z):
    return  1 / (1 + np.exp(-z))

In [122]:
# how does `a` change with `z`?
dadz = compute_derivative(sigma, "z", dict(z=15))
dadz

values={'z': 15} | values={'z': 15.00000001}
old_result=0.999999694097773 | new_result=0.9999996940977761



3.1086244689504383e-07

#### `2.` (ANALYTICAL) How does `L` change with `z`?

```
dadz = a * ( 1 - a )
```

In [123]:
z = 15
sigma(z) * (1 - sigma(z))

3.0590213341738715e-07

In [124]:
# chain rule 
dldz = dadz * dlda
dldz

-3.111736205190554e-07

--OR--

In [127]:
# which gives direct `dldz`
a - y

-0.0010000000000000009

# Done!
I mean, then we can easily traverse the path till the `W1`, `W2` and `b`