In [1]:
import matplotlib.pyplot as plt
import numpy as np

x_dx = {
    'sigmoid': np.linspace(-10, 10, 400),
    'logarithmic': np.linspace(0.1, 10, 400),
    'convex': np.linspace(-10, 10, 400),
    'linear': np.linspace(-10, 10, 400),
    'cuadratic': np.linspace(-10, 10, 400),
    'cube': np.linspace(-10, 10, 400)}

functions = {
    'sigmoid': (x_dx['sigmoid'], 1 / (1 + np.exp(-x_dx['sigmoid'])), 'f(x)= σ(1/1+e^x)'),
    'logarithmic' : (x_dx['logarithmic'], np.log(x_dx['logarithmic']),'f(x) = log(x)'),
    'convex': (x_dx['convex'], np.exp(x_dx['convex']),'f(x) = e^x'),
    'linear': (x_dx['linear'], 2 * x_dx['linear'], 'f(x) = 2x'),
    'cuadratic': (x_dx['cuadratic'], x_dx['cuadratic'] ** 2, 'f(x) = x^2'),
    'cube': (x_dx['cube'], x_dx['cube'] ** 3, 'f(x) = x^3'),
}

def graph_functions(dx_functions):
    for function,x_y_label in dx_functions.items():
        x,y,label = x_y_label
        plt.figure(figsize=(5,3))
        plt.plot(x,y,label=label)
        plt.xlabel('x')
        plt.ylabel('f(x)')
        plt.axhline(0, color='black', linewidth=0.8, linestyle='--')
        plt.axvline(0, color='black', linewidth=0.8, linestyle='--')
        plt.legend()
        plt.grid(True)
        plt.savefig(f'{function}_function.png')
        plt.close()

graph_functions(functions)

# Neural Networks and Deep Learning
## Logistic Regression
Given x, want $\hat{y}=P(y=1 | x)$


$x \in\mathbb{R}^{nx}$

Parameters : $w\in\mathbb{R}^{nx}$ , $b\in\mathbb{R}$

$z = w^tx+b$


Output: $\hat{y} = \sigma(z)$


The sigmoid function permits to get a number between 0 - 1 and this is the definition

$\sigma(z) = \frac{1}{1+e^-z}$

![sigmoid](sigmoid_function.png)

When z is a large number then the output tends to be 1 

When z is a large negative number then the output tends to be 0

In summary, $\hat{y} = \sigma(w^tx + b)$, where  $\sigma(z) = \frac{1}{1+e^-z}$

Which means that given ${ \{x^1,y^1,...,x^my^m}\}$, want $ \hat{y} \approx{y^i} $

### Logistic Cost Function or Error Function

Loss(error) Function Permits evaluate how well is performing a model for a given $\hat{y}$ against $y$

Thus $L(\hat{y},y) = \frac{1}{2}(\hat{y}-y)^2$

This is usefull when the optimization problem is convex, due to the convexity of the formula, guarantee that any local minima is the global minima as well, make it easier the optimization process.

A convex optimization problem has this graphical form:

![convex_fun](convex_function.png)

But for problems like logistic regresion which is a problem of classification and not of regression, we will have multiple local minima,thus is better use the logaritmic loss function.

$L(\hat{y},y) = - (y \cdot log(\hat{y})+(1-y) \cdot log(1-\hat{y}))$

A logaritmic optimization problem has this graphical form:

![logarithmic_func](logarithmic_function.png)

Summarizing:

It is important to reduce as poosible the square error, as bigger the square error means that the model is performing bad.

Following this in the formula:

$L(\hat{y},y) = - (y \cdot log(\hat{y})+(1-y) \cdot log(1-\hat{y}))$

If $y=1: L(\hat{y},y) = -log(\hat{y})$ <--- Want $log(y)$ large, want $\hat{y}$ large.

If $y=0: L(\hat{y},y) = -((1-y) \cdot log(1-\hat{y})) \rightarrow -log(1-\hat{y}))  $ <--- Want $log(1-\hat{y})$ large ... Want $\hat{y}$ small. 


#### Cost Function

$J(w,b) = \frac{1}{m} \sum_{i=1}^m L(\hat{y}^i,y^i) = - \frac{1}{m} \sum_{i=1}^m[y^i \cdot log(\hat{y}^i) + (1-y^i)\cdot log(1-\hat{y}^i)]$

## Derivatives

In mathematics, the derivative is a fundamental concept that measures the change of a function's output for a given x.

When we talk about  cartesian plan, usually is represented with two axis, x and y, this axis permit us to locate a point with coordenates (x,y)

With derivates is little similar, we can locate an especific point by (x,f(x)), but this powerfull technique permit us to measure the exchange rate for a given x.

$x$ = The value in the axis x
$f(x)$ = This represents how will be handeled the input, is the pattern that will define the graph behaviour 

### Linear Regresion

For example, to calculate a linear model, we know that the exchange rate for each x, in f(x) it will be constant, because the model is linear.

For a given function: $f(x)=2x$

When $x=2, f(2) \rightarrow 2 \cdot 2 \rightarrow 4$

Now $x=2.002, f(2.002) \rightarrow 2 \cdot 2.002 \rightarrow 4.004$

The slope for both graphs is equivalent to $\frac{2}{1}$, which means that in each point of x, y will be the double of x.

![linear_func](linear_function.png)

### More functions
Now, understanding that, in ML there are a lot of kind of relationships in the data, could be the scenario in which we do not have a linear relation, but cuadratic.

Understanding the same concept

For a given function: $f(x)=x^2$

We have:  $x=2, f(2) \rightarrow 2^2 \rightarrow 4$

When:   $x=2.001, f(2.001) \rightarrow 2.001^2 \rightarrow 4.004$

Then the slope(derivate) at x=2 is 4, because the value increase 4 from .001 -> .004

$\frac{d}{dx} \cdot f(x) = 4$ when $x=2$

We have $x=5, f(5) \rightarrow 5^2 \rightarrow 25$


When:   $x=5.001, f(5.001) \rightarrow 5.001^2 \rightarrow 25.010$

Then the slope(derivate) at x=5 is 10, because the value increase in a rate of 10 from .001 -> .010

$\frac{d}{dx} \cdot f(x) = 10$ when $x=5$

![cuadratic_func](cuadratic_function.png)


We have $x=5, f(5) \rightarrow 5^3 \rightarrow 125$


When:   $x=5.001, f(5.001) \rightarrow 5.001^3 \rightarrow 125.075$

Then the slope(derivate) at x=5 is 75, because the value increase in a rate of 75 from .001 -> .070

$\frac{d}{dx} \cdot f(x) = 75$ when $x=5$

![cube_func](cube_function.png)


### Computation Graph

A computation graph, is basically the calculation of the cost function in which we will try that the final value gets closer to 0 as we can.


This starts with something called forward propagation, which simply means the calculation of simple blocks of equations followed by more blocks of equations that involve previous equations, until culminate in the final result, also know as cost function. Once we have the final result, depending if the result is positive or negative, the baiases and weights of the neural networks will be adjusted in order to aproach to 0, if is positive, then the weights and baisases will be adjusted to drag the result to the left (in a number line), if is negative, than the oposite; This is called backward propagation, because it uses the final result, to adjust the previous steps.


In a simple example:
$J = 3(a + bc) $

The calculation of this entire equation(cost functiuon) can be splitted in several steps:

1. **Calculus of $u$:**
   
   $u = b \cdot c$

2. **Calculus of $v$:**
   
   $v = a + u$

3. **Calculus of $J$:**
   
   $J = 3 \cdot v$

## Derivatives with a Computation Grap

![comp_graph](comp_graph.jpg)
   

Lets break down the meaning of a derivate with respect another derivate.

In the practice the Computation Graph will make all the calculus of the forward and back propragation(propagation to the left and right), automatically. However it is important to understand how it works and how are ajusted the values of the weights anda biases along the process.

As we mentioned earlier, the first step of this process, is assign arbitrary and low values to the biases and weights of the neural network. 

For the formula: $J = 3(a + bc) $

We will use the arbitrary and low values:

- $ a = 5$
- $ b = 3$
- $ c = 2$

If we solve with those values we will get:

 1. $ u = b \cdot c \rightarrow 6 = 3 \cdot 2 \ $
 2. $ v = a + u \rightarrow 11 = 5 + 6$
 3. $ J = 3 \cdot v \rightarrow 33 = 3 \cdot 11$

At this point this is called right propagation, or forward propagation, because we are solving from the left to the right until culminate with the whole equation.

Once we have done with the  forwardpropagation, we will get the final value, that will represent many things, but for now will only considered the cost function, so, the final value will represent the cost function. As is positive its needed to adjust the weights and biases to get a result closer to 0. Is there when enters the second step, backpropagation.

In this step will be calculated how the derivatives affects the final result, and others results in order to know if decrease or increase biases or weights to reduce the final value.

In this case, lets suppose that we want to know how changing $a$ affects the derivative of $J$. For that we calcule, for instance:

New value of $a = 5.001$ then with respect to $J$ will be $J = 3 \cdot 11.001 \rightarrow 33.003$ so in this case $\frac{dJ}{da} = 3$ because increase 3 times the value

Another example will be perturb the value of $b = 3.001$ to observe the resulting change in $J$ represented by $\frac{dJ}{db}$ Please notice that in this case is chained, because first of all we need to resolve for $u$, and then for $v$, and finally for $J$.

Giving the equation is must resolve $ \boxed{u = b \cdot c} \rightarrow \boxed{v = a + u} \rightarrow \boxed{J= 3 \cdot v} $

So will we take in consideration the increment or decrement since the value that we choose to change, until the respect of.

If we reeplace $b = 3.001$ we will have $ \boxed{6.002 = 3.001 \cdot 2} \rightarrow \boxed{v = 5 + 6.002} \rightarrow \boxed{33.006= 3 \cdot 11.002} $

So in total the decimal of $b$ corresponding to $.001$ increase to $.006$ in $J \rightarrow \frac{dJ}{db} = 6$




## Logistic Regression Gradient Descent

Giving 

- $x_1$
- $w_1$
- $x_2$
- $w_2$
- $b$

We can calculate $\boxed{ z = w_1 \cdot x_1 + w_2 \cdot x_2} \rightarrow \boxed{a = \sigma(z)} \rightarrow \boxed{L(a,y)}$

First of all we would like to calculate the result if we do a perturbation in $a$ to see the resulting change in $L$

So we denote: $\boxed{da = \frac{dL(a,y)}{da}}  \rightarrow \boxed{ -\frac{y}{a}+\frac{1-y}{1-a}}$

And also denote : $dz = \boxed{\frac{dL}{dz}} \rightarrow \boxed{\frac{dL(a,y)}{dz}} \rightarrow \boxed{a-y} \rightarrow \boxed{\frac{dL}{da} \cdot \frac{da}{dz}} $


In [11]:
import numpy as np

# Calculando el logaritmo base 2 de 8 usando numpy
number = 8
log_base_2 = np.log(number) / np.log(2)

print("log base 2 de 8 es:", log_base_2)

log base 2 de 8 es: 3.0


In [15]:
np.log(number)

2.0794415416798357

In [18]:
np.log(2)

0.6931471805599453

In [None]:
np.log(2)