# Some notes related to neural networks

Functions like the ones shown avoid counterintuitive jumps and can model continuous values (e.g. a probability):

![image.png](attachment:15d2dee2-3991-49b4-8e61-f32d891c3dd3.png)

Integration:

![image.png](attachment:fecbdaf2-3f36-4adc-b25e-fe8870b4ec23.png)

Add bias to vectors instead of + b:
Add new element 1 to x, and new element b to w. (1 * b, sum) ;)

In [2]:
def compute_integration(x, w, b):
    """ x is input
        w is weights
        b is bias

        Dot product of the vectors x and w.
        Dot product is the element-wise products of both vectors

        weighted_sum = sum(x[k] * w[k] for k in range(0, len(x)))
    """
    return sum(x[k] * w[k] for k in range(0, len(x))) + b

Magnitude of various vectors:

![image.png](attachment:e4451750-a273-47e2-958b-050191805265.png)

In [1]:
def magnitude(x):
    """ x is input
    """
    return sum(k**2 for k in x)**0.5

Normalize vector: 

1: find unit vector (vector with magnitude 1)

2: devided every element in vector by unit vector

## Optimization NN
Optimizing weights and biases is a problem in a continuous space (since they can be any real numbers), so derivatives (and especially partial derivatives) play a fundamental role in building performant ANNs

Basic steps:
1. take the derivative of the function
2. set the derivative equal to 0
3. solve for the parameters (inputs) that satisfy the equation (often to complex:heuristic methods:example:gradient descent)

![image.png](attachment:ca7f59b7-b1c3-4907-a181-9024fecb0b15.png)

# Sigmoid function:

![image.png](attachment:8ca7f97e-c6bc-45ae-acc9-7b13f456dd0d.png)

v = the weighted sum of the inputs

Often used as an activation function in ANNs

![image.png](attachment:e185917f-e544-4082-823f-0c26b4eea21d.png)   
![image.png](attachment:654e0647-b2a9-43b1-8e27-b1d834a8bf22.png)

## Sigmoid neuron
An artificial neuron wich outputs a number between 0 and 1. Bias controls how much input is needed to active it
![image.png](attachment:b5953495-6401-4313-bfc1-21caf9705904.png)

Gradient is, simply, a row vector of a function’s partial derivatives.

Gradient descent is a heuristic method that starts at a random point and iteratively moves in the direction (hence “gradient”) that decreases (hence “descent”) the function that we want to minimize. With enough of these steps in the decreasing direction, a local minimum can theoretically be reached. Colloquially, think of it as playing a game of “hot” and “cold” until the improvement becomes negligible.

In [5]:
# Step_size or learning rate
def gradient_descent(point, learning_rate, threshold):
    value = f(point)
    new_point = point - learning_rate * gradient(point)
    new_value = f(new_point)
    if abs(new_value - value) < threshold:
        return value
    return gradient_descent(new_point, learning_rate, threshold)

#check Artificial Neural Networks: Optimization for Neural Networks