A Linear function combined with a non-linear function such as the sigmond function. What neural networks add to this is that we connect multiple such models together to obtain network.  

When talking about neural networks, we use slightly different terminology. Instead of coefficients we say ***(weights)***.  

The non-linear part of the model, which in the case of logistic regression was the sigmond function, is called the ***(activation function)***.  

One such model is called a ***(neuron)***. The neurons are connected to each other by letting the output of one neuron is the input of another.  

Below is a diagram of neural network, let's start breaking the diagram down.  

![Screen Shot 2024-08-30 at 8.31.37 PM.png](attachment:031af805-1992-4f3b-89a2-032a831b1dac.png)  

Our neural network in this case is a network of .....  
1- Three input nodes $x_1, x_2, x_3$, forming the input layer.  
2- a hidden layer with two nodes $h_1, h_2$, and a bias node (more on this later).  
3- and output layer with one node $o$.  

The input layer and the hidden layer are connected so that each node in hidden layer behaves not unlike a logistic regression model where the input of the model comes from the input nodes.  

Likewise, the nodes in the hidden layer are connected to the output node in very much the same fashion: this time the nodes in the hidden layer provide the inputs to the output node.  

#### Input layer
The input layer is your input data, where one node corresponds to one element in your input.For example, in the cabin price case you could have $x_1$ be the cabin size, and $x_2$ the size of the sauna, ans so on.  

#### Hidden layer
As we said, the nodes behave very similarly to a logistic regression model. Inside each node we calculate a linear combination of the inputs of the node - recall that in the case of the output these are provided by the nodes in the hidden layer - and apply something like the sigmond function to determine the actual output.  

As mentioned above, we call the function applied in the latter stage the ***(activation function).*** This term is borrowed directly from neuroscience where neurons communicate by sending electrical pulses to other neurons when activated by recieved stimuli. 

In the hidden layer we also have a ***(Bias node)*** which is NOT connected to the input nodes. The purpose of the bias node is functionally the same as with the intercept term in the linear regression: it can shift the input coming from the layer to another layer by some constant value.  
In the network above it shifts the input the final output layer gets by a constant value determined by the weight of the coefficient to the output node. ***(Bias node)*** are not a mandatory feature of neural networks but are usually helpful in model performance.   

Neural network is the box where we insert the inputs and out comes the output.  Looking at it as just a function with inputs and outputs, it srves the same purpose as any regression or classification model.   

And since the individual neurons are not more complicated than a logistic regression model, how is this useful at all?    

the magic happens at the point where the neurons inside the network use non-linear activation functions.Indeed if we only use linear activations, it makes the whole network is just a big fat linear regression model.  
But if we use non-linear activations such as the sigmond function or something called a ***(Rectified linear unit)***, the model suddenly becomes much more powerful than linear models.💪🏻👊🏻👍🏽  
In fact it becomes so powerful that with enough nodes in the network, we can learn to fit virtually anything data perfectly. A technical way to express this is to say that neural networks are "universal function approximators"....  

Even relatively simple model with few neurons can learn things that are too much for linear models such as linear regression, logistic regression, or Naive Bayes.  


The power of neural networks comes with cost. More complex problems require larger networks, larger networks contain more parameters, and more parameters require more data.  

If we try to fit a large network with too little data, the model will overfit and make worse predictions than a simpler network: even a simple logistic regression model can easily beat a large neural network if there is only scarce data.  

Even if we have more data, more parameters also implies more computation and at some point we run out of computational resources to complete the fitting  process in a reasonable amount of time. so there really is no free lunch even (or especially) for machine learning.  

The reason why oprimizing the parameters of a neural network to fit the training data is so hard is exactly the problem we encountered when the activation functions are non-linear, the optimization "landscape" is also highly irregular and exhibits plenty of local optima where the optimizer can get stuck. a fair amount of research has gone into devising new and better optimization algorithmis just for this problem...





In [2]:
import numpy as np

# Provided weights and biases
w0 = np.array([[ 1.19627687e+01,  2.60163283e-01],
               [ 4.48832507e-01,  4.00666119e-01],
               [-2.75768443e-01,  3.43724167e-01],
               [ 2.29138536e+01,  3.91783025e-01],
               [-1.22397711e-02, -1.03029800e+00]])

w1 = np.array([[11.5631751 , 11.87043684],
               [-0.85735419,  0.27114237]])

w2 = np.array([[11.04122165],
               [10.44637262]])

b0 = np.array([-4.21310294, -0.52664488])
b1 = np.array([-4.84067881, -4.53335139])
b2 = np.array([-7.52942418])

# Provided feature vector and output vector
x_test = [[74, 5, 10, 2, 100]]

# Activation functions
def hidden_activation(z):
    # ReLU activation
    return np.maximum(0, z)

def output_activation(z):
    # Identity (linear) activation
    return z

# Loop through the test input(s)
for item in x_test:
    # Convert input to numpy array
    item = np.array(item)
    
    # Calculate first hidden layer
    h1_in = np.dot(item, w0) + b0  # Add bias term
    h1_out = hidden_activation(h1_in)  # Apply ReLU activation function
    
    # Calculate second hidden layer
    h2_in = np.dot(h1_out, w1) + b1  # Add bias term
    h2_out = hidden_activation(h2_in)  # Apply ReLU activation function
    
    # Calculate output layer
    out_in = np.dot(h2_out, w2) + b2  # Add bias term
    out = output_activation(out_in)  # Apply identity activation function
    
    # Print the output (predicted price)
    print(f"Predicted price for {item.tolist()}: {out[0]:.2f}")


Predicted price for [74, 5, 10, 2, 100]: 232721.36


$$Great Good Bye$$