# Building a Neuron: The Role of the Sigmoid Function, An Activation Function

Creating a neuron involves several steps and relies heavily on mathematics. Among these, the sigmoid function plays a crucial role.

The sigmoid function, also known as the logistic function, introduces non-linearity into the neuron. This means it helps the model learn complex relationships between variables, unlike linear models where changes in one variable directly affect another (e.g., doubling the length of a rectangle doubles its width).

## Understanding Non-Linearity

In real-world scenarios, relationships between variables are often non-linear. For example, consider a car's speed (variable A) and its fuel consumption (variable B). Doubling the speed (A) wouldn't simply double the fuel consumption (B). This is because air resistance increases exponentially as speed increases (another variable C affecting B). This creates a roughly cubic relationship between speed and fuel consumption.

Here's an illustration:

- **Linear Assumption (incorrect):** One might assume that doubling a car's speed would simply double its fuel consumption.
- **Reality (non-linear):** Doubling speed increases air resistance significantly, leading to a much higher (cubic) increase in fuel consumption.

Data points:

- At 30 mph, a car might use 24 mpg.
- At 60 mph, instead of 12 mpg (linear assumption), it might use 30 mpg (non-linear effect).
- At 90 mph, consumption might drop to 18 mpg (air resistance becomes more dominant).

The sigmoid function allows neurons to capture these non-linear relationships, making them powerful tools for tackling complex problems.

## Sigmoid Function Formula

$$
\frac{1}{1+e^{-x}}
$$

## Weights and Biases

Often times we initialize the weights and biases as random. This is because we want to avoid having all neurons in a neural network be the same weight, as then it is likely that all neurons will behave the exact same and that is not helpful for learning different aspects of the data.

### What is a weight and a bias?

**Weight:** A weight is a variable within a neuron that serves to scale or de-scale a data point's importance.

**Bias:** A bias is a constant variable added to data points whose whole purpose is to "shift the activation function in a sense." This helps in handling inputs that are not centered around zero.

## Evaluating a Predicted Data Point

The evaluation of a data point is quite crucial, as it tells you how far you are from the prediction. There are two specific ways that we are going to cover in which you can evaluate a predicted data point.

### Binary Cross Entropy Loss

This is commonly used for binary classification tasks. It measures the performance of a classification model whose output is a probability value between 0 and 1.

$$
-\frac{1}{n}\sum_{i=1}^{n}\left[y_i \log(y_i^{\hat}) + (1-y_i) \log(1-y_i^{\hat})\right]
$$

where \( y_i^{\hat} \) is the predicted output and \( y_i \) is the actual output.

### Mean Squared Error

This one is used in regression tasks to measure the Euclidean distances (squared) between the predicted output and the actual output.

$$
\frac{1}{n}\sum_{i=1}^{n}(y_i^{\hat} - y_i)^2
$$

## Calculating Gradient

The gradient is a vector of partial derivatives of the loss function with respect to each parameter (weight and bias) in the network. It indicates the direction and magnitude of the steepest increase in the loss function. We use its negative to update parameters, moving towards minimizing the loss.

## Steps

1. **Calculate the Error** using either the loss function formula for binary cross-entropy loss or simply by calculating the difference between the predicted point and the actual point. To compute the derivative of the loss with respect to the predicted output:
   $$y^{\hat} - y$$

2. **Calculate the Partial Derivatives:**
   1. Partial derivatives are kinda simple since you are calculating the derivative of a variable with respect to another variable, but you often treat variable b as a constant. To compute the derivative of the predicted output with respect to the weighted sum:
      $$
      y^{\hat}(1 - y^{\hat})
      $$

      To compute the derivative of the weighted sum with respect to each weight:
      $$
      x_i
      $$

3. **Use the Chain Rule:**
   $$
   x_i \cdot y^{\hat}(1 - y^{\hat}) \cdot (y^{\hat} - y)
   $$


In [39]:
'''PseudoCode

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    sig = sigmoid(x)
    return sig * (1 - sig)

def initialize_parameters(input_size):
    weights = np.random.rand(input_size)
    bias = np.random.rand(1)
    return weights, bias

def forward_propagation(inputs, weights, bias):
    weighted_sum = np.dot(inputs, weights) + bias
    output = sigmoid(weighted_sum)
    return output, weighted_sum

def binary_cross_entropy_loss(predicted_output, actual_output):
    epsilon = 1e-15
    predicted_output = np.clip(predicted_output, epsilon, 1 - epsilon)
    loss = - (actual_output * np.log(predicted_output) + (1 - actual_output) * np.log(1 - predicted_output))
    return np.mean(loss)

def calculate_gradients(inputs, predicted_output, actual_output, weighted_sum):
    error = predicted_output - actual_output  # dL/dy
    d_loss_d_predicted = error
    d_predicted_d_weighted_sum = sigmoid_derivative(weighted_sum)  # dy/dz
    d_weighted_sum_d_weights = inputs  # dz/dw
    d_weighted_sum_d_bias = 1  # dz/db

    gradients = d_loss_d_predicted * d_predicted_d_weighted_sum * d_weighted_sum_d_weights
    bias_gradient = d_loss_d_predicted * d_predicted_d_weighted_sum * d_weighted_sum_d_bias

    return gradients, bias_gradient
'''



'PseudoCode\n\ndef sigmoid(x):\n    return 1 / (1 + np.exp(-x))\n\ndef sigmoid_derivative(x):\n    sig = sigmoid(x)\n    return sig * (1 - sig)\n\ndef initialize_parameters(input_size):\n    weights = np.random.rand(input_size)\n    bias = np.random.rand(1)\n    return weights, bias\n\ndef forward_propagation(inputs, weights, bias):\n    weighted_sum = np.dot(inputs, weights) + bias\n    output = sigmoid(weighted_sum)\n    return output, weighted_sum\n\ndef binary_cross_entropy_loss(predicted_output, actual_output):\n    epsilon = 1e-15\n    predicted_output = np.clip(predicted_output, epsilon, 1 - epsilon)\n    loss = - (actual_output * np.log(predicted_output) + (1 - actual_output) * np.log(1 - predicted_output))\n    return np.mean(loss)\n\ndef calculate_gradients(inputs, predicted_output, actual_output, weighted_sum):\n    error = predicted_output - actual_output  # dL/dy\n    d_loss_d_predicted = error\n    d_predicted_d_weighted_sum = sigmoid_derivative(weighted_sum)  # dy/dz\n  

In [40]:
import numpy as np
class Neuron:


    def __init__(self, x, y, learning_rate=0.1, epochs=1000, type= 'Regression'):
        '''Class Constructor'''

        self.x = x
        self.y = y
        self.learning_rate = learning_rate
        self.epochs = epochs
        self.type = type
        self.weights = np.random.rand(x.shape[1])
        self.bias = np.random.rand(1)
        self.loss = []
        self.predictions = []

    def compute_sigmoid(self, x):
        '''Compute Sigmoid function'''
        return 1 / (1 + np.exp(-x))
    
    def compute_wieghted_sum(self, inputs):
        '''Compute Weighted Sum'''
        return np.dot(inputs, self.weights) + self.bias
    def sigmoid_derivative(self,x):
        sig = self.compute_sigmoid(x)
        return sig * (1 - sig)
        
    def compute_loss(self, predicted_output, actual_output):
        '''Compute Loss'''
        if(self.type == 'Regression'):
            return np.mean((predicted_output - actual_output) ** 2)
        
        elif(self.type == 'Classification'):
            epsilon = 1e-15
            predicted_output = np.clip(predicted_output, epsilon, 1 - epsilon)
            return - (actual_output * np.log(predicted_output) + (1 - actual_output) * np.log(1 - predicted_output))
        
    def forward_propagation(self, input ):
        weighted_sum = self.compute_wieghted_sum(input)
        output = self.compute_sigmoid(weighted_sum)
        return output, weighted_sum

    def backward_propagation(self, grandients, bais_gradients):
        self.weights -= self.learning_rate * grandients
        self.bias -= self.learning_rate * bais_gradients      


    def calculate_gradients(self,inputs, predicted_output, actual_output, weighted_sum):
        error = predicted_output - actual_output  # dL/dy
        d_loss_d_predicted = error
        d_predicted_d_weighted_sum = self.sigmoid_derivative(weighted_sum)  # dy/dz
        d_weighted_sum_d_weights = inputs  # dz/dw
        d_weighted_sum_d_bias = 1  # dz/db

        gradients = d_loss_d_predicted * d_predicted_d_weighted_sum * d_weighted_sum_d_weights
        bias_gradient = d_loss_d_predicted * d_predicted_d_weighted_sum * d_weighted_sum_d_bias

        return gradients, bias_gradient
    
    
    def train(self):
        for epoch in range(self.epochs):
            total_loss = 0
            for i in range(len(self.x)):
                y_pred, wieghted_sum = self.forward_propagation(self.x[i])
                loss = self.compute_loss(y_pred, self.y[i])
                total_loss += loss
                gradients, bias_gradient = self.calculate_gradients(self.x[i], y_pred, self.y[i], self.compute_wieghted_sum(self.x[i]))
                self.backward_propagation(gradients, bias_gradient)
            
            average_loss = total_loss / len(self.x)
            self.loss.append(average_loss)
            print(f'Epoch: {epoch}, Loss: {average_loss}')


    def predict(self, x):
        return self.compute_sigmoid(self.compute_wieghted_sum(x))

    

In [41]:
import numpy as np
from sklearn.datasets import make_classification, make_regression

# Generate synthetic data for classification
def generate_classification_data(samples=100, features=2):
    x, y = make_classification(n_samples=samples, n_features=features, n_informative=2, n_redundant=0, n_classes=2, random_state=1)
    return x, y

# Generate synthetic data for regression
def generate_regression_data(samples=100, features=2):
    x, y = make_regression(n_samples=samples, n_features=features, noise=0.1, random_state=1)
    return x, y

# Example usage
# Classification data
x_class, y_class = generate_classification_data(samples=100, features=2)

# Regression data
x_reg, y_reg = generate_regression_data(samples=100, features=2)

# Display some examples
print("Classification Data (first 5 samples):")
print("X:\n", x_class[:5])
print("Y:\n", y_class[:5])

print("\nRegression Data (first 5 samples):")
print("X:\n", x_reg[:5])
print("Y:\n", y_reg[:5])


Classification Data (first 5 samples):
X:
 [[ 1.30022717 -0.7856539 ]
 [ 1.44184425 -0.56008554]
 [-0.84792445 -1.36621324]
 [-0.72215015 -1.41129414]
 [-1.27221465  0.25945106]]
Y:
 [1 1 0 0 0]

Regression Data (first 5 samples):
X:
 [[ 0.0465673   0.80186103]
 [-2.02220122  0.31563495]
 [-0.38405435 -0.3224172 ]
 [-1.31228341  0.35054598]
 [-0.88762896 -0.19183555]]
Y:
 [ 70.97621092 -37.8840955  -40.37691982 -12.10161298 -45.07000824]


In [42]:
# Create and train the neuron for classification
neuron_class = Neuron(x_class, y_class, learning_rate=0.1, epochs=5000, type='Classification')
neuron_class.train()

# Predict with new inputs for classification
new_inputs_class = np.array([0.5, 1.5])
predicted_output_class = neuron_class.predict(new_inputs_class)
print(f'Predicted Output for New Classification Inputs: {predicted_output_class}')


Epoch: 0, Loss: [0.47542201]
Epoch: 1, Loss: [0.30619705]
Epoch: 2, Loss: [0.24551082]
Epoch: 3, Loss: [0.21357947]
Epoch: 4, Loss: [0.19298725]
Epoch: 5, Loss: [0.17826019]
Epoch: 6, Loss: [0.1670539]
Epoch: 7, Loss: [0.15816256]
Epoch: 8, Loss: [0.15089044]
Epoch: 9, Loss: [0.14480316]
Epoch: 10, Loss: [0.13961342]
Epoch: 11, Loss: [0.13512256]
Epoch: 12, Loss: [0.13118821]
Epoch: 13, Loss: [0.12770541]
Epoch: 14, Loss: [0.12459479]
Epoch: 15, Loss: [0.12179516]
Epoch: 16, Loss: [0.11925847]
Epoch: 17, Loss: [0.11694638]
Epoch: 18, Loss: [0.11482792]
Epoch: 19, Loss: [0.11287775]
Epoch: 20, Loss: [0.11107491]
Epoch: 21, Loss: [0.1094019]
Epoch: 22, Loss: [0.107844]
Epoch: 23, Loss: [0.10638868]
Epoch: 24, Loss: [0.10502526]
Epoch: 25, Loss: [0.10374453]
Epoch: 26, Loss: [0.10253852]
Epoch: 27, Loss: [0.10140028]
Epoch: 28, Loss: [0.10032375]
Epoch: 29, Loss: [0.09930357]
Epoch: 30, Loss: [0.09833504]
Epoch: 31, Loss: [0.09741395]
Epoch: 32, Loss: [0.09653659]
Epoch: 33, Loss: [0.0956

In [43]:

# Create and train the neuron for regression
neuron_reg = Neuron(x_reg, y_reg, learning_rate=0.1, epochs=5000, type='Regression')
neuron_reg.train()

# Predict with new inputs for regression
new_inputs_reg = np.array([0.5, 1.5])
predicted_output_reg = neuron_reg.predict(new_inputs_reg)
print(f'Predicted Output for New Regression Inputs: {predicted_output_reg}')


Epoch: 0, Loss: 6091.3088439269095
Epoch: 1, Loss: 6086.9320751429395
Epoch: 2, Loss: 6086.530735379027
Epoch: 3, Loss: 6086.3355618105425
Epoch: 4, Loss: 6086.215697732002
Epoch: 5, Loss: 6086.133238350167
Epoch: 6, Loss: 6086.072381651666
Epoch: 7, Loss: 6086.025258931557
Epoch: 8, Loss: 6085.987474218022
Epoch: 9, Loss: 6085.956362336914
Epoch: 10, Loss: 6085.930205163109
Epoch: 11, Loss: 6085.907840806628
Epoch: 12, Loss: 6085.888452788704
Epoch: 13, Loss: 6085.871449000955
Epoch: 14, Loss: 6085.856388599412
Epoch: 15, Loss: 6085.842935953123
Epoch: 16, Loss: 6085.830830586591
Epoch: 17, Loss: 6085.819866956078
Epoch: 18, Loss: 6085.809880479638
Epoch: 19, Loss: 6085.80073766249
Epoch: 20, Loss: 6085.792328973962
Epoch: 21, Loss: 6085.7845636158145
Epoch: 22, Loss: 6085.777365617071
Epoch: 23, Loss: 6085.770670876431
Epoch: 24, Loss: 6085.764424892642
Epoch: 25, Loss: 6085.758581001795
Epoch: 26, Loss: 6085.7530989933575
Epoch: 27, Loss: 6085.747944012591
Epoch: 28, Loss: 6085.7430