![image.png](img/Picture1.jpg)

In our previous discussion, we explained how to obtain the output of a perceptron. We multiply the inputs by the weights, sum them up, add the bias, and then pass the result through the activation function to obtain the output. We emphasized that the weights and bias are crucial in this process. Once we have determined the weights and bias, we can easily obtain the output by providing the desired inputs.

Let's illustrate this with an example of housing data. Suppose we have input data representing features of a house (such as square footage, ease of transportation, and age) and the corresponding outputs, which are the house prices. Initially, we randomly assign weights and obtain an output. The difference between this output and the actual house price determines how we update the weights.

Now, what happens when the obtained output does not match the actual value? This is where the weights are updated using a technique called backpropagation.

![image.png](img/Picture2.jpg)

To minimize the squared error as much as possible, we need to adjust the values of W (weights) and the bias by small increments or decrements. This process is known as a cost function, where our goal is to bring the squared error as close to zero as possible.

![image.png](img/Picture3.jpg)

Let's discuss the variety of activation functions. The relationship between your inputs and outputs can differ significantly. For example, when determining house prices, there is a nearly linearly increasing graph between the inputs (house size) and outputs (price). In this case, the relationship between inputs and outputs is linear. However, when classifying whether an image is a cat or a dog, the relationship between inputs and outputs is not linear. In such cases, where inputs are images and outputs are classes, different types of activation functions come into play. Examples include sigmoid or softmax activation functions.

Now, let's address why we square the difference between the predicted value (H(x(i))) and the actual value (Y(i)). If we aimed to equate the difference to zero directly, the resulting function would not be differentiable. By creating a differentiable function, we can find the point where the derivative is minimum, which provides us with the weight and bias values. Therefore, we square the difference to work with a differentiable function. When we plot the cost function graph, we can utilize the derivative of the graph to find the optimal weight and bias values. This process is called backpropagation.

![image.png](img/Picture4.jpg)

We have discussed activation functions and the backpropagation part. For more detailed information about backpropagation and gradient descent, you can refer to my article on Gradient Descent. It provides a deeper understanding of these concepts.

Let's now create an example perceptron to predict house prices.

In [16]:
import numpy as np

def perceptron(num_features, learning_rate=0.01, num_epochs=10):
        learning_rate = learning_rate
        num_epochs = num_epochs
        weights = np.zeros(num_features + 1)  # +1 for bias
        
def predict(inputs):
        summation = np.dot(inputs, self.weights[1:]) + self.weights[0]
        activation = summation
        return activation
    
def train(training_data, labels):
        for _ in range(self.num_epochs):
            for inputs, label in zip(training_data, labels):
                # Forward Propagation
                prediction = self.predict(inputs)

                # Backpropagation
                error = label - prediction #real_y - predicted_y
                weights[1:] += self.learning_rate * error * inputs
                weights[0] += self.learning_rate * error


    perceptron function initializes the perceptron with the given number of features, learning rate, and number of epochs. 
    It creates a variable weights as an array of zeros with a size of num_features + 1 to accommodate the bias term.

    predict function takes inputs as input features and calculates the predicted value using the perceptron's weights.
    Lineer function formula
    
    train function trains a perceptron model using the given training data and labels. It iterates through multiple epochs
    and updates the weights based on the prediction error using the backpropagation algorithm. The weights are adjusted by adding the product of the learning rate, error, and input features.
    

In [18]:
training_data = np.array([[120, 3, 15], [80, 2, 10], [150, 4, 20], [200, 5, 25]])  # House features (square meters, number of rooms, building age)
labels = np.array([300000, 200000, 350000, 400000])  # House prices

training_data = np.divide(training_data, 100)
labels = np.divide(labels, 100) # We need to divide each value

perceptron = Perceptron(num_features=3)#To ensure easier calculations and avoid large numbers during multiplication and summation, we divide each number by 100. This allows for more manageable computations.
perceptron.train(training_data, labels)

# Test
test_data = np.array([[100, 2, 12], [160, 3, 18], [180, 4, 22]])
test_data = np.divide(test_data, 100)
for data in test_data:
    prediction = perceptron.predict(data)
    print(f"House Features: {data*100} => Predicted Price: {prediction*100}")

House Features: [100.   2.  12.] => Predicted Price: 254669.89823791108
House Features: [160.   3.  18.] => Predicted Price: 347633.97454922576
House Features: [180.   4.  22.] => Predicted Price: 379041.6811784502
