# **Loss Functions**

## **Imports**

In [1]:
import numpy as np

## **Sitting Smaple Data**

In [2]:
y_predicted = np.array([1,1,0,0,1])
y_true = np.array([0.30,0.7,1,0,0.5])

## **Implement Mean Absolute Error (MAE)**

In [3]:
def mae(y_predicted,y_true):
    
    total_error = 0
    for yp, yt in zip(y_predicted, y_true):
        total_error += abs(yp - yt)
    print("Total error is:",total_error)
    
    mae = total_error/len(y_predicted)
    print("Mean absolute error is:",mae)
    
    return mae

In [4]:
mae(y_predicted, y_true)

Total error is: 2.5
Mean absolute error is: 0.5


0.5

### **Implementing MAE by Numpy**

$\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} \left| y_{\text{predicted},i} - y_{\text{true},i} \right|
$

In [6]:
def mae_numpy(y_predicted,y_true):
    return np.mean(np.abs(y_predicted-y_true))

In [7]:
mae_numpy(y_predicted,y_true)

0.5

## **Mean Squared Error**

$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} \left( y_{\text{predicted},i} - y_{\text{true},i} \right)^2
$

In [8]:
def mse(y_predicted,y_true):
    
    total_error = 0
    for yp, yt in zip(y_predicted, y_true):
        total_error += (yp - yt) ** 2
    print("Total error is:",total_error)
    
    mse = total_error/len(y_predicted)
    print("Mean absolute error is:",mse)
    
    return mse

In [9]:
mse(y_predicted,y_true)

Total error is: 1.83
Mean absolute error is: 0.366


0.366

### **MSE Implentaion Using Numpy**
* This type of error or loss function increases the penality.

In [10]:
def mse_numpy(y_predicted, y_true):
    return np.mean((y_predicted-y_true)**2)

In [11]:
mse_numpy(y_predicted, y_true)

0.366

## **Conclusion:** 
* **Mean Squared Error (MSE)**: Squaring emphasizes larger discrepancies between predictions and actual values, which can be useful when you want to heavily penalize larger mistakes. However, it might not be ideal if your model is overly influenced by outliers.

* In contrast, the **Mean Absolute Error (MAE)** does not square the errors and instead uses the absolute value, which gives equal weight to all errors, regardless of size. Therefore, MSE tends to increase penalties for large deviations more than MAE does.

## **Log Loss (Binary Cross-Entropy)**
$\text{Log Loss} = - \frac{1}{n} \sum_{i=1}^{n} \left( y_{\text{true},i} \cdot \log(y_{\text{predicted},i}) + (1 - y_{\text{true},i}) \cdot \log(1 - y_{\text{predicted},i}) \right)$

In [15]:
def log_loss(y_predicted, y_true):
    epsilon = 1e-15 #To avoid log(0) errors
    y_predicted = np.clip(y_predicted, epsilon, 1 - epsilon) #To avoid undefined valued like:log(0)
    return -np.mean(y_true * np.log(y_predicted) + (1 - y_true) * np.log(1 - y_predicted))    

In [16]:
log_loss(y_predicted,y_true)

17.2696280766844

### **Explanation:**
* $ y_{\text{true}} \cdot \log(y_{\text{predicted}})$:

    * This part handles the loss when the true label is 1. It calculates the log of the predicted probability for the positive class and multiplies it by the true label.
    * If the true label is 1 and the predicted probability is low, the log term will be a large negative number, increasing the overall loss.

* $(1 - y_{\text{true}}) \cdot \log(1 - y_{\text{predicted}})$:
    * This part handles the loss when the true label is 0. It calculates the log of the predicted probability for the negative class (i.e., 1 minus the predicted probability) and multiplies it by $(1-y_{\text{true}})$
    * If the true label is 0 and the predicted probability is close to 1, the loss will increase significantly.(since $\log(1 - y_{\text{predicted}})$ will become a large negative number).
* **Sum the Two Terms**:

    * These two terms are summed to get the total log loss for each prediction, which measures how well the predicted probabilities match the true binary labels.

* **Take the Mean:**

    * `np.mean()`: averages the log loss over all samples in the dataset, giving a single value that represents the model's overall performance.

* **Negative Sign:**

    * The result is negated (-np.mean()) because log probabilities are typically negative, and we want the final loss to be positive.


## **Summary:**
* **Mean Squared Error (MSE)**: is designed for regression problems where the output is continuous, not for probabilistic binary classification.
* **Log Loss (Binary Cross-Entropy)**: is specifically designed to measure how well a model's predicted probabilities align with the true binary labels, making it the ideal choice for logistic regression.
* Hence, Log Loss is used instead of MSE for logistic regression, ensuring better optimization, proper gradient updates, and compatibility with the probabilistic nature of the task.