***Loss Function vs Cost Function:***<br>
Loss functions are used to define **the error** between the predicted values and the actual values. Whereas, the **cost function is the average of all the losses for a given dataset.** Example: In linear regression, the loss function is directlty proportional to the square of difference of the predicted value and the actual value and its cost function is the average of all the losses i.e., it is the sum of squares of the difference between the actual values and the predicted values divided by the total number of values.

***Loss Functions:***<br>
1. MSE (Mean Squared Error) and MAE (Mean Absolute Error):<br>
Mean Absolute Error (MAE): This measures the absolute average distance between the real data and the predicted data, but it fails to punish large errors in prediction.<br>
Mean Square Error (MSE): This measures the squared average distance between the real data and the predicted data. Here, larger errors are well noted (better than MAE). But the disadvantage is that it also squares up the units of data as well. So, evaluation with different units is not at all justified.<br>
**MSE helps Gradient Descent to converge in a better way and thus training will be better when compared to MAE.**<br>
Note that MSE do not have the units of the error and hence Root Mean Squared Error is used for training in linear regressions, random forests etc but again, it is necessary for us to try every method to get the best model.<br>
2. Log Loss or Binary Cross Entropy:<br>
<img src='binary_entropy.jpg' width=500><br>
***For Logistic Regression, we use binary cross entropy and not MSE.*** For more information, click on the [link](https://towardsdatascience.com/why-not-mse-as-a-loss-function-for-logistic-regression-589816b5e03c)

**Note that cost function is avg of all the losses.**

***Epoch and Forward Passing:***<br>Consider a dataset in which you need to predict whether a person buys insurance or not based on the input such as age and affordability.<br> First, you need to select a loss function which will be used to calculate all the individual losses and then find the average of it. The forward process in training is called ***Forward Passing.*** Forward Passing of all the samples for one time is called ***One Epoch.***

In [1]:
# Implementing Loss functions:

In [2]:
import numpy as np

In [22]:
y_predicted=np.array([1,1,0,0,1])
y_true=np.array([0.3,0.7,1,0,0.5])

In [4]:
# MAE (using for loop):

In [7]:
def MAE(y_true,y_predicted):
    total=0
    for y_pred,y_tr in zip(y_predicted,y_true): # Zip function allows to iterate multiple arrays in a single loop
        total+=abs(y_pred-y_tr)
    mae=total/len(y_true)
    return mae

In [8]:
MAE(y_true,y_predicted)

0.5

In [9]:
# MAE using numpy:

In [12]:
mae=np.mean(np.abs(y_true-y_predicted))
mae

0.5

In [13]:
# Similarly you can do for MSE.

In [14]:
# Binary Cross Entropy:

In [15]:
# Since the binary cross entropy uses values like 'y_pred' and '1-y_pred' and the logarithm of it with y as 0 or 1 gives you
# infinite, we need to to change the value 0 to near zero like 10^-15 and change the value 1 to near one like 1-10^-15

In [17]:
y_predicted

array([1, 1, 0, 0, 1])

In [18]:
# Define a variable with value 10^-15
x=1e-15
x

1e-15

In [23]:
y_predicted=[max(i,x) for i in y_predicted]
y_predicted

[1, 1, 1e-15, 1e-15, 1]

In [24]:
y_predicted=[min(i,1-x) for i in y_predicted]
y_predicted

[0.999999999999999, 0.999999999999999, 1e-15, 1e-15, 0.999999999999999]

In [25]:
# Now that you converted the predicted array, we can implement the formula to calculate the binary entropy loss

In [29]:
binary_entropy_loss=-np.mean(y_true*np.log(y_predicted)+(np.ones(len(y_true))-y_true)*np.log(np.ones(len(y_true))-y_predicted))
binary_entropy_loss

17.2696280766844

***use Binary cross entropy in case of Binary classification only. In multi class classification we need to use categorical cross entropy or sparse categorical entropy.***