# 概要

https://www.section.io/engineering-education/understanding-loss-functions-in-machine-learning/

Loss functions play an important role in any statistical model - they define an objective which the performance of the model is evaluated against and the parameters learned by the model are determined by minimizing a chosen loss function.

Loss functions define what a good prediction is and isn’t. In short, choosing the right loss function dictates how well your estimator will be. This article will probe into loss functions, the role they play in validating predictions, and the various loss functions used.

Prerequisities
The reader is expected to have a faint idea of machine learning concepts such as regression and classification, and the basic building blocks that formulate a statistical model that can churn out predictions. Machine Learning Mastery has an excellent compilation of the concepts that would help in understanding this article.


# Table of contents
Introduction
Loss functions for regression
Loss functions for classification
Conclusion
Further reading

# 0. Introduction
A loss function takes a theoretical proposition to a practical one. Building a highly accurate predictor requires constant iteration of the problem through questioning, modeling the problem with the chosen approach and testing.

The only criteria by which a statistical model is scrutinized is its performance - how accurate the model’s decisions are. This calls for a way to measure how far a particular iteration of the model is from the actual values. This is where loss functions come into play.

Loss functions measure how far an estimated value is from its true value. A loss function maps decisions to their associated costs. Loss functions are not fixed, they change depending on the task in hand and the goal to be met.

# 1. Loss functions for regression
Regression involves predicting a specific value that is continuous in nature. Estimating the price of a house or predicting stock prices are examples of regression because one works towards building a model that would predict a real-valued quantity.

Let’s take a look at some loss functions which can be used for regression problems and try to draw comparisons among them.

## 1.1 L1范数损失 , L1-LossMean, Absolute Error (MAE)

Mean Absolute Error (also called L1 loss) is one of the most simple yet robust loss functions used for regression models.

Regression problems may have variables that are not strictly Gaussian in nature due to the presence of outliers (values that are very different from the rest of the data). Mean Absolute Error would be an ideal option in such cases because it does not take into account the direction of the outliers (unrealistically high positive or negative values).

As the name suggests, MAE takes the average sum of the absolute differences between the actual and the predicted values. For a data point xi and its predicted value yi, n being the total number of data points in the dataset, the mean absolute error is defined as:

$$MAE = \frac{1}{N}\sum\limits_{i=1}\limits^{N-1} |y_i - x_i|$$

In [8]:
import numpy as np
from sklearn.metrics import mean_absolute_error

def my_loss_l1(x,y):
    return np.mean(np.abs(x - y))

N = 10
x = np.random.randn(N)
y = np.random.randn(N)
print(type(x))
print(x)

my_mae = my_loss_l1(x,y)
sklearn_mae = mean_absolute_error(x, y)
print(my_mae, sklearn_mae)

<class 'numpy.ndarray'>
[-0.46473032  0.54652299  0.42551151 -1.34267024  1.44770848  0.28243456
  0.9549279  -0.67330887  0.87979148 -0.60954231]
1.1454870444295682 1.1454870444295682


# 1.2 L2范数损失, L2 loss, Mean Squared Error (MSE)

Mean Squared Error (also called L2 loss) is almost every data scientist’s preference when it comes to loss functions for regression. This is because most variables can be modeled into a Gaussian distribution.

Mean Squared Error is the average of the squared differences between the actual and the predicted values. For a data point Yi and its predicted value Ŷi, where n is the total number of data points in the dataset, the mean squared error is defined as:
$$MSE = \frac{1}{N}\sum\limits_{i=1}\limits^{N-1} |y_i - x_i|^2$$

In [13]:
import numpy as np
from sklearn.metrics import mean_squared_error

def my_loss_l2(x,y):
    return np.mean(np.square(x - y))

N = 10
#x = np.arange(N)
#y = np.zeros([N,])
x = np.random.randn(N)
y = np.random.randn(N)
print(type(x))
print(x)

my_mae = my_loss_l2(x,y)
sklearn_mae = mean_squared_error(x, y)
print(my_mae, sklearn_mae)

<class 'numpy.ndarray'>
[-1.02390269  0.78449111  0.12764909  0.60077756 -1.38303544 -1.78699437
 -0.8337004  -0.19166717 -0.02222612  0.55830641]
1.635810818506421 1.635810818506421


## 1.3 Mean Bias Error (MBE)

Mean Bias Error is used to calculate the average bias in the model. Bias, in a nutshell, is overestimating or underestimating a parameter. Corrective measures can be taken to reduce the bias post-evaluating the model using MBE.

Mean Bias Error takes the actual difference between the target and the predicted value, and not the absolute difference. One has to be cautious as the positive and the negative errors could cancel each other out, which is why it is one of the lesser-used loss functions.

The formula of Mean Bias Error is:

$$MBE = \frac{1}{N}\sum\limits_{i=1}\limits^{N-1} (y_i - x_i)$$

这个其实很难说是一种（有用的）损失函数，scikit-learn中没有定义这种损失函数。

In [19]:
import numpy as np

def my_mbe(x,y):
    return np.mean(x - y)

N = 10000
#x = np.arange(N)
#y = np.zeros([N,])
x = np.random.randn(N)
y = np.random.randn(N)

my_mbe = my_mbe(x,y)
print(my_mbe)

-0.009731471835487447


## 1.4 Mean Squared Logarithmic Error (MSLE)

Sometimes, one may not want to penalize the model too much for predicting unscaled quantities directly. Relaxing the penalty on huge differences can be done with the help of Mean Squared Logarithmic Error.

Calculating the Mean Squared Logarithmic Error is the same as Mean Squared Error, except the natural logarithm of the predicted values is used rather than the actual values.

$$MSE = \frac{1}{N}\sum\limits_{i=1}\limits^{N-1} |y_i - x_i|^2$$