## 什么是损失函数？

损失函数(loss)是`预测值`和`正确答案`的差距。神经网络的目标是找到一组参数可以让`预测值`无限接近`正确答案`，即损失函数最小的过程。

## 均方误差（mse）

$$
MSE(y_i, y) = \frac{\sum_{i=1}^n(y-y_i)^2}{n}
$$

```python
loss_mse = tf.reduce_mean(tf.square(y_-y))
```

### 练习题
目标：预测酸奶日销量y,影响酸奶日销量的因素包括了x1,x2

数据：每日的x1,x2和当天的销量y_（正确答案）

In [3]:
import numpy as np
import tensorflow as tf

SEED = 23455

rdm = np.random.RandomState(seed=SEED)
x = rdm.rand(32, 2)
y_ = [[x1 + x2 + (rdm.rand()/10.0-0.05)] for (x1, x2) in x]
x = tf.cast(x, dtype=tf.float32)

w1 = tf.Variable(tf.random.normal([2,1], stddev=1, mean=0, seed=1))

epoch = 15000
lr = 0.02

for epoch in range(epoch):
    with tf.GradientTape() as tape:
        y = tf.matmul(x, w1)
        loss_mse = tf.reduce_mean(tf.square(y_ - y))
    grads = tape.gradient(loss_mse, w1)
    w1.assign_sub(lr * grads)
    
    if epoch % 500 == 0:
        print('After {} training steps, w1 = {}'.format(epoch, w1.numpy()))

After 0 training steps, w1 = [[-0.18593146]
 [ 0.03002513]]
After 500 training steps, w1 = [[0.97760665]
 [1.0182234 ]]
After 1000 training steps, w1 = [[1.0006421]
 [0.9980455]]
After 1500 training steps, w1 = [[1.0038046]
 [0.9952709]]
After 2000 training steps, w1 = [[1.0042382]
 [0.9948903]]
After 2500 training steps, w1 = [[1.004296  ]
 [0.99483895]]
After 3000 training steps, w1 = [[1.004296  ]
 [0.99483895]]
After 3500 training steps, w1 = [[1.004296  ]
 [0.99483895]]
After 4000 training steps, w1 = [[1.004296  ]
 [0.99483895]]
After 4500 training steps, w1 = [[1.004296  ]
 [0.99483895]]
After 5000 training steps, w1 = [[1.004296  ]
 [0.99483895]]
After 5500 training steps, w1 = [[1.004296  ]
 [0.99483895]]
After 6000 training steps, w1 = [[1.004296  ]
 [0.99483895]]
After 6500 training steps, w1 = [[1.004296  ]
 [0.99483895]]
After 7000 training steps, w1 = [[1.004296  ]
 [0.99483895]]
After 7500 training steps, w1 = [[1.004296  ]
 [0.99483895]]
After 8000 training steps, w1 = 

可以看出w1中的两个元素在向1靠近，与方程$y = x1 + x2$一致

## 自定义损失函数

使用均方误差作为损失函数预测酸奶销量，默认认为是预测多了和预测少了的时候损失都是一样的。

但是真实的情况是，预测多了，损失成本；预测少了，损失利润。往往`利润`不等于`成本`。

自定义损失函数：

$$
f(y_i,y) = \begin{cases} PROFIT * (y_i - y) & y<y_i \\\\ COST*(y-y_i) & y>=y_i\end{cases}
$$

In [9]:
import numpy as np
import tensorflow as tf

SEED = 23455

rdm = np.random.RandomState(seed=SEED)
x = rdm.rand(32, 2)
y_ = [[x1 + x2 + (rdm.rand() / 10.0 - 0.05)] for (x1, x2) in x]
x = tf.cast(x, dtype=tf.float32)

w1 = tf.Variable(tf.random.normal([2, 1], stddev=1, mean=0, seed=1))

epoch = 15000
lr = 0.02
COST = 2.0
PROFIT = 10.0

for epoch in range(epoch):
    with tf.GradientTape() as tape:
        y = tf.matmul(x, w1)
        loss_mse = tf.reduce_sum(
            tf.where(y > y_, COST * (y - y_), PROFIT * (y_ - y)))
    grads = tape.gradient(loss_mse, w1)
    w1.assign_sub(lr * grads)

    if epoch % 500 == 0:
        print('After {} training steps, w1 = {}'.format(epoch, w1.numpy()))

After 0 training steps, w1 = [[3.3923972]
 [1.3141794]]
After 500 training steps, w1 = [[2.6693935]
 [2.8167872]]
After 1000 training steps, w1 = [[1.4506109]
 [1.515564 ]]
After 1500 training steps, w1 = [[3.8882358]
 [4.1180105]]
After 2000 training steps, w1 = [[2.6694531]
 [2.8167872]]
After 2500 training steps, w1 = [[1.4506705]
 [1.515564 ]]
After 3000 training steps, w1 = [[3.8882954]
 [4.1180105]]
After 3500 training steps, w1 = [[2.6695127]
 [2.8167872]]
After 4000 training steps, w1 = [[1.4507301]
 [1.515564 ]]
After 4500 training steps, w1 = [[3.888355 ]
 [4.1180105]]
After 5000 training steps, w1 = [[2.6695724]
 [2.8167872]]
After 5500 training steps, w1 = [[1.4507897]
 [1.515564 ]]
After 6000 training steps, w1 = [[3.8884146]
 [4.1180105]]
After 6500 training steps, w1 = [[2.669632 ]
 [2.8167872]]
After 7000 training steps, w1 = [[1.4508493]
 [1.515564 ]]
After 7500 training steps, w1 = [[3.8884742]
 [4.1180105]]
After 8000 training steps, w1 = [[2.6696916]
 [2.8167872]]
A

当利润>成本时，模型会往多的预测。

当利润<成本时，模型会往少的预测。

## 交叉熵

交叉熵CE：表达了两个概率分布之间的距离。

$$
H(y_i, y) = - \sum{y_i*\ln y}
$$

例如：二分类问题，答案是$y_i = (1, 0)$，预测结果是$y_1 = (0.6, 0.4)$ $y_2 = (0.8, 0.2)$那么哪个更接近标准答案？

$$
H_1{((1,0), (0.6,0.4))} = -(1*\ln{0.6} + 0*\ln{0.4}) = 0.511
$$

$$
H_2{((1,0), (0.8,0.2))} = -(1*\ln{0.8} + 0*\ln{0.2}) = 0.223
$$

$H_1 > H_2$，所以$y_2$更加准确

```python
tf.losses.categorical_crossentropy(y_, y)
```

In [5]:
import tensorflow as tf

loss_ce1 = tf.losses.categorical_crossentropy([0, 0, 1], [0.51, 0.32, 0.17])
loss_ce2 = tf.losses.categorical_crossentropy([0, 0, 1], [0.22, 0.73, 0.05])
loss_ce3 = tf.losses.categorical_crossentropy([0, 0, 1], [0.15, 0.17, 0.68])
loss_ce4 = tf.losses.categorical_crossentropy([0, 0, 1], [0.04, 0.03, 0.93])
print(loss_ce1)
print(loss_ce2)
print(loss_ce3)
print(loss_ce4)
print(tf.reduce_min([loss_ce1, loss_ce2, loss_ce3, loss_ce4]))

tf.Tensor(1.7719568, shape=(), dtype=float32)
tf.Tensor(2.9957323, shape=(), dtype=float32)
tf.Tensor(0.38566247, shape=(), dtype=float32)
tf.Tensor(0.07257068, shape=(), dtype=float32)
tf.Tensor(0.07257068, shape=(), dtype=float32)
