## MSE
- learning_rate 更新步長
- n 是 batch

$loos = \frac{1}{n} \sum(y-out)^2$

$L_{2-norm} = \sqrt{\sum(y-out)^2}$

In [1]:
import tensorflow as tf

In [2]:
y = tf.constant([1,2,3,0,2])
y = tf.one_hot(y, depth=4)
y = tf.cast(y, dtype=tf.float32)

In [3]:
out = tf.random.normal([5,4])

In [4]:
loss1 = tf.reduce_mean(tf.square(y-out))

In [5]:
loss2 = tf.square(tf.norm(y-out))/(5*4)

In [6]:
loss3 = tf.reduce_mean(tf.losses.MSE(y, out))

In [7]:
print(loss1, loss2, loss3)

tf.Tensor(1.3709277, shape=(), dtype=float32) tf.Tensor(1.3709276, shape=(), dtype=float32) tf.Tensor(1.3709278, shape=(), dtype=float32)


## Entropy
他是衡量一個*不確定度*或驚喜度，當越低表示訊息越多，越不穩定。
$Entropy = - \sum P(i)logP(i)$

$H(p,q) = - \sum p(x)logq(x) = H(p) + D_{KL}(p|q)$
- p 真實分布
- q 預測

In [8]:
a = tf.fill([4], 0.35)
a * tf.math.log(a)/tf.math.log(2.)

<tf.Tensor: id=36, shape=(4,), dtype=float32, numpy=array([-0.5301006, -0.5301006, -0.5301006, -0.5301006], dtype=float32)>

In [9]:
-tf.reduce_sum(a * tf.math.log(a)/tf.math.log(2.)) # 各中獎機率是平等，表示訊息量少，商越大

<tf.Tensor: id=44, shape=(), dtype=float32, numpy=2.1204023>

In [10]:
a = tf.constant([0.01, 0.2, 0.01, 0.98])
-tf.reduce_sum(a * tf.math.log(a)/tf.math.log(2.)) # 其中一個中獎機率是高的，表示訊息量多，商越小

<tf.Tensor: id=53, shape=(), dtype=float32, numpy=0.6258261>

In [12]:
tf.losses.categorical_crossentropy([0,1,0,0],[0.25, 0.25, 0.25, 0.25])

<tf.Tensor: id=70, shape=(), dtype=float32, numpy=1.3862944>

tf.losses.categorical_crossentropy([0,1,0,0],[0.1, 0.2, 0.9, 0.5])

In [16]:
tf.losses.categorical_crossentropy([0,1,0,0],[0.01, 0.99, 0.01, 0.01])

<tf.Tensor: id=138, shape=(), dtype=float32, numpy=0.02985293>

## Cross Entropy loss
- binary
- multi\-class
- \+softmax


#### MSE

In [18]:
x = tf.random.normal([2,4])
w = tf.random.normal([4,3])
b = tf.zeros([3])
y = tf.constant([2,0])

In [20]:
with tf.GradientTape() as tape:
    tape.watch([w,b]) # 如果 w 和 b 有使用 variable 包住，則不用 watch
    prob = tf.nn.softmax(x@w+b, axis=1)
    loss = tf.reduce_mean(tf.losses.MSE(tf.one_hot(y, depth=3), prob))
grads = tape.gradient(loss, [w, b])

In [21]:
grads[0]

<tf.Tensor: id=202, shape=(4, 3), dtype=float32, numpy=
array([[ 7.3738270e-03, -7.8342352e-03,  4.6041107e-04],
       [ 9.6618682e-03, -9.7540161e-03,  9.2150920e-05],
       [ 5.4713968e-02, -5.5656407e-02,  9.4246038e-04],
       [-8.7275863e-02,  8.9574240e-02, -2.2984096e-03]], dtype=float32)>

In [22]:
grads[1] # bias

<tf.Tensor: id=201, shape=(3,), dtype=float32, numpy=array([-0.04697574,  0.04825434, -0.00127862], dtype=float32)>