# Let Neural Network Learn

Machine Learning:
* Extract characteristic quantities, which can present the essential character of the original pictures
* Use machine learning techniques to learn the patterns of the pictures.

But neural network can learn the characteristic of a picture directly, does not need people to come up with the idea about what should be used to present the characteristics.

One of the main boons of the neural network is that it has the same procedure to handle every kind of problems.

Training data is to train the network, testing data is to test those data that is not used in training. The final goal of training is to generalize, making the network work for all the input data.

Over fitting: a state that a network works extremely well for a specific data set but not good for other data sets.

## Loss Function
Loss function is a metric to represent how bad the network is. To be more specific, in what extent the network is not able to predict the training data set.

### Mean Squared Error
$$
E = \frac{1}{2}\sum\limits_k(y_k - t_k)^2
$$
* $y_k$ is the output of the network
* $t_k$ is the oversight data(data from the training data set)
* $k$ is the dimension count of the data

Example:

In [79]:
import numpy as np

y1 = np.array([0.1, 0.05, 0.6, 0.0, 0.05, 0.1, 0.0, 0.1, 0.0, 0.0])
y2 = np.array([0.1, 0.05, 0.1, 0.0, 0.05, 0.1, 0.0, 0.6, 0.0, 0.0])
t = np.array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0])

def mean_squared_error(y: np.ndarray, t: np.ndarray) -> np.float32:
    diff: np.ndarray = y - t
    diff *= diff
    return np.sum(diff) / 2

result: np.float32 = mean_squared_error(y1, t)
print("high accuracy:")
print(type(result))
print(result)

result: np.float32 = mean_squared_error(y2, t)
print("low accuracy:")
print(type(result))
print(result)

high accuracy:
<class 'numpy.float64'>
0.09750000000000003
low accuracy:
<class 'numpy.float64'>
0.5975


### Cross Entrophy Error
$$
E = -\sum\limits_k t_k\log y_k
$$

Because the $t_k$ only have one $t_i$ is 1, others are all 0. Therefore, $E$ is only calculating the $\log$ of only one $y_k$

In [80]:
def cross_entrophy_error(y: np.ndarray, t: np.ndarray) -> np.float32:
    y += 1e-7
    y = np.log(y)
    sum: np.float32 = np.sum(t * y)
    return -sum

result: np.float32 = cross_entrophy_error(y1, t)
print("high accuracy:")
print(type(result))
print(result)

result: np.float32 = cross_entrophy_error(y2, t)
print("low accuracy:")
print(type(result))
print(result)

high accuracy:
<class 'numpy.float64'>
0.510825457099338
low accuracy:
<class 'numpy.float64'>
2.302584092994546


### mini-batch training
We cannot just calculate every loss, add up and normalize. If the dataset is too big, do training for a time can take a long time.

The training of a neural network will choose a mini-batch, and let it learn based on the mini-batches

In [81]:
# mnist mini-batch
import sys, os
import numpy as np
from my_mnist import load_mnist

x_train: np.ndarray
t_train: np.ndarray
x_test: np.ndarray
t_test: np.ndarray
(x_train, t_train), (x_test, t_test) = load_mnist(one_hot_label=True)

# do a mini-batch of size of 10
batch_size: np.int32 = 10
mask: np.ndarray = np.random.choice(len(x_train), batch_size)
x_batch: np.ndarray = x_train[mask]
t_batch: np.ndarray = t_train[mask]
# this way to visit the elements in array is that only the True idx will be selected

In [83]:
# x_train, t_train, x_test, t_test have already been loaded previously
def mini_batch(size: int):
    mask:np.ndarray = np.random.choice(len(x_train), size)
    x_batch:np.ndarray = x_train[mask]
    t_batch:np.ndarray = t_train[mask]
    return x_batch, t_batch

# if t_train is in the form of one hot, cross entrophy is super easy
def batch_cross_entrophy_error(y: np.ndarray, t: np.ndarray) -> np.float32:
    batch_size: np.int32 = len(y)
    print(f"batch_size = {batch_size}")
    # shape of y and t is the same, therefore we do not neet to reshape and we can use the element wise operate directly
    y += 1e-7
    logy: np.ndarray = np.log(y)
    sum = np.sum(t * logy)
    sum /= batch_size # normalize
    return -sum

(y, t) = mini_batch(10)
result = batch_cross_entrophy_error(y, t)

batch_size = 10


ValueError: operands could not be broadcast together with shapes (10,10) (10,784) 