# Neural Network Training

In this chapter, "training" refers to automatixcally acquiring the optimal values of the weight parameters from the training data.
The goal of training is to find the weight parameters that minimize the result of the loss function, which is a metric(지표) that allows neural networks to train.

## Loss function

In the Neural Network Training, the loss function serves as a metric used to search for optimal parameter values.
The loss function quantifies the difference between the model(=Neural Network)'s predictions and actual truth, thus playing a crucial role in evaluating the model's performance and suggesting improvement directions.
By updating the model's parameters in the direction of minimizing the loss function, the predictive performance of the model is enhanced.
As a loss function, arbitrary functions(임의의 함수) can be used, but generally used SSE(sum of squares of error)(오차제곱합) and CEE(cross entrophy error)(교차 엔트로피 오차).

1. SSE

In [None]:
def sum_square_error(y, t):
    return 0.5 * np.sum((y-t)**2)

2. CEE
When the np.log() function is given 0 as input, it returns negative infinity(-inf), making further calculations impossible. To prevent this, a very small value, often denoted as delta, is added.

In [None]:
def cross_entropy_error(y, t):
    delta = 1e-7
    return -np.sum(t * np.log(y + delta))

3. Mini-batch

Calculating the total loss function for a dataset containing a large amount of training data is inefficient.
Therefore, using randomly selected subsets of data, called "mini-batches", to approximate the total loss functions and using them for training is called "mini-batch training".

In [6]:
import sys, os
sys.path.append(os.pardir)
import numpy as np
from dataset.mnist import load_mnist

(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True) 
#To use one-hot encoding, one_hot_label=True

print(x_train.shape)
print(t_train.shape)

train_size = x_train.shape[0]
batch_size = 10
batch_mask = np.random.choice(train_size, batch_size)
x_batch = x_train[batch_mask]
t_batch = t_train[batch_mask]

print(x_batch.shape)
print(t_batch.shape)

(60000, 784)
(60000, 10)
(10, 784)
(10, 10)


In [None]:
import sys, os
sys.path.append(os.pardir)
from common.functions import *
from common.gradient import numerical_gradient

class TwoLayerNet:
    def __init__(self, input_size, hidden_size, output_size, weight_init_std=0.01):
        self.params = {}
        self.params['W1'] = weight_init_std * np.random.randn(input_size, hidden_size