##1.인공신경망과 손실함수

###1-1.인공신경망

####기본용어(in visualized image)

* x : input

* w : weight

* b : bias

* f : activation function

* u : Net(선형결합)

* z : output




 **인공신경망 구성: input layer - hidden layer - output layer** 

**u=Wx+b, z=f(u)**

###1-2 손실함수(Loss function)

* 신경망에서 내놓는 결과값과 실제 결과값 사이의 차이를 정의하는 함수이며,  손실함수를 최소화하는 것이 학습의 목표이다.

regression의 경우 Mean Squqred Error, classification의 경우 Cross entropy를 사용한다.(activate function은 softmax)

###인공신경망&손실함수 code

In [0]:
import tensorflow as tf
import numpy as np
from tensorflow.keras.datasets import fashion_mnist
import matplotlib.pyplot as plt

def net(x, weights, biases):
  layer_1 = tf.add(tf.matmul(x,weights['w1']),biases['b1'])
  layer_1 = tf.nn.sigmoid(layer_1)

  layer_2 = tf.add(tf.matmul(layer_1,weights['w2']),biases['b2'])
  layer_2 = tf.nn.sigmoid(layer_2)

  out_layer = tf.add(tf.matmul(layer_2,weights['out']),biases['out'])

  return tf.nn.softmax(out_layer)

def cross_entropy(y_pred, y_true, num_classes):

  y_true = tf.one_hot(y_true, depth=num_classes)

  # min 1e-8, max 1.0 
  y_pred = tf.clip_by_value(y_pred, 1e-8, 1.)

  return -tf.reduce_mean(tf.reduce_sum(tf.multiply(y_true,tf.math.log(y_pred))))

def accuracy(y_pred, y_true):
  correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.cast(y_true, tf.int64))
  return tf.reduce_mean(tf.cast(correct_prediction, tf.float32), axis=-1)


def params_optimization(x, y, weights, biases, num_classes, learning_rate):
    # GradientTape()?
  with tf.GradientTape() as gt:
    pred = net(x, weights, biases)
    loss = cross_entropy(pred, y, num_classes)

    # update variable
    variables_to_update = list(weights.values()) + list(biases.values())

    # update gradient 
    grads = gt.gradient(loss, variables_to_update)

  optimizer = tf.optimizers.Adagrad(learning_rate)
  optimizer.apply_gradients(zip(grads, variables_to_update))



(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

num_classes = 10

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat','Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

num_features = 784

hidden_layer_1 = 64
hidden_layer_2 = 128

#hyperparameters
learning_rate = 0.05
training_steps = 3000
batch_size = 512
interval = 300

x_train, x_test = np.array(x_train, np.float32), np.array(x_test, np.float32)
#flatten
x_train, x_test = x_train.reshape([-1, num_features]), x_test.reshape([-1, num_features])

#  [0, 255] to [0, 1] Regularization
x_train, x_test = x_train / 255., x_test / 255.

# shuffle minibatch initialization.
train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_data = train_data.repeat().shuffle(10000).batch(batch_size).prefetch(1)


#randomNormal weight initialization.
random_normal = tf.random_normal_initializer()


weights = {
    'w1': tf.Variable(random_normal([num_features, hidden_layer_1])),
    'w2': tf.Variable(random_normal([hidden_layer_1, hidden_layer_2])),
    'out': tf.Variable(random_normal([hidden_layer_2, num_classes]))
}
biases = {
    'b1': tf.Variable(tf.zeros([hidden_layer_1])),
    'b2': tf.Variable(tf.zeros([hidden_layer_2])),
    'out': tf.Variable(tf.zeros([num_classes]))
}

for step, (batch_x, batch_y) in enumerate(train_data.take(training_steps), 1):
  
  params_optimization(batch_x, batch_y, weights, biases, num_classes, learning_rate)

  if step % interval == 0:  
    pred=net(batch_x, weights, biases)
    loss = cross_entropy(pred, batch_y, num_classes)

  acc = accuracy(net, batch_y)
  print("step: %i, loss: %f, accuracy: %f" % (step, loss, acc))
  pred = neural_net(x_test, weights, biases)
  print("Test Accuracy: %f" % accuracy(pred, y_test))
  


##2.학습 알고리즘

* Gradient Descent

* learning : step size

* θ = θ - learning_rate * gradient 

* level set,level curve(우리말로 하면 등고선)?

* gradient vector는 무조건 함수가 가장 크게 증가하는 방향(각 벡터성분의 편미분값의 양방향)으로 level set과 수직

* Gradient Descent는 gradient의 반대 방향으로 이동하는 것을 반복하여 손실함수의 값을 최소화하는 값을 찾음.

##3.역전파법

###3-1역전파법

* 계산그래프(computational graph)

 * 계산 과정을 그래프로 나타낸 것

 * node와 edge로 표현

 * node는 연산을, edge는 데이터가 흘러가는 방향을 나타냄

* 덧셈노드
 * z=x+y 의 함수식에서  x와 y에 대해서 편미분한 값을 곱해준다

* 곱셈노드 
 * z=xy 한다음에 편미분해서 곱해준다

* sigmoid function의 back propagation? (y=1/(1+exp(-x)) 함수 )