# **Lecture 06: Softmax Regression**

> **Multinomial Classification**  
- Binary classfication 모델을 여러 개 만들어서 여러가지 군으로 분리하는 것
- 2차원 행렬을 연산하여 진행한다
- ex) 3개의 label을 분류한다 했을 때는 $w_{11}x_{1} + w_{12}x_{2} + w_{13}x_{3}, w_{21}x_{1} + w_{22}x_{2} + w_{23}x_{3}, w_{31}x_{1} + w_{32}x_{2} + w_{33}x_{3} \cdots$ 이런식으로 나열해야 한다 하지만 행렬 연산을 진행 했을땐,  $\left[\begin{matrix} w_{11} w_{12} w_{13}\\ w_{21} w_{22} w_{23}\\ w_{31} w_{32} w_{33}\\ \end{matrix}\right] \cdot \left[\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \\ \end{matrix}\right]$ 로 표현 가능함

> **Softmax**  
- 어떤 입력 값에 대해서 각각의 원소에 대한 확률을 나타내주는 형태의 sigmoid 대체 함수
- 값들은 0과 1사이이다
- 각각의 원소의 합들이 1로 나타내어 진다  
- $S(y_{i}) = \frac{e^{y_{i}}}{\sum{e^{y_{j}}}}$  

> **One-Hot Encoding**  
- Softmax 함수를 거쳐서 나온 값중 가장 큰 값의 확률을 1로 바꾸는 과정  
- tensorflow에서는 argmax라는 함수가 담당한다  


> **Cross Entropy**  
- Multi label classification에서 cost
- -$\sum\limits_{i} L_{i}\log{S_{i}} = \sum\limits_{i} L_{i} * (-\log{S_{i}})$  

> **Cost Function**  
- $Loss = \frac{1}{N} \sum\limits_{i} D({S(wx_{i} + b), L_{i}})$  
- 여기서 $S(wx_{i} + b)$는 y 값, $L_{i}$는 확률 값이다  

> **Gradient Descent**  
- 이번엔 각각의 weight 벡터에 대한 gradient의 편미분을 말하는 것이다  


# **Lab 06-1: Softmax Classification Eager**

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

tf.random.set_seed(777)  # for reproducibility

> **Data를 벡터 형태로 담는다**  
**nb_class**  
&rarr; nb_class는 몇개의 sector로 분류할 것인지에 대한 변수로, 행의 개수를 정의하는 것이다


In [None]:
#Data
x_data = [[1, 2, 1, 1],
          [2, 1, 3, 2],
          [3, 1, 3, 4],
          [4, 1, 5, 5],
          [1, 7, 5, 5],
          [1, 2, 5, 6],
          [1, 6, 6, 6],
          [1, 7, 7, 7]]
y_data = [[0, 0, 1],
          [0, 0, 1],
          [0, 0, 1],
          [0, 1, 0],
          [0, 1, 0],
          [0, 1, 0],
          [1, 0, 0],
          [1, 0, 0]]

#convert into numpy and float format
x_data = np.asarray(x_data, dtype=np.float32)
y_data = np.asarray(y_data, dtype=np.float32)

#nb_classes
nb_classes = 3
print(x_data.shape)
print(y_data.shape)

> **Hyopothesis에 들어갈 weight값과 bias 설정**

In [None]:
W = tf.Variable(tf.random.normal((4, nb_classes)), name='weight')
b = tf.Variable(tf.random.normal((nb_classes,)), name='bias')
variables = [W, b]

print(W,b)

> **Hypothesis**

In [None]:
def hypothesis(X):
    return tf.nn.softmax(tf.matmul(X, W) + b)

print(hypothesis(x_data))

> **Softmax funciton**

In [None]:
sample_db = [[8,2,1,4]]
sample_db = np.asarray(sample_db, dtype=np.float32)


print(hypothesis(sample_db))

> **Cost function**

In [None]:
def cost_fn(X, Y):
    logits = hypothesis(X)
    cost = -tf.reduce_sum(Y * tf.math.log(logits), axis=1)
    cost_mean = tf.reduce_mean(cost)

    return cost_mean

print(cost_fn(x_data, y_data))

>**Gradient Tapee**

In [None]:
x = tf.constant(3.0)
with tf.GradientTape() as g:
    g.watch(x)
    y = x * x # x^2
dy_dx = g.gradient(y, x) # Will compute to 6.0
print(dy_dx)

> **Gradient**

In [None]:
def grad_fn(X, Y):
    with tf.GradientTape() as tape:
        loss = cost_fn(X, Y)
        grads = tape.gradient(loss, variables)

        return grads

print(grad_fn(x_data, y_data))

>**Model fitting**

In [None]:
def fit(X, Y, epochs=2000, verbose=100):
    optimizer =  tf.keras.optimizers.SGD(learning_rate=0.1)

    for i in range(epochs):
        grads = grad_fn(X, Y)
        optimizer.apply_gradients(zip(grads, variables))
        if (i==0) | ((i+1)%verbose==0):
            print('Loss at epoch %d: %f' %(i+1, cost_fn(X, Y).numpy()))

fit(x_data, y_data)

>**Argmax를 이용한 정확도 측정**

In [None]:
sample_data = [[2,1,3,2]] # answer_label [[0,0,1]]
sample_data = np.asarray(sample_data, dtype=np.float32)

a = hypothesis(sample_data)

print(a)
print(tf.argmax(a, 1)) #index: 2

b = hypothesis(x_data)
print(b)
print(tf.argmax(b, 1))
print(tf.argmax(y_data, 1)) # matches with y_data

# **Lab 06-2: Softmax Zoo Classifier-Eager**

> **tf.onehot()**  
&rarr; 내가 원하는 행의 개수 만큼 행렬을 변환해주는 method  
&rarr; 3차원으로 반환  
**tf.reshape()**  
&rarr; 원하는 형태의 행렬로 재배열 해준다  
&rarr; 앞서 3차원으로 반환 되었지만 2차원 형태로 model fitting을 해야하기 때문에 reshape를 사용한다


In [None]:
xy = np.loadtxt('data-04-zoo.csv', delimiter=',', dtype=np.float32)
x_data = xy[:, 0:-1]
y_data = xy[:, -1]

nb_classes = 7  # 0 ~ 6

# Make Y data as onehot shape
#2차원에서 3차원으로 변환
#tf.one_hot()을 쓰면 3차원으로 반환을 해주기 때문에
Y_one_hot = tf.one_hot(y_data.astype(np.int32), nb_classes)
Y_one_hot = tf.reshape(Y_one_hot, [-1, nb_classes])
print(x_data.shape, Y_one_hot.shape)

> **Weight와 Bias**   
&rarr; 전과 같음  
**Logit Function, Hypothesis**  
&rarr; 나중에 정확도 관련 값을 구할때 필요하기 때문에 따로 정의한다  
**Cross Entropy**  
&rarr; tf.keras.losses.categorical_crossentropy() 이용  
**tf.argmax()**  
&rarr; 가장 큰값을 가지는 index를 리턴해준다  
**Prediction**  
&rarr; Accuracy를 알려주는 함수  

In [None]:
#Weight and bias setting
W = tf.Variable(tf.random.normal((16, nb_classes)), name='weight')
b = tf.Variable(tf.random.normal((nb_classes,)), name='bias')
variables = [W, b]

# tf.nn.softmax computes softmax activations
# softmax = exp(logits) / reduce_sum(exp(logits), dim)

#####logit과 hypothesis를 다르게 함
def logit_fn(X):
    return tf.matmul(X, W) + b

def hypothesis(X):
    return tf.nn.softmax(logit_fn(X))

def cost_fn(X, Y):
    logits = logit_fn(X)
    cost_i = tf.keras.losses.categorical_crossentropy(y_true=Y, y_pred=logits,
                                                      from_logits=True)
    cost = tf.reduce_mean(cost_i)
    return cost

#이전과 동일
def grad_fn(X, Y):
    with tf.GradientTape() as tape:
        loss = cost_fn(X, Y)
        grads = tape.gradient(loss, variables)
        return grads

#정확도를 나타내주는것이 추가됨
#tf.argmax 알아보기
def prediction(X, Y):
    pred = tf.argmax(hypothesis(X), 1)
    correct_prediction = tf.equal(pred, tf.argmax(Y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    return accuracy

In [None]:
def fit(X, Y, epochs=1000, verbose=100):
    optimizer =  tf.keras.optimizers.SGD(learning_rate=0.1)

    for i in range(epochs):
        grads = grad_fn(X, Y)
        optimizer.apply_gradients(zip(grads, variables))
        if (i==0) | ((i+1)%verbose==0):
#             print('Loss at epoch %d: %f' %(i+1, cost_fn(X, Y).numpy()))
            acc = prediction(X, Y).numpy()
            loss = cost_fn(X, Y).numpy()
            print('Steps: {} Loss: {}, Acc: {}'.format(i+1, loss, acc))

fit(x_data, Y_one_hot)

# **Lab 07-1: Learning Rate and Evaluation**

> **Learning Rate**  
&rarr; Learnging Rate 값이 크면 Overshootiing 현상이 생긴다. 즉, 다음 weight 값이 최소 점보다 더 멀리 나아가 발산한다.  
&rarr; Laerning Rate 값이 작으면 시간이 오래 걸려 Overfitting이나 발산형태로 나아간다.  


> **Learning Rate Decay**  
&rarr; Learning Rate를 일정 epoch 마다 값을 바꿔주는 과정  
$\alpha = \frac{\alpha_{0}}{1+kt}$  
$\alpha = \alpha_{0}e^{-kt}$  


In [None]:
x_train = [[1, 2, 1],
          [1, 3, 2],
          [1, 3, 4],
          [1, 5, 5],
          [1, 7, 5],
          [1, 2, 5],
          [1, 6, 6],
          [1, 7, 7]]

y_train = [[0, 0, 1],
          [0, 0, 1],
          [0, 0, 1],
          [0, 1, 0],
          [0, 1, 0],
          [0, 1, 0],
          [1, 0, 0],
          [1, 0, 0]]

# Evaluation our model using this test dataset
x_test = [[2, 1, 1],
          [3, 1, 2],
          [3, 3, 4]]
y_test = [[0, 0, 1],
          [0, 0, 1],
          [0, 0, 1]]


x1 = [x[0] for x in x_train]
x2 = [x[1] for x in x_train]
x3 = [x[2] for x in x_train]

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x1, x2, x3, c=y_train, marker='^')

ax.scatter(x_test[0][0], x_test[0][1], x_test[0][2], c="black", marker='^')
ax.scatter(x_test[1][0], x_test[1][1], x_test[1][2], c="black", marker='^')
ax.scatter(x_test[2][0], x_test[2][1], x_test[2][2], c="black", marker='^')


ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')

plt.show()

> **필요한 함수 정의**

In [None]:
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(len(x_train))#.repeat()

W = tf.Variable(tf.random.normal((3, 3)))
b = tf.Variable(tf.random.normal((3,)))

def softmax_fn(features):
    hypothesis = tf.nn.softmax(tf.matmul(features, W) + b)
    return hypothesis

def loss_fn(hypothesis, features, labels):
    cost = tf.reduce_mean(-tf.reduce_sum(labels * tf.math.log(hypothesis), axis=1))
    return cost

> **Define Accuracy**

In [None]:
def accuracy_fn(hypothesis, labels):
    prediction = tf.argmax(hypothesis, 1)
    is_correct = tf.equal(prediction, tf.argmax(labels, 1))
    accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))
    return accuracy

def grad(hypothesis, features, labels):
    with tf.GradientTape() as tape:
        loss_value = loss_fn(softmax_fn(features),features,labels)
    return tape.gradient(loss_value, [W,b])

In [None]:
"""
0: Exponential Decay
1: Inverse Time Decay
2: Cosine Daecay
3: Piecewise Decay

"""
def learningRate(command):
    if command ==0:

        learning_rate = tf.keras.optimizers.schedules.ExponentialDecay(initial_learning_rate=0.1,
                                                                    decay_steps=100,
                                                                    decay_rate=0.96,
                                                                    staircase=True)
        optimizer = tf.keras.optimizers.SGD(learning_rate)

    elif command == 1:
        learning_rate = tf.keras.optimizers.schedules.InverseTimeDecay(initial_learning_rate=0.1,
                                                                   decay_steps=100,
                                                                   decay_rate=0.96,
                                                                   staircase=True)
        optimizer = tf.keras.optimizers.SGD(learning_rate)

    elif command == 2:
        learning_rate = tf.keras.optimizers.schedules.CosineDecay(initial_learning_rate=0.1,
                                                    decay_steps=100,
                                                    alpha=0.0)

        optimizer = tf.keras.optimizers.SGD(learning_rate)

    elif command == 3:
        boundaries = [300, 800]
        values = [1.0, 0.5, 0.1]
        learning_rate = tf.keras.optimizers.schedules.PiecewiseConstantDecay(boundaries, values)

        optimizer = tf.keras.optimizers.SGD(learning_rate)

    else:
        optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)

    return optimizer





> **Exponential Decay**  
- $\alpha = \alpha_{0}e^{-kt}$  


In [None]:
"""
0: Exponential Decay
1: Inverse Time Decay
2: Cosine Daecay
3: Piecewise Decay

"""

EPOCHS = 1001

for step in range(EPOCHS):
    for features, labels  in iter(dataset):
        features = tf.cast(features, tf.float32)
        labels = tf.cast(labels, tf.float32)
        grads = grad(softmax_fn(features), features, labels)
        optimizer = learningRate(0)

        optimizer.apply_gradients(grads_and_vars=zip(grads,[W,b]))
        if step % 100 == 0:
            print("Iter: {}, Loss: {:.4f}".format(step, loss_fn(softmax_fn(features),features,labels)))
x_test = tf.cast(x_test, tf.float32)
y_test = tf.cast(y_test, tf.float32)
test_acc = accuracy_fn(softmax_fn(x_test),y_test)
print("Testset Accuracy: {:.4f}".format(test_acc))

> **Inverse Time Decay**  
&rarr; $\alpha = \frac{\alpha_{0}}{1+kt}$

In [None]:
"""
0: Exponential Decay
1: Inverse Time Decay
2: Cosine Daecay
3: Piecewise Decay

"""

EPOCHS = 1001

for step in range(EPOCHS):
    for features, labels  in iter(dataset):
        features = tf.cast(features, tf.float32)
        labels = tf.cast(labels, tf.float32)
        grads = grad(softmax_fn(features), features, labels)
        optimizer = learningRate(1)

        optimizer.apply_gradients(grads_and_vars=zip(grads,[W,b]))
        if step % 100 == 0:
            print("Iter: {}, Loss: {:.4f}".format(step, loss_fn(softmax_fn(features),features,labels)))
x_test = tf.cast(x_test, tf.float32)
y_test = tf.cast(y_test, tf.float32)
test_acc = accuracy_fn(softmax_fn(x_test),y_test)
print("Testset Accuracy: {:.4f}".format(test_acc))

> **Cosine Annealing**  
-  $\alpha = \alpha_{min}^{i} + \frac{1}{2}(\alpha_{max}^{i} - \alpha_{min}^{i})(1+ \cos(\frac{T_{current}}{T_{i}}\pi))$  
-  $\alpha_{min}, \alpha_{max}$: 학습전 설정된 learning rate의 최대 최소값  
- $T_{current}$: 현재 Epoch  
- $T_{i}$: Cosine Annealing을 실행하는 주기    

In [None]:
"""
0: Exponential Decay
1: Inverse Time Decay
2: Cosine Daecay
3: Piecewise Decay

"""

EPOCHS = 1001

for step in range(EPOCHS):
    for features, labels  in iter(dataset):
        features = tf.cast(features, tf.float32)
        labels = tf.cast(labels, tf.float32)
        grads = grad(softmax_fn(features), features, labels)
        optimizer = learningRate(2)

        optimizer.apply_gradients(grads_and_vars=zip(grads,[W,b]))
        if step % 100 == 0:
            print("Iter: {}, Loss: {:.4f}".format(step, loss_fn(softmax_fn(features),features,labels)))
x_test = tf.cast(x_test, tf.float32)
y_test = tf.cast(y_test, tf.float32)
test_acc = accuracy_fn(softmax_fn(x_test),y_test)
print("Testset Accuracy: {:.4f}".format(test_acc))

> **Piecewise Annealing**  
- 특정 Epoch에 도달할 때 특정 값을 learning rate로 바꾼다  
- **keras.optimizers.schedules.PiecewiseConstantDecay(
    boundaries, values)**  
- boudary와 value는 순서가 있는 객체로 선언해야함

In [None]:
"""
0: Exponential Decay
1: Inverse Time Decay
2: Cosine Daecay
3: Piecewise Decay

"""

EPOCHS = 1001

for step in range(EPOCHS):
    for features, labels  in iter(dataset):
        features = tf.cast(features, tf.float32)
        labels = tf.cast(labels, tf.float32)
        grads = grad(softmax_fn(features), features, labels)
        optimizer = learningRate(3)

        optimizer.apply_gradients(grads_and_vars=zip(grads,[W,b]))
        if step % 100 == 0:
            print("Iter: {}, Loss: {:.4f}".format(step, loss_fn(softmax_fn(features),features,labels)))
x_test = tf.cast(x_test, tf.float32)
y_test = tf.cast(y_test, tf.float32)
test_acc = accuracy_fn(softmax_fn(x_test),y_test)
print("Testset Accuracy: {:.4f}".format(test_acc))

> **Model Fitting**

In [None]:
"""
0: Exponential Decay
1: Inverse Time Decay
2: Cosine Daecay
3: Piecewise Decay

"""

EPOCHS = 1001

for step in range(EPOCHS):
    for features, labels  in iter(dataset):
        features = tf.cast(features, tf.float32)
        labels = tf.cast(labels, tf.float32)
        grads = grad(softmax_fn(features), features, labels)
        optimizer = learningRate(3)

        optimizer.apply_gradients(grads_and_vars=zip(grads,[W,b]))
        if step % 100 == 0:
            print("Iter: {}, Loss: {:.4f}".format(step, loss_fn(softmax_fn(features),features,labels)))
x_test = tf.cast(x_test, tf.float32)
y_test = tf.cast(y_test, tf.float32)
test_acc = accuracy_fn(softmax_fn(x_test),y_test)
print("Testset Accuracy: {:.4f}".format(test_acc))

# **Lab 07-2: linear regression(without min/max)**

> **Normalization**  
&rarr; 데이터의 값을 0과1 사이로 만들어주는 과정  
$x_{new} = \frac{x - \mu}{\sigma}$  
>**Standardization**  
&rarr; 평균과 얼마나 떨어져 있는지에 대해서 평균에 대해 정규화 하는 과정  
$x_{new} = \frac{x - x_{min}}{x_{max} - x_{min}}$

In [None]:
def normalization(data):
    numerator = data - np.min(data, 0)
    denominator = np.max(data, 0) - np.min(data, 0)
    return numerator / denominator

def standardization(data):
    numerator = data - np.mean(data)
    denominator = sqrt(np.sum(data - np.mean(data))^2/np.count(data))

# **Lab 07-3: Overfitting**

> **Overfitting**  
&rarr; model이 너무 train data에 취중해서 fitting 됨  


> **Over fitting 줄이는 방법**  
- train data를 많이 받는다  
- feature 수를 줄인다 (차원 축소)  
- feature 수를 늘린다 (hypothesis의 식을 더 많이)

> **Data Set and Validation**  
&rarr; train data와 test data의 비율을 잘 조정해서 data를 구성해야 한다  


> **Fine tuning**  
&rarr; model을 Learning 하는 과정에서 모델의 특정 분류 방법을 고치거나 기존 모델에 또 다른 기법을 추가해서 fitting 하는 과정

# **Lab 07-3-2: Mnist**

>**Mnist data set**  
&rarr; 0부터 9까지 모아 놓은 손글씨 data set

In [None]:
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

>**Model**

In [None]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

>**Optimizer, Cross Entropy**

In [None]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

> **Model fitting**

In [None]:
model.fit(x_train, y_train, epochs=5)

> **Evaluation**

In [None]:
model.evaluate(x_test, y_test)

> **Fasion mnist**  
&rarr; 10가지의 옷의 labe을 가진 data

In [None]:
fashion_mnist = tf.keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

In [None]:
plt.figure()
plt.imshow(train_images[3])
plt.colorbar()
plt.grid(False)
train_images = train_images / 255.0
test_images = test_images / 255.0

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[train_labels[i]])