### 4-7-12. 조기 종료를 사용한 배치 경사 하강법으로 소프트맥스 회귀 구현하기

In [76]:
from sklearn import datasets

iris = datasets.load_iris()

In [2]:
iris.keys()

dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

In [3]:
iris.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [4]:
print(iris.DESCR)

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :

In [5]:
X = iris["data"][:, (2, 3)] # petal length, width
y = iris["target"]

In [6]:
import numpy as np
np.random.seed(42)

In [9]:
X_b = np.c_[np.ones((len(X), 1)), X]

In [10]:
X[0], X_b[0]

(array([1.4, 0.2]), array([1. , 1.4, 0.2]))

In [123]:
N = len(X_b)
rnd_indices = np.random.permutation(N)

train_size = int(N * 0.7)
valid_size = int(N * 0.2)
test_size = int(N * 0.1)

X_train = X_b[rnd_indices[:train_size]]
y_train = y[rnd_indices[:train_size]]

X_valid = X_b[rnd_indices[train_size: train_size+valid_size]]
y_valid = y[rnd_indices[train_size: train_size+valid_size]]

X_test = X_b[rnd_indices[-test_size:]]
y_test = y[rnd_indices[-test_size:]]

In [20]:
X_train.shape, X_valid.shape, X_test.shape

((105, 3), (30, 3), (15, 3))

In [21]:
def one_hot(Y):
    m = len(Y)
    n_classes = len(iris.target_names)
    y_one_hot = np.zeros((m, n_classes))
    y_one_hot[np.arange(m), Y] = 1
    return y_one_hot

In [124]:
y_train_one_hot = one_hot(y_train)
y_valid_one_hot = one_hot(y_valid)
y_test_one_hot = one_hot(y_test)

In [23]:
y_train_one_hot.shape, y_valid_one_hot.shape, y_test_one_hot.shape

((105, 3), (30, 3), (15, 3))

In [24]:
def softmax_function(logits):
    exps = np.exp(logits)
    total_exps = np.sum(exps, axis=1, keepdims=True)
    return exps / total_exps

In [199]:
def fit_theta(n_iterations, eta, m, train_data, valid_data):
    t_X, t_y = train_data
    v_X, v_y = valid_data
    theta = np.random.randn(t_X.shape[-1], len(iris.target_names))

    best_iteration = n_iterations
    best_loss = float("inf")
    best_theta = None

    for iteration in range(n_iterations):
        logits = t_X.dot(theta)
        prob = softmax_function(logits)
        loss = -np.mean(np.sum(t_y * np.log(prob), axis=1))
        gradients = 1/m * t_X.T.dot(prob - t_y)
        theta = theta - eta * gradients
        
        valid_logits = v_X.dot(theta)
        valid_prob = softmax_function(valid_logits)
        valid_loss = -np.mean(np.sum(v_y * np.log(valid_prob), axis=1))
        if iteration % (n_iterations/10) == 0:
            print(f"iter: {iteration}, loss: {loss}, valid: {valid_loss}")

        if (iteration+1) % (n_iterations/2) == 0:
            eta /= 10
            print(f"eta: {eta}")

        if valid_loss < best_loss:
            best_loss = valid_loss
            best_iteration = iteration
            best_theta = theta.copy()
        else:
            print("Early Stopped")
            break
    
    return best_iteration, best_loss, best_theta

In [200]:
t0, t1 = 5, 50

eta = 0.1
m = len(X_train)
n_iterations = 10000

best_iteration, best_loss, best_theta = fit_theta(
    n_iterations, eta, m, (X_train, y_train_one_hot), (X_valid, y_valid_one_hot)
)
best_iteration, best_loss

iter: 0, loss: 5.15463399503697, valid: 4.239973802925364
iter: 1000, loss: 0.3005748539065718, valid: 0.29048238914537744
iter: 2000, loss: 0.2328114598805926, valid: 0.22390518678274468
iter: 3000, loss: 0.19705570609315443, valid: 0.19333560741124828
iter: 4000, loss: 0.17432282046961367, valid: 0.17561167704176447
eta: 0.01
iter: 5000, loss: 0.15846638367394053, valid: 0.16397787316126244
iter: 6000, loss: 0.1571380061384765, valid: 0.16302628661268248
iter: 7000, loss: 0.15584801842421148, valid: 0.16210548694723306
iter: 8000, loss: 0.15459473461027975, valid: 0.16121392298652848
iter: 9000, loss: 0.15337656736110875, valid: 0.16035014580877813
eta: 0.001


(9999, 0.1595136251587691)

In [192]:
def calc_acc(x_data, y_data, Theta):
    y_pred = np.argmax(softmax_function(x_data.dot(Theta)), axis=1)
    acc = np.mean(y_pred == y_data)
    return acc

In [201]:
print(f"train acc: {calc_acc(X_train, y_train, best_theta)}")
print(f"valid acc: {calc_acc(X_valid, y_valid, best_theta)}")
print(f"test acc: {calc_acc(X_test, y_test, best_theta)}")

train acc: 0.9523809523809523
valid acc: 1.0
test acc: 0.9333333333333333


### merge train & valid

In [161]:
X_train_valid = X_b[rnd_indices[:train_size+valid_size]]
y_train_valid = y[rnd_indices[:train_size+valid_size]]
y_train_valid_one_hot = one_hot(y_train_valid)

In [202]:
t0, t1 = 5, 50

eta = 0.1
m = len(X_train)
n_iterations = 10000

best_iteration, best_loss, best_theta = fit_theta(
    n_iterations, eta, m, (X_train_valid, y_train_valid_one_hot), (X_test, y_test_one_hot)
)
best_iteration, best_loss

iter: 0, loss: 4.042312468813351, valid: 4.008925191232171
iter: 1000, loss: 0.2790599386013156, valid: 0.2983153929374564
iter: 2000, loss: 0.21168015135082627, valid: 0.22043429622446103
iter: 3000, loss: 0.1793880245943859, valid: 0.18342536611058546
iter: 4000, loss: 0.1595715338208858, valid: 0.1621639745813902
eta: 0.01
iter: 5000, loss: 0.1459931078178252, valid: 0.14842409810673646
iter: 6000, loss: 0.14486317535755613, valid: 0.14730990717032927
iter: 7000, loss: 0.1437669201639916, valid: 0.14623328713838724
iter: 8000, loss: 0.14270280083212702, valid: 0.14519230359593477
iter: 9000, loss: 0.14166937043819136, valid: 0.14418515170271107
eta: 0.001


(9999, 0.143211105194879)

In [203]:
print(f"train&valid acc: {calc_acc(X_train_valid, y_train_valid, best_theta)}")
print(f"test acc: {calc_acc(X_test, y_test, best_theta)}")

train&valid acc: 0.9629629629629629
test acc: 0.9333333333333333


### add l2 norm

In [204]:
def fit_theta_l2(n_iterations, eta, m, train_data, valid_data, alpha=0.1):
    t_X, t_y = train_data
    v_X, v_y = valid_data
    theta = np.random.randn(t_X.shape[-1], len(iris.target_names))

    best_iteration = n_iterations
    best_loss = float("inf")
    best_theta = None

    for iteration in range(n_iterations):
        logits = t_X.dot(theta)
        prob = softmax_function(logits)
        
        cross_entropy = -np.mean(np.sum(t_y * np.log(prob), axis=1))
        l2_loss = 1/2 * np.sum(np.square(theta[1:]))
        loss = cross_entropy + alpha * l2_loss
        
        gradients = 1/m * t_X.T.dot(prob - t_y) + np.r_[np.zeros([1, len(iris.target_names)]), alpha * theta[1:]]
        theta = theta - eta * gradients
        
        # valid
        valid_logits = v_X.dot(theta)
        valid_prob = softmax_function(valid_logits)
        
        valid_cross_entropy = -np.mean(np.sum(v_y * np.log(valid_prob), axis=1))
        valid_l2_loss = 1/2 * np.sum(np.square(theta[1:]))
        valid_loss = valid_cross_entropy + alpha * valid_l2_loss
        if iteration % (n_iterations/10) == 0:
            print(f"iter: {iteration}, loss: {loss}, valid: {valid_loss}")

        if (iteration+1) % (n_iterations/2) == 0:
            eta /= 10
            print(f"eta: {eta}")

        if valid_loss < best_loss:
            best_loss = valid_loss
            best_iteration = iteration
            best_theta = theta.copy()
        else:
            print("Early Stopped")
            break
    
    return best_iteration, best_loss, best_theta

In [210]:
t0, t1 = 5, 50

eta = 0.1
m = len(X_train)
n_iterations = 10000

best_iteration, best_loss, best_theta = fit_theta(
    n_iterations, eta, m, (X_train, y_train_one_hot), (X_valid, y_valid_one_hot)
)
print(f"best iter: {best_iteration}, loss: {best_loss}")

iter: 0, loss: 2.5739340308515044, valid: 1.4555216727724654
iter: 1000, loss: 0.30960747187940374, valid: 0.2901711292548832
iter: 2000, loss: 0.23799746620167672, valid: 0.22404962271163822
iter: 3000, loss: 0.200418246337554, valid: 0.19351294735973182
iter: 4000, loss: 0.17669677556810404, valid: 0.17575601104802205
eta: 0.01
iter: 5000, loss: 0.16024940538769064, valid: 0.16408327349150362
iter: 6000, loss: 0.1588754264400782, valid: 0.16312795708289976
iter: 7000, loss: 0.15754174045914016, valid: 0.16220347611235678
iter: 8000, loss: 0.1562465548901227, valid: 0.16130828163827415
iter: 9000, loss: 0.15498818306575357, valid: 0.16044092657669562
eta: 0.001
best iter: 9999, loss: 0.15960088566552796


In [211]:
print(f"train acc: {calc_acc(X_train, y_train, best_theta)}")
print(f"valid acc: {calc_acc(X_valid, y_valid, best_theta)}")
print(f"test acc: {calc_acc(X_test, y_test, best_theta)}")

train acc: 0.9523809523809523
valid acc: 1.0
test acc: 0.9333333333333333


### merge train & valid on l2 norm

In [207]:
t0, t1 = 5, 50

eta = 0.1
m = len(X_train)
n_iterations = 10000

best_iteration, best_loss, best_theta = fit_theta(
    n_iterations, eta, m, (X_train_valid, y_train_valid_one_hot), (X_test, y_test_one_hot)
)
print(f"best iter: {best_iteration}, loss: {best_loss}")

iter: 0, loss: 1.539837257281275, valid: 1.301851057985813
iter: 1000, loss: 0.2825968364635875, valid: 0.31047397209595096
iter: 2000, loss: 0.21462893504639358, valid: 0.224897167645095
iter: 3000, loss: 0.18119560465857634, valid: 0.1852442579417542
iter: 4000, loss: 0.16071799025523578, valid: 0.16294233213017836
eta: 0.01
iter: 5000, loss: 0.14675765310908595, valid: 0.14872524732362266
iter: 6000, loss: 0.1455991638666448, valid: 0.1475789683945682
iter: 7000, loss: 0.1444757018601901, valid: 0.14647226935772312
iter: 8000, loss: 0.14338564679217902, valid: 0.14540306559662106
iter: 9000, loss: 0.1423274781480785, valid: 0.1443694153769135
eta: 0.001
best iter: 9999, loss: 0.1433704916480228


In [208]:
print(f"train&valid acc: {calc_acc(X_train_valid, y_train_valid, best_theta)}")
print(f"test acc: {calc_acc(X_test, y_test, best_theta)}")

train&valid acc: 0.9629629629629629
test acc: 0.9333333333333333
