1. Batch Gradient descent with early stopping for softmax regresion

In [5]:
from sklearn.datasets import load_iris
data = load_iris()
data.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [6]:
print(data.DESCR)

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :

In [7]:
X = data['data'][:,(2,3)]
y = data['target']

add the bias term for every instances $x_0 = 1$

In [8]:
import numpy as np
X_with_bias = np.c_[np.ones([len(X)]),X]

Add the random seed, membuat sebuah outbut dapat di produksi

In [9]:
np.random.seed(2042)

In [10]:
# split the train and the validation data
test_ratio = 0.2
validation_ratio =0.2
total_size = len(X_with_bias)

#size the parameter
test_size = int(total_size * test_ratio)
validation_size = int(total_size * validation_ratio)
train_size = total_size - test_size - validation_size
print(test_size)
#randomized state
rnd_indices = np.random.permutation(total_size)

# calculate the value
X_train = X_with_bias[rnd_indices[:train_size]]
y_train = y[rnd_indices[:train_size]]

X_test = X_with_bias[rnd_indices[-test_size:]]
y_test = y[rnd_indices[-test_size:]]

X_valid = X_with_bias[rnd_indices[train_size:-test_size]]
y_valid = y[rnd_indices[train_size:-test_size]]

30


Pada target field sekarang memiliki sbeuah indeks (0,1,2) tapi kita membutuhkan sebuah target class probabilites untuk train the softmax regresion, yg mana probabilitas disini terdiri dari sebuah 0 dan 1.Convert vector of class indices ke dalam sebuah one hot decoder

In [11]:
def to_one_hot(y):
    n_classes = y.max() + 1 # 3
    m = len(y) #150
    Y_one_hot = np.zeros((m,n_classes))#150 instances row, 3 coloumn
    Y_one_hot[np.arange(m),y] =1 # data yang masuk di dalam index y (0,1,2) akan 1
    return Y_one_hot

In [12]:
# example 
y_train[:10]

array([0, 1, 2, 1, 1, 0, 1, 1, 1, 0])

In [13]:
to_one_hot(y_train[:10])

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.]])

Target sudah menjadi class probabilitis matriks for the training set and the test set


In [14]:
Y_train_one_hot = to_one_hot(y_train)
Y_valid_one_hot = to_one_hot(y_valid)
Y_test_one_hot = to_one_hot(y_test)

Implement the sofrmax function ill get the multiclass clasification

$\sigma\left(\mathbf{s}(\mathbf{x})\right)_k = \dfrac{\exp\left(s_k(\mathbf{x})\right)}{\sum\limits_{j=1}^{K}{\exp\left(s_j(\mathbf{x})\right)}}$

In [15]:
# create the softmax function model
def softmax(logits):
    exps = np.exp(logits)
    exp_sums = np.sum(exps, axis = 1, keepdims=True)
    return exps /exp_sums

In [16]:
# create data 3D
n_inputs = X_train.shape[1] # == 3 (2 features plus the bias term)
n_outputs = len(np.unique(y_train))  # == 3 (3 iris classes virginicia)

Persamaan untuk cost funciton:

$J(\mathbf{\Theta}) =-\dfrac{1}{m}\sum\limits_{i=1}^{m}\sum\limits_{k=1}^{K}{y_k^{(i)}\log\left(\hat{p}_k^{(i)}\right)}$

Persamaan untuk gradients:

$\nabla_{\mathbf{\theta}^{(k)}} \, J(\mathbf{\Theta}) = \dfrac{1}{m} \sum\limits_{i=1}^{m}{ \left ( \hat{p}^{(i)}_k - y_k^{(i)} \right ) \mathbf{x}^{(i)}}$

Note that $\log\left(\hat{p}_k^{(i)}\right)$ may not be computable if $\hat{p}_k^{(i)} = 0$. So we will add a tiny value $\epsilon$ to $\log\left(\hat{p}_k^{(i)}\right)$ to avoid getting `nan` values.


In [26]:
eta = 0.01 # learning rate
n_iterations = 10001
m = len(X_train)
epsilon = 1e-7

Theta = np.random.randn(n_inputs,n_outputs)# 3 - 3 -> random state.

for iteration in range(n_iterations):
    logits = X_train.dot(Theta) # di kalikan dengan random sate
    #hasilnya akan disesuaikan berdasarkan 3 kolom dari label
    Y_proba = softmax(logits) # setiap instance yang masuk di dapatkan y_probanya
    if iteration % 500 ==0:
        # cost funcitonnya
        loss = -np.mean(np.sum(Y_train_one_hot * np.log(Y_proba + epsilon),axis=1))
        print(iteration, loss)
    error = Y_proba - Y_train_one_hot
    gradients = 1/m * X_train.T.dot(error)
    # next theta Theta - MSE * learning rate
    Theta = Theta - eta * gradients


0 5.173284880908112
500 0.8258143504756522
1000 0.6740383508681776
1500 0.5891518016822946
2000 0.5353052890403674
2500 0.4975988211901051
3000 0.469220320068328
3500 0.4467104744290491
4000 0.4281482798645294
4500 0.41238656131534807
5000 0.3986986115898958
5500 0.3866012294249264
6000 0.37575990461002273
6500 0.36593504479314265
7000 0.35694997413544044
7500 0.34867108926125434
8000 0.34099511023585066
8500 0.33384063385790874
9000 0.32714238455391575
9500 0.3208472071973719
10000 0.314911214409356


The softmax model is trained , ini adalah sebuah parameter dari modelnya

In [27]:
Theta

array([[ 4.11598028, -0.99264301, -4.22641375],
       [-0.57589211,  1.18291641,  0.65978084],
       [-1.4822713 , -1.04992775,  2.55675949]])

In [20]:
Theta

array([[ 0.11330361, -0.23452355, -0.20774285],
       [ 0.43433246, -0.66647126, -0.71757054],
       [ 1.0188498 ,  0.41245226, -0.75018439]])