# BATCH GRADIENT DESCENT WITH EARLY STOPPING FOR SOFTMAX REGRESSION

In this notebook we will implement a batch gradient descent with early stopping for softmax regression (ex 12 ch. 4 of "**Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow**" by Aurélien Géron3. 

We have to implement it by scratch without using scikit-learn. 

## Settings

We start import all we need.

In [71]:
import numpy as np 
from sklearn import datasets

In [45]:
def softmax_score(x, theta):
    score = np.dot(x.transpose(), theta)
    return score

In [65]:
def softmax_function(x, theta, k=None):
    if k is None:
        p = np.exp(softmax_score(x, theta)) / np.sum(np.exp(softmax_score(x, theta)))
        return p
    else:
        p_k = np.exp(softmax_score(x, theta[:, k - 1])) / np.sum(np.exp(softmax_score(x, theta)))
        return p_k

In [61]:
x = np.array([0.8,0.7,0.2])
theta = np.array([[0.2,0.1,0.5], [0.3,0.2,0.1], [0.3,0.4,0.5]])
k = 3

In [66]:
theta

array([[0.2, 0.1, 0.5],
       [0.3, 0.2, 0.1],
       [0.3, 0.4, 0.5]])

In [68]:
np.random.randn(theta.shape[0], theta.shape[1])

array([[-1.7094662 , -0.74635296,  1.41075455],
       [ 1.47065076,  1.25936043,  0.71852724],
       [ 1.22854529, -0.46854869, -0.47496751]])

In [None]:
def learning_schedule(t):
    t_0, t_1 = 5, 50 
    return t_0 / (t + t_1)

In [77]:
def batch_gradient_descent(y, X, learning_rate = 0.1, n_epochs = 100):
    theta = np.random.randn(X.shape[0], X.shape[1])
    m = y.shape[0]
    for epoch in range(n_epochs):
        gradient = 1/m * np.sum(np.dot(softmax_function(y, X) - y, X))
        theta = theta - learning_rate * gradient
        learning_rate = learning_schedule(learning_rate)
    return theta

In [74]:
iris = datasets.load_iris()
iris_data = iris["data"]
iris_target = iris["target"]

In [78]:
batch_gradient_descent(iris_target, iris_data)

  p = np.exp(softmax_score(x, theta)) / np.sum(np.exp(softmax_score(x, theta)))
  p = np.exp(softmax_score(x, theta)) / np.sum(np.exp(softmax_score(x, theta)))


ValueError: operands could not be broadcast together with shapes (4,) (150,) 