<a href="https://colab.research.google.com/github/RP272/Hands-On-ML/blob/main/Batch_Gradient_Descent_for_Softmax_Regression_with_Early_Stopping_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Chapter 4: Training Models

Exercise 12: Implement Batch Gradient Descent with early stopping for Softmax Regression (without using Scikit-Learn).

In the first step lets define functions in regard to Softmax Regression: softmax function, softmax cost function (cross entropy) and as last the cross entropy gradient vector.

In [45]:
import numpy as np

def softmax_score(input_vector, parameter_matrix, k: int):
  parameter_vector = parameter_matrix[k]
  return np.dot(input_vector, parameter_vector)

def softmax(input_vector, parameter_matrix):
  softmax_scores = []
  sum_of_softmax_scores = 0
  for i in range(len(parameter_matrix)):
    score = softmax_score(input_vector, parameter_matrix, i)
    softmax_scores.append(score)
    sum_of_softmax_scores += score
  return softmax_scores / sum_of_softmax_scores

def cross_entropy_cost(parameter_matrix, input_vectors, class_count, target_probabilities):
  cost_sum = 0
  m = len(input_vectors)
  for i in range(m):
    softmax_scores = softmax(input_vectors[i], parameter_matrix)
    for k in range(class_count):
      cost_sum += target_probabilities[i][k] * np.log10(softmax_scores[k])
  return -1/m * cost_sum


def cross_entropy_gradient(parameter_matrix, input_vectors, class_count, target_probabilities):
  m = len(input_vectors)
  class_gradients = []
  softmax_score_cache = {}
  for k in range(class_count):
    tmp = 0
    for i in range(m):
      if i not in softmax_score_cache:
        softmax_score_cache[i] = softmax(input_vectors[i], parameter_matrix)
      softmax_scores = softmax_score_cache[i]
      difference = softmax_scores[k] - target_probabilities[i][k]
      tmp += np.array(input_vectors[i]) * difference
    class_gradients.append(1/m * tmp)
  return class_gradients


We have defined functions regarding softmax function. Let's prepare the dataset.

In [50]:
from sklearn import datasets
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()

X = iris["data"][:, (2, 3)] # petal length, petal width
y = iris["target"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Now lets write down the code for Batch Gradient Descent.

In [55]:
import numpy as np
from sklearn.metrics import accuracy_score

eta = 0.01
n_iterations = 100000
m = len(X_train) # size of training set

# parameter matrix. rows represent individual classes. columns represent features. given cell represents specified feature value for given class. ??? I guess that it can be randomly initialized at start. Batch Gradient will do its work ???
theta = np.random.rand(3, 2)
class_count = len(theta)

# input_vectors matrix. rows represent individual train entity. columns represent features. given cell represents specified feature value for given train entity.
input_vectors = X_train
# NUMBER OF COLUMNS IN theta IS EQUAL TO NUMBER OF COLUMNS IN input_vectors

# parameter matrix. rows represent individual train entity. columns represent class. given cell represents specified class probability for given train entity.
target_probabilities = np.zeros((m, class_count))
for i in range(m):
  target_probabilities[i][y_train[i]] = 1

for iteration in range(n_iterations):
  gradients = np.array(cross_entropy_gradient(theta, input_vectors, class_count, target_probabilities))
  theta = theta - eta * gradients

# Lets test the accuracy on train set
y_pred = np.zeros(y_train.shape)

for i in range(m):
  softmax_scores = softmax(input_vectors[i], theta)
  y_pred[i] = np.argmax(softmax_scores)

print(accuracy_score(y_train, y_pred))



0.6416666666666667


Doesn't seem to learn so well. Will inspect it later