<a href="https://colab.research.google.com/github/JovanBosic/Neural-nets-vs-SUDOKU-project/blob/main/End_to_end_Sudoku_solver.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# End-to-end Sudoku solving using using neural nets 🧩

This notebook builds an end-to-end neural network using only NumPy, in order to solve Sudoku puzzle

Sudoku is a popular number puzzle that requires you to fill blanks in a 9X9 grid with digits so that each column, each row, and each of the nine 3×3 subgrids contains all of the digits from 1 to 9.

## 1. Problem

Find missing numbers and their places in unsolved sudoku.

## 2. Data

The data we are using is from Kaggle:

https://www.kaggle.com/bryanpark/sudoku

There is only one sv file, containing 1 000 000 unsolved and solved sudoku games.

## 3. Evaluation

If we can reach 85% accuracy at predicting numbers, we will pursue the project.

## 4. Features

A Sudoku puzzle is represented as a 9x9 Python numpy array. The blanks were replaced with 0's. 

There are to columns in the data:
* quizzes
* solutions


### Get our workspace ready

* Import NumPy ✅
* Import Pandas✅
* Import Scikit-Learn ✅


In [2]:
# Import necessary tools
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

### Load our Data

In [3]:
# Import data set
path = "drive/MyDrive/Colab/sudoku.csv"
df = pd.read_csv(path)
df.head(), df.shape # Rows, Columns

(                                             quizzes                                          solutions
 0  0043002090050090010700600430060020871900074000...  8643712593258497619712658434361925871986574322...
 1  0401000501070039605200080000000000170009068008...  3461792581875239645296483719658324174729168358...
 2  6001203840084590720000060050002640300700800069...  6951273841384596727248369158512647392739815469...
 3  4972000001004000050000160986203000403009000000...  4972583161864397252537164986293815473759641828...
 4  0059103080094030600275001000300002010008200070...  4659123781894735623275681497386452919548216372...,
 (1000000, 2))

In [4]:
# Are there any missing values?
df.isna().sum()

quizzes      0
solutions    0
dtype: int64

In [5]:
X = df["quizzes"]
y = df["solutions"]

In [6]:
# Let's check the data
print(X[:10])
print(X.shape)
print(X.dtype)

0    0043002090050090010700600430060020871900074000...
1    0401000501070039605200080000000000170009068008...
2    6001203840084590720000060050002640300700800069...
3    4972000001004000050000160986203000403009000000...
4    0059103080094030600275001000300002010008200070...
5    1000050073809000006000004808200010750407600200...
6    0090654300070008006001080200030900025014039608...
7    0000006577024001003500060005000200092103005000...
8    5030701900000067500471906004000380009502003000...
9    0607209080840030017001000659000080000710600000...
Name: quizzes, dtype: object
(1000000,)
object


### Changing data shape and type

Since we need every number separately, we have to:
* Cast objects to numbers ✅
* `X.shape` should be (1000000, 81) ✅

In [7]:
def get_data_in_right_form(X, y): 
    '''
    This function reshapes features and labels into matrix 1 000 000, 81, 
    and scale feature values between -0.5 and 0.5
    '''
    # Lists where wi will store new reshaped features and labels
    X_new = []
    y_new = []

    for i in X:
        # Reshaping our features into shape 1 000 000, 81
        x = np.array([int(j) for j in i])
        X_new.append(x)

    X_new = np.array(X_new)
    # Convert values of quizzes from 0-9 to -0.5-0.5 for better perfomances during training
    X_new = X_new/9
    X_new -= .5    
    
    for i in y:
        # Reshaping our labels into shape 1 000 000, 81
        x = np.array([int(j) for j in i])
        y_new.append(x)   
    
    y_new = np.array(y_new)    
    # Spliting data into train and test sets
    x_train, x_test, y_train, y_test = train_test_split(X_new, y_new, test_size=0.35, random_state=42)
    
    return x_train, x_test, y_train, y_test

In [8]:
X, X_test, y, y_test = get_data_in_right_form(X, y)

In [9]:
print(X.shape)
print(X_test.shape)
print(y.shape)
print(y_test.shape)

(650000, 81)
(350000, 81)
(650000, 81)
(350000, 81)


### Making our labels easy to use

Now we have to change our labels. We have to get 81 numbers for each position in the sudoku game, not just one, and we have a total of 9 classes for each number because a number can fall in a range of 1 to 9.

To comply with this design, our network should output 81x9=729 numbers.

In order to compare our `predicted labels`, and `true labels` we have to change our current labels shape. Since our current true  labels are in shape 1 000 000, 81 we will have to change this somehow to 1 000 000, 729. My idea was to every number in current label represent as 0 and 1 where the lenght will be 9. 

**For example:** 
* 1 == 100000000
* 5 == 000010000...

This wat we can easily compare `predicted labels` and `true labels`.


In [10]:
# Lets create a function from transforming our labels
def label_transformer(y):
    '''
    This function reshapes labes from 1 000 000,81 into 1 000 000, 729 matix 
    by representing every number with 0 and 1
    '''
    zeroes = [0, 0, 0, 0, 0, 0, 0, 0, 0]
    h = []
    y_transform = []
    for i in y:
        for j in i:
            zeroes[j-1] = 1
            h += zeroes
            zeroes = [0, 0, 0, 0, 0, 0, 0, 0, 0]
        y_transform.append(h)
        h = []

    y_transform = np.array(y_transform)
    return y_transform

In [11]:
y_transform = label_transformer(y)

In [12]:
print(y[1])
print(y_transform[1])
print(y_transform.shape)

[1 6 7 3 8 2 4 5 9 4 2 8 5 9 6 7 3 1 3 9 5 4 1 7 8 6 2 5 4 9 7 6 8 1 2 3 7
 1 2 9 5 3 6 4 8 8 3 6 1 2 4 5 9 7 6 7 4 2 3 1 9 8 5 9 8 3 6 7 5 2 1 4 2 5
 1 8 4 9 3 7 6]
[1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0
 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0
 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0
 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1
 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0
 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
 0 0 0 0 

In [13]:
y_test_transform = label_transformer(y_test)

In [14]:
print(y_test_transform[1])

[0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0
 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0
 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0
 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
 1 0 0 0 0 0 0 0 0 0 1 0 

Now we have got our data into training and test sets, it is time to build a machine learning model.

We will train it (find the patterns) on the training set.

And we will test it (use the patterns) on the test set.

## 5. Building artificial neural network model

### Modelling

#### Dense Layer

In [57]:
# Dense layer
class Layer_Dense:

    # Layer initialization
    def __init__(self, n_inputs, n_neurons,
                 weight_regularizer_l1=0, weight_regularizer_l2=0,
                 bias_regularizer_l1=0, bias_regularizer_l2=0):
        # Initialize weights and biases
        self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))
        # Set regularization strength
        self.weight_regularizer_l1 = weight_regularizer_l1
        self.weight_regularizer_l2 = weight_regularizer_l2
        self.bias_regularizer_l1 = bias_regularizer_l1
        self.bias_regularizer_l2 = bias_regularizer_l2

    # Forward pass
    def forward(self, inputs, training):
        # Remember input values
        self.inputs = inputs
        # Calculate output values from inputs, weights and biases
        self.output = np.dot(inputs, self.weights) + self.biases

    # Backward pass
    def backward(self, dvalues):
        # Gradients on parameters
        self.dweights = np.dot(self.inputs.T, dvalues)
        self.dbiases = np.sum(dvalues, axis=0, keepdims=True)


        # Gradients on regularization
        # L1 on weights
        if self.weight_regularizer_l1 > 0:
            dL1 = np.ones_like(self.weights)
            dL1[self.weights < 0] = -1
            self.dweights += self.weight_regularizer_l1 * dL1
        # L2 on weights
        if self.weight_regularizer_l2 > 0:
            self.dweights += 2 * self.weight_regularizer_l2 * self.weights
        # L1 on biases
        if self.bias_regularizer_l1 > 0:
            dL1 = np.ones_like(self.biases)
            dL1[self.biases < 0] = -1
            self.dbiases += self.bias_regularizer_l1 * dL1
        # L2 on biases
        if self.bias_regularizer_l2 > 0:
            self.dbiases += 2 * self.bias_regularizer_l2 * self.biases

        # Gradient on values
        self.dinputs = np.dot(dvalues, self.weights.T)

#### Dropout Layer

In [58]:
# Dropout
class Layer_Dropout:

    # Init
    def __init__(self, rate):
        # Store rate, we invert it as for example for dropout
        # of 0.1 we need success rate of 0.9
        self.rate = 1 - rate

    # Forward pass
    def forward(self, inputs, training):
        # Save input values
        self.inputs = inputs


        # If not in the training mode - return values
        if not training:
            self.output = inputs.copy()
            return

        # Generate and save scaled mask
        self.binary_mask = np.random.binomial(1, self.rate, size=inputs.shape) / self.rate
        # Apply mask to output values
        self.output = inputs * self.binary_mask

    # Backward pass
    def backward(self, dvalues):
        # Gradient on values
        self.dinputs = dvalues * self.binary_mask

#### Input Layer

In [59]:
# Input "layer"
class Layer_Input:

    # Forward pass
    def forward(self, inputs, training):
        self.output = inputs

#### ReLU Activation Function

In [60]:
# ReLU activation
class Activation_ReLU:

    # Forward pass
    def forward(self, inputs, training):
        # Remember input values
        self.inputs = inputs
        # Calculate output values from inputs
        self.output = np.maximum(0, inputs)

    # Backward pass
    def backward(self, dvalues):
        # Since we need to modify original variable,
        # let's make a copy of values first
        self.dinputs = dvalues.copy()

        # Zero gradient where input values were negative
        self.dinputs[self.inputs <= 0] = 0

    # Calculate predictions for outputs
    def predictions(self, outputs):
        return outputs

#### Softmax Activation Function

In [61]:
# Softmax activation
def calculating_probabilities_for_softmax(inputs):
    '''
    This function iterates through every sample in batch, and sum exp values
    of every 9 numbers in sample, after that it perform normalization where we devide 
    every 9 numbers with the sum of corresponding 9 numbers from group where the numbers 
    belong. Output is in a shape batch_lenght * 729 and goes further to the loss funtion.
    '''
    row, columns = inputs.shape
    probabilities = np.array([])
    counter = 0 

    # Iterate through batch
    while counter < row:
        sample = inputs[counter, :]
        sample = sample.reshape(-1, 9)

        max = np.max(sample, axis=1,keepdims=True) #returns max of each row and keeps same dims
        e_x = np.exp(sample-max) #subtracts each row with its max value
        sum = np.sum(e_x, axis=1, keepdims=True) #returns sum of each row and keeps same dims
        f_x = e_x / sum 
        sample_probabilities = f_x.reshape(1, -1)

        counter += 1
        probabilities = np.append(probabilities, sample_probabilities)
    probabilities = probabilities.reshape(row, columns)  
    return probabilities


class Activation_Softmax:

    # Forward pass
    def forward(self, inputs, training):
        # Remember input values
        self.inputs = inputs

        # Get probabilities
        probabilities = calculating_probabilities_for_softmax(inputs) 
        self.output = probabilities

    # Backward pass
    def backward(self, dvalues):

        # Create uninitialized array
        self.dinputs = np.empty_like(dvalues)

        # Enumerate outputs and gradients
        for index, (single_output, single_dvalues) in enumerate(zip(self.output, dvalues)):
            # Flatten output array
            single_output = single_output.reshape(-1, 1)
            # Calculate Jacobian matrix of the output
            jacobian_matrix = np.diagflat(single_output) - np.dot(single_output, single_output.T)
            # Calculate sample-wise gradient
            # and add it to the array of sample gradients
            self.dinputs[index] = np.dot(jacobian_matrix,
                                         single_dvalues)

    # Calculate predictions for outputs
    def predictions(self, outputs):
        '''
        This function returns indexes of highest values for every group of numbers. 
        Output is in shape batch_lenght * 81 (for every 9 outputs one index)
        '''
        row, columns = outputs.shape
        i = 0
        counter = 0
        argmax_array = np.array([])
        sample_argmaxes = np.array([])

        # Iterate through batches
        while counter < row:
            # Iterate through sample 
            for i in range(81):
                argmax_for_groups_of_9 = np.argmax(outputs[counter, i*9:i*9+9]) + (i*9) 
                sample_argmaxes = np.append(sample_argmaxes,argmax_for_groups_of_9)

            counter += 1
            argmax_array = np.append(argmax_array, sample_argmaxes, axis=0)
            sample_argmaxes = np.array([])
        argmax_array = argmax_array.reshape(row,81)

        return argmax_array


#### Adam Optimizer

In [62]:
# Adam optimizer
class Optimizer_Adam:

    # Initialize optimizer - set settings
    def __init__(self, learning_rate=0.001, decay=0., epsilon=1e-7,
                 beta_1=0.9, beta_2=0.999):
        self.learning_rate = learning_rate
        self.current_learning_rate = learning_rate
        self.decay = decay
        self.iterations = 0
        self.epsilon = epsilon
        self.beta_1 = beta_1
        self.beta_2 = beta_2

    # Call once before any parameter updates
    def pre_update_params(self):
        if self.decay:
            self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))

    # Update parameters
    def update_params(self, layer):

        # If layer does not contain cache arrays,
        # create them filled with zeros
        if not hasattr(layer, 'weight_cache'):
            layer.weight_momentums = np.zeros_like(layer.weights)
            layer.weight_cache = np.zeros_like(layer.weights)
            layer.bias_momentums = np.zeros_like(layer.biases)
            layer.bias_cache = np.zeros_like(layer.biases)


        # Update momentum  with current gradients
        layer.weight_momentums = self.beta_1 * layer.weight_momentums + (1 - self.beta_1) * layer.dweights
        layer.bias_momentums = self.beta_1 * layer.bias_momentums + (1 - self.beta_1) * layer.dbiases
        # Get corrected momentum
        # self.iteration is 0 at first pass
        # and we need to start with 1 here
        weight_momentums_corrected = layer.weight_momentums / (1 - self.beta_1 ** (self.iterations + 1))
        bias_momentums_corrected = layer.bias_momentums / (1 - self.beta_1 ** (self.iterations + 1))
        # Update cache with squared current gradients
        layer.weight_cache = self.beta_2 * layer.weight_cache + (1 - self.beta_2) * layer.dweights**2
        layer.bias_cache = self.beta_2 * layer.bias_cache + (1 - self.beta_2) * layer.dbiases**2
        # Get corrected cache
        weight_cache_corrected = layer.weight_cache / (1 - self.beta_2 ** (self.iterations + 1))
        bias_cache_corrected = layer.bias_cache / (1 - self.beta_2 ** (self.iterations + 1))

        # Vanilla SGD parameter update + normalization
        # with square rooted cache
        layer.weights += -self.current_learning_rate * weight_momentums_corrected / (np.sqrt(weight_cache_corrected) + self.epsilon)
        layer.biases += -self.current_learning_rate * bias_momentums_corrected / (np.sqrt(bias_cache_corrected) + self.epsilon)

    # Call once after any parameter updates
    def post_update_params(self):
        self.iterations += 1

#### Loss

In [None]:
# Common loss class
class Loss:

    # Regularization loss calculation
    def regularization_loss(self):

        # 0 by default
        regularization_loss = 0

        # Calculate regularization loss
        # iterate all trainable layers
        for layer in self.trainable_layers:

            # L1 regularization - weights
            # calculate only when factor greater than 0
            if layer.weight_regularizer_l1 > 0:
                regularization_loss += layer.weight_regularizer_l1 * np.sum(np.abs(layer.weights))

            # L2 regularization - weights
            if layer.weight_regularizer_l2 > 0:
                regularization_loss += layer.weight_regularizer_l2 * np.sum(layer.weights * layer.weights)

            # L1 regularization - biases
            # calculate only when factor greater than 0
            if layer.bias_regularizer_l1 > 0:
                regularization_loss += layer.bias_regularizer_l1 * np.sum(np.abs(layer.biases))

            # L2 regularization - biases
            if layer.bias_regularizer_l2 > 0:
                regularization_loss += layer.bias_regularizer_l2 * np.sum(layer.biases * layer.biases)

        return regularization_loss

    # Set/remember trainable layers
    def remember_trainable_layers(self, trainable_layers):
        self.trainable_layers = trainable_layers


    # Calculates the data and regularization losses
    # given model output and ground truth values
    def calculate(self, output, y, *, include_regularization=False):

        # Calculate sample losses
        sample_losses = self.forward(output, y)

        # Calculate mean loss
        data_loss = np.mean(sample_losses)

        # Add accumulated sum of losses and sample count
        self.accumulated_sum += np.sum(sample_losses)
        self.accumulated_count += len(sample_losses)

        # If just data loss - return it
        if not include_regularization:
            return data_loss

        # Return the data and regularization losses
        return data_loss, self.regularization_loss()

    # Calculates accumulated loss
    def calculate_accumulated(self, *, include_regularization=False):

        # Calculate mean loss
        data_loss = self.accumulated_sum / self.accumulated_count

        # If just data loss - return it
        if not include_regularization:
            return data_loss

        # Return the data and regularization losses
        return data_loss, self.regularization_loss()

    # Reset variables for accumulated loss
    def new_pass(self):
        self.accumulated_sum = 0
        self.accumulated_count = 0


# Cross-entropy loss
class Loss_CategoricalCrossentropy(Loss):

    def forward(self, y_pred, y_true):
        """
        This function calculates loss for every place in sudoku separately and sum all losses at end
        """
        # Clip data to prevent division by 0
        # Clip both sides to not drag mean towards any value
        y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)
        row,columns = y_pred_clipped.shape
        counter=0
        negative_log_likelihoods = np.array([])
        sample_loss_sum = 0

        # Iterate through batch
        while counter < row:
            # Iterate through sample
            for i in range(columns):
                if y_true[counter,i] == 1:
                    temp_loss = -y_true[counter,i] * np.log(y_pred_clipped[counter,i])
                    sample_loss_sum = sample_loss_sum + temp_loss

            counter += 1
            negative_log_likelihoods = np.append(negative_log_likelihoods,sample_loss_sum)
            sample_loss_sum = 0
            #negative_log_likelihoods = negative_log_likelihoods / 81

        return negative_log_likelihoods


    # Backward pass
    def backward(self, dvalues, y_true):

        # Number of samples
        samples = len(dvalues)
        # Number of labels in every sample
        # We'll use the first sample to count them
        labels = len(dvalues[0])

        # If labels are sparse, turn them into one-hot vector
        if len(y_true.shape) == 1:
            y_true = np.eye(labels)[y_true]

        # Calculate gradient
        self.dinputs = -y_true / dvalues
        # Normalize gradient
        self.dinputs = self.dinputs / samples



# Softmax classifier - combined Softmax activation
# and cross-entropy loss for faster backward step
class Activation_Softmax_Loss_CategoricalCrossentropy():

    # Backward pass
    def backward(self, dvalues, y_true):

        # Number of samples
        samples = len(dvalues)

        # If labels are one-hot encoded,
        # turn them into discrete values
        if len(y_true.shape) == 2:
            y_true = np.argmax(y_true, axis=1)

        # Copy so we can safely modify
        self.dinputs = dvalues.copy()
        # Calculate gradient
        self.dinputs[range(samples), y_true] -= 1
        # Normalize gradient
        self.dinputs = self.dinputs / samples


#### Accuracy

In [64]:
# Common accuracy class
class Accuracy:

    # Calculates an accuracy
    # given predictions and ground truth values
    def calculate(self, predictions, y):

        # Get comparison results
        comparisons = self.compare(predictions, y)

        # Calculate an accuracy
        accuracy = np.mean(comparisons)
        # print(accuracy, np.mean(comparisons))
        

        # Add accumulated sum of matching values and sample count
        self.accumulated_sum += np.sum(comparisons)
        self.accumulated_count += len(comparisons)

        # Return accuracy
        return accuracy

    # Calculates accumulated accuracy
    def calculate_accumulated(self):

        # Calculate an accuracy
        accuracy = self.accumulated_sum / self.accumulated_count

        # Return the data and regularization losses
        return accuracy

    # Reset variables for accumulated accuracy
    def new_pass(self):
        self.accumulated_sum = 0
        self.accumulated_count = 0



# Accuracy calculation for classification model
class Accuracy_Categorical(Accuracy):

    def __init__(self, *, binary=False):
        # Binary mode?
        self.binary = binary

    # No initialization is needed
    def init(self, y):
        pass

    # Compares predictions to the ground truth values
    def compare(self, predictions, y):
        """
        This function compares predictions and labels. Since our predictions are in shape
        batch size * 81 we need to change our labels. Labels will have size batch size * 81 after
        we take the indexes of ones in labels. 
        """
        if not self.binary and len(y.shape) == 2:
            row, columns = y.shape
            i = 0
            counter = 0
            y_temp = np.array([])
            temp_indexes = np.array([])

            # Iterate through batches
            while counter < row:
                # Iterate through samples
                for i in range(81):
                    max_value_index = np.argmax(y[counter,i*9:i*9+9])+(9*i) 
                    temp_indexes = np.append(temp_indexes,max_value_index)
                counter += 1
                y_temp = np.append(y_temp, temp_indexes, axis=0)
                temp_indexes = np.array([])
            y_temp = y_temp.reshape(row,81)
            
        return y_temp==predictions

#### Model

In [65]:
# Model class
class Model:

    def __init__(self):
        # Create a list of network objects
        self.layers = []
        # Softmax classifier's output object
        self.softmax_classifier_output = None

    # Add objects to the model
    def add(self, layer):
        self.layers.append(layer)


    # Set loss, optimizer and accuracy
    def set(self, *, loss, optimizer, accuracy):
        self.loss = loss
        self.optimizer = optimizer
        self.accuracy = accuracy

    # Finalize the model
    def finalize(self):

        # Create and set the input layer
        self.input_layer = Layer_Input()

        # Count all the objects
        layer_count = len(self.layers)

        # Initialize a list containing trainable layers:
        self.trainable_layers = []

        # Iterate the objects
        for i in range(layer_count):

            # If it's the first layer,
            # the previous layer object is the input layer
            if i == 0:
                self.layers[i].prev = self.input_layer
                self.layers[i].next = self.layers[i+1]

            # All layers except for the first and the last
            elif i < layer_count - 1:
                self.layers[i].prev = self.layers[i-1]
                self.layers[i].next = self.layers[i+1]

            # The last layer - the next object is the loss
            # Also let's save aside the reference to the last object
            # whose output is the model's output
            else:
                self.layers[i].prev = self.layers[i-1]
                self.layers[i].next = self.loss
                self.output_layer_activation = self.layers[i]


            # If layer contains an attribute called "weights",
            # it's a trainable layer -
            # add it to the list of trainable layers
            # We don't need to check for biases -
            # checking for weights is enough
            if hasattr(self.layers[i], 'weights'):
                self.trainable_layers.append(self.layers[i])

        # Update loss object with trainable layers
        self.loss.remember_trainable_layers(
            self.trainable_layers
        )

        # If output activation is Softmax and
        # loss function is Categorical Cross-Entropy
        # create an object of combined activation
        # and loss function containing
        # faster gradient calculation
        if isinstance(self.layers[-1], Activation_Softmax) and isinstance(self.loss, Loss_CategoricalCrossentropy):
            # Create an object of combined activation
            # and loss functions
            self.softmax_classifier_output = Activation_Softmax_Loss_CategoricalCrossentropy()

    # Train the model
    def train(self, X, y, *, epochs=1, batch_size=None,
              print_every=1, validation_data=None):

        # Initialize accuracy object
        self.accuracy.init(y)

        # Default value if batch size is not being set
        train_steps = 1

        # If there is validation data passed,
        # set default number of steps for validation as well
        if validation_data is not None:
            validation_steps = 1

            # For better readability
            X_val, y_val = validation_data

        # Calculate number of steps
        if batch_size is not None:
            train_steps = len(X) // batch_size
            # Dividing rounds down. If there are some remaining
            # data but not a full batch, this won't include it
            # Add `1` to include this not full batch
            if train_steps * batch_size < len(X):
                train_steps += 1

            if validation_data is not None:
                validation_steps = len(X_val) // batch_size

                # Dividing rounds down. If there are some remaining
                # data but nor full batch, this won't include it
                # Add `1` to include this not full batch
                if validation_steps * batch_size < len(X_val):
                    validation_steps += 1


        # Main training loop
        for epoch in range(1, epochs+1):

            # Print epoch number
            print(f'epoch: {epoch}')

            # Reset accumulated values in loss and accuracy objects
            self.loss.new_pass()
            self.accuracy.new_pass()

            # Iterate over steps
            for step in range(train_steps):

                # If batch size is not set -
                # train using one step and full dataset
                if batch_size is None:
                    batch_X = X
                    batch_y = y

                # Otherwise slice a batch
                else:
                    batch_X = X[step*batch_size:(step+1)*batch_size]
                    batch_y = y[step*batch_size:(step+1)*batch_size]

                # Perform the forward pass
                output = self.forward(batch_X, training=True)

                # Calculate loss
                data_loss, regularization_loss = self.loss.calculate(output, batch_y, include_regularization=True)
                loss = data_loss + regularization_loss

                # Get predictions and calculate an accuracy
                predictions = self.output_layer_activation.predictions(output)
                accuracy = self.accuracy.calculate(predictions,
                                                   batch_y)

                # Perform backward pass
                self.backward(output, batch_y)

                # Optimize (update parameters)
                self.optimizer.pre_update_params()
                for layer in self.trainable_layers:
                    self.optimizer.update_params(layer)
                self.optimizer.post_update_params()


                # Print a summary
                if not step % print_every or step == train_steps - 1:
                    print(f'step: {step}, ' +
                          f'acc: {accuracy:.3f}, ' +
                          f'loss: {loss:.3f} (' +
                          f'data_loss: {data_loss:.3f}, ' +
                          f'reg_loss: {regularization_loss:.3f}), ' +
                          f'lr: {self.optimizer.current_learning_rate}')

            # Get and print epoch loss and accuracy
            epoch_data_loss, epoch_regularization_loss = self.loss.calculate_accumulated(include_regularization=True)
            epoch_loss = epoch_data_loss + epoch_regularization_loss
            epoch_accuracy = self.accuracy.calculate_accumulated()

            print(f'training, ' +
                  f'acc: {epoch_accuracy:.3f}, ' +
                  f'loss: {epoch_loss:.3f} (' +
                  f'data_loss: {epoch_data_loss:.3f}, ' +
                  f'reg_loss: {epoch_regularization_loss:.3f}), ' +
                  f'lr: {self.optimizer.current_learning_rate}')

            # If there is the validation data
            if validation_data is not None:

                # Reset accumulated values in loss
                # and accuracy objects
                self.loss.new_pass()
                self.accuracy.new_pass()

                # Iterate over steps
                for step in range(validation_steps):

                    # If batch size is not set -
                    # train using one step and full dataset
                    if batch_size is None:
                        batch_X = X_val
                        batch_y = y_val


                    # Otherwise slice a batch
                    else:
                        batch_X = X_val[
                            step*batch_size:(step+1)*batch_size
                        ]
                        batch_y = y_val[
                            step*batch_size:(step+1)*batch_size
                        ]

                    # Perform the forward pass
                    output = self.forward(batch_X, training=False)

                    # Calculate the loss
                    self.loss.calculate(output, batch_y)

                    # Get predictions and calculate an accuracy
                    predictions = self.output_layer_activation.predictions(output)
                    self.accuracy.calculate(predictions, batch_y)

                # Get and print validation loss and accuracy
                validation_loss = self.loss.calculate_accumulated()
                validation_accuracy = self.accuracy.calculate_accumulated()

                # Print a summary
                print(f'validation, ' +
                      f'acc: {validation_accuracy:.3f}, ' +
                      f'loss: {validation_loss:.3f}')

    # Performs forward pass
    def forward(self, X, training):

        # Call forward method on the input layer
        # this will set the output property that
        # the first layer in "prev" object is expecting
        self.input_layer.forward(X, training)

        # Call forward method of every object in a chain
        # Pass output of the previous object as a parameter
        for layer in self.layers:
            layer.forward(layer.prev.output, training)

        # "layer" is now the last object from the list,
        # return its output
        return layer.output


    # Performs backward pass
    def backward(self, output, y):

        # If softmax classifier
        if self.softmax_classifier_output is not None:
            # First call backward method
            # on the combined activation/loss
            # this will set dinputs property
            self.softmax_classifier_output.backward(output, y)

            # Since we'll not call backward method of the last layer
            # which is Softmax activation
            # as we used combined activation/loss
            # object, let's set dinputs in this object
            self.layers[-1].dinputs = self.softmax_classifier_output.dinputs

            # Call backward method going through
            # all the objects but last
            # in reversed order passing dinputs as a parameter
            for layer in reversed(self.layers[:-1]):
                layer.backward(layer.next.dinputs)

            return

        # First call backward method on the loss
        # this will set dinputs property that the last
        # layer will try to access shortly
        self.loss.backward(output, y)

        # Call backward method going through all the objects
        # in reversed order passing dinputs as a parameter
        for layer in reversed(self.layers):
            layer.backward(layer.next.dinputs)

    




### Training

In [None]:
model = Model()


# Add layers
model.add(Layer_Dense(X.shape[1], 30))
model.add(Activation_ReLU())
model.add(Layer_Dense(30, 100))
model.add(Activation_ReLU())
model.add(Layer_Dense(100, 729))
model.add(Activation_Softmax())

# Set loss, optimizer and accuracy objects
model.set(
    loss=Loss_CategoricalCrossentropy(),
    optimizer=Optimizer_Adam(decay=1e-3),
    accuracy=Accuracy_Categorical()
)

# Finalize the model
model.finalize()

# Train the model
model.train(X, y_transform, validation_data=(X_test, y_test_transform),epochs=2, batch_size=64, print_every=10)

epoch: 1
step: 0, acc: 0.111, loss: 177.975 (data_loss: 177.975, reg_loss: 0.000), lr: 0.001
step: 10, acc: 0.114, loss: 177.980 (data_loss: 177.980, reg_loss: 0.000), lr: 0.0009900990099009901
step: 20, acc: 0.112, loss: 178.017 (data_loss: 178.017, reg_loss: 0.000), lr: 0.000980392156862745
step: 30, acc: 0.105, loss: 178.101 (data_loss: 178.101, reg_loss: 0.000), lr: 0.0009708737864077671
step: 40, acc: 0.102, loss: 178.341 (data_loss: 178.341, reg_loss: 0.000), lr: 0.0009615384615384615
step: 50, acc: 0.116, loss: 178.566 (data_loss: 178.566, reg_loss: 0.000), lr: 0.0009523809523809524
step: 60, acc: 0.108, loss: 179.861 (data_loss: 179.861, reg_loss: 0.000), lr: 0.0009433962264150942
step: 70, acc: 0.112, loss: 179.912 (data_loss: 179.912, reg_loss: 0.000), lr: 0.0009345794392523365
step: 80, acc: 0.113, loss: 179.628 (data_loss: 179.628, reg_loss: 0.000), lr: 0.0009259259259259259
step: 90, acc: 0.106, loss: 179.825 (data_loss: 179.825, reg_loss: 0.000), lr: 0.0009174311926605504