The `hid_size` parameter controls the number of neurons in the hidden layer of the neural network.

The value of `epochs` is the number of times the neural network iterates over the entire training set to adjust its weights.

The parameter `learning_rate` controls the velocity at which the neural network converges: generally, the lower the value of `learning_rate`, the more refined the weights used by the neural network to make predictions get; however, lowering this value might require increasing the number of iterations on the training data to obtain good metrics.

The `batch_size` parameter affects the number of times the weights in the neural network are updated. \
Let's consider an example: if the training set contains $2000$ samples and the number of iterations is $10$, the neural network will make predictions for all $2000$ samples in one go and thus update its weights exactly $10$ times, which is the number of iterations. \
If a batch size of $100$ is used instead, during each iteration the neural network will make predictions for the first $100$ samples in the training set and adjust its weights, then repeat for the next $100$ samples, and so on, for a total of $\frac{2000}{100} = 20$ batches per iteration; this means that the neural network will update its weights $20 \cdot 10 = 200$ times instead of just $10$, which should positively affect its performance.

The parameter `skip_remaining` takes a boolean value; if `True`, the last batch will be skipped if the number of its samples is lower than that of `batch_size`. \
For example, if the number of samples is $1003$ and the value of `batch_size` is $10$, then there will be $101$ batches, with the last one containing just $3$ elements instead of $10$; if `skip_remaining` is set to `True`, this last batch will be ignored.

The `hid_activation` and `out_activation` parameters are activation functions applied respectively to the values going out of the hidden and the output layer; each function impacts how the network performs and adjusts its weights in a different way. \
The most common activation function is *ReLU*, usually applied to hidden layers, which disables negative weights.

The parameter `dropout` accepts a rate within the range $(0, 1]$; dropout is a form of regularization that, during each iteration, disables a different random subset of neurons in a layer; this forces the neural network to train on a different subset of neurons during each epoch. \
For example, if the number of hidden neurons in $100$ and a dropout rate of $0.3$ is applied, then about $30$ neurons will be disabled.

The `input_as_boolean` parameter takes a boolean value; if `True`, every non-zero value in the input data will be turned in a $1$. \
This might result useful when working on count data, though not so much when dealing with continuous data.

If the `is_generator` parameter if set to `True` the model will yield at the end of each epoch, so that it can be evaluated after every iteration.

The parameter `random_seed` allows for reproducible results; if set to `None`, each instance of the neural network will perform differently due to randomness in the initialization of its weights.
If the model is robust enough, the results should not differ exaggeratedly between multiple executions. 

In [None]:
import numpy as np
from copy import copy
from functools import wraps
from enum import Enum

In [None]:
def conditional_generator(func):
    @wraps(func)
    def wrapper(self, *args, **kwargs):
        if self.is_generator:
            return func(self, *args, **kwargs)
        for values in func(self, *args, **kwargs):
            pass
        return values
    return wrapper

In [None]:
class Activation(Enum):
    RELU = lambda x: (x > 0) * x, \
           lambda x: x > 0

    SIGMOID = lambda x: 1 / (1 + np.exp(-x)), \
              lambda x: x * (1 - x)

    TANH = lambda x: np.tanh(x), \
           lambda x: 1 - (x ** 2),

    SOFTMAX = lambda x: (ex := np.exp(x)) / np.sum(ex, axis=1, keepdims=True), \
              lambda x: (_ for _ in ()).throw(Exception("activation function softmax only works on output"))

In [None]:
class BasicNeuralNetwork:

    def __init__(self, hid_size, epochs=3, learning_rate=1e-3, batch_size=32, skip_remaining=True,
                 hid_activation=None, out_activation=None, dropout=1,
                 input_as_boolean=False, is_generator=False, random_seed=None):
        self.hid_size = hid_size
        self.epochs = epochs
        self.learning_rate = learning_rate
        self.batch_size = batch_size
        self.skip_remaining = skip_remaining
        self.hid_activation = hid_activation
        self.out_activation = out_activation
        self.dropout = dropout
        self.input_as_boolean = input_as_boolean
        self.is_generator = is_generator
        self.random_seed = random_seed

        if self.hid_activation is not None:
            self.__hid_activation_fun, self.__hid_activation_deriv = self.hid_activation.value
        if self.out_activation is not None:
            self.__out_activation_fun, _ = self.out_activation.value

        self.__rng = np.random.default_rng(seed=self.random_seed)

    @property
    def input_to_hidden_coeffs(self):
        return copy(self._coeffs_in_to_hid)

    @property
    def hidden_to_output_coeffs(self):
        return copy(self._coeffs_hid_to_out)

    @conditional_generator
    def fit(self, train_samples, train_labels):
        assert(len(train_samples) == len(train_labels))
        # initialize coefficients with values between -0.1 and 0.1
        self._coeffs_in_to_hid = 0.2 * self.__rng.random((len(train_samples.T), self.hid_size)) - 0.1
        self._coeffs_hid_to_out = 0.2 * self.__rng.random((self.hid_size, len(train_labels.T))) - 0.1
        if self.input_as_boolean:
            train_samples = np.array(train_samples).astype(bool)
        for _ in range(self.epochs):
            for i in range(0, len(train_samples), self.batch_size):
                samples = train_samples[i:i + self.batch_size]
                labels = train_labels[i:i + self.batch_size]
                batch_size = len(samples)
                if batch_size < self.batch_size and self.skip_remaining:
                    continue

                dropout_mask = self.__rng.choice((0, 1), size=(batch_size, self.hid_size),
                                                 p=(self.dropout, 1 - self.dropout))

                layers_in = samples
                layers_hid = layers_in.dot(self._coeffs_in_to_hid)
                if self.hid_activation is not None:
                    layers_hid = self.__hid_activation_fun(layers_hid)
                layers_hid *= dropout_mask * (1 / self.dropout)
                layers_out = layers_hid.dot(self._coeffs_hid_to_out)
                if self.out_activation is not None:
                    layers_out = self.__out_activation_fun(layers_out)

                deltas_out = labels - layers_out
                deltas_hid = deltas_out.dot(self._coeffs_hid_to_out.T)
                if self.hid_activation is not None:
                    deltas_hid *= self.__hid_activation_deriv(layers_hid)
                deltas_hid *= dropout_mask

                self._coeffs_hid_to_out += layers_hid.T.dot(deltas_out) * self.learning_rate
                self._coeffs_in_to_hid += layers_in.T.dot(deltas_hid) * self.learning_rate
            yield self

    def predict(self, samples, normalize=False):
        assert(len(samples.T) == len(self._coeffs_in_to_hid))
        if self.input_as_boolean:
            samples = np.array(samples).astype(bool)
        layers_in = samples
        layers_hid = layers_in.dot(self._coeffs_in_to_hid)
        if self.hid_activation is not None:
            layers_hid = self.__hid_activation_fun(layers_hid)
        layers_out = layers_hid.dot(self._coeffs_hid_to_out)
        if self.out_activation is not None:
            layers_out = self.__out_activation_fun(layers_out)
        if normalize and self.out_activation is not Activation.SOFTMAX:
            to_probs, _ = Activation.SOFTMAX.value
            layers_out = to_probs(layers_out)
        return layers_out

    def evaluate(self, samples, labels):
        assert(len(samples) == len(labels))
        assert(len(labels.T) == len(self._coeffs_hid_to_out.T))
        preds = self.predict(samples, normalize=False)
        errors = ((labels - preds) ** 2).sum(axis=0)
        loss = sum(errors) / len(preds)
        n_correct = sum([np.argmax(pred) == np.argmax(label)
                        for pred, label in zip(preds, labels)])
        accuracy = n_correct / len(preds)
        return loss, accuracy

In [None]:
class BasicSparseNeuralNetwork:

    def __init__(self, n_total, hid_size, epochs=3, learning_rate=1e-3, batch_size=32, skip_remaining=True,
                 hid_activation=None, out_activation=None, dropout=1,
                 input_as_boolean=False, is_generator=False, random_seed=None):

        self.n_total = n_total
        self.hid_size = hid_size
        self.epochs = epochs
        self.learning_rate = learning_rate
        self.batch_size = batch_size
        self.skip_remaining = skip_remaining
        self.hid_activation = hid_activation
        self.out_activation = out_activation
        self.dropout = dropout
        self.input_as_boolean = input_as_boolean
        self.is_generator = is_generator
        self.random_seed = random_seed

        if self.hid_activation is not None:
            self.__hid_activation_fun, self.__hid_activation_deriv = self.hid_activation.value
        if self.out_activation is not None:
            self.__out_activation_fun, _ = self.out_activation.value

        self.__rng = np.random.default_rng(seed=self.random_seed)

    @property
    def input_to_hidden_coeffs(self):
        return copy(self._coeffs_in_to_hid)

    @property
    def hidden_to_output_coeffs(self):
        return copy(self._coeffs_hid_to_out)

    @conditional_generator
    def fit(self, train_samples, train_labels):
        assert(len(train_samples) == len(train_labels))

        # initialize coefficients using values between -0.1 and 0.1
        self._coeffs_in_to_hid = 0.2 * self.__rng.random((self.n_total, self.hid_size)) - 0.1
        self._coeffs_hid_to_out = 0.2 * self.__rng.random((self.hid_size, len(train_labels.T))) - 0.1

        for _ in range(self.epochs):
            for i in range(0, len(train_samples), self.batch_size):
                samples = train_samples[i:i + self.batch_size]
                labels = train_labels[i:i + self.batch_size]
                batch_size = len(samples)
                if batch_size < self.batch_size and self.skip_remaining:
                    continue

                dropout_mask = self.__rng.choice((0, 1), size=(batch_size, self.hid_size),
                                                 p=(self.dropout, 1 - self.dropout))

                layers_in, layers_hid = [], []
                for j in range(batch_size):
                    layer_in = samples[j]
                    if not self.input_as_boolean:
                        layer_hid = layer_in.T[1].dot(self._coeffs_in_to_hid[layer_in.T[0]])
                    else:
                        layer_hid = self._coeffs_in_to_hid[layer_in.T[0]].sum(axis=0)
                    layers_in.append(layer_in), layers_hid.append(layer_hid)

                layers_hid = np.array(layers_hid)
                if self.hid_activation is not None:
                    layers_hid = self.__hid_activation_fun(layers_hid)
                layers_hid *= dropout_mask * (1 / self.dropout)
                layers_out = layers_hid.dot(self._coeffs_hid_to_out)
                if self.out_activation is not None:
                    layers_out = self.__out_activation_fun(layers_out)

                deltas_out = labels - layers_out
                deltas_hid = deltas_out.dot(self._coeffs_hid_to_out.T)
                if self.hid_activation is not None:
                    deltas_hid *= self.__hid_activation_deriv(layers_hid)
                deltas_hid *= dropout_mask

                self._coeffs_hid_to_out += layers_hid.T.dot(deltas_out) * self.learning_rate
                for j in range(batch_size):
                    self._coeffs_in_to_hid[layers_in[j].T[0]] += deltas_hid[j] * self.learning_rate
            yield

    def predict(self, samples, normalize=False):
        layers_hid = []
        for sample in samples:
            layer_in = sample
            if not self.input_as_boolean:
                layer_hid = layer_in.T[1].dot(self._coeffs_in_to_hid[layer_in.T[0]])
            else:
                layer_hid = self._coeffs_in_to_hid[layer_in.T[0]].sum(axis=0)
            if self.hid_activation is not None:
                layer_hid = self.__hid_activation_fun(layer_hid)
            layers_hid.append(layer_hid)

        layers_hid = np.array(layers_hid)
        layers_out = layers_hid.dot(self._coeffs_hid_to_out)
        if self.out_activation is not None:
            layers_out = self.__out_activation_fun(layers_out)
        if normalize and self.out_activation is not Activation.SOFTMAX:
            to_probs, _ = Activation.SOFTMAX.value
            layers_out = to_probs(layers_out)
        return layers_out

    def evaluate(self, samples, labels):
        assert(len(samples) == len(labels))
        assert(len(labels.T) == len(self._coeffs_hid_to_out.T))
        preds = self.predict(samples, normalize=False)
        errors = ((labels - preds) ** 2).sum(axis=0)
        loss = sum(errors) / len(preds)
        n_correct = sum([np.argmax(pred) == np.argmax(label)
                        for pred, label in zip(preds, labels)])
        accuracy = n_correct / len(preds)
        return loss, accuracy