[[Neural Networks from Scratch]]

##### Why Might We Need to Retrieve Model Parameters?
There exists situations where we need to inspect model parameters to see if we have dead or exploding neurons. To retrieve said parameters, we will iterate over the trainable layer, take their parameters, and put them into a list. The only trainable layer type that we have here is the Dense layer. Let's add a method to the `Layer_Dense` class to retrieve parameters:


In [None]:
# Dense Layer
class Layer_Dense:
	# rest of code...
	def get_parameters(self):
		return self.weights, self.biases


Within the `Model` class, we'll add a `get_parameters` method, which will iterate over the trainable layers of the model, run their `get_parameters` method, and append returned weights and biases to a list:

In [None]:
# Model class
class Model:
	# rest of code...
	def get_parameters(self):
		# initialises an empty list to store parameters
		parameters = []

		# iterate over trainable layers and get their parameters
		for layer in self.trainable_layer:
			parameters.append(layer.get_parameters())

		# return a list
		return parameters


After training a model, we get the parameters by running:

In [None]:
parameters = model.get_parameters()


##### Using `get_parameters` Method in a Practical Example

In [None]:
# Create dataset
X, y, X_test, y_test = create_data_mnist('fashion_mnist_images')
# Shuffle the training dataset
keys = np.array(range(X.shape[0]))
np.random.shuffle(keys)
X = X[keys]
y = y[keys]
# Scale and reshape samples
X = (X.reshape(X.shape[0], -1).astype(np.float32) - 127.5) / 127.5
X_test = (X_test.reshape(X_test.shape[0], -1).astype(np.float32) - 127.5) / 127.5
# Instantiate the model
model = Model()
# Add layers
model.add(Layer_Dense(X.shape[1], 128))
model.add(Activation_ReLU())
model.add(Layer_Dense(128, 128))
model.add(Activation_ReLU())
model.add(Layer_Dense(128, 10))
model.add(Activation_Softmax())
# Set loss, optimiser, and accuracy objects
model.set(
    loss=Loss_CategoricalCrossentropy(),
    optimiser=Optimiser_Adam(decay=1e-3),
    accuracy=Accuracy_Categorical()
)
# Finalise the model
model.finalise()
# Train the model
model.train(X, y, validation_data=(X_test, y_test), epochs=10, batch_size=128, print_every=100)
# Retrieve and print parameters
parameters = model.get_parameters()
print(parameters)


The output:
[(array([[ 0.03538642, 0.00794717, -0.04143231, ..., 0.04267325,
-0.00935107, 0.01872394],
[ 0.03289384, 0.00691249, -0.03424096, ..., 0.02362755,
-0.00903602, 0.00977725],
[ 0.02189022, -0.01362374, -0.01442819, ..., 0.01320345,
-0.02083327, 0.02499157],
...,
[ 0.0146937 , -0.02869027, -0.02198809, ..., 0.01459295,
-0.02335824, 0.00935643],
[-0.00090149, 0.01082182, -0.06013806, ..., 0.00704454,
-0.0039093 , 0.00311571],
[ 0.03660082, -0.00809607, -0.02737131, ..., 0.02216582,
-0.01710589, 0.01578414]], dtype=float32), array([[-2.24505737e-02,
5.40090213e-03, 2.91307438e-02,
-1.04323691e-02, -9.52822249e-03, -1.48109728e-02,
...,
0.04158591, -0.01614098, -0.0134403 , 0.00708392, 0.0284729 ,
0.00336277, -0.00085383, 0.00163819]], dtype=float32)),
(array([[-0.00196577, -0.00335329, -0.01362851, ..., 0.00397028,
0.00027816, 0.00427755],
[ 0.04438829, -0.09197803, 0.02897452, ..., -0.11920264,
0.03808296, -0.00536136],
[ 0.04146343, -0.03637529, 0.04973305, ..., -0.13564698,
-0.08259197, -0.02467288],
...,
[ 0.03495856, 0.03902597, 0.0028984 , ..., -0.10016892,
-0.11356542, 0.05866433],
[-0.00857899, -0.02612676, -0.01050871, ..., -0.00551328,
-0.01432311, -0.00916382],
[-0.20444085, -0.01483698, -0.09321352, ..., 0.02114356,
-0.0762504 , 0.03600615]], dtype=float32), array([[-0.0103433 ,
-0.00158314, 0.02268587, -0.02352985, -0.02144126,
-0.00777614, 0.00795028, -0.00622872, 0.06918745, -0.00743477]],
dtype=float32))]

##### How Do We Set Parameters in a Model?
We implement a `set_parameters` method in `Layer_Dense` and `Model` classes.

###### `Layer_Dense`

In [None]:
# Dense layer
class Layer_Dense:
    ...
    # Set weights and biases in a layer instance
    def set_parameters(self, weights, biases):
        self.weights = weights
        self.biases = biases


###### `Model`

In [None]:
# Model class
class Model:
    ...
    # Updates the model with new parameters
    def set_parameters(self, parameters):
        # Iterate over the parameters and layers
        # and update each layer with each set of the parameters
        for parameter_set, layer in zip(parameters, self.trainable_layers):
            layer.set_parameters(*parameter_set)


Explanation of the code in `Model`:

- `zip()` takes a list of parameters and a list of layers and returns an iterator containing tuples of 0th elements of both lists, then the 1st elements of both list, and so on.
- With `zip()`, we are iterating over parameters and the layer they belong to at the same time.
- Our parameters are a tuple of weights and biases so we **unpack them** with a starred expression `*` so that `Layer_Dense` method can take them as seperate parameters.

##### Updating `finalise` function in `Model` class so that we only set a list of trainable layers to the loss function if and only if this loss object exists

In [None]:
# Model class
class Model:
    ...
    # Finalise the model
    def finalise(self):
        ...
        # Update loss object with trainable layers
        if self.loss is not None:
            self.loss.remember_trainable_layers(self.trainable_layers)


##### Changing the `Model` class' `set` function to allow us to pass in only given parameters

In [None]:
# Set loss, optimiser, and accuracy
def set(self, *, loss=None, optimiser=None, accuracy=None):
    if loss is not None:
        self.loss = loss
    if optimiser is not None:
        self.optimiser = optimiser
    if accuracy is not None:
        self.accuracy = accuracy


##### Training Loop which retrieves the model parameters, creates a new model, and set its parameters with those retrieved from the previously-trained model

In [None]:
# Create dataset
X, y, X_test, y_test = create_data_mnist('fashion_mnist_images')
# Shuffle the training dataset
keys = np.array(range(X.shape[0]))
np.random.shuffle(keys)
X = X[keys]
y = y[keys]
# Scale and reshape samples
X = (X.reshape(X.shape[0], -1).astype(np.float32) - 127.5) / 127.5
X_test = (X_test.reshape(X_test.shape[0], -1).astype(np.float32) - 127.5) / 127.5
# Instantiate the model
model = Model()
# Add layers
model.add(Layer_Dense(X.shape[1], 128))
model.add(Activation_ReLU())
model.add(Layer_Dense(128, 128))
model.add(Activation_ReLU())
model.add(Layer_Dense(128, 10))
model.add(Activation_Softmax())
# Set loss, optimiser, and accuracy objects
model.set(
    loss=Loss_CategoricalCrossentropy(),
    optimiser=Optimiser_Adam(decay=1e-4),
    accuracy=Accuracy_Categorical()
)
# Finalise the model
model.finalise()
# Train the model
model.train(X, y, validation_data=(X_test, y_test), epochs=10, batch_size=128, print_every=100)
# Retrieve model parameters
parameters = model.get_parameters()
# New model
# Instantiate the model
model = Model()
# Add layers
model.add(Layer_Dense(X.shape[1], 128))
model.add(Activation_ReLU())
model.add(Layer_Dense(128, 128))
model.add(Activation_ReLU())
model.add(Layer_Dense(128, 10))
model.add(Activation_Softmax())
# Set loss and accuracy objects
# We do not set optimiser object this time - there's no need to do it
# as we won't train the model
model.set(
    loss=Loss_CategoricalCrossentropy(),
    accuracy=Accuracy_Categorical()
)
# Finalise the model
model.finalise()
# Set model with parameters instead of training it
model.set_parameters(parameters)
# Evaluate the model
model.evaluate(X_test, y_test)

Output:
validation, acc: 0.874, loss: 0.354


##### How Do We save and Load Model Parameters?
We add a `save_parameters` method in the `Model` class using Python's built-in *pickle* module to serialise any Python object.

###### Importing *pickle*

In [None]:
import pickle


###### Opening a file in the binary-write mode and saving parameters to it using `pickle.dump`

In [None]:
# Model class
class Model:
    ...
    # Saves the parameters to a file
    def save_parameters(self, path):
        # Open a file in the binary-write mode
        # and save parameters to it
        with open(path, 'wb') as f:
            pickle.dump(self.get_parameters(), f)


###### Saving parameters of a train model

In [None]:
model.save_parameters('fashion_mnist.parms')


##### How Do We Load Parameters from a File?
We open the file in binary-read mode and have `pickle` read from it, deserialising parameters back into a list.

###### calling `set_parameters` method that we created earlier to pass loaded parameters into

In [None]:
# Loads the weights and updates a model instance with them
def load_parameters(self, path):
    # Open file in the binary-read mode,
    # load weights and update trainable layers
    with open(path, 'rb') as f:
        self.set_parameters(pickle.load(f))


###### Setting up a model, loading in the parameters file, and testing the model to check if it works

In [None]:
# Create dataset
X, y, X_test, y_test = create_data_mnist('fashion_mnist_images')
# Shuffle the training dataset
keys = np.array(range(X.shape[0]))
np.random.shuffle(keys)
X = X[keys]
y = y[keys]
# Scale and reshape samples
X = (X.reshape(X.shape[0], -1).astype(np.float32) - 127.5) / 127.5
X_test = (X_test.reshape(X_test.shape[0], -1).astype(np.float32) - 127.5) / 127.5
# Instantiate the model
model = Model()
# Add layers
model.add(Layer_Dense(X.shape[1], 128))
model.add(Activation_ReLU())
model.add(Layer_Dense(128, 128))
model.add(Activation_ReLU())
model.add(Layer_Dense(128, 10))
model.add(Activation_Softmax())
# Set loss and accuracy objects
# We do not set optimiser object this time - there's no need to do it
# as we won't train the model
model.set(
    loss=Loss_CategoricalCrossentropy(),
    accuracy=Accuracy_Categorical()
)
# Finalise the model
model.finalise()
# Set model with parameters instead of training it
model.load_parameters('fashion_mnist.parms')
# Evaluate the model
model.evaluate(X_test, y_test)

Output:
validation, acc: 0.874, loss: 0.354

##### Why Save the Entire Model?
Earlier, we saved just weights and biases which can, for instance, initialise a model with those weights, trained from similar data, and then train that model to work with your specific data. This is called transfer learning which loads models faster and uses less memory.

Saving the entire model will also allow us to load the optimiser's state and model's structure.

###### First we import the `copy` module:

In [None]:
import copy

###### Then we make a `copy` method in the `Model` class:

In [None]:
# Saves the model
def save(self, path):
    # Make a deep copy of current model instance
    model = copy.deepcopy(self)


While `copy` is faster, it only copies the first level of the object's properties. `deepcopy` recursively traverses all objects and creates a full copy.

###### Remove accumulated loss and accuracy:

In [None]:
# Reset accumulated values in loss and accuracy objects
model.loss.new_pass()
model.accuracy.new_pass()


###### Remove any data in the input layer, and reset the gradients, if any exist:

In [None]:
# Remove data from input layer
# and gradients from the loss object
model.input_layer.__dict__.pop('output', None)
model.loss.__dict__.pop('dinputs', None)

Both `model.input_layer` and `model.loss` are attributes of the `Model` object but also objects themselves. One of the dunder properties ("dunder" = double underscores) that exists for all classes is the `__dict__` property which contains names and values for the class object's properties. 

We can use the `pop` method on these values which removes them from that instance of the class' object. We set the second parameters of `pop` - the default value that is returned if key doesn't exist - to `None` as we don't intend to catch the removed values.

###### Next, we iterate over all the layers to remove their properties:

In [None]:
# For each layer remove inputs, output and dinputs properties
for layer in model.layers:
    for property in ['inputs', 'output', 'dinputs', 'dweights', 'dbiases']:
        layer.__dict__.pop(property, None)


With these things cleaned up, we can save the model object.
###### Opening a file in binary-write mode, and calling `pickle.dump()`

In [None]:
# Open a file in the binary-write mode and save the model
with open(path, 'wb') as f:
    pickle.dump(model, f)


###### The full `save` method:

In [None]:
# Saves the model
def save(self, path):
    # Make a deep copy of current model instance
    model = copy.deepcopy(self)
    # Reset accumulated values in loss and accuracy objects
    model.loss.new_pass()
    model.accuracy.new_pass()
    # Remove data from the input layer
    # and gradients from the loss object
    model.input_layer.__dict__.pop('output', None)
    model.loss.__dict__.pop('dinputs', None)
    # For each layer remove inputs, output and dinputs properties
    for layer in model.layers:
        for property in ['inputs', 'output', 'dinputs', 'dweights', 'dbiases']:
            layer.__dict__.pop(property, None)
    # Open a file in the binary-write mode and save the model
    with open(path, 'wb') as f:
        pickle.dump(model, f)


###### We can train a model, then save it whenever we wish with:

In [None]:
model.save('fashion_mnist.model')


##### How do we load the entire model?
We load a modal ideally before a model object even exists.
###### Loading a model by calling a method of the `Model` class instead of the object:

In [None]:
model = Model.load('fashion_mnist.model')


###### Using the `@staticmethod` decorator to run on uninitialised objects where the `self` doesn't exist to immediately create a model object without first needing to instantiate a model object:

In [None]:
# Loads and returns a model
@staticmethod
def load(path):
    # Open file in the binary-read mode, load a model
    with open(path, 'rb') as f:
        model = pickle.load(f)
    # Return a model
    return model



###### Creating the data, then loading a model to see if it works:

In [None]:
# Create dataset
X, y, X_test, y_test = create_data_mnist('fashion_mnist_images')
# Shuffle the training dataset
keys = np.array(range(X.shape[0]))
np.random.shuffle(keys)
X = X[keys]
y = y[keys]
# Scale and reshape samples
X = (X.reshape(X.shape[0], -1).astype(np.float32) - 127.5) / 127.5
X_test = (X_test.reshape(X_test.shape[0], -1).astype(np.float32) - 127.5) / 127.5
# Load the model
model = Model.load('fashion_mnist.model')
# Evaluate the model
model.evaluate(X_test, y_test)

Output:
validation, acc: 0.874, loss: 0.354

Saving the full trained model is a common way of saving a model. It saves parameters (weights and biases) and instances of all the model's objects and the data they generated. 

Examples of data generated:
- optimiser state like cache
- learning rate decay
- full model structure
,etc.

##### Next Step
[[Prediction Inference]]