## Deep Convolutional Classificator using Galaxy Zoo dataset

![](ngc2841.jpg)

In this project, we will make a convolutional neural network (hereafter CNN) to classify the different astronomical images contained on the [Galaxy Zoo dataset](https://www.kaggle.com/competitions/galaxy-zoo-the-galaxy-challenge/overview). This set has 61.578 training images and 79.975 test images, which will be used to train and test the deep convolutional classificator. The idea is to analyze the JPG images of galaxies to find automated metrics that reproduce the probability distributions derived from human classifications. For each galaxy, determine the probability that it belongs in a particular class.

In July 2007, astronomers from Oxford University had in their possession a data set of approximately 1 million galaxies imaged by the Sloan Digital Sky Survey (SDSS). But the galaxies in this data set needed to have their morphologies (shapes) classified in order to be used to better understand galactic processes. With so many galaxies, it would have taken an individual a thousand lifetimes to classify all of them. Instead, [Galaxy Zoo was born](https://www.zooniverse.org/projects/zookeeper/galaxy-zoo/).

Galaxy Zoo took an innovative approach that brought astronomy to the general public: log on and help classify a galaxy, driven to a successful citizen science crowdsourcing project.

It was initially assumed that despite outsourcing the work to thousands in the general public, it would still take years for all of the images to be classified. Within the first 24 hours of launch, Galaxy Zoo founders were stunned to be receiving nearly 70 000 classifications an hour. In the end, more than 50 million classifications were received by the project during its first year, contributed by more than 150 000 people.

The Galaxy Zoo project has gone through four iterations. The first was focused on deciding if a galaxy was elliptical, spiral (including direction), or a merger of two galaxies.

*Galaxy Zoo 2* asked for more details on bright SDSS galaxies. These detailed classifications include (amongst others) measurements of the bulge size, presence of bars, and the structure of spiral arms. This is what it is aimed in this project, to construct a machine learning algorithm able to classify a given galaxy based on the classification rules defined in Galaxy Zoo 2. These rules are described in the image below.

<figure>
<img src="Figures/decision_tree_of_classifications.png" style="width:80%">
</figure>

##### Weighting the responses

The values of the morphology categories in the solution file are computed as follows. For the first set of responses (smooth, features/disk, star/artifact), the values in each category are simply the likelihood of the galaxy falling in each category. These values sum to 1.0. For each subsequent question, the probabilities are first computed (these will sum to 1.0) and then multiplied by the value which led to that new set of responses. 

Here is a simplified example: a galaxy had 80% of users identify it as smooth, 15% as having features/disk, and 5% as a star/artifact.

| Class | Probability of being that class |
| :---: | :---: |
| Class1.1 | 0.80 |
| Class1.2 | 0.15 |
| Class1.3 | 0.05 |

For the 80% of users that identified the galaxy as "smooth", they also recorded responses for the galaxy's relative roundness. These votes were for 50% completely round, 25% in-between, and 25% cigar-shaped. The values in the solution file are thus:

| Class | Weighted probability of being that class |
| :---: | :---: |
| Class 7.1 | 0.80 * 0.50 = 0.40 |
| Class 7.2 | 0.80 * 0.25 = 0.20 |
| Class 7.3 | 0.80 * 0.25 = 0.20 |

The reason for this weighting is to emphasize that a good solution must get the high-level, large-scale morphology categories correct. The best solutions, though, will also have high levels of accuracy on the detailed solutions that are further down the decision tree.

| Question | Number of possible answers |
| --- | --- |
| Q1. Is the object a smooth galaxy, a galaxy with features/disk or a star? | 3 responses |
| Q2. Is it edge-on? | 2 responses |
| Q3. Is there a bar? | 2 responses |
| Q4. Is there a spiral pattern? | 2 responses |
| Q5. How prominent is the central bulge? | 4 responses |
| Q6. Is there anything "odd" about the galaxy? | 2 responses |
| Q7. How round is the smooth galaxy? | 3 responses |
| Q8. What is the odd feature? | 7 responses |
| Q9. What shape is the bulge in the edge-on galaxy? | 3 responses |
| Q10. How tightly wound are the spiral arms? | 3 responses |
| Q11. How many spiral arms are there? | 6 responses |

As a result, at each node or question, the total initial probability of a classification will sum to 1.0.

We will implement a CNN whose aim is to predict the type of galaxy based on the above table. Let us start importing the packages we will use.

In [1]:
# --- Torch tools
import torch
from torch import nn
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torchvision.datasets.folder import default_loader
from torchvision.transforms import ToTensor, Lambda, Compose
from torchvision import datasets, transforms
import torch.optim as optim

# --- Data visualization
import matplotlib.pyplot as plt
import matplotlib.font_manager

# --- Data analysis
import numpy as np
import pandas as pd
import seaborn as sns
#import sklearn.metrics as met

import os
from tqdm import tqdm
%matplotlib inline

Before we proceed, we will go through a brief explanation of what is a CNN.

As it is said in this [IBM article](https://www.ibm.com/topics/convolutional-neural-networks), neural networks are a subset of machine learning, and they are at the heart of deep learning algorithms. They are comprised of node layers, containing an input layer, one or more hidden layers, and an output layer. Each node connects to another and has an associated weight and threshold, as it is shown below.

<figure>
    <img src="Figures/neural_networks-001.png" style="width:80%">
    <figcaption>Source: https://tikz.net/neural_networks/</figcaption>
</figure>

If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network. Otherwise, no data is passed along to the next layer of the network, as is outlined below.

<figure>
    <img src="Figures/activation_function.ppm" style="width:80%">
    <figcaption>Source: Google Images (2019)</figcaption>
</figure>

Convolutional neural networks are distinguished from other neural networks by their superior performance with image, speech, or audio signal inputs. They have three main types of layers, which are:
- Convolutional layer
- Pooling layer
- Fully-connected (FC) layer

The *convolutional layer* is the first layer of a convolutional network. While convolutional layers can be followed by additional convolutional layers or *pooling layers*, the *fully-connected* layer is the final layer. With each layer, the CNN increases in its complexity, identifying greater portions of the image. Earlier layers focus on simple features, such as colors and edges. As the image data progresses through the layers of the CNN, it starts to recognize larger elements or shapes of the object until it finally identifies the intended object.

#### Convolutional layer

The convolutional layer is the core building block of a CNN, and it is where the majority of computation occurs. It requires a few components, which are:

- Input data: Since in this projects we are going to work with galaxy images, let’s assume that the input will be a color image. It is made up of a matrix of pixels in 3D, which means that the input will have three dimensions (a height, width, and depth) which correspond to RGB in an image.
- Filter: It is the feature detector, also known as a kernel, which will move across the receptive fields of the image, checking if the feature is present. This process is known as a *convolution*. The feature detector is a two-dimensional (2-D) array of weights, which represents part of the image. They can vary in size, which determines the size of the receptive field. The filter is then applied to an area of the image, and a dot product is calculated between the input pixels and the filter. This dot product is then fed into an output array. Afterwards, the filter shifts by a *stride*, repeating the process until the kernel has swept across the entire image.
- Feature map: The final output from the series of dot products from the input and the filter is known as a feature map, activation map, or a convolved feature.

After each convolution operation, a CNN applies an activation function transformation to the feature map, introducing nonlinearity to the model. 

As we mentioned earlier, another convolution layer can follow the initial convolution layer. When this happens, the structure of the CNN can become hierarchical as the later layers can see the pixels within the receptive fields of prior layers.

<a title="Vincent Dumoulin, Francesco Visin, MIT &lt;http://opensource.org/licenses/mit-license.php&gt;, via Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File:Convolution_arithmetic_-_Padding_strides.gif"><img width="512" alt="Convolution arithmetic - Padding strides" src="https://upload.wikimedia.org/wikipedia/commons/0/04/Convolution_arithmetic_-_Padding_strides.gif"></a>

#### Pooling layer

Pooling layers, also known as downsampling, conducts dimensionality reduction, reducing the number of parameters in the input. Its purpose is to gradually shrink the representation’s spatial dimension. Similar to the convolutional layer, the pooling operation sweeps a filter across the entire input, but the difference is that this filter does not have any weights. Instead, the kernel applies an aggregation function to the values within the receptive field, populating the output array. There are two main types of pooling:

- Max pooling: As the filter moves across the input, it selects the pixel with the maximum value to send to the output array. As an aside, this approach tends to be used more often compared to average pooling.
- Average pooling: As the filter moves across the input, it calculates the average value within the receptive field to send to the output array.

<a title="Rafay Qayyum - Introduction To Pooling Layers In CNN" href="https://pub.towardsai.net/introduction-to-pooling-layers-in-cnn-dafe61eabe34"><img width="512" alt="Introduction To Pooling Layers In CNN" src="https://miro.medium.com/v2/resize:fit:828/1*fXxDBsJ96FKEtMOa9vNgjA.gif"></a>

While a lot of information is lost in the pooling layer, it also has a number of benefits to the CNN. They help to reduce complexity, improve efficiency, and limit risk of overfitting.

#### Fully-connected layer

The pixel values of the input image are not directly connected to the output layer in partially connected layers. However, in the fully-connected layer, each node in the output layer connects directly to a node in the previous layer.

This kind of layers perform the task of classification based on the features extracted through the previous layers and their different filters. While convolutional and pooling layers tend to use ReLu functions, for example, fully-connected layers usually leverage a softmax activation function to classify inputs appropriately, producing a probability from 0 to 1. This is what we will use in this project.

<figure>
    <img src="Figures/fully_cnn.jpg" style="width:100%">
    <figcaption>Source: https://developersbreach.com/convolution-neural-network-deep-learning/</figcaption>
</figure>

In [2]:
# ------------------------------- Plot features ------------------------------
# Properties to decorate the plots.
plt.rcParams['axes.linewidth'] = 0.5
plt.rcParams['text.usetex'] = False
plt.rcParams['font.family'] = 'serif'   
plt.rcParams['font.sans-serif'] = 'New Century Schoolbook' # 'Times', 'Liberation Serif', 'Times New Roman'
#plt.rcParams['font.serif'] = ['Helvetica']
plt.rcParams['font.size'] = 10
plt.rcParams['legend.frameon'] = False
plt.rcParams['legend.edgecolor'] = 'k'
plt.rcParams['legend.markerscale'] = 7
plt.rcParams['xtick.minor.visible'] = True
plt.rcParams['ytick.minor.visible'] = True
plt.rcParams['xtick.top'] = False
plt.rcParams['ytick.right'] = False
plt.rcParams['xtick.direction'] = 'in'
plt.rcParams['ytick.direction'] = 'in'
plt.rcParams['xtick.major.width']= 0.5
plt.rcParams['xtick.major.size']= 5.0
plt.rcParams['xtick.minor.width']= 0.5
plt.rcParams['xtick.minor.size']= 3.0
plt.rcParams['ytick.major.width']= 0.5
plt.rcParams['ytick.major.size']= 5.0
plt.rcParams['ytick.minor.width']= 0.5
plt.rcParams['ytick.minor.size']= 3.0
# ----------------------------------------------------------------------------

We need to load the images of the dataset. It means we need to have in memory the *training* and *testing* images. To do this, we will use the `default_loader` function of Torch, which will enable us to convert them to *tensors*, the essential data unit of Torch. In addition, we need to load the *probability distributions* for each image. Let us do this firts.

In [3]:
probabilities = pd.read_csv('training_solutions_rev1.csv', sep=',')

FileNotFoundError: [Errno 2] No such file or directory: 'training_solutions_rev1.csv'

In [4]:
# Let us get some info of the probabilities dataframe
probabilities.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 61578 entries, 0 to 61577
Data columns (total 38 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   GalaxyID   61578 non-null  int64  
 1   Class1.1   61578 non-null  float64
 2   Class1.2   61578 non-null  float64
 3   Class1.3   61578 non-null  float64
 4   Class2.1   61578 non-null  float64
 5   Class2.2   61578 non-null  float64
 6   Class3.1   61578 non-null  float64
 7   Class3.2   61578 non-null  float64
 8   Class4.1   61578 non-null  float64
 9   Class4.2   61578 non-null  float64
 10  Class5.1   61578 non-null  float64
 11  Class5.2   61578 non-null  float64
 12  Class5.3   61578 non-null  float64
 13  Class5.4   61578 non-null  float64
 14  Class6.1   61578 non-null  float64
 15  Class6.2   61578 non-null  float64
 16  Class7.1   61578 non-null  float64
 17  Class7.2   61578 non-null  float64
 18  Class7.3   61578 non-null  float64
 19  Class8.1   61578 non-null  float64
 20  Class8

In [None]:
# Create a list to store the image tensors, their IDs and their probabilities
image_list = []
ID_list = []
probabilities_list = []

# CenterCrop object to crop the image
center_crop = transforms.CenterCrop(256)

# Iterate over the images in the folder
for filename in tqdm(os.listdir('images_training_rev1')):
    # Load each image using the default loader from torchvision
    image = default_loader(os.path.join('images_training_rev1', filename))
    
    # Crop the image to the given dimension
    cropped_image = center_crop(image)
    
    # Convert the image to a tensor
    tensor_image = torch.tensor(np.array(cropped_image, dtype=np.single), dtype=torch.float)/255.0
    
    # Reshape the tensor
    tensor_image = tensor_image.reshape(3, 256, 256)
    
    # Append the tensor to the image list
    image_list.append(tensor_image)
    
    # Append the ID to the ID list
    ID_temp = int(filename[:-4])
    ID_list.append(ID_temp)
    
    # Append the probability to the probabilities list
    probabilities_list.append(probabilities.loc[probabilities['GalaxyID'] == ID_temp].to_numpy(dtype=np.single).reshape(-1)[1:])

# Concatenate the list of image tensors into a single tensor
image_list = torch.stack(image_list)
# Convert the list of IDs into a Torch tensor of integers
ID_list = torch.tensor(ID_list, dtype=torch.int32)
# Convert the list of probability distributions into a Torch tensor
probabilities_list = torch.tensor(np.array(probabilities_list), dtype=torch.float)
print("¡Listo!")

 18%|█████████████▉                                                               | 11103/61578 [05:37<26:08, 32.18it/s]

In [None]:
# Let us check the dimenions of the tensors
print(image_list.shape, "\n")
print(ID_list.shape, "\n")
print(probabilities_list.shape)

We need to define the training and testing dataset. To do so, we will consider the testing set composed of the 15% of the images of the original dataset, while the other 85% will remain as the training dataset.

In [None]:
prop = 0.15
testing_size = int(len(ID_list)*prop)
training_size = len(ID_list) - testing_size

# Check the sizes
print("Testing size: ", testing_size, "\n")
print("Training size: ", training_size)

Let us generate a sequence of random integers that are the indexes of the images and their IDs.

In [None]:
testing_indexes = np.random.randint(0, len(ID_list) - 1, testing_size)
training_indexes = np.delete(np.linspace(0, len(ID_list), len(ID_list), endpoint=False, dtype=int),
                             testing_indexes)

# Training IDs and images
training_images = image_list[training_indexes]
ID_training = ID_list[training_indexes]
training_probabilities = probabilities_list[training_indexes]
# Testing IDs and images
testing_images = image_list[testing_indexes]
ID_testing = ID_list[testing_indexes]
testing_probabilities = probabilities_list[testing_indexes]

In [None]:
# Let us check one example to see if it was loaded correctly
fig, axes = plt.subplots(1, 2, figsize=(10,10))

axes[0].imshow(training_images[4].reshape(256, 256, 3))
axes[0].set_title("Training")
axes[0].axis('off')

axes[1].imshow(testing_images[4].reshape(256, 256, 3))
axes[1].set_title("Testing")
axes[1].axis('off')
plt.show()

In [None]:
# Let us check the dimension of the arrays
print(training_images.shape, "\n")
print(testing_images.shape)

Once loaded the data needed to train the model, let us define a custom image class to instance the images and probabililites as these objects. This way, we will be able to manipulate the dataset easier.

In [None]:
# We create a custom Dataset class to work the images
class CustomImageDataset(Dataset):
    def __init__(self, images, ID, probabilities):
        """ The super() builtin returns a proxy object (temporary object of the superclass)
        that allows us to access methods of the base class."""
        super().__init__()
        self.images = images                      # Torch tensor
        self.ID = ID                              # ID array
        self.probabilities = probabilities        # Pandas dataframe
        
    # We redefine the __len__() method
    def __len__(self):
        return len(self.images)
    
    # We redefine the __getitem__() method
    def __getitem__(self, i):
        image = self.images[i]
        probabilities = self.probabilities[i]
        return image, probabilities

We have to instantiate two different objects, one for the training images and the other for the testing images.

In [None]:
# Training
train_data = CustomImageDataset(training_images, ID_training, training_probabilities)

# Testing
test_data = CustomImageDataset(testing_images, ID_testing, testing_probabilities)

Once we have loaded the CustomImageDataset objects, it is time to instantiate the Dataloader objects in order to get the proper inputs to the CNN. Also, it is possible to train the CNN in batches if the inputs are Dataloader objects.

In [None]:
# Size of the batch
batch_size = 50
#batch_size = 1000

# Training DataLoader object
train_dl = DataLoader(train_data, batch_size=batch_size, shuffle=True)
# Testing DataLoader object
test_dl = DataLoader(test_data, batch_size=batch_size, shuffle=False)

We can check if cuda is available for training. The use of cuda optimizes the training process, allowing us to use the different GPUs we have in our computer.

In [None]:
# Get cpu or gpu device for training
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))

Now we have loaded the datasets and we transformed them into appropiate Dataloader objects, it is time to define the model we will train to classificate the input galaxies. As said before, the model is a CNN, and the architecture of such a network will be explained right now. First, let us define the *hyperparametrs* of the model.

In [None]:
# We define a function to compute the width and height of an output convolutional layer
def output_size(W, K, P, S):
    # W is the input width and height
    # K is the kernel size
    # P is the padding
    # S is the stride
    return ((W - K + 2*P)/S) + 1

print(output_size(124, 3, 1, 3))

In [None]:
# Model parameters
n_inputs = 12348
n_hidden_1 = 6174                 # 12348/2 = 6174
n_hidden_2 = 3087                 # 6174/2 = 3087
n_outputs = 37                    # Number of the components of the target vector
in_channels = 3
out_channels_1 = 5
out_channels_2 = 7
kernel_size_1 = 4
kernel_size_2 = 3
p_dropout = 0.1                   # Dropout probability
lr = 1e-3                         # Learning rate
n_epochs = 100                    # Number of epochs
#n_epochs = 300                   # Number of epochs

In [None]:
# Model definition
class Model(nn.Module):
    # Define model elements
    def __init__(self):
        super().__init__()
        # Sequence of transformations implemented by the layers of the network
        self.cnn = nn.Sequential(
            # Convolution layer. Convolution applyed to the input image. Stride = 1 and no padding
            nn.Conv2d(in_channels, out_channels_1, kernel_size_1, stride=2),
            # Activation function applyed to the convolutioned map
            nn.ReLU(),
            # Pooling layer. Max pooling function applyed to the ReLU-convolutioned map. No padding
            nn.MaxPool2d(kernel_size_1, stride=1),
            # Convolution layer. Convolution applyed to the maxpooled layer before. Stride = 1 and no padding
            nn.Conv2d(out_channels_1, out_channels_2, kernel_size_2, stride=3, padding=1),
            # Activation function applyed to the convolutioned map
            nn.ReLU(),
            # Flattens a contiguous range of dims into a tensor
            nn.Flatten(),
            # Linear transformation of the flattened layer
            nn.Linear(n_inputs, n_hidden_1),
            # Activation function applyed to the convolutioned map
            nn.ReLU(),
            # Linear transformation applyed to the ReLU-transformed layer
            nn.Linear(n_hidden_1, n_hidden_2),
            # Rndomly zeroes some of the elements of the input tensor with probability p_dropout
            nn.Dropout(p_dropout),
            # Activation function applyed to the dropped out layer
            nn.ReLU(),
            # Linear transformation applyed to the ReLU-transformed layer 
            nn.Linear(n_hidden_2, n_outputs),
            # Softmax function applied to the linear transformed layer
            nn.Softmax()
        )
        
    # Method to transform inputs in outputs considering the internal structure of the network
    def forward(self, X):
        output = self.cnn(X)
        return output
    
# Now we can create a model and send it at once to the device
model = Model().to(device)
# We can also inspect its parameters using its state_dict() method
print(model.state_dict())
# We can check the architecture this way
print(model.parameters)

<figure>
    <img src="cnn_architecture.jpg" style="width:100%">
</figure>

Te architecture of the network is based in partially-connected layers (convolutional and pooling layers) and a fully-connected part. In the convolutional part, a *filter* composed of various *kernels* will move across the input, while doing the convolution (to learn more about convolutions see [here](https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1#:~:text=Each%20filter%20in%20a%20convolution,a%20processed%20version%20of%20each.)). There will be one kernel per input channel and one filter per output channel. After each convolution, a ReLU function is applied to all the pixels in the different channels, except when the max pooling operation is applyed. After the last convolution, the 7 channels of 42x42 will be flattened, to enter the fully-connected part of the network. Then, after a series of linear, ReLUs and one dropping transformations, the last layer of 37 components is evaluted using a *softmax* function to get the desired probability distributions.

Now, we have to define the function that will perform the training and testing of the CNN.

In [None]:
# We define the training function
def train_loop(dataloader, model, loss_fn, optimizer):
    #size = int(len(dataloader.dataset)/1000)
    size = int(len(dataloader.dataset)/batch_size)
    tmp = []

    # We iterate over batches
    for batch, (X, y) in enumerate(dataloader):
        # We calculate the model's prediction
        print(X.dtype)
        pred = model(X)
        # With the model's prediction we calculate the loss function
        loss = loss_fn(pred, y)

        # We apply the backpropagation method
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # Training progress
        loss, current = loss.item(), batch
        tmp.append(loss)
        print(f"Actual batch = {current} | Loss = {loss:>7f} | Processed samples: [{current:>2d}/{size:>2d}]")
    
    tmp = np.array(tmp)
    loss_avg = tmp.sum()/len(tmp)
    return loss_avg

# We define the test function
def test_loop(dataloader, model, loss_fn):
    #size = int(len(dataloader.dataset)/1000)
    size = int(len(dataloader.dataset)/batch_size)
    test_loss = 0
    j = 0
    
    # To test, we need to deactivate the calculation of the gradients
    with torch.no_grad():
        # We iterate over batches
        for X, y in dataloader:
            # Model's prection
            pred = model(X)
            # Corresponding errors, which we acumulate in a total value
            test_loss += loss_fn(pred, y).item()
            j += 1
            
    # We calculate the total loss and print it
    test_loss /= j
    print(f"Test Error: Avg loss = {test_loss:>8f} \n")
    return test_loss

In order to train the model, we need to instanciate an optimizer object and a loss function object. Let us do this.

In [None]:
# Loss function object. It is a Medium Squared Error.
loss_fn = nn.MSELoss()

# We instantiate an optimizer. In this case we choose an Adam optimizer.
optimizer = torch.optim.Adam(model.parameters(), lr=lr, eps=1e-08, weight_decay=0, amsgrad=False)

In [None]:
# Print model's state_dict size to gain some perspective about the model
print("Model's state_dict size:")
for param_tensor in model.state_dict():
    print(param_tensor, "\t", model.state_dict()[param_tensor].size())

We will plot the loss function against the epochs, so, we need to save its value after each epoch of training is concluded.

In [None]:
# We define a loss array to plot the training loss function and the testing loss function
loss_to_plot = []
loss_to_plot_test = []

We are ready to train the model. Let us train it during $n_{epochs}$ epochs, as defined above.

In [None]:
# We train the model iterating over the different epochs
for t in tqdm(range(n_epochs)):
    print(f"Epoch {t+1}\n=============================================")
    loss_to_plot.append(train_loop(train_dl, model, loss_fn, optimizer))
    loss_to_plot_test.append(test_loop(test_dl, model, loss_fn))
print("Done!")

Since we have our model trained, now we need to write a function able to convert the probability distributions predicted by the network in words. The 11 questions shown at the beggining used to define the decision tree has each one a set of answers. Let us convert the predicted probabilites in those answers.

In [None]:
# We save both loss functions
np.savetxt('loss_to_plot.txt', loss_to_plot)
np.savetxt('loss_to_plot_test.txt', loss_to_plot_test)

# We choose an image and calculate the corresponding prediction generated by the model
for (X, y) in test_dl:
    pred_cpu = model(X)
    image_cpu = X[7]
    target = y
    break

pred_cpu = pred_cpu[7].detach().numpy()
target = target[7].detach().numpy()

# We plot the image to be predicted and as a title the corresponding prediction
fig, ax = plt.subplots(1, 1, dpi=280)
fig.set_size_inches(4.0, 4.0)
ax.axis("off")
#plt.title(labels_map[np.argmax(pred_cpu)])
ax.imshow(image_cpu.squeeze().reshape(256, 256, 3), cmap="gray")

print("Prediction", "\t", "Target", "\n", "_________________________", "\n")
for (pred, tar) in zip(pred_cpu, target):
    print(pred, "\t", tar)

Let us plot the loss function as a function of the epochs.

In [None]:
# Load both loss functions
lp = np.loadtxt('loss_to_plot.txt')
lp_test = np.loadtxt('loss_to_plot_test.txt')

# Let us plot both loss functions
fig, ax = plt.subplots(1, 1, figsize=(7,7), dpi=200)
ax.plot([i for i in range(3, n_epochs+1)], lp[2:], color='darkblue', lw=1.5, label='Training error')
ax.plot([i for i in range(3, n_epochs+1)], lp_test[2:], ls=':', color='darkblue', lw=1.5, label='Test error')
ax.set_xlabel('Epoch')
ax.set_ylabel('Average loss function')
#ax.set_xticks([0, 75, 150, 225, 300])
#ax.set_xticklabels(['0', '75', '150', '225', '300'])
#y_ticks = [0.02, 0.025, 0.03, 0.035, 0.04, 0.045]
#ax.set_yticks(y_ticks)
#ax.set_yticklabels([str(y_ticks[i]) for i in range(len(y_ticks))])
plt.legend()
plt.show()
#plt.savefig('loss.jpg', bbox_inches='tight')