## Chapter 11 : Training a Classification model on our data
1. Using the pytorch dataloader to load the data.
2. Using a classification model to classify the data.
3. Setting up a basic skeleton for our project app.
4. Calculating, logging and displaying of various different metrics

We are going to do two things in this chapter. We are going to set a basic training loop and a basic model which clssifies our data. We will use the `Ct` and `LunaDataset` classes which we have defined in chapter 10 to feed into a dataloader which feeds the data into our model through training and validation data pipelines.

Lets start by the next step after cleaning our data i.e the classification part. We would build a clssifier which would classify the candidates as nodules or non-nodules first. Defining the malignancy of the tumor comes in later. 

The basic structure that we are going to implement is as follows
1. Initialize our model and data loading.
2. Loop over the epochs
    1. Loop over each batch of the data.
    2. Calculate the error between the prediction and ground truth of the batch.
    3. The background worker processes load the next batch of the data.
    4. Now backpropagate to get the gradients.
    5. Update the weights.
    6. Record the metrics of the training into a seperate data structure.
    7. Load the validation data, classify it and get the error between the prediction and the ground truth on the validation data.
    8. The background worker processes load the validation data in the background.
    9. Record the metrics of the validation step in a data structure.
    10. Print out the progress and performance information about this epoch.

The approch to this part of the project will be more structured than the training loops that we implemented in part 1. We will have our main training application have more well-contained function and for things like dataset loading we will have them as seperate python modules.\
\
When working on your own projects make sure that the structure of the project is according to the complexity of the project. Too little structure and you cannot weed our errors and run training effectively while too much structure can take your mind off from devising the solution to the problem and keep it focussed only on setting up on the infrastructure, which can also be a great procrastination tactic.\
\
We would also focus on the correct logging and collection of metrics pertaining to the task at hand as the training progresses so that we can determine the effect of changes on training. We would lay the infrastructure of those metrics in this chapter and we would use them in the next chapter.

#### The main entry point of our application
1. The first and the foremost thing that we will do is that we will set up our application so that we can use the command line to run the script and parse the arguments in the command line. This will make it easy to run it in a wide variety of environments.
2. Our application functinality will be implemented via a class so that we can istantiate and pass it aroung for testing, debugging and running. We can invoke the application without spinning up a second os-level process(We won't do unit testing but the structure that we create will be useful if unit testing is required).
3. One way to take advantage of being able to invoke our training by either function call or OS level process is to wrap up our function invocations into a jupyter notebook so that it can be called from a command line or a browser.(OS level process means to run the script from the CLI).

#### Workings of the application
1. lets get some semi-standard boilerplate code out of the way. The first part invokes the if statement which creates the application object and invokes the main statement of that object.
2.  Next we would create the skeleton of the LunaTrainingApp with the `__init__` and the `main` methods.
3. As we are trying to parse arguments from the CLI we would need the standard `argeparse` library in the apps `__init__` method. We can also pass custom arguments to our init method should we wish to do so.
4. The main method will be the primary entry point to our application.
5. Parsing arguments in the init method allows us to configure the application seperately from invoking it.

    ##### Pretraining setup and initialization
    Before we can begin training our model some of the initialization needs to happen.
    1. First we need to initialize our `dataset` and `Dataloader` instances. The class LunaDataset will provide randomised samples of data which will be loaded per epoch and our dataloader will perform the work of loading the data and providing it to our application.
    2. This will be implemented by the training and the validation scripts in our code.

    ##### Initializing the model and the optimizer
    For this chapter we will take the inner workings of the model as a blackbox. We will improve on the model architecture in the next chapter.

    ##### Care and feeding of the dataloaders.
    1. The LunaDataset class that we built earlier transfroms the data from real-world to the form of tensors expected by pytorch.
    2. For eg. torch.nn.conv3D expects an input of `N x C x D x H x W` which is quite different from the native 3D dataout CT provides.
    3. CT scans only have a single channel, but other types of data have more than one channel(color images have 3 channels but astronimical data will have more than 3 upto 12 channels)
    4. We will not have to do the batching of the data. The pytorch dataloader handles the batching of the data for us.
    5. In addition to batching, dataloaders can also provide parallel loading of the data. Just add the number of worker processes in  `num_workers = ` and the parallelization is handled behind the scenes.
    6. Using data loading routines of pytorch can help speed up many projects, because we can overlap data loading and processing with GPU calculations.

    ##### Our first pass Neural Network Model design
    1. We can use a 3D CNN for recognition of our tumors.
    2. We will base our network design on the model that we used in Chapter 8. Although there we used a 2D model here we will modify it to use a 3D model.
    3. The model consists of 
        1. A batchNormalization layer. [TAIL]
        2. Four repeated blocks of convolution operation.[BACKBONE]
        3. A Linear layer followed by a softmax layer. [Head]

    
    4. ##### The Core Convolutions
        1. CNN's always have some of format of the core 3 parts
            1. A tail which is responsible for consuming the inputs and converting it into the form expected by the backbone. Here we have used a simple batchnormalization though often the tails contain Convolutional layers also which are used to downsample the size of the image. Here we do not need them because our size is already very small.
            2. A backbone which performs the convolution operations which is the bulk of the layers. They are usually arranged in a series of blocks. Each block has the same or similar set of layers though often the blocks have different expected input size and the different number of filters. 
            3. We will use a block that contains two 3x3 convolutions followed by an activation and max-pooling layer.
            4. Finally we define a head which has a linear layer followed by a softmax layer.

    ##### The convolution Block
    1. We are defining a convolution block called as LunaBlock which when stacked 4 times would make up the backbone of our Model.
    2. This convolution block consists of:
        1. `Convolution Layer 1`
        2. `Relu Non-linearity`
        3. `Convolution Layer 2`
        4. `Relu Non-linearity`
        5. `Max Pooling Layer`
    3. What the above architecture is doing is that it is stacking two convolution layers side one after another to increase the receptive field so that it has the receptive field of a larger kernel_size(5x5) while having lesser parameters and performing lesser number of calculations than a 5x5 kernel. Lets suppose we have a 6x6 image, now after othe first convolution it reduces to 4x4 and after the second convolution it reduces to 2x2 (relu does not reduce the image size) so both the convolutions combined have a receptive field of 6x6 but doing one 5x5 convolution also produces the same output size with a receptive field of 6x6 but the number of calculations increase.
    4. Though we have taken the example of the receptive field shrinking we have actualy used padded convolutions. So the size of the image will be the same.
    
    ##### The full model
    1. We would stack 4 of the convolution blocks along with a batchnorm tail and a Linear + softmax head to form the full model.
    2. The way the backbone is structured is that due to the maxpool layer at the end of each block it takes the original input image and halves it in dimensions. So by the time the image reaches the end of the backbone it has been reduced to 2^4 = 16 times. So a 32x48x48 image gets reduced to 2x3x3 image.
    3. Finally our last layer is a `nn.Softmax` layer. This is useful because for single label clssification because it nicely modifies the input between 0 and 1 and it is relatively insensitive to the absolute range of the inputs(only relative values of the inputs matter). It also allows our model to express the degree on uncertainity it has in the answer.
    4. After the convolution backbone we have the image size as `2 x 3 x 3 x 64` but our Linear layer expects the images to be in the format of a batch of 1D vecotrs. So we will need to convert it to that shape in the forward pass.
    5. The head portion will be similar in a wide variety of problems that use convolutions and produces classification, regression or any other non-image output.
    6. For the return values of the forward function we would return the `Logits (Raw values produced by the model before they are fed into softmax and probabilities are calculated)` and we would also return the `softmax outputs`. `We will use the logits when we calculate the nn.CrossEntropyLoss during training and we will use the probabilities when we want to actually classify the samples.` This kind of slight difference between what is used for training and what is used for production is fairly common, especially when the output of the network is a stateless function like softmax. This approach can also provide numerical benefits also. Propogating gradients from an exponential function using 32 bit floats can be a bit problematic.

    ##### Initialization
    1. Lets talk about the initialization of the weights to a predefined value. If we imagine a case where the weights are initialized to 1 then the weights would get substantially large when the activations move from initial layers to later layers leading to larger and larger weights. Conversely the weights would get smaller and smaller in subsequent calculations leading as the activation travel further and further.
    2. Many normalization techniques would alleviate the problem but the simplest is to just initialize the weights so that they are not very large or very small. 
    3. Pytorch does not come with predefined initialization fucntionality so we will have to define the weights oureselves.
    4. We can treat the init_weights function as boilerplate code without needing much to understand it.

    ##### Training and Validating the model
    1. Now it's the time to take the various pieces we have been working on and assemble them into something we can actually execute.
    2. In the training loop in the main function : 
        1. We can see that the trnMetrics_g tensor collects per class metrics during training. This type of insight is very useful in larger projects.
        2. We do not directly iterate over the train_dl dataloader, we iterate with an estimated time of completion. This is a stylistic choice.
        3. The actual loss computation is done by the computeBatchLoss method this is not nescessary but it is good pratice to split it up in functions.
    3. The trnMetrics_g tensor is to transport information about how the model is behaving on a per sample basis from the computeBatchLoss function to the LogMetrics function

        ##### The `computeBatchLoss` function:
        1. This function is responsible for computing the actal loss and is called in by both training and validation loops. 
        2. It computes the loss and also records the per sample output that our model is producing. This lets us compute the % of correct answers per class.
        3. In this function we are not using the stadard loss value of the average loss of the batch, instead we get a tensor of loss values, one per sample and we can aggregate them as we according to our problem. In projects where you do not want to keep the individual loss values per sample, it is perfectly fine to use the averaged loss over the batch as the output.
        5. If we leave the function as it is we have accomplished the most of the backpropagation and the weight update parts. But we also need to record per sample stats for psterity and later analysis. We'll use the metrics_g parameter passed in to accomplish it.

        ##### The validation loop
        1. The validation loop is somewhat similar to the training loop w.r.t the metrics calculated on a per sample basis. And we will also not calculate the gradients and update the weights.

    ##### Outputting the performance metrics
    
    1. One last thing that we do per epoch is to log in out performance metrics for this epoch. When we have logged in the metrics we need to return to the next epoch. Logging results and progress as we go is important  because if the training does not converge we can easily notice it and stop training the model. 
    2. Ealier we were collecting our information in trainMetrics_g and logMetrics_g for logging per epoch. Each of these tensors now needs to compute our % correct and average loss per class for the training and validation runs.
    3. Logging of the metrics per epoch is somewhat of a common choice but the logs get reasonably big if we are training for many epochs. If future we can do that after a set number of epochs.
    4. ##### The logMetircs function
        1. The function takes in the epochs_ndx argument which is used to keep track of the number of epochs.
        2. The `mode_str` argument tells us which metrics is used, training or validation.
        3. The trainMetrics_g or valMetrics_g is the tensor that is passed in the function which keeps the metrics of the training or validation. This has been computed from the computBatchLoss function and transferred to the CPU after being computed in the CPU. Both these tensors have 3 rows and as many columns as we have samples.
    5. ##### Tensor masking and Boolean Indexing
        1. Masks are common usage pattern used in arrays.
        2. We will use masks to get the number of nodules and non-nodules in the data. Will find the total samples per class as well as how many did we classify correctly.
        3. First we calculate and track the overall loss, since this is the metric which should be optimized, we would always keep track of it.
        4. Then we limit the loss averaging to only those samples with the negative labels using the negLabel_mask. We do the same for the positive class.
        5. Calculating the loss per class is useful if we have a class that is consistently harder to classify since that can help drive investigation and improvements.
        6. We'll close out the calculations with determining the fraction of samples we calssified correctly as well  as the fraction correct from each label. Since we will display these numbers as percentages we will also multiply the values by 100. 
        7. After the calculations we log our results with 3 calls to log.info.
        
    ##### Running the training script
    1. Now that we have completed the training script we need to run it. This will intialize and train and run our model while printing the stats of how well the model is performing.
    2. 
        
    

In [2]:
import sys
import math
import argparse
from logging import log
import numpy as np
import datetime
import torch
from torch import nn, optim
from dataset import LunaDataset
from torch.utils.data import DataLoader

##### Defining the CNN model

In [3]:
class LunaBlock(nn.Module):
    def __init__(self, in_channels, conv_channels):
        super().__init__()

        self.conv1 = nn.Conv3d(in_channels, conv_channels, kernel_size=3, padding=1, bias = True)

        self.relu1 = nn.ReLU(inplace=True)

        self.conv2 = nn.Conv3D(in_channels, conv_channels, kernel_size = 3, padding = 1, bias = True)

        self.relu2 = nn.ReLU(inplace = True)

        self.max_pool = nn.MaxPool3d(2,2)

    def forward(self, input_batch):
        block_out = self.conv1(input_batch)
        block_out = self.relu1(block_out)     # this could be implemented as calls to functional API instead
        block_out = self.conv2(block_out)
        block_out = self.relu2(block_out)     # this could be implemented as calls to functional API instead
        block_out = self.max_pool(block_out)  # this could be implemented as calls to functional API instead

        return block_out


In [4]:
class LunaModel(nn.Module):
    def __init__(self, in_channels = 1 , conv_channels = 8):
        super().__init__()
        
        self.tail_batchnorm = nn.BatchNorm3d(1)

        self.block1 = LunaBlock(in_channels, conv_channels)
        self.block2 = LunaBlock(conv_channels, conv_channels*2)
        self.block3 = LunaBlock(conv_channels*2, conv_channels*4)
        self.block4 = LunaBlock(conv_channels*4, conv_channels*8)

        self.head_linear = nn.Linear(1152, 2)
        self.head_softmax = nn.Softmax(dim=1)
    
    def _init_weights(self):
        for m in self.modules:
            if type(m) in {nn.Linear, nn.Conv3d}:
                nn.init.kaiming_normal_(m.weight.data, a = 0, mode = 'fan_out', nonlinearity='relu')
            if m.bias is not None:
                fan_in, fan_out = nn.init._calculate_fan_in_and_fan_out(m.weight.data)
                bound = 1 / math.sqrt(fan_out)
                nn.init.normal_(m.bias, -bound, bound)

    def forward(self, input_batch):

        bn_output = self.tail_batchnorm(input_batch)

        block_1_out = self.block1(bn_output)
        block_2_out = self.block2(block_1_out)
        block_3_out = self.block3(block_2_out)
        block_4_out = self.block4(block_3_out)

        conv_flat = block_4_out.view(block_4_out.size(0), -1)  # Flattening to (batch_size, -1)

        linear_output = self.head_linear(conv_flat)
        softmax_output = self.head_softmax(linear_output)

        return linear_output, softmax_output


In [5]:
METRICS_LABEL_NDX = 0    # These are the named array indices which are declared at the module level.
METRICS_PRED_NDX = 1     # These are the named array indices which are declared at the module level.
METRICS_LOSS_NDX = 2    # These are the named array indices which are declared at the module level.
METRICS_SIZE = 3        # These are the named array indices which are declared at the module level.

def enumerateWithEstimate(
        iter,
        desc_str,
        start_ndx=0,
        print_ndx=4,
        backoff=None,
        iter_len=None,
):
    """
    In terms of behavior, `enumerateWithEstimate` is almost identical
    to the standard `enumerate` (the differences are things like how
    our function returns a generator, while `enumerate` returns a
    specialized `<enumerate object at 0x...>`).
    However, the side effects (logging, specifically) are what make the
    function interesting.
    :param iter: `iter` is the iterable that will be passed into
        `enumerate`. Required.
    :param desc_str: This is a human-readable string that describes
        what the loop is doing. The value is arbitrary, but should be
        kept reasonably short. Things like `"epoch 4 training"` or
        `"deleting temp files"` or similar would all make sense.
    :param start_ndx: This parameter defines how many iterations of the
        loop should be skipped before timing actually starts. Skipping
        a few iterations can be useful if there are startup costs like
        caching that are only paid early on, resulting in a skewed
        average when those early iterations dominate the average time
        per iteration.
        NOTE: Using `start_ndx` to skip some iterations makes the time
        spent performing those iterations not be included in the
        displayed duration. Please account for this if you use the
        displayed duration for anything formal.
        This parameter defaults to `0`.
    :param print_ndx: determines which loop interation that the timing
        logging will start on. The intent is that we don't start
        logging until we've given the loop a few iterations to let the
        average time-per-iteration a chance to stablize a bit. We
        require that `print_ndx` not be less than `start_ndx` times
        `backoff`, since `start_ndx` greater than `0` implies that the
        early N iterations are unstable from a timing perspective.
        `print_ndx` defaults to `4`.
    :param backoff: This is used to how many iterations to skip before
        logging again. Frequent logging is less interesting later on,
        so by default we double the gap between logging messages each
        time after the first.
        `backoff` defaults to `2` unless iter_len is > 1000, in which
        case it defaults to `4`.
    :param iter_len: Since we need to know the number of items to
        estimate when the loop will finish, that can be provided by
        passing in a value for `iter_len`. If a value isn't provided,
        then it will be set by using the value of `len(iter)`.
    :return:
    """
    if iter_len is None:
        iter_len = len(iter)

    if backoff is None:
        backoff = 2
        while backoff ** 7 < iter_len:
            backoff *= 2

    assert backoff >= 2
    while print_ndx < start_ndx * backoff:
        print_ndx *= backoff

    log.warning("{} ----/{}, starting".format(
        desc_str,
        iter_len,
    ))
    start_ts = time.time()
    for (current_ndx, item) in enumerate(iter):
        yield (current_ndx, item)
        if current_ndx == print_ndx:
            # ... <1>
            duration_sec = ((time.time() - start_ts)
                            / (current_ndx - start_ndx + 1)
                            * (iter_len-start_ndx)
                            )

            done_dt = datetime.datetime.fromtimestamp(start_ts + duration_sec)
            done_td = datetime.timedelta(seconds=duration_sec)

            log.info("{} {:-4}/{}, done at {}, {}".format(
                desc_str,
                current_ndx,
                iter_len,
                str(done_dt).rsplit('.', 1)[0],
                str(done_td).rsplit('.', 1)[0],
            ))

            print_ndx *= backoff

        if current_ndx + 1 == start_ndx:
            start_ts = time.time()

    log.warning("{} ----/{}, done at {}".format(
        desc_str,
        iter_len,
        str(datetime.datetime.now()).rsplit('.', 1)[0],
    ))

class LunaTrainingApp():

    def __init__(self, sys_argv = None):
        # Here we are checking of arguments are provided by the user in the CLI. If not then we use default system arguments
        if sys_argv == None: # check 
            sys_argv = sys.argv    # using the defualt system arguments

        # Then we instantiate the argument parser object.
        parser = argparse.ArgumentParser()

        # Now we add an argument --num-workers which lets us specify how many backgroung workers would be utilized for data loading.
        parser.add_argument('--num-workers',
                                help = 'number of worker processes for backgroung data loading',
                                type = int,
                                default = 8)

        # Then we parse arguments provided in the CLI and assign them to an attribute cli_args
        self.cli_args = parser.parse_args(sys_argv)

        # We instantiate a datetime.now object and assign it to the time_str attribute.
        self.time_str = datetime.datetime.now().strftime("%Y-%m-%d_%H:%M:%S")

        # Now we will initialize the model and the optimizer

        # First we will check if the GPU is available
        self.use_cude = torch.cuda.is_available()
        # If GPU is available then use GPU else use CPU
        self.device = torch.device('cuda' if self.use_cuda else 'cpu')

        # Initialize the model 
        self.model = self.initModel()

        #Initialize the optimizer
        self.optimizer = self.initOptimizer()

    def initModel():
        """ This function initialzes the model and transfers the model and parameters to the GPU.
            If multiple GPU's are available then execute the model computations in paraller and sync and return the results """

        # initialize the model
        model = LunaModel()

        
        if self.use_cuda:
            log.info(f"Using CUDA : {torch.cuda.device_count()} devices")
            # If multiple GPU's are available then execute the computations in parallel
            if torch.cuda.device_count()>1:
                model = nn.DataParallel(model)
            # move the model to the device
            model = model.to(self.device)
        return model
    
    def initOptimizer(self):
        return optim.SGD(self.model.parameters(), lr= 0.001,momentum = 0.99)

    # Now lets put the training data into the dataloader.
    def initTrainDl(self):
        train_ds = LunaDataset(val_stride= 10, is_val_set_bool = False)


        batch_size = self.cli_args.batch_size
        if self.use_cuda:
            self.batch_size *= torch.cuda.device_count()

        train_dl = DataLoader(train_ds,
                                batch_size = batch_size,
                                num_workers = self.cli_args.num_workers,
                                pin_memory = self.use_cuda)
        return train_dl

    def initValDl(self):
        val_ds = LunaDataset(val_stride = 10, is_val_set_bool=True)
        batch_size = self.cli_args.batch_size

        if self.use_cuda:
            self.batch_size *= torch.cuda.device_count()
        
        val_dl = DataLoader(val_ds,
                            batch_size = batch_size,
                            num_workers= self.cli_args.num_workers,
                            pin_memory = self.use_cuda)

        return val_dl
    
    def logMetrics(self, epochs_ndx, mode_str, metrics_t, classification_threshold):
        negLabel_mask = metrics_t[METRICS_LABEL_NDX] <= classification_threshold   # Non Nodules
        negPred_mask = metrics_t[METRICS_LABEL_NDX] >= classification_threshold    # Nodules


        posLabel_mask = ~negLabel_mask
        posPred_mask = ~posLabel_mask


        # Next we  would use the masks to calculate some per label statistics and use them to store in a dictionary metrics_dict
        metrics_dict = {}
        neg_count = int(negLabel_mask.sum())
        pos_count = int(posLabel_mask.sum())

        neg_correct = int((negLabel_mask & negPred_mask).sum())
        pos_correct = int((posLabel_mask & posPred_mask).sum())

        metrics_dict['loss/all'] = metrics_t[METRICS_LOSS_NDX].mean()
        metrics_dict['loss/neg'] = metrics_t[METRICS_LOSS_NDX, negLabel_mask].mean()
        metrics_dict['loss/pos'] = metrics_t[METRICS_LOSS_NDX, posLabel_mask].mean()
        metrics_dict['correct/all'] = metrics_t(pos_correct + neg_correct)/ np.float32(metrics_t.shape[1]) * 100
        metrics_dict['correct/neg'] = neg_correct/ np.float32(neg_count) * 100
        metrics_dict['correct/pos'] = pos_correct/ np.float32(pos_count) * 100


        log.info(f"E{epochs_ndx}, {mode_str[:8]}, {metrics_dict['loss/all']:.4f} LOSS, {metrics_dict['correct/all']:-5.1f}% CORRECT")
        log.info(f"E{epochs_ndx}, {mode_str[:8]}, {metrics_dict['loss/neg']:.4f} LOSS, {metrics_dict['correct/neg']:-5.1f}% CORRECT {neg_correct} of {neg_count}")
        log.info(f"E{epochs_ndx}, {mode_str[:8]}, {metrics_dict['loss/pos']:.4f} LOSS, {metrics_dict['correct/pos']:-5.1f}% CORRECT {pos_correct} of {pos_count}")

        





    

    def computeBatchLoss(self, batch_ndx, batch_tup, batch_size,  metrics_g):
        input_t, label_t, _series_list, _center_list = batch_tup

        input_g = input_t.to(self.device, non_blocking = True)
        label_g = label_t.to(self.device, non_blocking = True)


        logits_g, probability_g = self.model(input_g)
        loss_fn = nn.CrossEntropyLoss(reduction = 'none')  # reduction = 'none' gives the loss per sample

        loss_g = loss_fn(logits_g, label_g[:,1])

        start_ndx = batch_ndx * batch_size
        end_ndx = start_ndx + label_t.size(0)

        metrics_g[METRICS_LABEL_NDX, start_ndx:end_ndx] = label_g[:,1].detach()            # We use detach since none of them need to hold gradients.
        metrics_g[METRICS_PRED_NDX, start_ndx:end_ndx] = probability_g[:, 1].detach()
        metrics_g[METRICS_LOSS_NDX, start_ndx:end_ndx] = loss_g[:,1].detach()


        return loss_g.mean()  # recombines the loss per sample to a single value averaged over the entire batch 


    def doTraining(self, epochs_ndx, train_dl):
            self.model.train()
            trnMetrics_g = torch.zeros(METRICS_SIZE, len(train_dl.dataset), device = self.device)  # Initializes an empty array

            batch_iter = enumerateWithEstimate(train_dl, f"E{epochs_ndx} Training")  # Sets up batch looping with time estimate
            start_ndx = train_dl.num_workers

            for batch_ndx, batch_tup in batch_iter:
                self.optimizer.zero_grad()   # Frees up any leftover gradient tensors

                loss_var = self.computeBatchLoss(batch_ndx, batch_tup, train_dl.batch_size, trnMetrics_g)

                loss_var.backward()     # Backpropagates

                self.optimizer.step()   # Updates the model weights

            self.totalTrainingSamples_count += len(train_dl.dataset)

            return trnMetrics_g.to('cpu')
            

    def doValidation(self, epochs_ndx, val_dl):
            with torch.no_grad():
                self.model.eval()

                valMetrics_g = torch.zeros(METRICS_SIZE, len(val_dl.dataset), device = self.device)

                batch_iter = enumerateWithEstimate(val_dl, f"E{epochs_ndx} Validation", start_ndx = val_dl.num_workers)
                
                for batch_ndx, batch_tup in batch_iter:
                    self.computeBatchLoss(batch_ndx, batch_tup, val_dl.batch_size, valMetrics_g)

            return valMetrics_g.to('cpu')   # We would need to measure the validation statistics for each sample


    def main(self):
        
        log.info(f"Starting {type(self).__name__}, {self.cli_args}")
        train_dl = self.initTrainDL()
        val_dl = self.initValDl()

        for epochs_ndx in range(1, self.cli_args.epochs + 1):
            trnMetrics_t = self.doTraining(epochs_ndx, train_dl)
            self.logMetrics(epochs_ndx, 'trn', trnMetrics_t)

            valMetrics_t = self.doValidation(epochs_ndx, val_dl)
            self.logMetrics(epochs_ndx, 'val', valMetrics_t)

if __name__ == '__main__':
    LunaTrainingApp().main()

usage: ipykernel_launcher.py [-h] [--num-workers NUM_WORKERS]
ipykernel_launcher.py: error: unrecognized arguments: c:\Users\Anant\anaconda3\lib\site-packages\ipykernel_launcher.py --ip=127.0.0.1 --stdin=9003 --control=9001 --hb=9000 --Session.signature_scheme="hmac-sha256" --Session.key=b"f938c452-538d-451e-acb1-1280e1d17eed" --shell=9002 --transport="tcp" --iopub=9004 --f=c:\Users\Anant\AppData\Roaming\jupyter\runtime\kernel-v2-18608OGI2p1xdQlZW.json


SystemExit: 2

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


In [1]:
import sys
print(sys.argv)

['c:\\Users\\Anant\\anaconda3\\lib\\site-packages\\ipykernel_launcher.py', '--ip=127.0.0.1', '--stdin=9003', '--control=9001', '--hb=9000', '--Session.signature_scheme="hmac-sha256"', '--Session.key=b"cef03c5f-6114-44be-b09b-9f9ff47f28fd"', '--shell=9002', '--transport="tcp"', '--iopub=9004', '--f=c:\\Users\\Anant\\AppData\\Roaming\\jupyter\\runtime\\kernel-v2-16508q7E1hkay3Uch.json']


In [2]:
import logging

logger = logging.getLogger()
print(logger)

