This is a high-level roadmap
1. Data loading of the <strong>.mhd</strong> and <strong>.raw</strong> file. 
2. Segmentation (ch13)
3. Grouping (ch14) 
4. Nodule classification (0/1)
5. Nodule analysis and diagnosis (Malignant/Benign)


As a reminder, we will classify candidates as nodules or non-nodules (we’ll build another classifier to attempt to tell malignant nodules from benign ones in chapter 14). That means we’re going to assign a single, specific label to each sample that we present to the model. In this case, those labels are “nodule” and “non-nodule,” since each sample represents a single candidate.

In [1]:
import torch

In [None]:
import datetime

import training
from util.util import importstr
from util.logconf import logging

log = logging.getLogger('nb') #<1>

def run(app, *argv):
    argv = list(argv)
    argv.insert(0, '--num-workers=1') #<2>
    #log.info("Running: {}({!r}).main()".format(app, argv))
    x = app.rsplit('.', 1)
    app_cls = importstr(*x)
   # print(argv)
    app_cls(argv).main() 
   # log.info("Finished: {}.{!r}).main()".format(app, argv)) #<3>

#if __name__ == "__main__": #<4>
run('prepcache.LunaPrepCacheApp')
run('training.LunaTrainingApp', '--epochs=1')

2022-10-17 14:49:59,461 INFO     pid:24901 training:081:initModel Using Apple's M1 chip as a GPU device.
2022-10-17 14:49:59,470 INFO     pid:24901 training:129:main Starting LunaTrainingApp, Namespace(num_workers=1, batch_size=32, epochs=1, tb_prefix='p2ch11', comment='dwlpt')


Successfully transfered model to device


2022-10-17 14:50:01,677 INFO     pid:24901 dsets:182:__init__ <dsets.LunaDataset object at 0x7f90922cbf70>: 149400 training samples
2022-10-17 14:50:01,684 INFO     pid:24901 dsets:182:__init__ <dsets.LunaDataset object at 0x7f9081b814f0>: 16601 validation samples
2022-10-17 14:50:01,685 INFO     pid:24901 training:136:main Epoch 1 of 1, 4669/519 batches of size 32*1


Successfully initialized models
Started batch iteration


In [None]:
run('training.LunaTrainingApp', '--epochs=1')

1. Logging is the process of writing information into log files. Log files contain information about various events that happened in operating system, software, or in communication. (https://docs.python.org/3/howto/logging.html)


2. We assume you have a four-core, eight- thread CPU. Change the 4 if needed.
3. This is a slightly cleaner call to \_\_import\_\_
4. As we executed script.py directly <strong>\_\_name\_\_</strong> variable will be <strong>\_\_main\_\_</strong>.  Thus, you can test whether your script is being run directly or being imported by something else by testing <strong>\_\_name\_\_</strong>. If script is getting imported by some other module at that time <strong>\_\_name\_\_</strong> will be module name. if __name__ == “main”: is used to execute some code only if the file was run directly, and not imported.


One way to take advantage of being able to invoke our training by either function call or OS-level process is to wrap the function invocations into a Jupyter Notebook so the code can easily be called from either the native CLI or the browser.

### logging
(https://docs.python.org/3/howto/logging.html)<br>


The logging module in Python is a ready-to-use and powerful module that is designed to meet the needs of beginners as well as enterprise teams. By default, there are 5 standard levels indicating the severity of events. Each has a corresponding method that can be used to log events at that level of severity.
- DEBUG
- INFO
- WARNING
- ERROR
- CRITICAL


The output shows the severity level before each message along with root, which is the name the logging module gives to its default logger. This format, which shows the level, name, and message separated by a colon (:), is the default output format that can be configured to include things like timestamp, line number, and other details.

## "training.py" file

In [5]:
# Used for computeBatchLoss and logMetrics to index into metrics_t/metrics_a

METRICS_LABEL_NDX=0
METRICS_PRED_NDX=1
METRICS_LOSS_NDX=2
METRICS_SIZE = 3


class LunaTrainingApp:
    def __init__(self, sys_argv=None):
        if sys_argv is None: #<2>
            sys_argv = sys.argv[1:]
        parser = argparse.ArgumentParser()
        parser.add_argument('--num-workers',
            help='Number of worker processes for background data loading',
            default=8,
            type=int,
        )
        parser.add_argument('--batch-size',
            help='Batch size to use for training',
            default=32,
            type=int,
        )
        parser.add_argument('--epochs',
            help='Number of epochs to train for',
            default=1,
            type=int,
        )
        self.cli_args = parser.parse_args(sys_argv)
        self.time_str = datetime.datetime.now().strftime('%Y-%m-%d_%H.%M.%S') #<3>
        self.use_mps1 = torch.backends.mps.is_available()
        self.use_mps2 = torch.backends.mps.is_built()
        self.device = torch.device("mps" if self.use_mps1 and self.usemps2 else "cpu") #<6>
        self.model = self.initModel()
        self.optimizer = self.initOptimizer()
        
        
    def initModel(self):
        model = LunaModel()
        if self.use_mps1 and self.use_mps2:
            log.info("Using Apple's M1 chip as a GPU device.")
            model = model.to(self.device) #<4>
        return model

    def initOptimizer(self):#<5>
        return SGD(self.model.parameters(), lr=0.001, momentum=0.99)
    
    def initTrainDl(self):
        train_ds = LunaDataset(
            val_stride=10,
            isValSet_bool=False,
        )

        batch_size = self.cli_args.batch_size
     #   if self.use_cuda:
     #       batch_size *= torch.cuda.device_count()

        train_dl = DataLoader(
            train_ds,
            batch_size=batch_size,
            num_workers=self.cli_args.num_workers,
            pin_memory=(self.use_mps1 and self.use_mps2) ,
        )

        return train_dl

    def initValDl(self):
        val_ds = LunaDataset(
            val_stride=10,
            isValSet_bool=True,
        )

        batch_size = self.cli_args.batch_size
      #  if self.use_cuda:
      #      batch_size *= torch.cuda.device_count()

        val_dl = DataLoader(
            val_ds,
            batch_size=batch_size,
            num_workers=self.cli_args.num_workers,
            pin_memory= (self.use_mps1 and self.use_mps2),
        )

        return val_dl


    def main(self):
        log.info("Starting {}, {}".format(type(self).__name__, self.cli_args))
        train_dl = self.initTrainDl()
        val_dl = self.initValDl()
        
        print("Successfully initialized models")
        for epoch_ndx in range(1, self.cli_args.epochs + 1):

            log.info("Epoch {} of {}, {}/{} batches of size {}*{}".format(
                epoch_ndx,
                self.cli_args.epochs,
                len(train_dl),
                len(val_dl),
                self.cli_args.batch_size,
                1,
            ))
            
            trnMetrics_t = self.doTraining(epoch_ndx, train_dl)
            self.logMtrics(epoch_ndx, 'trn', trnMetrics_t)

            valMetrics_t = self.doValidation(epoch_ndx, val_dl)
            self.logMetrics(epoch_ndx, 'val', valMetrics_t)

        if hasattr(self, 'trn_writer'):
            self.trn_writer.close()
            self.val_writer.close()
        
        
        
        #....
    def doTraining(self, epoch_ndx, train_dl):
        self.model.train()
        trnMetrics_g = torch.zeros(
            METRICS_SIZE,
            len(train_dl.dataset),
            device=self.device,
        )

        batch_iter = enumerateWithEstimate(
            train_dl,
            "E{} Training".format(epoch_ndx),
            start_ndx=train_dl.num_workers,
        )
        print("Started batch iteration")
        for batch_ndx, batch_tup in batch_iter:
            self.optimizer.zero_grad()

            loss_var = self.computeBatchLoss(
                batch_ndx,
                batch_tup,
                train_dl.batch_size,
                trnMetrics_g
            )
            loss_var.backward()
            self.optimizer.step()

            # # This is for adding the model graph to TensorBoard.
            # if epoch_ndx == 1 and batch_ndx == 0:
            #     with torch.no_grad():
            #         model = LunaModel()
            #         self.trn_writer.add_graph(model, batch_tup[0], verbose=True)
            #         self.trn_writer.close()
        print("Finished batch iteration")
        self.totalTrainingSamples_count += len(train_dl.dataset)

        return trnMetrics_g.to('cpu')

     def doValidation(self, epoch_ndx, val_dl): #<11>
        with torch.no_grad():
            self.model.eval()
            valMetrics_g = torch.zeros(
                METRICS_SIZE,
                len(val_dl.dataset),
                device=self.device,
            )

            batch_iter = enumerateWithEstimate(
                val_dl,
                "E{} Validation ".format(epoch_ndx),
                start_ndx=val_dl.num_workers,
            )
            for batch_ndx, batch_tup in batch_iter:
                self.computeBatchLoss(
                    batch_ndx, batch_tup, val_dl.batch_size, valMetrics_g)

        return valMetrics_g.to('cpu')

    def computeBatchLoss(self, batch_ndx, batch_tup, batch_size, metrics_g): #<7>
        input_t, label_t, _series_list, _center_list = batch_tup

        input_g = input_t.to(self.device, non_blocking=True) #<8>
        label_g = label_t.to(self.device, non_blocking=True) #<8>

        logits_g, probability_g = self.model(input_g)

        loss_func = nn.CrossEntropyLoss(reduction='none') #<9>
        loss_g = loss_func(
            logits_g,
            label_g[:,1],
         )
        start_ndx = batch_ndx * batch_size
        end_ndx = start_ndx + label_t.size(0)

        metrics_g[METRICS_LABEL_NDX, start_ndx:end_ndx] = \
            label_g[:,1].detach() #<10>
        metrics_g[METRICS_PRED_NDX, start_ndx:end_ndx] = \
            probability_g[:,1].detach() #<10>
        metrics_g[METRICS_LOSS_NDX, start_ndx:end_ndx] = \
            loss_g.detach() #<10>

        return loss_g.mean()

        
        
        
if __name__ == '__main__': #<1>
    LunaTrainingApp().main() 

NameError: name 'sys' is not defined

1. This instantiates the application object and invokes the <strong>main</strong> method. 
2. If the caller doesn't provide arguments, we get them from the command line.
3. The timestamp is used to help identify training runs. The .now method is used of the datetime library. 
4. Sends model parameters to the GPU.  It’s important to do so beforeconstructing the optimizer, since, otherwise, the optimizer would be left looking atthe CPU-based parameter objects rather than those copied to the GPU.
5.  For our optimizer, we’ll use basic stochastic gradient descent (SGD;https://pytorch.org/docs/stable/optim.html#torch.optim.SGD) with momentum. A learning rate of 0.001 and a momentum of 0.9 are pretty safe choices. Empirically, SGD with those values has worked reasonably well for a wide range of projects.
6. <strong>self.use_mps1</strong>  ensures that the current MacOS version is at least 12.3+, and <strong>self.use_mps2</strong> ensures that the current current PyTorch installation was built with MPS activated. If both conditions are met, then Apple's M1 chip can be used as a GPU device to train the model on.
7. The <strong>computeBatchLoss</strong> function is being used by both training and validation loops. It computes the loss over a batch of samples. In addition, the func- tion also computes and records per-sample information about the output the model is producing.
8. Moving the tensors to GPU.
9. Setting the reduction argument of the <strong>nn.CrossEntropyLoss</strong> as "none" gives you the loss per sample. Therefore the return of the <strong>computeBatchLoss</strong> function is the averaged loss per sample, which is a single value. Getting the loss just per sample allows us to aggregate the individual losses depending on our project and goals. 
10. Here the per-sample stats are recorded for posterity, by storing the metrics within the metrics_g variable. Detach is used because non of the metrics need to hold on to gradients.
11. The  <strong>doValidation</strong> and <strong>doTraining</strong> are different in that validation is read-only, therefore the loss value returned is not used, and the weights are not updated. Using with torch.no_grad() context, we are explicitly informing PyTorch that no gradients need to be computed. After that with statement, self.model.eval() turns off training-time behaviour. Main differences between validation and training functions are:
        - No need to update the weights during validation
        - No need to use he lass from the computeBatchLoss
        - No need to reference the optimizer
        - All that's left inside the validation loop is to call computeBatchLoss, and collect the metrics in valMetrics_g variable, which is a side effect of the call.  


The application class <strong>LunaTrainingApp</strong> has two functions by mandate; the <strong>\_\_init\_\_</strong> and <strong>main</strong>. We are parsing arguments in <strong>\_\_init\_\_</strong>, and that allows us to configure the application separately from invoking it.

## Workflow 
Before we can begin iterating over each batch in our epoch, some initialization work needs to happen, which includes instantiating the model.  

<strong></strong>
&emsp;<strong>i.</strong> Initialize our model and optimizer. The model is initialized with random weights. <br>
&emsp;<strong>ii.</strong> Initialize our <strong>Dataset</strong> and <strong>DataLoader</strong> instances. <br>
&emsp;<strong>iii.</strong> Start Training loop. This is when the batch tuple is loaded, the batch is classified, the loss is calculated, the metrics are recorded, and the weights are updated. <br>
&emsp;<strong>iv.</strong> In parallel, the validation loop is initiated where the validation set is loaded as a batch tuple, the batches are classified, the loss is calculated, and the metrics are recorded. <br>
&emsp;<strong>v.</strong> This process, excluding the i. step is looped over a predefined number of epochs until the model is fully trained.<br>


<strong>LunaDataset</strong> will define the randomizedset of samples that will make up our training epoch, and our <strong>DataLoader</strong> instance
will perform the work of loading the data out of our dataset and providing it to
our application.

### Batch
The batch's dimensions are  (N, C, D, H, W) which are the number of samples, channels per sample, depth, height, and width. Note that since CT scans are single-intensity, our channel dimension
is only size 1. The bridging from the CT Scans to PyTorch tensors is done with the <strong>LunaDataset</strong> class. Below is what includes the batch tuple that is fed to the model:
- 5D array
- List of boolean array that classifies wether it's a tumor or not (T/F)
- List of strings that contain series_uid
- The candidate location expressed in IRC coordinates. 

## "model.py" file
Classification models often have a structure that consists of a tail, a backbone (or body), and a head. Next, the backbone of the network typically contains the bulk of the layers, which
are usually arranged in series of blocks, which in this case is the <strong>LunaBlock</strong>. We will use a block that consists of two 3 × 3 convolutions, each followed by an activation, with a max-pooling operation at the end of the
block. Twenty-seven voxels are fed in, and one comes out.


### LunaBlock
Stacking convolutional layers allows the final output voxel (or pixel) to be influenced by an input further away than the size of the convolutional kernel suggests. If that output voxel is fed into another 3 × 3 × 3 kernel as one of the edge voxels, then some of the inputs to the first layer will be outside of the 3 × 3 × 3 area of input to the second. The final output of those two stacked layers has an effective receptive field of 5 × 5 × 5. That means that when taken together, the stacked layers act as similar to a single convolu- tional layer with a larger size.  Two stacked 3 × 3 × 3 layers uses fewer parameters than a full 5 × 5 × 5 convolution would (and so is also faster to compute).

Below is the implementation of the block:


In [None]:
class LunaBlock(nn.Module):
    def __init__(self, in_channels, conv_channels):
        super().__init__()

        self.conv1 = nn.Conv3d(
            in_channels, conv_channels, kernel_size=3, padding=1, bias=True,
        )
        self.relu1 = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv3d(
            conv_channels, conv_channels, kernel_size=3, padding=1, bias=True,
        )
        self.relu2 = nn.ReLU(inplace=True)

        self.maxpool = nn.MaxPool3d(2, 2)

    def forward(self, input_batch):
        block_out = self.conv1(input_batch)
        block_out = self.relu1(block_out)
        block_out = self.conv2(block_out)
        block_out = self.relu2(block_out)
        return self.maxpool(block_out)

(https://proceedings.neurips.cc/paper/2016/file/c8067ad1937f728f51288b3eb986afaa-Paper.pdf)

### LunaModel (The full model)

In [None]:
class LunaModel(nn.Module):
    def __init__(self, in_channels=1, conv_channels=8):
        super().__init__()

        self.tail_batchnorm = nn.BatchNorm3d(1) #<1>

        self.block1 = LunaBlock(in_channels, conv_channels)
        self.block2 = LunaBlock(conv_channels, conv_channels * 2)
        self.block3 = LunaBlock(conv_channels * 2, conv_channels * 4)
        self.block4 = LunaBlock(conv_channels * 4, conv_channels * 8)

        self.head_linear = nn.Linear(1152, 2) #<2>
        self.head_softmax = nn.Softmax(dim=1)#<2>
        self._init_weights()
        
     # see also https://github.com/pytorch/pytorch/issues/18182
    def _init_weights(self): #<4>
        for m in self.modules():
            if type(m) in {
                nn.Linear,
                nn.Conv3d,
                nn.Conv2d,
                nn.ConvTranspose2d,
                nn.ConvTranspose3d,
            }:
                nn.init.kaiming_normal_(
                    m.weight.data, a=0, mode='fan_out', nonlinearity='relu',
                )
                if m.bias is not None:
                    fan_in, fan_out = \
                        nn.init._calculate_fan_in_and_fan_out(m.weight.data)
                    bound = 1 / math.sqrt(fan_out)
                    nn.init.normal_(m.bias, -bound, bound)
                    
    def forward(self, input_batch):
        bn_output = self.tail_batchnorm(input_batch)

        block_out = self.block1(bn_output)
        block_out = self.block2(block_out)
        block_out = self.block3(block_out)
        block_out = self.block4(block_out)

        conv_flat = block_out.view(
            block_out.size(0), #<3>
            -1,
        )
        linear_output = self.head_linear(conv_flat)

        return linear_output, self.head_softmax(linear_output)

1. This is the tail. The input is shifted and scaled so that it has a mean of 0 and a standard deviation of 1. This is a normalization technique. 
2. Our tail is a fully connected layer followed by a call to <strong>nn.Softmax</strong>. Softmax is a useful function for single-label classification tasks and has a few nice proper- ties: it bounds the output between 0 and 1, it’s relatively insensitive to the absolute range of the inputs (only the relative values of the inputs matter), and it allows our model to express the degree of certainty it has in an answer.
(https://machinelearningmastery.com/softmax-activation-function-with-python/)
3. The view method flattens the data into a batch of 1D vectors, which is what a fully connected layers expect as input. 
4. The point of creating the <strong> _init_weights</strong> is to make sure the network’s weights are initialized such that intermediate values and gradients become neither unreasonably small nor unreasonably large. This method can be trated as boilerplate, as the exact details aren’t particularly important.


Our backbone is four repeated blocks, with the block implementation pulled out into the separate nn.Module subclass. Since each block ends with a 2 × 2 × 2 max-pool operation, after 4 layers we will have decreased the resolution of the image 16 times in each dimension. Our data's dimensions are 32 × 48 × 48 therefore dividing by 16, we'll end up with 2 x 3 x 3 by the end of the backbone.

When looking at the <strong>forward</strong> method, its return value is both the raw logits and the softmax-produced probabilities. The reason why we return both, is because the logits are being used to calculate the <strong>nn.CrossEntropyLoss</strong> during trainig, and the probabilities for classification of the samples. 





### Outputting performance metrics


Logging results and progress as we go is important, because if the model doesn not converge or goes off the rails, we notice. Noticing it will enable us to stop spending time training that model. The results per epoch are stored in the <strong>trnMetrics_g</strong> and <strong>valMetrics_g</strong> variable. 

In [None]:
class LunaTrainingApp:
#...
METRICS_LABEL_NDX=0 #<1>
METRICS_PRED_NDX=1
METRICS_LOSS_NDX=2
METRICS_SIZE = 3

    def logMetrics(
            self,
            epoch_ndx,
            mode_str,
            metrics_t,
            classificationThreshold=0.5,
    ):#<2>
        self.initTensorboardWriters()
        log.info("E{} {}".format(
            epoch_ndx,
            type(self).__name__,
        ))

        negLabel_mask = metrics_t[METRICS_LABEL_NDX] <= classificationThreshold #<3>
        negPred_mask = metrics_t[METRICS_PRED_NDX] <= classificationThreshold #<3>

        posLabel_mask = ~negLabel_mask #<3>
        posPred_mask = ~negPred_mask #<3>

        neg_count = int(negLabel_mask.sum())
        pos_count = int(posLabel_mask.sum())

        neg_correct = int((negLabel_mask & negPred_mask).sum())
        pos_correct = int((posLabel_mask & posPred_mask).sum())

        metrics_dict = {}
        metrics_dict['loss/all'] = \
            metrics_t[METRICS_LOSS_NDX].mean()
        metrics_dict['loss/neg'] = \
            metrics_t[METRICS_LOSS_NDX, negLabel_mask].mean()
        metrics_dict['loss/pos'] = \
            metrics_t[METRICS_LOSS_NDX, posLabel_mask].mean()

        metrics_dict['correct/all'] = (pos_correct + neg_correct) \
            / np.float32(metrics_t.shape[1]) * 100
        metrics_dict['correct/neg'] = neg_correct / np.float32(neg_count) * 100
        metrics_dict['correct/pos'] = pos_correct / np.float32(pos_count) * 100

        log.info(
            ("E{} {:8} {loss/all:.4f} loss, "
                 + "{correct/all:-5.1f}% correct, "
            ).format(
                epoch_ndx,
                mode_str,
                **metrics_dict,
            )
        )
        log.info(
            ("E{} {:8} {loss/neg:.4f} loss, "
                 + "{correct/neg:-5.1f}% correct ({neg_correct:} of {neg_count:})"
            ).format(
                epoch_ndx,
                mode_str + '_neg',
                neg_correct=neg_correct,
                neg_count=neg_count,
                **metrics_dict,
            )
        )
        log.info(
            ("E{} {:8} {loss/pos:.4f} loss, "
                 + "{correct/pos:-5.1f}% correct ({pos_correct:} of {pos_count:})"
            ).format(
                epoch_ndx,
                mode_str + '_pos',
                pos_correct=pos_correct,
                pos_count=pos_count,
                **metrics_dict,
            )
        )

        writer = getattr(self, mode_str + '_writer')

        for key, value in metrics_dict.items():
            writer.add_scalar(key, value, self.totalTrainingSamples_count)

        writer.add_pr_curve(
            'pr',
            metrics_t[METRICS_LABEL_NDX],
            metrics_t[METRICS_PRED_NDX],
            self.totalTrainingSamples_count,
        )

        bins = [x/50.0 for x in range(51)]

        negHist_mask = negLabel_mask & (metrics_t[METRICS_PRED_NDX] > 0.01) #<2>
        posHist_mask = posLabel_mask & (metrics_t[METRICS_PRED_NDX] < 0.99) #<2>

        if negHist_mask.any():
            writer.add_histogram(
                'is_neg',
                metrics_t[METRICS_PRED_NDX, negHist_mask],
                self.totalTrainingSamples_count,
                bins=bins,
            )
        if posHist_mask.any():
            writer.add_histogram(
                'is_pos',
                metrics_t[METRICS_PRED_NDX, posHist_mask],
                self.totalTrainingSamples_count,
                bins=bins,
            )


1. These named array indexes are declared at module-level scope. This is to gain access of the label,prediction, and loss for each and every training and validation. 
2. The epoch_ndx is used for display when logging the results. The mode_str holds information about wether the metrics are for training or validation. 
3. In the main function we consume either <strong>trnMetrics_t</strong> or <strong>valMetrics_t</strong>, and is passed as the <strong>metrics_t</strong> parameter. Here we construct masks that will let us limit our metrics to only the nod- ule or non-nodule samples. Our nodule status labels are simply True or False, therefore here we get an array of binary values where a True value corresponds to a non-nodule (aka negative) label for the sample in question. The positive masks are simply the inverse of the negative masks.

First we compute the average loss over the entire epoch. Since the loss is the single metric that is being minimized during training, we always want to be able to keep track of it. Then we limit the loss averaging to only those samples with a negative label using the negLabel_mask we just made. We do the same with the positive loss. Com- puting a per-class loss like this can be useful if one class is persistently harder to classify than another, since that knowledge can help drive investigation and improvements.
We’ll close out the calculations with determining the fraction of samples we classi- fied correctly, as well as the fraction correct from each label. Since we will display these numbers as percentages in a moment, we also multiply the values by 100. Similar to the loss, we can use these numbers to help guide our efforts when making improve- ments. After the calculations, we then log our results with three calls to log.info.


