In [None]:
import datetime

from util.util import importstr
from util.logconf import logging
log = logging.getLogger('nb')

In [None]:
def run(app, *argv):
    argv = list(argv)
    argv.insert(0, '--num-workers=4')
    log.info("Running: {}({!r}).main()".format(app, argv))
    
    app_cls = importstr(*app.rsplit('.', 1))
    app_cls(argv).main()
    
    log.info("Finished: {}.{!r}).main()".format(app, argv))

In [None]:
import os
import shutil

# clean up any old data that might be around.
# We don't call this by default because it's destructive, 
# and would waste a lot of time if it ran when nothing 
# on the application side had changed.
def cleanCache():
    shutil.rmtree('data-unversioned/cache')
    os.mkdir('data-unversioned/cache')

# cleanCache()

In [None]:
training_epochs = 20
experiment_epochs = 10
final_epochs = 50

training_epochs = 2
experiment_epochs = 2
final_epochs = 5

## Chapter 11

In [None]:
run('p2ch11.prepcache.LunaPrepCacheApp')

In [None]:
run('p2ch11.training.LunaTrainingApp', '--epochs=1')

In [None]:
run('p2ch11.training.LunaTrainingApp', f'--epochs={experiment_epochs}')

## Chapter 12

In [None]:
run('p2ch12.prepcache.LunaPrepCacheApp')

In [None]:
run('p2ch12.training.LunaTrainingApp', '--epochs=1', 'unbalanced')

In [None]:
run('p2ch12.training.LunaTrainingApp', f'--epochs={training_epochs}', '--balanced', 'balanced')

In [None]:
run('p2ch12.training.LunaTrainingApp', f'--epochs={experiment_epochs}', '--balanced', '--augment-flip', 'flip')

In [None]:
run('p2ch12.training.LunaTrainingApp', f'--epochs={experiment_epochs}', '--balanced', '--augment-offset', 'offset')

In [None]:
run('p2ch12.training.LunaTrainingApp', f'--epochs={experiment_epochs}', '--balanced', '--augment-scale', 'scale')

In [None]:
run('p2ch12.training.LunaTrainingApp', f'--epochs={experiment_epochs}', '--balanced', '--augment-rotate', 'rotate')

In [None]:
run('p2ch12.training.LunaTrainingApp', f'--epochs={experiment_epochs}', '--balanced', '--augment-noise', 'noise')

In [None]:
run('p2ch12.training.LunaTrainingApp', f'--epochs={training_epochs}', '--balanced', '--augmented', 'fully-augmented')

## Chapter 13

You should run LunaPrepCacheApp once before training, unless:
  - You are 100% sure that the disk cache already exists from previous runs.
  - You're okay with longer initial load times and repeated I/O.

Running LunaPrepCacheApp ensures:
  - Faster training.
  - Fewer I/O bottlenecks.
  - All CT and mask data required by TrainingLuna2dSegmentationDataset are ready to go.

Note:

- LunaPrepCacheApp only affects data loading.
- You don’t need to run LunaPrepCacheApp again after restart, if you've already run it once and did not delete the cache files on the disk.

In [None]:
# Preload and cache expensive CT scan computations (like loading image volumes and masks) before training.
run('p2ch13.prepcache.LunaPrepCacheApp')

The final data augmentation configuration used is:
flip = True, offset = 0.03, scale = 0.2, rotate = True, and noise = 25.0.

When the offset value is increased from 0.03 to 0.1, while keeping all other parameters the same, the model's recall monotonically decreases starting from epoch 1.
This behavior is undesirable and indicates that the model is highly sensitive to data augmentation settings.

Epoch 3 is the most suitable choice for achieving optimal performance in this segmentation model, as it yields the highest recall value among all epochs.

Recall reflects the model's ability to correctly identify as many true positive cases as possible. In the context of nodule detection, maintaining a high recall is critical—if a potential nodule is not detected, it cannot be classified or analyzed further.

Therefore, maximizing recall is the top priority for this segmentation task, as failing to identify nodules at the segmentation stage would undermine the entire downstream diagnostic process.



In [None]:

seg_epochs = 12
run('p2ch13.training.SegmentationTrainingApp', f'--epochs={seg_epochs}', '--augmented', 'seg')

## Chapter 14

In [None]:
run('p2ch14.prepcache.LunaPrepCacheApp')

## 📊 Best Epoch Selection Analysis (Epochs 5–30)

This section presents an evaluation of model performance over epochs 5 to 30, based on validation metrics including F1 score, AUC, loss, precision, recall, and accuracy.

---

### 🔍 Summary Table of Validation Metrics

| Epoch | F1 | AUC | Loss | Precision | Recall | Correct (%) |
|:-----:|:------:|:-------:|:--------:|:---------:|:------:|:------------:|
|   5   | 0.1296 |  0.9794 |  0.1315  |  0.0700   | 0.8831 |     96.5     |
|  10   | 0.1049 |  0.9858 |  0.1556  |  0.0555   | 0.9610 |     95.1     |
|  15   | 0.1552 |  0.9893 |  0.1116  |  0.0845   | 0.9481 |     96.9     |
|  20   | 0.2367 |  0.9930 |  0.0700  |  0.1354   | 0.9416 |     98.2     |
|  25   | **0.2689** | **0.9913** | **0.0572** | **0.1574** | 0.9221 | **98.5** |
|  30   | 0.2067 |  0.9893 |  0.0796  |  0.1161   | 0.9416 |     97.8     |

---

### ✅ Selected Best Epoch: **Epoch 25**

#### Justification:

- **F1 Score**: `0.2689` — the highest across all epochs, indicating optimal balance between precision and recall.
- **Validation Loss**: `0.0572` — the lowest, reflecting strong predictive confidence.
- **Precision**: `0.1574` — highest among all epochs, minimizing false positives.
- **Recall**: `0.9221` — excellent recall, capturing most positive cases.
- **Correct**: `98.5%` — highest validation accuracy in the evaluated range.
- **AUC**: `0.9913` — near-perfect discrimination ability between classes.

#### Comparative Notes:

- **Epoch 20** is strong but has slightly lower F1 and precision than Epoch 25.
- **Epochs 5–15** show underdeveloped performance with significantly lower precision and F1.
- **Epoch 30** shows a **decline** in F1 score, suggesting performance peaked earlier.

---

### 📌 Conclusion

> **Epoch 25** yields the best validation performance across all major metrics.  
> It is recommended to use the model checkpoint saved at **epoch 25** for deployment or further fine-tuning.

In [None]:
cls_epochs = 20
run('p2ch14.training.ClassificationTrainingApp', f'--epochs={cls_epochs}', 'cls')

In [None]:
cls_epochs = 50
run(
  'p2ch14.training.ClassificationTrainingApp',
  f'--epochs={cls_epochs}',
  r'--resume-from=data-unversioned\part2\models\p2ch14\cls_2025-05-23_13.40.10_e20_1000000_best.state',
  'cls'
)

### 📊 **Malignant Nodule Analysis Model Trained by Fine-tuning the Last Block**


| Epoch |  Loss  | Accuracy (%) | Precision | Recall | F1 Score |  AUC   |
|-------|--------|---------------|-----------|--------|----------|--------|
| 1     | 0.6486 | 72.1          | 0.5789    | 0.6346 | 0.6055   | 0.7192 |
| 2     | 0.6755 | 72.1          | 0.5738    | 0.6731 | 0.6195   | 0.7283 |
| 3     | 0.6694 | 71.4          | 0.5645    | 0.6731 | 0.6140   | 0.7380 |
| 4     | 0.6625 | 72.7          | 0.5833    | 0.6731 | 0.6250   | 0.7483 |
| 5     | 0.6472 | 74.0          | 0.5968    | 0.7115 | 0.6491   | 0.7559 |
| 6     | 0.6327 | 74.7          | 0.6066    | 0.7115 | 0.6549   | 0.7628 |
| 7     | 0.6248 | 74.7          | 0.6066    | 0.7115 | 0.6549   | 0.7685 |
| 8     | 0.6152 | 74.7          | 0.6066    | 0.7115 | 0.6549   | 0.7722 |
| 9     | 0.6074 | 74.7          | 0.6066    | 0.7115 | 0.6549   | 0.7758 |
| 10    | 0.5934 | 74.7          | 0.6102    | 0.6923 | 0.6486   | 0.7810 |
| 11    | 0.5836 | 75.3          | 0.6207    | 0.6923 | 0.6545   | 0.7836 |
| 12    | 0.5860 | 75.3          | 0.6167    | 0.7115 | 0.6607   | 0.7871 |
| 13    | 0.5801 | 76.0          | 0.6230    | 0.7308 | 0.6726   | 0.7886 |
| 14    | 0.5779 | 76.0          | 0.6230    | 0.7308 | 0.6726   | 0.7913 |
| 15    | 0.5662 | 76.0          | 0.6271    | 0.7115 | 0.6667   | 0.7950 |
| 16    | 0.5577 | 77.3          | 0.6491    | 0.7115 | 0.6789   | 0.7973 |
| 17    | 0.5542 | 76.6          | 0.6379    | 0.7115 | 0.6727   | 0.7996 |
| 18    | 0.5526 | 76.0          | 0.6271    | 0.7115 | 0.6667   | 0.8023 |
| 19    | 0.5477 | 76.6          | 0.6379    | 0.7115 | 0.6727   | 0.8032 |
| 20    | 0.5475 | 76.6          | 0.6379    | 0.7115 | 0.6727   | 0.8043 |
| 21    | 0.5406 | 76.0          | 0.6316    | 0.6923 | 0.6606   | 0.8064 |
| 22    | 0.5344 | 76.0          | 0.6316    | 0.6923 | 0.6606   | 0.8086 |
| 23    | 0.5382 | 76.0          | 0.6316    | 0.6923 | 0.6606   | 0.8092 |
| 24    | 0.5286 | 76.6          | 0.6429    | 0.6923 | 0.6667   | 0.8112 |
| 25    | 0.5289 | 77.9          | 0.6667    | 0.6923 | 0.6792   | 0.8122 |


The AUC is the top priority in evaluating the performane of this model. It seems that the AUC will consistently increase as the epoch increases.

In [None]:
cls_epochs = 25
run(
  'p2ch14.training.ClassificationTrainingApp',
  f'--epochs={cls_epochs}',
  '--dataset=MalignantLunaDataset',
  '--malignant',
  r'--pretrained-model-path=data-unversioned\part2\models\p2ch14\cls_2025-05-23_16.12.19_e25_1250000_best.state',
  '--finetune-depth=1',
  'ft_depth1'
)

### 📊 **Malignant Nodule Analysis Model trained by fine-tuning the last two Blocks**

The validation results show that **Epoch 11** as the most balanced and high-performing checkpoint.


| Epoch | Loss       | Correct % | Precision  | Recall | F1 Score   | AUC        |
| ----- | ---------- | --------- | ---------- | ------ | ---------- | ---------- |
| 1     | 0.4551     | 81.2     | 0.6825     | 0.8269 | 0.7478     | 0.8780     |
| 2     | 0.4343     | 82.5     | 0.6984     | 0.8462 | 0.7652     | 0.8857     |
| 3     | 0.6083     | 74.0     | 0.5750     | 0.8846 | 0.6970     | 0.8864     |
| 4     | 0.3588     | **86.4** | **0.8039** | 0.7885 | 0.7961     | 0.9122     |
| 5     | 0.3685     | 84.4     | 0.8043     | 0.7115 | 0.7551     | 0.8994     |
| 6     | 0.3848     | 83.8     | 0.7647     | 0.7500 | 0.7573     | 0.9014     |
| 7     | 0.4695     | 78.6     | 0.6377     | 0.8462 | 0.7273     | 0.8999     |
| 8     | 0.3906     | 84.4     | 0.7414     | 0.8269 | 0.7818     | 0.9098     |
| 9     | 0.3737     | 85.1     | 0.7843     | 0.7692 | 0.7767     | 0.9082     |
| 10    | 0.4129     | 83.1     | 0.7097     | 0.8462 | 0.7719     | 0.9128     |
| 11    | **0.3572** | **86.4** | **0.8039** | 0.7885 | **0.7961** | **0.9154** |
| 12    | 0.4025     | 83.8     | 0.7288     | 0.8269 | 0.7748     | 0.9033     |
| 13    | 0.4346     | 80.5     | 0.6667     | 0.8462 | 0.7458     | 0.9070     |
| 14    | 0.4744     | 81.8     | 0.6765     | 0.8846 | 0.7667     | 0.9088     |
| 15    | 0.3868     | 84.4     | 0.7500     | 0.8077 | 0.7778     | 0.9134     |
| 16    | 0.3885     | 84.4     | 0.7500     | 0.8077 | 0.7778     | 0.9088     |

---

### Recommended Epoch: **Epoch 11**

### Justification:

* **Lowest validation loss**: Epoch 11 has the lowest loss (0.3572), indicating a better overall model performance on unseen data.
* **High accuracy**: Tied highest correct percentage (86.4%).
* **Best precision**: Highest precision (0.8039), equal to Epoch 4.
* **Strong recall and F1-score**: Recall (0.7885) and F1-score (0.7961) are among the highest, balancing precision and recall effectively.
* **Highest AUC**: Best ROC-AUC value (0.9154), reflecting robust overall predictive capability.

Epoch 11 strikes the best balance across all performance metrics, indicating it is the optimal choice for final model deployment.

In [None]:
cls_epochs = 40
run(
  'p2ch14.training.ClassificationTrainingApp',
  f'--epochs={cls_epochs}',
  '--dataset=MalignantLunaDataset',
  '--malignant',
  r'--pretrained-model-path=data-unversioned\part2\models\p2ch14\cls_2025-05-23_16.12.19_e25_1250000_best.state',
  '--finetune-depth=2',
  'ft_depth2'
)

# Final Diagnostic Results for Nodule Analysis

## Understanding the Confusion Matrix

**Rows (Ground Truth Labels):**

* **Row 0:** Non-nodule (no annotation)
* **Row 1:** Benign nodule
* **Row 2:** Malignant nodule

**Columns (Detection Outcomes):**

* **Column 0:** Not detected (Complete Miss)
* **Column 1:** Detected but filtered out by segmentation
* **Column 2:** Detected and classified as benign
* **Column 3:** Detected and classified as malignant

### Confusion Matrix

| Ground Truth \ Detection Result | Complete Miss | Filtered Out | Pred. Benign | Pred. Malignant |
| ------------------------------- | ------------- | ------------ | ------------ | --------------- |
| **Non-Nodules**                 | —             | 160,952      | 1,718        | 470             |
| **Benign Nodules**              | 15            | 3            | 74           | 10              |
| **Malignant Nodules**           | 2             | 6            | 8            | 36              |

---

## Interpretation of Results

### Why is the "Complete Miss" Cell Blank for Non-Nodules?

The "Complete Miss" cell for **Non-Nodules** is intentionally left blank because the concept of "missing" doesn't apply.

* **Complete Miss** means:

  * A real nodule (benign or malignant) existed.
  * The segmentation model failed entirely to detect any candidate region at that location.

Since **Non-Nodules** don't contain any actual nodules, it's impossible for the segmenter to "miss" something that doesn't exist. Therefore, this cell is not applicable.

### Meaning of "Filtered Out"

**"Filtered Out"** means:

* The segmentation model (e.g., U-Net) detected a candidate region.
* However, the candidate region was discarded based on filtering rules (e.g., size, shape, confidence).
* Thus, these candidates never reached the classification stage.

### Why Is the "Filtered Out" Number for Non-Nodules Large?

The large number in the **Non-Nodules** "Filtered Out" cell indicates:

* The segmenter identified many regions that were **false alarms** (regions not actually nodules).
* The segmenter intentionally has high recall, meaning it detects many candidate regions to avoid missing true nodules.
* Most false alarms are filtered out before reaching the classifier, significantly reducing false positives.

---

## Performance Metrics and Calculations

### Nodule Counts

* Number of benign nodules:
  $15 + 3 + 74 + 10 = 102$

* Number of malignant nodules:
  $2 + 6 + 8 + 36 = 52$

* Total true nodules:
  $102 + 52 = 154$

### Detection Metrics

* Benign nodules detected by segmenter:
  $74 + 10 = 84$

* Malignant nodules detected by segmenter:
  $8 + 36 = 44$

* Total True Positives (TP) detected by segmentation:
  $84 + 44 = 128$

### Segmentation Recall

$\text{Recall} = \frac{\text{TP (Segmenter)}}{\text{Total True Nodules}} = \frac{128}{154} \approx 0.83 \ (83\%)$

### Malignant Detection Accuracy

* Correctly classified malignant nodules:
  $36 \text{ (Predicted Malignant Correctly)}$

$\text{Malignant Detection Accuracy} = \frac{36}{52} \approx 0.69 \ (69\%)$

### Segmentation Precision

* False Positives (FP) by segmentation:
  $1,718 + 470 = 2,188$

$\text{Precision} = \frac{\text{TP (Segmenter)}}{\text{TP (Segmenter)} + \text{FP (Segmenter)}} = \frac{128}{128 + 2,188} \approx 0.06 \ (6\%)$

The low precision (6%) is intentional, as the primary goal of the segmentation model is to avoid missing any true nodules. To achieve high recall, the model deliberately detects many candidate regions, even if they include numerous false positives. Most of these false positives are subsequently eliminated by filtering rules based on criteria such as size, shape, or confidence.

### False Negative Rate (Malignancy Model)

* False Negatives (FN) for malignant nodules:
  $52 - 36 = 16$

$\text{False Negative Rate} = \frac{\text{FN}}{\text{Total Malignant Nodules}} = \frac{16}{52} \approx 0.307 \ (30.7\%)$

---

## Overall Assessment

The current performance demonstrates reasonable recall (83%) and malignant classification accuracy (69%).

While these results are insufficient for commercial medical AI deployment, they serve effectively as a foundation for learning, initial research, and model refinement.

In [None]:
run(
  'p2ch14.nodule_analysis.NoduleAnalysisApp',
  '--run-validation',
  r'--seg-path=data-unversioned\part2\models\p2ch13\seg_2025-05-24_15.23.31_e03_900000_best.state',
  r'--cls-path=data-unversioned\part2\models\p2ch14\cls_2025-05-23_16.12.19_e25_1250000_best.state',
  r'--mal-path=data-unversioned\part2\models\p2ch14\mal-finetune-depth-2_2025-05-23_18.19.25_e11_1100000_best.state',
)