# ML modeling baseline

## RetinaNet based model

### Import packages

In [2]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline


import pickle
import numpy as np

from SlideRunner.dataAccess.database import Database
from lib.calculate_F1 import *
from lib.extractDetections import *
optimal_threshold, F1_values = {},{}

### Baseline model trained on MEL dataset

The baseline is described in the manuscript as a 'dual-stage' transfer learning model trained on the manually expert labeled (MEL) dataset. The MEL dataset was not augmented by object detection nor clustering as with the ODAEL and CODAEL datasets.
* Stage 1: RetinaNet
* Stage 2: ResNet-18

The architecture of this model can be found in RetinaNet-CMC-MEL-architecture.txt file found in this folder

In [3]:
databasefile = '../../databases/MITOS_WSI_CMC_MEL.sqlite'

resfile = '../../results/trainval_RetinaNet-CMC-MEL-512sh-b1.pth-MEL-val-inference_results_boxes.p.bz2'
ident = f'MEL_2nd'

optimal_threshold[ident], F1scores, thrs = optimize_threshold(databasefile=databasefile, minthres=0.3, resfile=resfile)

resfile = '../../results/trainval_2ndstage_RetinaNet-CMC-MEL-512sh-b1.pth-MELshort-val-inference_results_boxes.p.bz2'

F1_values[ident], individ = calculate_F1(databasefile=databasefile, resfile=resfile, det_thres=optimal_threshold[ident])

Optimizing threshold for validation set of 14 files:  a8773be388e12df89edd.svs,460906c0b1fe17ea5354.svs,d0423ef9a648bb66a763.svs,50cf88e9a33df0c0c8f9.svs,da18e7b9846e9d38034c.svs,d7a8af121d7d4f3fbf01.svs,2191a7aa287ce1d5dbc0.svs,c4b95da36e32993289cb.svs,fa4959e484beec77543b.svs,72c93e042d0171a61012.svs,3d3d04eca056556b0b26.svs,084383c18b9060880e82.svs,d37ab62158945f22deed.svs,deb768e5efb9d1dcbc13.svs
Best threshold: F1= 0.7012792324605237 Threshold= 0.6200000000000003
Calculating F1 for test set of 14 files
Overall: 
TP: 7383 FP: 9425 FN:  765 F1: 0.5916813591921782
Number of mitotic figures: 8148
Precision: 0.439 
Recall: 0.906


### Baseline model performance

Validation set results
* F1 score = 0.70
* Optimal threshold = 0.62

Test set results
* Precision = 0.44
* Recall = 0.91
* F1 score = 0.59

Comments
* The validation set results are the same as the manuscript. However, the test set results differ from Table 2 (MEL dual-stage) in the manuscript. 
* The other models in the Evaluator.ipynb are repeatable/in agreement. 
* So this likely points to an issue w/how the test set data is being loaded/handled in our local setup rather than an issue of reproducibility.
* This will be addressed in further model iterations and the 'baseline' will be updated accordingly