Skip to content

Latest commit

 

History

History
461 lines (426 loc) · 30.3 KB

model_library.md

File metadata and controls

461 lines (426 loc) · 30.3 KB

colab_zirc_dims model library:

Here you can see data on pre-trained, Detectron2-based instance segmentation models that are currently available for use in colab_zirc_dims. This page is inspired by the Detectron2 model zoo page. Models are continually being improved through adjustment of training hyperparameters, so this page (and the models available here) may be subject to change.

Contents:

Current 'best' models:

Current as of colab_zirc_dims v1.0.10

These models were trained on the czd_large dataset. Models deployed for application by colab_zirc_dims users were chosen using a modified 'early stopping' process: models were trained for a set period of >=12,000 iterations, but only the most performant model checkpoints within the confines of the colab_zirc_dims processing algorithm (i.e., those best able to reproduce the manual measurement results of Leary et al. (2022)) are provided to users. See below for more more information on training and checkpoint selection. You can train your own models using our workflow by following the directions here.

Training:

Because training from scratch proved unsuccessful with the 'czd_orig' dataset (see 'Legacy Models' below), all models were fine-tuned after initialization of MS-COCO-pre-trained weights found in their respective repositories. You can find links to these weights in the config file for each model.

We used different, empirically chosen learning rates dependent on model architecture and optimizer: these were respectively 0.000015, 0.00025, 0.00005, and 0.0005 for models M-R101-C (Adam optimizer) and M-R50-C (SGD optimizer), M-ST-C (AdamW optimizer), and C-V-C (SGD optimizer). A warmup period of 1000 iterations was used for all models, followed by stepped learning rate rate drawdowns by a factor of 0.5 (i.e., 'Gamma') starting at 1500 iterations (~2 epochs) and again every further 1500 iteration increment thereafter. This learning rate schedule is somewhat generallized but (in combination with architecture-optimizer dependent base learning rates) seems to yield consistent decreases in validation loss (see 'metrics' data linked in 'Model info' table) where it begins to plateau in the absence of learning rate reduction.

We adopted a fairly aggressive training image augmentation strategy to mitigate overfitting, as shown in the figure below:

Random augmentations applied to training images via Detectron2 dataloader. All augmentations besides "defocus" were implemented using default Detectron2 augmentations and transformations. The random (in extent and magnitude) 'defocus blur' augmentation, which is based on a modification of code from the imagecorruptions library, approximates a relatively common tiling-related artefact that appears in LA-ICP-MS mosaic images.

All models were trained for at least 12,000 total iterations with a batch size of 2. Training loss stabilized by ~2000 iterations for all models (see plot of Mask-RCNN-style 'mask loss', which is a loss component for all trained models, below), and mAP metrics by ~4000 iterations, with largely stochastic variations observed thereafter. Validation loss (metrics vary between model architectures, so 1:1 comparisons are not plottable) did continue to decrease until >= ~8000 iterations for all models, though apparently not at a rate resolvable in mAP metrics.

Loss and evaluation curves during training: mask loss (average per-ROI binary cross-entropy loss, per He et al. (2022)), MS-COCO bounding box and mask mAP metrics, and approximate grain extent overestimate rates from rapid colab_zirc_dims evaluation of a serialized version of the Leary et al. (2022) grain image-measurement dataset. MS-COCO mAP metrics were evaluated at 200 iteration intervals during training. Evaluations of the serialized dataset were only run where model checkpoints were saved (at ~1000 iteration intervals).

Selection of checkpoints for deployment:

Model checkpoints for deployment were selected based on performance in reproducing manual per-grain long and short axis length measurements from Leary et al. (2022) using a fast, streamlined version of the colab_zirc_dims grain measurement algorithm and a serialized version of the Leary et al. (2022) dataset. We narrowed our selection window to checkpoints at >= 4,000 training iterations based on the observation that mAP metrics appear to increase up until this point (see curves above). We then selected checkpoints with minimal proportions of long and/or short axis measurement results that overestimate manual (Leary et al., 2022) measurements by > 20%.

Performance on the serialized dataset approximates but does differ slighly from results obtainable using conventional colab_zirc_dims processing, apparently due to lossy saving of the per-shot image data when serializing the dataset. Evaluations for the selected model checkpoints were consequently re-run using the conventional colab_zirc_dims process; these results are presented in the 'Evaluation results...' table below.

Summary tables:

Model info:

Model Architecture Backbone Train/val dataset Training iterations bbox AP mask AP Links:
M-ST-C Mask-RCNN (Detectron2) Swin-T czd_large 7.0k 75.14 75.61 config | model | training metrics
M-R101-C Mask-RCNN (Detectron2) ResNet-101-FPN czd_large 7.0k 73.87 75.92 config | model | training metrics
C-V-C Centermask2 VovNetv2-99 czd_large 11.0k 72.1 72.15 config | model | training metrics
M-R50-C Mask-RCNN (Detectron2) ResNet-50-FPN czd_large 6.0k 72.21 74.31 config | model | training metrics

Evaluation results on Leary et al. (2022) dataset:

Model Training iterations n total n successfula failure rate (%) avg. abs. long axis error (μm) avg. abs. short axis error (μm) avg. abs. long axis % error avg. abs. short axis % error avg. spot segmentation time (s)b Link:
M-ST-C 7.0k 5004 5003 0.02 5.66 4.31 7.28 8.57 0.1142 data file
M-R101-C 7.0k 5004 4994 0.1998 5.76 4.31 7.39 8.59 0.1205 data file
C-V-C 11.0k 5004 5000 0.0799 5.73 4.34 7.35 8.63 0.1642 data file
M-R50-C 6.0k 5004 4993 0.2198 5.7 4.29 7.32 8.54 0.0931 data file
aSegmentation/measurement of a spot is considered to have 'failed' if no grain mask can be obtained in the immediate vicinity of the spot target location
bPlease note that this represents only the time taken to obtain a central grain mask from a single spot within colab_zirc_dims processing. Actual per-spot processing speed encompasses measurement of the resulting mask and saving verification data, and will be substantially longer.

Discussion:

We recommend model M-ST-C in most cases. This model produces consistently good segmentation results and seems to be robust to image artefacts.

The relatively low bounding box mAP metric for C-V-C belies its accuracy to some degree: our train-validation dataset contains numerous very small grain annotations, which (as noted by Lee and Park (2020)) Centermask struggles with. Though it is thus contraindicated for application to images with many small grains, C-V-C is quite accurate when applied to images with large (relative to image size) grains. It is recommended that users try this model if they find that M-ST-C is failing to identify or producing inaccurate masks when applied to their data.

The aforementioned models rely on code in non-Detectron2 repositories. If users encounter problems related to these dependencies (download and path management doing this is handled automatically within colab_zirc_dims processing notebooks), we recommend that they try the Detectron2 Mask-RCNN models M-R101-C and M-R50-C. These will work with only a basic Detectron2 installation.

Legacy models:

These models were trained on the on the relatively small 'czd_orig' dataset. Newer models generally have lower segmentation error rates and we recommend that you use them instead. Please see our pre-print manuscript for details on model training and checkpoint selection.

Summary tables:

Model info:

Model Architecture Backbone Pretraining Train/val dataset Training images randomly augmented? Training iterations bbox AP mask AP Links:
101_model_COCO_base Mask-RCNN (Detectron2) ResNet-101-FPN COCO czd_orig Yes 6.0k 72.57 67.63 config | model | training metrics
centermask2 Centermask2 VovNetv2-99 COCO czd_orig Yes 4.0k 74.37 67.57 config | model | training metrics
50_model_COCO_base Mask-RCNN (Detectron2) ResNet-50-FPN COCO czd_orig Yes 6.0k 71.2 66.21 config | model | training metrics
101_from_scratch Mask-RCNN (Detectron2) ResNet-101-FPN None czd_orig Yes 8.0k 65.84 63.35 config | model | training metrics
50_from_scratch Mask-RCNN (Detectron2) ResNet-50-FPN None czd_orig Yes 4.0k 63.4 61.45 config | model | training metrics
50_from_scratch_no_augs Mask-RCNN (Detectron2) ResNet-50-FPN None czd_orig No 4.0k 35.99 35.82 config | model | training metrics
mask_rcnn_swint Mask-RCNN (Detectron2) Swin-T COCO czd_orig Yes 7.0k 72.42 67.69 config | model | training metrics

Evaluation results on Leary et al. (2022) dataset:

Model Training iterations n total n successfula failure rate (%) avg. abs. long axis error (μm) avg. abs. short axis error (μm) avg. abs. long axis % error avg. abs. short axis % error avg. spot segmentation time (s)b Link:
101_model_COCO_base 6.0k 5004 5003 0.02 6.11 4.39 7.96 8.83 0.1187 data file
centermask2 4.0k 5004 4998 0.1199 6.02 4.44 7.85 9.0 0.1136 data file
50_model_COCO_base 6.0k 5004 4992 0.2398 6.2 4.48 8.04 8.98 0.0722 data file
101_from_scratch 8.0k 5004 4931 1.4588 7.73 5.65 9.75 11.45 0.1073 data file
50_from_scratch 4.0k 5004 4988 0.3197 7.65 5.45 9.46 10.91 0.0868 data file
50_from_scratch_no_augs 4.0k 5004 4749 5.0959 13.09 8.56 17.18 18.16 0.1084 data file
mask_rcnn_swint 7.0k 5004 4993 0.2198 6.02 4.53 7.71 8.96 0.1295 data file
aSegmentation/measurement of a spot is considered to have 'failed' if no grain mask can be obtained in the immediate vicinity of the spot target location.
bPlease note that this represents only the time taken to obtain a central grain mask from a single spot within colab_zirc_dims processing. Actual per-spot processing speed encompasses measurement of the resulting mask and saving verification data, and will be substantially longer.

Example code: loading and applying models outside of provided notebooks:

The colab_zirc_dims package streamlines the process of loading a Detectron2 DefaultPredictor instance in a Colab virtual machine or local Jupyter runtime using the non_std_cfgs.smart_load_predictor function. If necessary, this function will download additional repositories (i.e., Swint_detectron2 or CenterMask2) to the current working directory prior to loading the predictor. See below for an example:

from colab_zirc_dims import non_std_cfgs

mypredictor = non_std_cfgs.smart_load_predictor('PATH_TO_CFG_YAML', 'PATH_TO_WEIGHTS_PTH_FILE',
						use_cpu=False)

This predictor instance can then be applied directly to an image to get full Detectron2 instance segmentation results:

from skimage import io as skio

#load the image
img = skio.imread('PATH_TO_REFLECTED_LIGHT_IMAGE')

#apply predictor to image with channel order reversed to BGR
predictions = mypredictor(img[:,:,::-1])

Or, if the image is centered on a mineral grain, via the colab_zirc_dims.segment.segment_given_imgs() function to try to extract a 'central' mask:

from colab_zirc_dims import segment

central_mask_found_bool, central_mask = segment.segment_given_imgs([img], mypredictor)

The above functions work well for dynamic model loading and application in virtual or local environments with Python and Anaconda installed and exposed. If you want to adapt a model for use in a compiled executable application with minimal Python dependencies, or to use a model with other languages (e.g., C++), you may want to look into tracing your chosen model.

References:

He, K., Gkioxari, G., Dollár, P., and Girshick, R.: Mask R-CNN, arXiv:1703.06870 [cs], 2018.

Leary, R. J., Smith, M. E., and Umhoefer, P.: Grain-Size Control on Detrital Zircon Cycloprovenance in the Late Paleozoic Paradox and Eagle Basins, USA, J. Geophys. Res. Solid Earth, 125, e2019JB019226, https://doi.org/10.1029/2019JB019226, 2020.

Leary, R. J., Smith, M. E., and Umhoefer, P.: Mixed eolian–longshore sediment transport in the late Paleozoic Arizona shelf and Pedregosa basin, U.S.A.: A case study in grain-size analysis of detrital-zircon datasets, Journal of Sedimentary Research, 92, 676–694, https://doi.org/10.2110/jsr.2021.101, 2022.

Lee, Y. and Park, J.: CenterMask : Real-Time Anchor-Free Instance Segmentation, arXiv:1911.06667 [cs], 2020.

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B.: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, https://doi.org/10.48550/ARXIV.2103.14030, 2021.

Michaelis, C., Mitzkus, B., Geirhos, R., Rusak, E., Bringmann, O., Ecker, A. S., Bethge, M., and Brendel, W.: Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming, https://doi.org/10.48550/arXiv.1907.07484, 31 March 2020.

Wu, Y., Kirillov, A., Massa, F., Lo, W.-Y., and Girshick, R.: Detectron2, 2019.

Ye, H., Yang, Y., and L3str4nge: SwinT_detectron2: v1.2, , https://doi.org/10.5281/ZENODO.6468976, 2021.