Skip to content

brandhsu/tfcaidm

Repository files navigation

TFCAIDM Tensor

TensorFlow CAIDM

Deep learning library for medical imaging

Site | Slides | License


Introduction

TFCAIDM is a unified framework for building and training medical imaging deep learning models for large scale experimentation built on top of TensorFlow and JarvisMD. The library supports interfacing custom datasets with jarvis, model development with tensorflow, and built-in reproducibility, traceability, and performance logging for all experiments.

For all project updates, check out CHANGELOG.md.

Available Features
  • Reusable state-of-the-art deep learning model blocks
  • Support for training multiple models in parallel
  • High-level interface for customizing datasets, models, loss functions, training routines, etc.
  • Reproducibility, performance logging, model checkpointing, and hyperparameter tracking
Upcoming Features
  • AutoML / efficient hyperparameter search
  • Distributed data and model training
  • Vision transformer models
  • Better documentation
More Information
  • YAML configuration files
  • Hyperparameter tuning
  • Supported models
  • Customizability
  • Viewing results
  • Benchmarks (available only @ caidm cluster)

Disclaimer: The library is primarily built for users with access to the caidm clusters, though general users are also supported.


Installation

The current library is supported on python 3.7 and tensorflow 2.5+, and the installation instructions provided below assume that your system is already equipped with cuda and nvcc.

Local Installation

Install using the conda virtual environment.

Where user is your account username.

user $ conda create --name tfcaidm python=3.7
user $ conda activate tfcaidm
user (tfcaidm) $ pip install tensorflow
user (tfcaidm) $ pip install jarvis-md
user (tfcaidm) $ pip install tfcaidm

Example

Training a set of models require two separate python scripts: a training submission script and a training routine script.

Training Submission
from jarvis.utils.general import gpus
from tfcaidm import Jobs

# --- Define paths
YML_CONFIG = "pipeline.yml"
TRAIN_ROUTINE_PATH = "main.py"

# --- Submit a training job
Jobs(path=YML_CONFIG).setup(
    producer=__file__,
    consumer=TRAIN_ROUTINE_PATH,
).train_cluster()
Automated Training Routine
from jarvis.train import params
from jarvis.utils.general import gpus
from tfcaidm import Trainer

# --- Autoselect GPU (use only on caidm cluster)
gpus.autoselect()

# --- Get hyperparameters (args passed by environment variables)
hyperparams = params.load()

# --- Train model (dataset and model created within trainer)
trainer = Trainer(hyperparams)
results = trainer.cross_validation(save=True)
trainer.save_results(results)
Custom Training Routine
from jarvis.train import params
from jarvis.utils.general import gpus, overload
from tfcaidm import JClient
from tfcaidm import Model
from tfcaidm import Trainer

# --- Autoselect GPU (use only on caidm cluster)
gpus.autoselect()

# --- Get hyperparameters (args passed by environment variables)
hyperparams = params.load()

# --- Setup custom dataset generator (more details in notebooks)
@overload(JClient)
def train_generator(self, gen_data, **kwargs):
    for xs, ys in gen_data:

        # --- User defined code
        xs = DataAugment(xs)

        yield xs, ys

# --- Setup custom model (more details in notebooks)
@overload(Model)
def create(self):

    # --- User defined code
    model = ViT(...)
    model.compile(...)

    return model

# --- Train model (dataset and model created within trainer)
trainer = Trainer(hyperparams)
results = trainer.cross_validation(save=True)
trainer.save_results(results)

# See notebooks for a breakdown on customizability

For an example project, see examples/projects. For a more detailed walkthrough of the library, see notebooks.


Sister Repositories