ClasSeg - An Extendable Classification/Segmentation/SSL DL Pipeline

ClasSeg is a pipeline which can handle multiple tasks and data types out of the box. This is a solid foundation for writing any type of deep learning project. This is setup so you only need one codebase for all of your DL projects and research. With capacity to handle multiple datasets with vastly different objectives, this pipeline is all you need to get efficient training with organized logging, results, and data management.

Still in an early stage of development, but feel free to fork and submit pull requests.

NEW: Configs can now be either JSON or YAML. The default for new datasets is YAML.

Windows compatibility is not guaranteed. If you have issues, please submit a pull request. Agnostic setup is in progress.

Default Supported Data Types

Natural images (PNG, JPG, ....) grayscale and RGB
Medical images (Dicom, NIfTI) 2d and 3d
Easily extended with a new reader/writer class (and perhaps a change or two to the datapoint object)

Setup

To get started by trianing an mnist classification follow these steps:

Clone this repository git clone https://github.com/aheschl1/ClasSegPipeline
Navigate to the root directory cd ClasSegPipeline
Install the package pip install -e .
Create the environment variables RAW_ROOT, PREPROCESSED_ROOT, RESULTS_ROOT. These should point to folders where:
1. Raw data will be stored - This is where you will setup your dataset structure. Since this will never be read during training, you can put this on a slower drive.
2. Preprocessed data will be stored - This is where the preprocessed data will be stored. This is read during training, so it should be on a faster drive.
3. Results will be stored - This is where the results of training will be stored. This is written to during training.
Preprocess the dataset (we have written an extension to manage this simple use-case) classegPreprocess -d <any integer id number <= 3 digits> -dd <text description> -f <number of folds> -ext mnist_class
Train the dataset classegTrain -d <dataset id> -f <fold to train> -ext mnist_class -m efficientnetb0_one_channel -n <experiment name>

What is created on the file system?

When you ran the first command, ClasSeg created a configuration file at ~/.classegrc
When you preprocessed: "/MNIST" was temporarily created, and then deleted by the proprocessor. This is where the data was downloaded to.
When you preprocessed: RAW_ROOT/Dataset_<desc>_<id>, PREPROCESSED/Dataset_<desc>_<id>. Take a look at what information is avaialble, and the default config files.
When you trained: RESULTS_ROOT/Dataset_<desc>_<id>/fold_<fold>/<experiment name>. This is where you can find logs, tensorboard logs, weights, and some backups.

A note on the environment variables

These are used to keep the codebase clean and to allow for easy switching between datasets. If you have trouble setting them up, you can modify classeg.utils.utils.constants.py

Notable Out of the Box Features

Multiple mode training with no coding needed. Mode is determined by file system structure.
Model customization and design with no coding (though it can be desireable to code for more complex systems) thanks to https://github.com/aheschl1/JsonTorchModels
1. This is a JSON based model definition system. It is very powerful for quick iterations and modifications during model experimentation.
2. May not be desireable all the time - create an extension and override get_model() to use a custom model.
WandB and Tensorboard logging
1. Switch between them by modifying ~/.classegrc
Extension capacity for extending functionality, while not modifying the core codebase.
1. One codebase for multiple tasks and flows
DDP Training (even with custom trainers)
K-Fold training and validation
In Development
1. React UI for dataset exploration, setup, and training
2. Main development under the ui branch

How to setup a dataset for preprocessing and trianing?

This is the most involved step for a custom dataset. There are three dataset structures. Case numbers can be any number of digits >= 5. Most common extensions are supported. Data can be images or volumes.

Classification Structure

RAW_ROOT
|
---Dataset_<desc>_<id>
.              |
.              ---- <label_0>
.              .        | case_00000.png
.              .        | case_000101.png
.              ---- <label_1>
.              .        | case_xxxxx.png
.              .        | case_xxxxx.png
.              ---- <label_n>
---Dataset_<desc2>_<id2>
|

SSL Structure

RAW_ROOT
|
---Dataset_<desc>_<id>
.        | case_00000.png
.        | case_000101.png
.        | case_xxxxx.png
.        | case_xxxxx.png
---Dataset_<desc2>_<id2>
|

Segmentation Structure

RAW_ROOT
|
---Dataset_<desc>_<id>
.              |
.              ---- imagesTr
.              .        | case_00000.png
.              .        | case_000101.png
.              ---- labelsTr
.              .        | case_00000.png
.              .        | case_00101.png
---Dataset_<desc2>_<id2>
|

Writing an Extension

Extensions currently allow you to easily create a custom trainer, inferer, and preprocessor. More easy extensionility coming soon! (you can always modify the core codebase)

Run classegCreateExtension
Follow the prompts
It is now created at <repo_root>/extensions/<extension_name>

To use your new extension, pass the name you chose to the -extension/-ext argument for trianing, preprocessing, and inference.

Change Trainer/Preprocessor/Inferer Class Names

There are default names for the template classes, of course. To rename them, you need to modify <extension_root>/init.py. Modify TRAINER_CLASS_NAME, PREPROCESSOR_CLASS_NAME, and INFERER_CLASS_NAME as needed.

Taking in custom arguments from the command line

All trainers/preprocessors/inferers take kwargs. At the entrypoint, any argument in the form of <arg>=<value> are passed as you would expect. In your extension classes, you can unpack them from the kwargs in init.

Running Training

To run training, you need to run the following command: classegTrain -d <dataset id> -f <fold to train> -ext <extension name> -m <model name> -n <experiment name> To get more info, run classegTrain --help

To switch logging mode

Modify the ~/.classegrc file. Change the logger to wandb or tensorboard.

The rc file gets created the first time you run any command (preprocess before looking for it)

WandB Details

We will allow anonymous authentication. You have a few options to deal with this:

Run wandb login before running any commands to connect your account
Train and collect the WandB anonymous link from the logs. This will allow you to view the logs and results on the WandB website, and connect it to a permanent account.

Use the WANDB_ENTITY environment variable to specify the team to log to, if you do not want to use the default. Optionally, you can specify wandb_entity in the rc file. This sets a default value for all runs, and is overriden by the environment variable if provided.

Use the WANDB_API_KEY environment variable to specify the api key to use, or login manually with wandb login, or by following prompts in your first run. Optionally, you can specify wandb_api_key in the rc file. This sets a default value for all runs, and is overriden by the environment variable if provided.

Running Preprocessing

To run preprocessing, you need to run the following command: classegPreprocess -d <dataset id> -dd <dataset description> -f <number of folds> -ext <extension name>. Run classegPreprocess --help for more info.

Running Inference

This portion is going to face a lot of changes in the future. For now, you can run inference with the following command: classegInfer -d <dataset id> -f <fold to infer> -ext <extension name> -i <input_folder> What happens if you are SSL and have no input? Too bad. Put something random for -i. Run classegInfer --help for more details, and up to date information.

Running the UI

The UI is under development. To run it, use the following command:

classegRunUI

The first time that you use this, run with the --install flag to install the npm project.

classegRunUI --install

To get the most recent UI, checkout the ui branch.

git checkout ui; git pull

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
classeg		classeg
client		client
model_definitions		model_definitions
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClasSeg - An Extendable Classification/Segmentation/SSL DL Pipeline

Default Supported Data Types

Setup

Notable Out of the Box Features

How to setup a dataset for preprocessing and trianing?

Writing an Extension

Running Training

Running Preprocessing

Running Inference

Running the UI

About

Releases

Packages

Languages

aheschl1/ClasSegPipeline

Folders and files

Latest commit

History

Repository files navigation

ClasSeg - An Extendable Classification/Segmentation/SSL DL Pipeline

Default Supported Data Types

Setup

Notable Out of the Box Features

How to setup a dataset for preprocessing and trianing?

Writing an Extension

Running Training

Running Preprocessing

Running Inference

Running the UI

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages