ML Pipeline for tabular data

Installation

    python3 -m venv env
    source env/bin/activate
    pip install poetry
    poetry install

Functionalities

This pipeline for tabular data offers the following functionalities:

automatic clean-up
splitting of dataset for any number of desired seeds or bootstraps
imputation of missing data and data normalisation
oversampling (if desired)
ability to run multiple feature selection strategies which can be configured step-by-step
verification of these strategies using one or more models
explainability

Configuration

Make sure to configure everything needed for your experiments in the config.yaml file.
Most important is the target_label, input_file and the label_as_index (if available).
Other noteworthy entries in the config file:

meta:
- workers: set according to your machine
impute:
- method: method to use for imputation of missing values
data_split:
- n_seeds: number of data split seeds to run
- test_frac: fraction of dataset to use for testing
selection:
- scoring: the metric to use for training during selection and verification
- jobs: each list defines a job of desired feature selection steps and normalisation
verification:
- models: models to train and test
- param_grids: parameter grids for GridSearchCV

Run

After the config file is set up properly, you can run the pipeline using:

python3 main.py

Computation progress is saved after each seed/bootstrap and will not be recomputed unless the meta.overwrite flag is set to True.

Name		Name	Last commit message	Last commit date
Latest commit History 312 Commits
pipeline_tabular		pipeline_tabular
.gitignore		.gitignore
.pylintrc		.pylintrc
collect_results.py		collect_results.py
config.yaml		config.yaml
licence.md		licence.md
main.py		main.py
pyproject.toml		pyproject.toml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pipeline_tabular

pipeline_tabular

.gitignore

.gitignore

.pylintrc

.pylintrc

collect_results.py

collect_results.py

config.yaml

config.yaml

licence.md

licence.md

main.py

main.py

pyproject.toml

pyproject.toml

readme.md

readme.md

Repository files navigation

ML Pipeline for tabular data

Table of contents

Installation

Functionalities

Configuration

Run

About

Releases

Packages

Contributors 2

Languages

License

AI-in-Cardiovascular-Imaging/ML_pipeline_tabular

Folders and files

Latest commit

History

Repository files navigation

ML Pipeline for tabular data

Table of contents

Installation

Functionalities

Configuration

Run

About

Resources

License

Stars

Watchers

Forks

Languages