Project structure to Run Hundreds of ML Pipelines

In this repository, we want to propose an organized structure for your ML projects to simplify traning and tuning lots of ML pipelines. This repository is based on https://medium.com/@a.larian/how-to-structure-your-machine-learning-project-to-run-hundreds-of-pipelines-960fcc4c877a

Prerequisites

(Back to top)

Knowing following concepts give you a great idea what we try to talk about. ‌Basic machine learning concepts.

Basic python (3 or higher versions)
Having a little experience with ML libraries such as scikit-learn, XGBoost, Dask, pandas, or etc.
And some other libraries like imbalanced-learn, category_encoders would help you.

Installation

(Back to top)

To use this project, first clone the repo on your device using the command below:

git init

git clone https://github.com/IDAS-Labratory/Project-Structure-to-Run-Hundreds-of-ML-Pipelines.git

Requirements

pip install requirements.txt

Usage

(Back to top)

To run this sample project just go to main_run.ipynb notebook and run all cells.

Project structure

(Back to top)

.
├── project                       <- project's name
│   ├── configs                   <- Yaml config files correspond to ML models, dimensionality reduction
|   |   |                            algorithms, oversamplers and any other components in your pipeline
│   │   ├── encoder_hp.yaml          that you want to tune it.
│   │   ├── ml_model_hp.yaml
│   │   ├── imputer_hp.yaml
│   │   └── oversampler_hp.yaml
|   |
│   ├── models                    <- Seperating each main component of your pipeline into
|   |   |                             different .py files.
│   │   ├── __init__.py
│   │   ├── _encoders.py
│   │   ├── _fetch_hyperparameter.py
│   │   ├── _imputers.py
│   │   ├── _ml_algorithms.py
│   │   └── _oversamplers.py
|   |
│   ├── pipelines                 <- Putting your pipelines(if you have several) here. 
|   |   |                            Each pipeline is written as a function.
│   │   ├── __init__.py
│   │   └── _make_pipeline.py
|   |
│   ├── results                   
|   |   |                            
│   │   ├── final-models          <- Saving results of training and testing.
│   │   └── outputs               <- Serialized pipelines are stored here.
|   |
│   └── utils
│       ├── __init__.py
│       ├── _handle_results.py
│       ├── _evaluation_metrics.py
│       └── _load_data.py
|
├── analytics                     <- Putting your analytical notebooks here.
|   |
│   ├── hypotesting
│   └── visualization
|
├── requirements.txt
|
├── main_run.ipynb
|
├── api                          
|
├── data
|   |
│   ├── intermediate             <- Intermediate data that has been transformed.
│   ├── processed                <- The final, canonical data sets for modeling.
│   └── raw                      <- The original, immutable data dump.
│       └── H1N1_Flu_Vaccines.csv
|
├── docker                       <- Docker file of your project.
├── experiments                  <- To test new model, lib etc.
├── lib                          <- Libraries that need to be added manually.
│   └── impute                   
│       ├── __init__.py
│       └── _knn.py
├── papers                       <- Papers which are related to your project.
└── scripts                      <- Scripts like scraping etc (if it is needed).

Data flow between directories

(Back to top)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project structure to Run Hundreds of ML Pipelines

Table of contents

Prerequisites

Installation

Requirements

Usage

Project structure

Data flow between directories

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Analytics		Analytics
api		api
data		data
docker		docker
docs/img		docs/img
experiments		experiments
papers		papers
project		project
scripts		scripts
.gitignore		.gitignore
README.md		README.md
main_run.ipynb		main_run.ipynb
requirements.txt		requirements.txt

IDAS-Labratory/Project-Structure-to-Run-Hundreds-of-ML-Pipelines

Folders and files

Latest commit

History

Repository files navigation

Project structure to Run Hundreds of ML Pipelines

Table of contents

Prerequisites

Installation

Requirements

Usage

Project structure

Data flow between directories

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages