Skip to content

Commit

Permalink
Minor edits
Browse files Browse the repository at this point in the history
  • Loading branch information
ahmed-shariff committed Jul 23, 2019
1 parent ea1e81f commit dc834c6
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 19 deletions.
25 changes: 14 additions & 11 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,31 +1,34 @@
mlpipeline
==========
This is a simple frawork to organize you machine learning workflow. It automates most of the basic functionalities such as logging, a framework for testing models and gluing together different steps at different stages. This project came about as a result of me abstracting the boilerplate code and automating different parts of the process.
This is a simple framework to organize you machine learning workflow. It automates most of the basic functionalities such as logging, a framework for testing models and gluing together different steps at different stages. This project came about as a result of me abstracting the boilerplate code and automating different parts of the process.

The aim of this simple framework is to consolidate the different sub-problems (such as loading data, model configurations, training process, evalutaion process, exporting trained models, etc.) when working/researching with machine learning models. This allows the user to define how the different sub-problems are to be solved using their choice of tools and mlpipeline would handle piecing them together.
The aim of this simple framework is to consolidate the different sub-problems (such as loading data, model configurations, training process, evaluation process, exporting trained models, etc.) when working/researching with machine learning models. This allows the user to define how the different sub-problems are to be solved using their choice of tools and mlpipeline would handle piecing them together.

Core operations
---------------
This framework chains the different operations (sub-problems) depending on the mode it is executed in. mlpipeline currently has 3 modes:
- TEST mode: When in TEST mode, it doesn't perform any logging or tracking. It creates a temporory empty directory for the experiment to store the artifacts of an experiment in. When developing and testing the different operations, this mode can be used.
- RUN mode: In this mode, logging and tracking is performed. In addition, for each experiment run (refered to as a experiment version in mlpipeline) a directory is created for artifacts to be stored.
- EXPORT mode: In this mode, the exporting related operations will be executed instead of the training/evaluation related operations.

In addition to providing different modes, the pipeline also supports logging and recording various details. Currently mlpipeline records all logs, metrics and artifacts using a bacis log files as well using `mlflow <https://github.com/databricks/mlflow>`_.
* TEST mode: When in TEST mode, it doesn't perform any logging or tracking. It creates a temporary empty directory for the experiment to store the artifacts of an experiment in. When developing and testing the different operations, this mode can be used.
* RUN mode: In this mode, logging and tracking is performed. In addition, for each experiment run (referred to as a experiment version in mlpipeline) a directory is created for artifacts to be stored.
* EXPORT mode: In this mode, the exporting related operations will be executed instead of the training/evaluation related operations.

In addition to providing different modes, the pipeline also supports logging and recording various details. Currently mlpipeline records all logs, metrics and artifacts using a basic log files as well using `mlflow <https://github.com/databricks/mlflow>`_.

The following information is recorded:
- The scripts that were executed/impoerted in relation to an experiment.
- The any output results
- The metrics and parameters

* The scripts that were executed/imported in relation to an experiment.
* The any output results
* The metrics and parameters

Documentation
-------------
The documentation is hosted at `ReadTheDocs <https://mlpipeline.readthedocs.io/>`_.

Installing
----------
Can be installed directly using the Python Package Index using pip:
pip install mlpipeline
Can be installed directly using the Python Package Index using pip::
pip install mlpipeline

Usage
-----
Expand Down
9 changes: 9 additions & 0 deletions examples/sample-project/experiments/sample_load_experiment.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from mlpipeline import get_experiment


def main():
print(get_experiment('sample_experiment.py', '', 'version5'))


if __name__ == '__main__':
main()
11 changes: 3 additions & 8 deletions examples/sample-project/sample_pipeline_execution.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,14 +40,9 @@ def train_pipeline_with_blacklist():


def load_experiment():
exp, exp_dir, tracking_uri, run_id = get_experiment("experiments/sample_experiment.py",
"experiments",
"version5")
import os
import mlflow
mlflow.start_run(run_id=run_id)
print(os.listdir(exp_dir), run_id, mlflow.get_artifact_uri(), mlflow.get_tracking_uri())
print(mlflow.get_artifact_uri('sample_experiment.py'))
print(get_experiment("experiments/sample_experiment.py",
"experiments",
"version5"))


if __name__ == "__main__":
Expand Down

0 comments on commit dc834c6

Please sign in to comment.