Data Pipelines and Workflow Orchestration with Prefect

This is an example multi-class classification machine learning project to showcase tools and best practices in the areas of

data science: scikit-learn

machine learning operations (MLOps): MLflow, Prefect

software development: Sphinx

In the end, this repository will contain and showcase the following aspects of an end-to-end machine learning project:

Resources

Prefect
- Homepage
- GitHub
- Documentation
  - Concepts
  - Tutorials
- Integrations
  - Juypter (based on papermill)
  - GCP

Questions

How to run Prefect-integrated code without Prefect (e.g. in an environment where this is not supported)?
Does Prefect have a concept of configuration files to pass parameters to the pipeline or to override default parameters of individual tasks?
How best to generate visualisations and dataframe printouts during intermediate steps of the pipeline and transport them outside?
- I'm not sure all of this diagnostic information should be logged to MLflow
- Perhaps that's what artifact mechanism is for

Thoughts and Notes

Adding 3D coordinates gives an opportunity to use tSNE for creating a 2D visualisation
New features can be defined in preprocessing steps in the pipeline using Pandas or as part of a feature engineering step in the model itself using scikit-learn. My personal thoughts on this are the following:
- If all of the feature engineering can be done in scikit-learn, then this is preferable because simply exporting the model (as a scikit-learn pipeline) will include the additional features
- If there are elements that have to be implemented outside of the scikit-learn model pipeline, then the proper outer pipeline (i.e. the part that is implemented in Prefect in this demo) has to be deployed anyways and it is preferable to make the pipeline as clean and consistent as possible - which may mean limiting the amount of feature engineering done with scikit-learn.

Getting Started

Prepare the Development Environment

In this package, the runtime/deployment dependencies are listed in the requirements.txt file, whereas additional development dependencies are collected in the playground-prefect.yml file.
The requirements.txt file, however, is included in the playground-prefect.yml
Therefore, to create the development environment, it is sufficient to run
```
$ conda env create -f playground-prefect.yml
```
or, alternatively, using the faster mamba package manager
```
$ mamba env create -f playground-prefect.yml
```
which will install the packages listed in the requirements.txt into the same environment as well.
The development environment can then be activated with
```
$ conda activate playground-prefect
```
The advantage of this structure is that a requirements.txt file is provided, which can be used for packaging, while at the same time avoiding having to maintain two partially overlapping dependency lists.

Use Jupytext to Turn Notebooks Into Equivalent Python Scripts

Use jupytext to convert the Python scripts in the root folder (such as 1-main.py) to Jupyter notebooks:
```
$ jupytext --set-formats ipynb,py:percent 1-main.py
```
After modifying a notebook, sync the .ipynb and the .py files with
```
$ jupytext --sync 1-main.ipynb
```

Start Up Prefect

Spin up a Prefect server:
```
$ prefect server start
```
This will by default start the web UI at http://127.0.0.1:4200

Prefect Cheat Sheet

Settings and Profiles

Specify or change a setting:

$ prefect config set PREFECT_TASKS_REFRESH_CACHE='True'

Reset to the default value:

$ prefect config unset PREFECT_TASKS_REFRESH_CACHE

View the currently active settings:
```
$ prefect config view
```

Profiles

List all available profiles:
```
$ prefect profile ls
```
View the settings associated with the currently active profile:
```
$ prefect profile inspect
```

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
mlops_prefect		mlops_prefect
tests		tests
.gitignore		.gitignore
1-main.py		1-main.py
README.md		README.md
playground-prefect.yml		playground-prefect.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Pipelines and Workflow Orchestration with Prefect

Resources

Questions

Thoughts and Notes

Getting Started

Prepare the Development Environment

Use Jupytext to Turn Notebooks Into Equivalent Python Scripts

Start Up Prefect

Prefect Cheat Sheet

Settings and Profiles

Profiles

About

Releases

Packages

Languages

dmrauch/mlops-prefect

Folders and files

Latest commit

History

Repository files navigation

Data Pipelines and Workflow Orchestration with Prefect

Resources

Questions

Thoughts and Notes

Getting Started

Prepare the Development Environment

Use Jupytext to Turn Notebooks Into Equivalent Python Scripts

Start Up Prefect

Prefect Cheat Sheet

Settings and Profiles

Profiles

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages