This lab concerns versioning data and models, as well as experiment tracking. Those are core MLOps concepts, fostering reproducibility, reliability, and auditability of ML processes.
Learning plan
- Data versioning
- Data Version Control (DVC)
- configuring remote data storage
- versioning datasets
- Experiment tracking
- MLflow introduction, MLflow Tracking
- autologging, custom logging
- analyzing & comparing experiments
Necessary software
Note that you should also activate uv project and install dependencies with uv sync.
Lab
There are separate instructions for DVC (part 1) and MLFlow (parts 2-3). DVC uses Markdown instructions in first lab instruction file. MLflow uses Jupyter Notebook in second lab instruction file.
There is no homework, only lab this time :)
Data
We will be using Ames housing dataset about house prices in 2006-2010 in Ames, Iowa.