Skip to content

MichaelKarpe/mylearn

Repository files navigation

mylearn: my Machine Learning framework


mylearn is a Machine Learning framework based on Airflow and MLflow for designing machine learning systems in a production perspective.

Work in progress... Stay tuned!

Index

  1. Prerequisites
    1. Recommended software
    2. Install environment
    3. Set up PostgreSQL database for Airflow
    4. Set up Airflow
    5. Set up MLflow
  2. Usage (#FIXME)

Prerequisites

mylearn leverages poetry and poethepoet to make its installation and setup surprisingly simple. We recommend install and use mylearn under a Linux environment and strictly follow indications provided in this section to avoid any struggle with mylearn installation.

Recommended software

  • Git
  • PostgreSQL
  • pgAdmin (optional)
  • pyenv
    # Install binary dependencies and build tools
    sudo apt update
    sudo apt install build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev curl libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev
    
    # Install pyenv
    curl https://pyenv.run | bash
    echo 'export PATH="$HOME/.pyenv/bin:$PATH"' >> ~/.bashrc
    echo 'eval "$(pyenv init -)"' >> ~/.bashrc
    echo 'eval "$(pyenv virtualenv-init -)"' >> ~/.bashrc
    source ~/.bashrc
    
    # Install a Python version and set it as default:
    pyenv install 3.11.6
    pyenv global 3.11.6
    
  • poetry
    curl -sSL https://install.python-poetry.org | python3 -
    echo 'export PATH="~/.local/bin:$PATH"' >> ~/.bashrc
    

Once poetry is installed, close and reopen your terminal. We recommended configure poetry to install requirements within a virtualenv located at the project root level, although not required.

poetry config virtualenvs.in-project true

Install environment

Installation is run with:

poetry install

Should you install from the requirements.txt file instead of the poetry.lock file:

pyenv shell 3.11.6
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Set up PostgreSQL database for Airflow

Open PostgreSQL command line:

sudo -i -u postgres
psql

Create airflow database with airflow user and airflow password:

CREATE DATABASE airflow;
CREATE USER airflow WITH PASSWORD 'airflow';
GRANT ALL PRIVILEGES ON DATABASE airflow TO airflow;
ALTER DATABASE airflow OWNER TO airflow;
ALTER ROLE airflow WITH CREATEDB;

Set up pgAdmin (optional)

Open pgAdmin, right-click on "Servers" at the top-left and click on "Register > Server".

Then, provide your desired "Name" in the "General" tab, and the following information in the "Connection" tab where:

  • "Port" matches the value in the AIRFLOW__DATABASE__SQL_ALCHEMY_CONN variable in pyproject.toml
  • "Maintenance database", "Username" and "Password" match the names defined in the previous subsection
  • "Save password" is activated

pgadmin.png

Set up Airflow

Airflow is initialized with a single poe command

poe airflow-init

Airflow Scheduler & Webserver can be run with

poe airflow-scheduler
poe airflow-webserver

Airflow UI can be opened at localhost:8080, and you can login with username and password admin.

If you want to clean your Airflow setup before rerunning poe airflow-init, you need to kill Airflow Scheduler & Webserver and run

poe airflow-clean

Set up MLflow (#FIXME)

MLflow UI can be opened at localhost:5000 after execution of the following command:

poe mlflow-ui

Usage (#FIXME)

MLflow Pipelines Regression Template

The mlflow-template pipeline, based on the MLflow Pipelines Regression Template, can be run independently with

poe mlflow-run

or via an Airflow Directed Acyclic Graph (DAG) by triggering the mlflow-template DAG via Airflow UI or with

TO BE COMPLETED

Other examples

Work in progress... Stay tuned!