mylearn is a Machine Learning framework based on Airflow and MLflow for designing machine learning systems in a production perspective.
Work in progress... Stay tuned!
mylearn
leverages poetry and poethepoet
to make its installation and setup surprisingly simple. We recommend install and use mylearn
under a Linux environment
and strictly follow indications provided in this section to avoid any struggle with mylearn
installation.
- Git
- PostgreSQL
- pgAdmin (optional)
- pyenv
# Install binary dependencies and build tools sudo apt update sudo apt install build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev curl libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev # Install pyenv curl https://pyenv.run | bash echo 'export PATH="$HOME/.pyenv/bin:$PATH"' >> ~/.bashrc echo 'eval "$(pyenv init -)"' >> ~/.bashrc echo 'eval "$(pyenv virtualenv-init -)"' >> ~/.bashrc source ~/.bashrc # Install a Python version and set it as default: pyenv install 3.11.6 pyenv global 3.11.6
- poetry
curl -sSL https://install.python-poetry.org | python3 - echo 'export PATH="~/.local/bin:$PATH"' >> ~/.bashrc
Once poetry is installed, close and reopen your terminal. We recommended configure poetry
to install requirements within a virtualenv
located at the project root level, although not required.
poetry config virtualenvs.in-project true
Installation is run with:
poetry install
Should you install from the requirements.txt
file instead of the poetry.lock
file:
pyenv shell 3.11.6
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Open PostgreSQL command line:
sudo -i -u postgres
psql
Create airflow
database with airflow
user and airflow
password:
CREATE DATABASE airflow;
CREATE USER airflow WITH PASSWORD 'airflow';
GRANT ALL PRIVILEGES ON DATABASE airflow TO airflow;
ALTER DATABASE airflow OWNER TO airflow;
ALTER ROLE airflow WITH CREATEDB;
Open pgAdmin, right-click on "Servers" at the top-left and click on "Register > Server".
Then, provide your desired "Name" in the "General" tab, and the following information in the "Connection" tab where:
- "Port" matches the value in the
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN
variable inpyproject.toml
- "Maintenance database", "Username" and "Password" match the names defined in the previous subsection
- "Save password" is activated
Airflow is initialized with a single poe
command
poe airflow-init
Airflow Scheduler & Webserver can be run with
poe airflow-scheduler
poe airflow-webserver
Airflow UI can be opened at localhost:8080, and you can login with username and password admin
.
If you want to clean your Airflow setup before rerunning poe airflow-init
, you need to kill Airflow Scheduler &
Webserver and run
poe airflow-clean
MLflow UI can be opened at localhost:5000 after execution of the following command:
poe mlflow-ui
The mlflow-template pipeline, based on the MLflow Pipelines Regression Template, can be run independently with
poe mlflow-run
or via an Airflow Directed Acyclic Graph (DAG) by triggering the mlflow-template DAG via Airflow UI or with
TO BE COMPLETED
Work in progress... Stay tuned!