MLOps for Landcoverpy

Overview

This repository hosts the essential resources for deploying the landcoverpy model within our specially developed MLOps infrastructure. It is designed to work seamlessly with a Kubernetes cluster, assuming that the MLOps infrastructure is already deployed. For detailed guidance on deploying each component, refer to our mlops-infra repository.

Infrastructure Components

The project relies on the following infrastructure components:

MLflow: Manages the model registry.
PostgreSQL: Provides the database backend required by MLflow for metadata storage.
Seldon Core: Handles model deployment.
Prefect: Manages workflow and pipeline orchestration.
MinIO: Offers object storage solutions.
Prometheus: Enables runtime monitoring and alerting.

Repository Structure

The repository is organized into four primary directories:

`training`

Contains all files necessary to deploy the training pipeline using Prefect. This pipeline manages the (re)training of models with Satellite imagery and validated locations, and stores the model versions in MLflow.

`deployment`

Includes resources for deploying trained models from MLflow. It facilitates the deployment of models to a production environment.

`prediction`

Provides a test script to verify the functionality of deployed models. The script generates randomized, valid polygons and sends them to the deployed model instances for predictions.

`gui`

Provides a web-based user interface for interacting with the deployed models. It allows users to interact with a map and draw polygons to generate, visualise and download predictions.

`.github/workflows`

Automatically updates Docker images in the Google Container Registry (GCR) upon new commits.

Before you begin

Before starting to integrate the models in the infrastructure, please perform the following steps:

Clone this repository:

  git clone https://github.com/KhaosResearch/mlops-landcoverpy.git
  cd mlops-landcoverpy

Modify the config.conf file with the infrastructure service addresses, ports, etc.
Execute the setup.sh script to replace the placeholders in all the files with the values in config.conf:

  chmod +x setup.sh
  ./setup.sh

How to integrate landcoverpy models in the infrastructure

The following sections describe how to integrate landcoverpy models in the infrastructure.

Deploy the base model in MLflow

Move to the training directory:

  cd training

Train one instance of the model using the landcoverpy repository. This will create several files such as the model instance model.joblib, model's metadata metadata.json, and a confusion matrix confusion_matrix.png (also confusion_matrix.csv). Move these files, along with the train and test data used to a base_model directory.
Execute the upload_base_model.py script to upload the first version of the model to the model registry. Having the base model, all the subsequent versions will be trained and uploaded automatically using Prefect.
```
 python upload_base_model.py
```
Deploy the base version of the model to the production environment using the deployment/deploy_landcover.yaml file. You will need to use the same environment variables used for training the base model.
```
 cd ../deployment
 kubectl apply -f deploy_landcover.yaml
```

Deploy the training pipeline

Move to the training directory:

  cd training

To let prefect uploading deployment files to the storage block (S3), environment variable FSSPEC_S3_ENDPOINT_URL has to be set.

  export FSSPEC_S3_ENDPOINT_URL=http://<S3-IP>:<S3-PORT>

Create a deployment for the training pipeline providing the K8s infrastructure.

  prefect deployment build -n retraining-flow-deployment-k8s \
  -p k8s-pool -ib kubernetes-job/k8s-infra -sb s3/khaos-minio \
  -o retraining_flow_deployment.yaml \
  pipeline_retrain.py:retraining_flow

Upload to prefect the deployment for the training pipeline linked to the infrastructure.

  prefect deployment apply test_flow_deployment.yaml

Deploy to the k8s cluster agents able to run the retraining pipeline.

  kubectl apply -f prefect_agent_deployment.yaml

Now it is possible to retrain the model automatically from Prefect UI or using the CLI indicating new data location in the object storage.

Making predictions to the models

Once models are deployed, a REST API and a GRPC API are available through the istio gateway. The easiest way to make predictions to the models is using the GUI. However, it is also possible to make predictions using the CLI or the Python SDK.

Example Python code to make predictions using the Python SDK:

    sc = SeldonClient(
        deployment_name=f"landcover-seldon",
        namespace="mlops-seldon"
    )
    res = sc.predict(
        transport="grpc",
        gateway="istio",
        gateway_endpoint=gateway_endpoint,
        raw_data={"strData":json.dumps(geojson)}
    )
    print(res.response["jsonData"]["result"])

Using the GUI

To locally run the GUI, you will need to install the dependencies in the gui directory:

    cd gui
    pip install -r requirements.txt

Then, you can run the GUI using the following command:

    streamlit run streamlit_app.py

The command will start a local server and open a browser window with the GUI. The GUI allows users to interact with a map and draw polygons to generate, visualise and download predictions.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github/workflows		.github/workflows
deployment		deployment
gui		gui
prediction		prediction
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.conf		config.conf
requirements.txt		requirements.txt
setup.sh		setup.sh

License

KhaosResearch/mlops-landcoverpy

Folders and files

Latest commit

History

Repository files navigation

MLOps for Landcoverpy

Overview

Infrastructure Components

Repository Structure

training

deployment

prediction

gui

.github/workflows

Before you begin

How to integrate landcoverpy models in the infrastructure

Deploy the base model in MLflow

Deploy the training pipeline

Making predictions to the models

Using the GUI

About

Resources

License

Stars

Watchers

Forks

Languages

`training`

`deployment`

`prediction`

`gui`

`.github/workflows`