MLOps Marathon 2023 - Sample solution

This repository is the sample solution for MLOps Marathon 2023.

Quickstart

Prepare environment

# Install python 3.9
# Install docker version 20.10.17
# Install docker-compose version v2.6.1
pip install -r requirements.txt
make mlflow_up

Train model

Download data, ./data/raw_data dir should look like

data/raw_data
├── .gitkeep
└── phase-1
    └── prob-1
        ├── features_config.json
        └── raw_train.parquet

Process data

python src/raw_data_processor.py --phase-id phase-1 --prob-id prob-1

After processing data, ./data/train_data dir should look like

data/train_data
├── .gitkeep
└── phase-1
    └── prob-1
        ├── category_index.pickle
        ├── test_x.parquet
        ├── test_y.parquet
        ├── train_x.parquet
        └── train_y.parquet

Train model

export MLFLOW_TRACKING_URI=http://localhost:5000
python src/model_trainer.py --phase-id phase-1 --prob-id prob-1

Register model: Go to mlflow UI at http://localhost:5000 and register a new model named phase-1_prob-1_model-1

Deploy model predictor

Create model config at data/model_config/phase-1/prob-1/model-1.yaml with content:

phase_id: "phase-1"
prob_id: "prob-1"
model_name: "phase-1_prob-1_model-1"
model_version: "1"

Test model predictor

# run model predictor
export MLFLOW_TRACKING_URI=http://localhost:5000
python src/model_predictor.py --config-path data/model_config/phase-1/prob-1/model-1.yaml --port 8000

# curl in another terminal
curl -X POST http://localhost:8000/phase-1/prob-1/predict -H "Content-Type: application/json" -d @data/curl/phase-1/prob-1/payload-1.json

# stop the predictor above

Deploy model predictor
```
make predictor_up
make predictor_curl
```

After running make predictor_curl to send requests to the server, ./data/captured_data dir should look like:

 data/captured_data
 ├── .gitkeep
 └── phase-1
     └── prob-1
         ├── 123.parquet
         └── 456.parquet

Improve model

The technique to improve model by using the prediction data is described in improve_model.md.

Label the captured data, taking around 3 minutes

python src/label_captured_data.py --phase-id phase-1 --prob-id prob-1

After label the captured data, ./data/captured_data dir should look like:

data/captured_data
├── .gitkeep
└── phase-1
    └── prob-1
        ├── 123.parquet
        ├── 456.parquet
        └── processed
            ├── captured_x.parquet
            └── uncertain_y.parquet

Improve model with updated data

export MLFLOW_TRACKING_URI=http://localhost:5000
python src/model_trainer.py --phase-id phase-1 --prob-id prob-1 --add-captured-data true

Register model: Go to mlflow UI at http://localhost:5000 and register model using the existing name phase-1_prob-1_model-1. The latest model version now should be 2.

Update model config at data/model_config/phase-1/prob-1/model-1.yaml to:

phase_id: "phase-1"
prob_id: "prob-1"
model_name: "phase-1_prob-1_model-1"
model_version: "2"

Deploy new model version

make predictor_restart
make predictor_curl

Teardown
```
make teardown
```

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
deployment		deployment
src		src
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
improve_model.md		improve_model.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

deployment

deployment

src

src

utils

utils

.dockerignore

.dockerignore

.gitignore

.gitignore

Makefile

Makefile

README.md

README.md

improve_model.md

improve_model.md

requirements.txt

requirements.txt

Repository files navigation

MLOps Marathon 2023 - Sample solution

Quickstart

About

Contributors 2

Languages

MLOpsVN/mlops-mara-sample-public

Folders and files

Latest commit

History

Repository files navigation

MLOps Marathon 2023 - Sample solution

Quickstart

About

Topics

Resources

Stars

Watchers

Forks

Languages