Deployment Instructions #9

K0nkere · 2022-09-11T21:43:40Z

Go to your /home/your-user folder and clone the repo from git

git clone https://github.com/K0nkere/kkr-mlops-project.git

It will create kkr-mlops-project folder that contains my code - in the following i will call it project folder

Create your own bucket with name <your_bucket_name> in the Cloud Service UI or with CLI command if you havent one already

aws --endpoint-url=https://storage.yandexcloud.net/ s3 mb s3://<your_bucket_name>

(Yandex Cloud Object Storage example, make sure that you config your aws via aws configure with your key_id and secret key before using this command)

Edit the my.env file the project folder with your specific parameters

PUBLIC_SERVER_IP=<your_public_ip>                               #insert
MLFLOW_S3_ENDPOINT_URL=<endpoint_irl>                           #like https://storage.yandexcloud.net
AWS_DEFAULT_REGION=<defult_region>                              #like ru-central1
AWS_ACCESS_KEY_ID=<your_key_id>                                 #insert yours
AWS_SECRET_ACCESS_KEY=<your_secret_key>                         #insert yours
BACKEND_URI=sqlite:////mlflow/database/mlops-project.db         #leave it as it is
ARTIFACT_ROOT=s3://<your_bucket_name>/mlflow-artifacts/         #insert your bucket_name

!!! MLFLOW_S3_ENDPOINT_URL is needed for analogs of AWS s3 bucket.
So if your are using original AWS looks like you can delete this row.
(i am not sure of it - because i have no chances to test deployment on AWS)
And if so - you need to go to project folder and correct docker-compose.yml - comment rows with MLFLOW_S3_ENDPOINT_URL in environment blocks for all services
In the case of original AWS rows in the docker-compose.yml with MLFLOW_S3_ENDPOINT_URL should look like this
# MLFLOW_S3_ENDPOINT_URL: "${MLFLOW_S3_ENDPOINT_URL}"

Finally run from project folder under your default base environment

bash run-venv.sh

it will create virtual environments for project services
You wil find yourself in orchestration_manager venv that will be used on the following steps and can be activated from a new terminal by running pipenv shell out of the **orchestration_manager ** folder

Open new terminal and from the project folder under the base env, run

bash run-tests.sh

it will launch unit and integration tests

You can use terminal from stage 3 - check that ports for docker-compose is empty
docker ps
if needed docker kill container_id
Service is using

5001 for MLFlow
4200 for Prefect Orion
3000 for grafana
9696 for prediction service
9898 for manager service

From project folder under the orchestration_manager env being activated previously (same terminal ) run

bash run-services.sh

it will install and launch all services in docker-compose and will start Prefect Orion 2 server
It is the most important step and its better to check that everything works fine:
Now you can open in your browser following UIs:
Prefect UI
MLFlow
Grafana
tunneled ports can variate depends on your IDE
format: localhost or public_ip:tunneled_port

Open new terminal and run under the orchestration_manager env from project folder

bash run-manager.sh

it will create prefect deployments and prefect queue and will launch prefect agent

Open new terminal, from project folder go to orchestration_manager folder and run

pipenv shell

to activate venv
return to project folder cd ..

All services are started and ready to work!

Its need to train starting model - better to do it from Prefect UI.
Open 127.0.0.1:your-tunneled-port in browser (something like 127.0.0.1:4200 or 4201) - Prefect UI

From Deployments > **retrain_request** press **retrain-model** > and RUN it with button on the right corner

Trainig process you can see on the log of Prefect Agent terminal

When the first model will be created it will automaticly will be promoted to Production stage and
you can imitate of sending data to prediction service

From terminal from project folder under the orchestration env
(cd orchestration_manager > pipenv shell > cd ..)

run python send_data.py with parameters of data in format yyyy-mm-dd and number of records to send

(dataset for every month consist of a few thousand so better to use just a few dozens or hundreds for review), like

python send_data.py 2015-05-30 200

When all rows will be sended and logged in project folder/targets monthly report can be created. Dont need to wait till the end of month - lets manually RUN from Prefect UI

Deployments > **batch_analyze** press monitoring report > RUN

You can watch the result of report in the logs of Prefect Agent terminal
Report will be created and saved in project folder/reports you can download it by pressing left-click in VSCode
(I dont save it in the bucket:( )

When report is created and the model drift is taking place it is possible to run a retrain (manually for review)

By default on the end of month prefect agent will start deployment for creating report on the data for the latest month.
It saves evidently report to project/reports folder and will estimate is there a model drift of not.
After that retrain service will give a questiong to manager service is it need to retrain a model on the latest data and if there was a drift manager will return True.

You can play with the manager service and run data from different month from 2015-1 to 2015-7.
The logic of manager service is the following:

report creation and retrain can be lauched at start freely

new data was sended > report can be created - else waiting for new data

report created and drift detected > possible to retrain - else waiting for new report

If report is created, manager will not allow to create another one on the same data - need to load new data

(just run report creation manually via Prefect UI twice one by one and watch logs)

If model was retrained, manager will not allow to retrain again - need to send a new data and create a new report

(just run retrain-model manually via Prefect UI twice one by one and watch logs)

I use this logic in order to manager-service doesnt try to give a signal niether for report creation nor for retraining of model on the old data. Its need to send new batch.

!!! But you always can launch retrain process without any restrictions via Prefect UI by running Deployments > main > initial-train

The text was updated successfully, but these errors were encountered:

K0nkere · 2022-09-12T11:35:58Z

If everything works fine in normal conditions you can test the following thing:

Break prediction service via manually promoting current Production model to None stage in MLFlow UI
Try to send data - it fails as soon as there is no prediction model
Try to create report via Prefect UI manually - it works but answer that waiting for new data
Try to launch of retrainig process via Prefect UI retrain_request > retrain-model manually - it will answer that there no request for trainig because report was not created
Launch retraining via Prefect UI main > initiall-retrain - it will train a new model anb will promote it to Production stage
Wait a few seconds in order to Flask gets new model for prediction service
Send data again... It works!!!

K0nkere · 2022-09-14T16:38:36Z

An important addition to Fast run section of Readme - its need to train initial model via Prefect UI > Deployments > retrain_request press retrain-model > RUN

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployment Instructions #9

Deployment Instructions #9

K0nkere commented Sep 11, 2022 •

edited

K0nkere commented Sep 12, 2022 •

edited

K0nkere commented Sep 14, 2022 •

edited

Deployment Instructions #9

Deployment Instructions #9

Comments

K0nkere commented Sep 11, 2022 • edited

K0nkere commented Sep 12, 2022 • edited

K0nkere commented Sep 14, 2022 • edited

K0nkere commented Sep 11, 2022 •

edited

K0nkere commented Sep 12, 2022 •

edited

K0nkere commented Sep 14, 2022 •

edited