Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment Instructions #9

Open
K0nkere opened this issue Sep 11, 2022 · 2 comments
Open

Deployment Instructions #9

K0nkere opened this issue Sep 11, 2022 · 2 comments

Comments

@K0nkere
Copy link
Owner

K0nkere commented Sep 11, 2022

  1. Go to your /home/your-user folder and clone the repo from git

git clone https://github.com/K0nkere/kkr-mlops-project.git

It will create kkr-mlops-project folder that contains my code - in the following i will call it project folder

Create your own bucket with name <your_bucket_name> in the Cloud Service UI or with CLI command if you havent one already

aws --endpoint-url=https://storage.yandexcloud.net/ s3 mb s3://<your_bucket_name>

(Yandex Cloud Object Storage example, make sure that you config your aws via aws configure with your key_id and secret key before using this command)

  1. Edit the my.env file the project folder with your specific parameters
PUBLIC_SERVER_IP=<your_public_ip>                               #insert
MLFLOW_S3_ENDPOINT_URL=<endpoint_irl>                           #like https://storage.yandexcloud.net
AWS_DEFAULT_REGION=<defult_region>                              #like ru-central1
AWS_ACCESS_KEY_ID=<your_key_id>                                 #insert yours
AWS_SECRET_ACCESS_KEY=<your_secret_key>                         #insert yours
BACKEND_URI=sqlite:////mlflow/database/mlops-project.db         #leave it as it is
ARTIFACT_ROOT=s3://<your_bucket_name>/mlflow-artifacts/         #insert your bucket_name

!!! MLFLOW_S3_ENDPOINT_URL is needed for analogs of AWS s3 bucket.
So if your are using original AWS looks like you can delete this row.
(i am not sure of it - because i have no chances to test deployment on AWS)
And if so - you need to go to project folder and correct docker-compose.yml - comment rows with MLFLOW_S3_ENDPOINT_URL in environment blocks for all services
In the case of original AWS rows in the docker-compose.yml with MLFLOW_S3_ENDPOINT_URL should look like this
# MLFLOW_S3_ENDPOINT_URL: "${MLFLOW_S3_ENDPOINT_URL}"

Finally run from project folder under your default base environment

bash run-venv.sh

it will create virtual environments for project services
You wil find yourself in orchestration_manager venv that will be used on the following steps and can be activated from a new terminal by running pipenv shell out of the **orchestration_manager ** folder

  1. Open new terminal and from the project folder under the base env, run
bash run-tests.sh

it will launch unit and integration tests

  1. You can use terminal from stage 3 - check that ports for docker-compose is empty
    docker ps
    if needed docker kill container_id
    Service is using
5001 for MLFlow
4200 for Prefect Orion
3000 for grafana
9696 for prediction service
9898 for manager service
  1. From project folder under the orchestration_manager env being activated previously (same terminal ) run
bash run-services.sh

it will install and launch all services in docker-compose and will start Prefect Orion 2 server
It is the most important step and its better to check that everything works fine:
Now you can open in your browser following UIs:
Prefect UI
MLFlow
Grafana
tunneled ports can variate depends on your IDE
format: localhost or public_ip:tunneled_port

  1. Open new terminal and run under the orchestration_manager env from project folder
bash run-manager.sh

it will create prefect deployments and prefect queue and will launch prefect agent

  1. Open new terminal, from project folder go to orchestration_manager folder and run
pipenv shell

to activate venv
return to project folder cd ..

All services are started and ready to work!

  1. Its need to train starting model - better to do it from Prefect UI.
    Open 127.0.0.1:your-tunneled-port in browser (something like 127.0.0.1:4200 or 4201) - Prefect UI

From Deployments > **retrain_request** press **retrain-model** > and RUN it with button on the right corner
Prefect_1

Trainig process you can see on the log of Prefect Agent terminal

When the first model will be created it will automaticly will be promoted to Production stage and
you can imitate of sending data to prediction service

  1. From terminal from project folder under the orchestration env
    (cd orchestration_manager > pipenv shell > cd ..)

run python send_data.py with parameters of data in format yyyy-mm-dd and number of records to send

(dataset for every month consist of a few thousand so better to use just a few dozens or hundreds for review), like

python send_data.py 2015-05-30 200

Send_data

When all rows will be sended and logged in project folder/targets monthly report can be created. Dont need to wait till the end of month - lets manually RUN from Prefect UI

Deployments > **batch_analyze** press monitoring report > RUN

Report_drift_located

You can watch the result of report in the logs of Prefect Agent terminal
Report will be created and saved in project folder/reports you can download it by pressing left-click in VSCode
(I dont save it in the bucket:( )

When report is created and the model drift is taking place it is possible to run a retrain (manually for review)
Training__switch

By default on the end of month prefect agent will start deployment for creating report on the data for the latest month.
It saves evidently report to project/reports folder and will estimate is there a model drift of not.
After that retrain service will give a questiong to manager service is it need to retrain a model on the latest data and if there was a drift manager will return True.

  1. You can play with the manager service and run data from different month from 2015-1 to 2015-7.
    The logic of manager service is the following:
report creation and retrain can be lauched at start freely
new data was sended > report can be created - else waiting for new data
report created and drift detected > possible to retrain - else waiting for new report

If report is created, manager will not allow to create another one on the same data - need to load new data
Report_no_new_data

(just run report creation manually via Prefect UI twice one by one and watch logs)

If model was retrained, manager will not allow to retrain again - need to send a new data and create a new report
Training_no_request

(just run retrain-model manually via Prefect UI twice one by one and watch logs)

I use this logic in order to manager-service doesnt try to give a signal niether for report creation nor for retraining of model on the old data. Its need to send new batch.

!!! But you always can launch retrain process without any restrictions via Prefect UI by running Deployments > main > initial-train

@K0nkere
Copy link
Owner Author

K0nkere commented Sep 12, 2022

If everything works fine in normal conditions you can test the following thing:

  1. Break prediction service via manually promoting current Production model to None stage in MLFlow UI
  2. Try to send data - it fails as soon as there is no prediction model
  3. Try to create report via Prefect UI manually - it works but answer that waiting for new data
  4. Try to launch of retrainig process via Prefect UI retrain_request > retrain-model manually - it will answer that there no request for trainig because report was not created
  5. Launch retraining via Prefect UI main > initiall-retrain - it will train a new model anb will promote it to Production stage
  6. Wait a few seconds in order to Flask gets new model for prediction service
  7. Send data again... It works!!!

@K0nkere
Copy link
Owner Author

K0nkere commented Sep 14, 2022

An important addition to Fast run section of Readme - its need to train initial model via Prefect UI > Deployments > retrain_request press retrain-model > RUN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant