1 Experiment tracking #2

K0nkere · 2022-08-12T12:31:14Z

Creating MLFlow server with custom s3 bucket as Docker container

Dockerfile

FROM python:3.9.7-slim

WORKDIR /mlflow/

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt && \
  rm requirements.txt

EXPOSE 5001

ENV MLFLOW_S3_ENDPOINT_URL=https://storage.yandexcloud.net
ENV AWS_DEFAULT_REGION=ru-central1
ENV AWS_ACCESS_KEY_ID=<key_id>
ENV AWS_SECRET_ACCESS_KEY=<key>

ENV BACKEND_URI sqlite:////mlflow/mlops-project.db
ENV ARTIFACT_ROOT s3://kkr-mlops-zoomcamp/mlflow-artifacts/

# ENTRYPOINT ["bash"]
CMD mlflow server --backend-store-uri ${BACKEND_URI} --default-artifact-root ${ARTIFACT_ROOT} --host 0.0.0.0 --port 5001

building from the /project folder
docker build -t project-mlflow-server ./1-experiment-tracking/
and running
docker run -it -v /mlflow-database:/mlflow/ -p 5001:5001 project-mlflow-server:latest

After that it will be accessible via 127.0.0.1:5001 or <public_server_ip>:5001

The text was updated successfully, but these errors were encountered:

K0nkere · 2022-08-20T00:21:48Z

For the mlflow to save artifact in a custom bucket

its need to add few environment variables
sudo nano /etc/environment
and add

MLFLOW_S3_ENDPOINT_URL=https://storage.yandexcloud.net
AWS_DEFAULT_REGION=ru-central1

also its need to add key_id and key to ~/.aws/credentials
[default]

aws_access_key_id = <key_id>
aws_secret_access_key = <key>

restart server

Running MLFlow service

mlflow ui
mlflow ui --backend-store-uri sqlite:///../mlops-project.db --default-artifact-root s3://kkr-mlops-zoomcamp/mlflow-artifacts
or as a server
mlflow server --backend-store-uri sqlite:///../mlops-project.db --default-artifact-root s3://kkr-mlops-zoomcamp/mlflow-artifacts/ --host 0.0.0.0:5001

or we can create .env located in the Pipfile folder

export MLFLOW_S3_ENDPOINT_URL=https://storage.yandexcloud.net
export AWS_DEFAULT_REGION=ru-central1
export AWS_ACCESS_KEY_ID=<key_id>
export AWS_SECRET_ACCESS_KEY=<key>

and variables will be automaticly setted on starting the pipenv environment

K0nkere · 2022-08-21T15:53:42Z

Downloading a model from registry

stage = 'Production'
staging_model = mlflow.pyfunc.load_model(model_uri=f'models:/{model_name}/{stage}/model')
    
#If it is need to save it locally  
with open('models/rf-best-model-production.bin', 'wb')as f_out:
          pickle.dump(staging_model, f_out)
      
#best model predictions
staging_model.predict(X_test)

K0nkere · 2022-08-21T16:34:59Z

Ways to get RUN_ID of a logged model

To take from experiment list

experiment = mlflow.set_experiment('Auction-car-prices-best-models')
best_model_run = mlflow_client.search_runs(
        experiment_ids=experiment.experiment_id,
        run_view_type=ViewType.ACTIVE_ONLY,
        max_results=1,
        order_by=["metrics.rmse_test ASC"]
        )
    RUN_ID = best_model_run[0].info.run_id
    model_uri = "runs:/{:}/full-pipeline".format(RUN_ID)

To take while we are adding model to registry with model_uri

registered_model = mlflow.register_model(
            model_uri=model_uri,
            name = model_name
        )
> registered_model.run_id and registered_model.current_stage

To take while promoting the model

promoted_model = mlflow_client.transition_model_version_stage(
                                name = model_name,
                                version = registered_model_version,
                                stage = to_stage,
                                archive_existing_versions=False
                                )
> registered_model.run_id and registered_model.current_stage

To take from registered models list

versions = mlflow_client.get_latest_versions(
        model_name,
        stages=['Production']
        )
> version[num].version version[num].current_stage version[num].run_id version[num].name==model_name

K0nkere · 2022-08-22T00:33:28Z

Filtering list of experiment

As soon as I am taking registering the best models relying on metrics.rmse_test and test_dataset can be different from period to period its need to specify additional restriction for filtering
as an example

query = f'parameters.test_dataset = "{test_dataset_period}"'
    best_model_run = mlflow_client.search_runs(
        experiment_ids=experiment.experiment_id,
        run_view_type=ViewType.ACTIVE_ONLY,
        max_results=1,
        filter_string=query,
        order_by=["metrics.rmse_test ASC"]   
        )

K0nkere changed the title ~~1 - Experiment tracking~~ 1 Experiment tracking Sep 9, 2022

K0nkere changed the title ~~1 Experiment tracking~~ 1 Virtual environment & Experiment tracking Sep 19, 2022

K0nkere changed the title ~~1 Virtual environment & Experiment tracking~~ 1 Experiment tracking Sep 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1 Experiment tracking #2

1 Experiment tracking #2

K0nkere commented Aug 12, 2022 •

edited

Loading

K0nkere commented Aug 20, 2022 •

edited

Loading

K0nkere commented Aug 21, 2022

K0nkere commented Aug 21, 2022 •

edited

Loading

K0nkere commented Aug 22, 2022 •

edited

Loading

1 Experiment tracking #2

1 Experiment tracking #2

Comments

K0nkere commented Aug 12, 2022 • edited Loading

Creating MLFlow server with custom s3 bucket as Docker container

K0nkere commented Aug 20, 2022 • edited Loading

For the mlflow to save artifact in a custom bucket

Running MLFlow service

K0nkere commented Aug 21, 2022

K0nkere commented Aug 21, 2022 • edited Loading

K0nkere commented Aug 22, 2022 • edited Loading

Filtering list of experiment

K0nkere commented Aug 12, 2022 •

edited

Loading

K0nkere commented Aug 20, 2022 •

edited

Loading

K0nkere commented Aug 21, 2022 •

edited

Loading

K0nkere commented Aug 22, 2022 •

edited

Loading