Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1 Experiment tracking #2

Open
K0nkere opened this issue Aug 12, 2022 · 4 comments
Open

1 Experiment tracking #2

K0nkere opened this issue Aug 12, 2022 · 4 comments

Comments

@K0nkere
Copy link
Owner

K0nkere commented Aug 12, 2022

Creating MLFlow server with custom s3 bucket as Docker container

Dockerfile

FROM python:3.9.7-slim

WORKDIR /mlflow/

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt && \
  rm requirements.txt

EXPOSE 5001

ENV MLFLOW_S3_ENDPOINT_URL=https://storage.yandexcloud.net
ENV AWS_DEFAULT_REGION=ru-central1
ENV AWS_ACCESS_KEY_ID=<key_id>
ENV AWS_SECRET_ACCESS_KEY=<key>

ENV BACKEND_URI sqlite:////mlflow/mlops-project.db
ENV ARTIFACT_ROOT s3://kkr-mlops-zoomcamp/mlflow-artifacts/

# ENTRYPOINT ["bash"]
CMD mlflow server --backend-store-uri ${BACKEND_URI} --default-artifact-root ${ARTIFACT_ROOT} --host 0.0.0.0 --port 5001

building from the /project folder
docker build -t project-mlflow-server ./1-experiment-tracking/
and running
docker run -it -v /mlflow-database:/mlflow/ -p 5001:5001 project-mlflow-server:latest

After that it will be accessible via 127.0.0.1:5001 or <public_server_ip>:5001

@K0nkere
Copy link
Owner Author

K0nkere commented Aug 20, 2022

For the mlflow to save artifact in a custom bucket

its need to add few environment variables
sudo nano /etc/environment
and add

MLFLOW_S3_ENDPOINT_URL=https://storage.yandexcloud.net
AWS_DEFAULT_REGION=ru-central1

also its need to add key_id and key to ~/.aws/credentials
[default]

aws_access_key_id = <key_id>
aws_secret_access_key = <key>

restart server

Running MLFlow service

mlflow ui
mlflow ui --backend-store-uri sqlite:///../mlops-project.db --default-artifact-root s3://kkr-mlops-zoomcamp/mlflow-artifacts
or as a server
mlflow server --backend-store-uri sqlite:///../mlops-project.db --default-artifact-root s3://kkr-mlops-zoomcamp/mlflow-artifacts/ --host 0.0.0.0:5001

or we can create .env located in the Pipfile folder

export MLFLOW_S3_ENDPOINT_URL=https://storage.yandexcloud.net
export AWS_DEFAULT_REGION=ru-central1
export AWS_ACCESS_KEY_ID=<key_id>
export AWS_SECRET_ACCESS_KEY=<key>

and variables will be automaticly setted on starting the pipenv environment

@K0nkere
Copy link
Owner Author

K0nkere commented Aug 21, 2022

Downloading a model from registry

stage = 'Production'
staging_model = mlflow.pyfunc.load_model(model_uri=f'models:/{model_name}/{stage}/model')
    
#If it is need to save it locally  
with open('models/rf-best-model-production.bin', 'wb')as f_out:
          pickle.dump(staging_model, f_out)
      
#best model predictions
staging_model.predict(X_test)

@K0nkere
Copy link
Owner Author

K0nkere commented Aug 21, 2022

Ways to get RUN_ID of a logged model

  1. To take from experiment list
experiment = mlflow.set_experiment('Auction-car-prices-best-models')
best_model_run = mlflow_client.search_runs(
        experiment_ids=experiment.experiment_id,
        run_view_type=ViewType.ACTIVE_ONLY,
        max_results=1,
        order_by=["metrics.rmse_test ASC"]
        )
    RUN_ID = best_model_run[0].info.run_id
    model_uri = "runs:/{:}/full-pipeline".format(RUN_ID)
  1. To take while we are adding model to registry with model_uri
registered_model = mlflow.register_model(
            model_uri=model_uri,
            name = model_name
        )
> registered_model.run_id and registered_model.current_stage
  1. To take while promoting the model
promoted_model = mlflow_client.transition_model_version_stage(
                                name = model_name,
                                version = registered_model_version,
                                stage = to_stage,
                                archive_existing_versions=False
                                )
> registered_model.run_id and registered_model.current_stage
  1. To take from registered models list
versions = mlflow_client.get_latest_versions(
        model_name,
        stages=['Production']
        )
> version[num].version version[num].current_stage version[num].run_id version[num].name==model_name

@K0nkere
Copy link
Owner Author

K0nkere commented Aug 22, 2022

Filtering list of experiment

As soon as I am taking registering the best models relying on metrics.rmse_test and test_dataset can be different from period to period its need to specify additional restriction for filtering
as an example

query = f'parameters.test_dataset = "{test_dataset_period}"'
    best_model_run = mlflow_client.search_runs(
        experiment_ids=experiment.experiment_id,
        run_view_type=ViewType.ACTIVE_ONLY,
        max_results=1,
        filter_string=query,
        order_by=["metrics.rmse_test ASC"]   
        )

@K0nkere K0nkere changed the title 1 - Experiment tracking 1 Experiment tracking Sep 9, 2022
@K0nkere K0nkere changed the title 1 Experiment tracking 1 Virtual environment & Experiment tracking Sep 19, 2022
@K0nkere K0nkere changed the title 1 Virtual environment & Experiment tracking 1 Experiment tracking Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant