docs: add mlflow logging and loading #1641

thinkall · 2022-09-01T07:58:06Z

Related Issues/PRs

None

What changes are proposed in this pull request?

Updated aisample notebooks.

Added sections for mlflow logging and loading.
Updated pip install with magic command.

How is this patch tested?

I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change any dependencies?

No. You can skip this section.
Yes. Make sure the dependencies are resolved correctly, and list changes here.

Does this PR add a new feature? If so, have you added samples on website?

No. You can skip this section.
Yes. Make sure you have added samples following below steps.

Find the corresponding markdown file for your new feature in website/docs/documentation folder.
Make sure you choose the correct class estimators/transformers and namespace.
Follow the pattern in markdown file and add another section for your new API, including pyspark, scala (and .NET potentially) samples.
Make sure the DocTable points to correct API link.
Navigate to website folder, and run yarn run start to make sure the website renders correctly.
Don't forget to add  before each python code blocks to enable auto-tests for python samples.
Make sure the WebsiteSamplesTests job pass in the pipeline.

AB#1958014

github-actions · 2022-09-01T07:58:19Z

Hey @thinkall 👋!
Thank you so much for contributing to our repository 🙌.
Someone from SynapseML Team will be reviewing this pull request soon.
We appreciate your patience and contributions 💯!

thinkall · 2022-09-01T09:18:16Z

/azp run

azure-pipelines · 2022-09-01T09:18:33Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov-commenter · 2022-09-01T09:26:59Z

Codecov Report

Merging #1641 (515a58b) into master (4115d4f) will increase coverage by 0.04%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1641      +/-   ##
==========================================
+ Coverage   85.74%   85.79%   +0.04%     
==========================================
  Files         272      272              
  Lines       14230    14230              
  Branches      739      739              
==========================================
+ Hits        12202    12208       +6     
+ Misses       2028     2022       -6

Impacted Files	Coverage Δ
...rosoft/azure/synapse/ml/stages/EnsembleByKey.scala	`88.73% <0.00%> (+1.40%)`	⬆️
...oft/azure/synapse/ml/lightgbm/NetworkManager.scala	`92.22% <0.00%> (+2.77%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

mhamilton723 · 2022-09-01T15:58:49Z

notebooks/community/aisamples/AIsample - Book Recommendation.ipynb

+   "source": [
+    "# load model back\n",
+    "# mlflow will use PipelineModel to wrapper the original model, thus here we extract the original ALSModel from the stages.\n",
+    "loaded_model = mlflow.spark.load_model(model_uri, dfs_tmpdir=\"Files/spark\").stages[-1]"


I think serena might have fixed this behavior in MLFlow so that it just loads and saves original model, can you check with her to see?

Hihi, I didn't change this part lol and we can't. Because they need a consistent API to load the model back, which only PipelineModel does, so we have to fetch the stages within it. I was using the similar logic in our dotnet stuff, because you can't know the real type only after you load it back, and sometimes we might just save a PipelineModel so you don't need to fetch the stages.

Thanks Mark. Just checked with Serena. MLflow still wraps original model with PipelineModel. In most cases, we only use model.transform which also works with PipelineModel. Only in this case, ALSModel has method like recommendForAllUsers which is needed in recommendation scenario. Maybe it's acceptable to extract original model from stages in special cases like this one.

notebooks/community/aisamples/AIsample - Fraud Detection.ipynb

Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>

thinkall · 2022-09-02T01:54:51Z

/azp run

azure-pipelines · 2022-09-02T01:55:05Z

Azure Pipelines successfully started running 1 pipeline(s).

mhamilton723 · 2022-09-02T17:25:40Z

notebooks/community/aisamples/AIsample - Book Recommendation.ipynb

+    "with mlflow.start_run() as run:\n",
+    "    print(\"log model:\")\n",
+    "    mlflow.spark.log_model(\n",
+    "        model,\n",
+    "        f\"{EXPERIMENT_NAME}-alsmodel\",\n",
+    "        registered_model_name=f\"{EXPERIMENT_NAME}-alsmodel\",\n",
+    "        dfs_tmpdir=\"Files/spark\",\n",
+    "    )\n",
+    "\n",
+    "    print(\"log metrics:\")\n",
+    "    mlflow.log_metrics({\"RMSE\": rmse, \"MAE\": mae, \"R2\": r2, \"Explained variance\": var})\n",
+    "\n",
+    "    print(\"log parameters:\")\n",
+    "    mlflow.log_params(\n",
+    "        {\n",
+    "            \"num_epochs\": num_epochs,\n",
+    "            \"rank_size_list\": rank_size_list,\n",
+    "            \"reg_param_list\": reg_param_list,\n",
+    "            \"model_tuning_method\": model_tuning_method,\n",
+    "            \"DATA_FOLDER\": DATA_FOLDER,\n",
+    "        }\n",
+    "    )\n",
+    "\n",
+    "    model_uri = f\"runs:/{run.info.run_id}/{EXPERIMENT_NAME}-alsmodel\"\n",


Can this be simplified with @serena-ruan 's Autologging?

Currently mlflow.pyspark.ml.autolog doesn't work since the default dfs_tmpdir is /tmp/mlflow, which doesn't work in our platform. Need to wait for another release of mlflow which will include Serena's PR for setting DFS_TMP as an environment variable. Moreover, for ALS model, it's not supported by autolog. I see Serena has two PRs for supporting modifying logModelAllowlistFile in a user friendly way, I guess those will be released in the next version of mlflow too.

mhamilton723

It looks good, i would just check with Serena if we can further simplify with her autologging, then the story will be full and very nice

doc: add mlflow logging and loading

3040935

thinkall requested a review from mhamilton723 as a code owner September 1, 2022 07:58

thinkall changed the title ~~doc: add mlflow logging and loading~~ docs: add mlflow logging and loading Sep 1, 2022

mhamilton723 reviewed Sep 1, 2022

View reviewed changes

notebooks/community/aisamples/AIsample - Fraud Detection.ipynb Outdated Show resolved Hide resolved

thinkall and others added 2 commits September 2, 2022 09:38

Update annotation

3fc79c7

Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>

Update annotations

515a58b

thinkall requested a review from mhamilton723 September 2, 2022 01:55

mhamilton723 reviewed Sep 2, 2022

View reviewed changes

thinkall requested a review from mhamilton723 September 5, 2022 08:59

mhamilton723 merged commit 59a922b into microsoft:master Sep 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add mlflow logging and loading #1641

docs: add mlflow logging and loading #1641

thinkall commented Sep 1, 2022 •

edited by mhamilton723

Loading

github-actions bot commented Sep 1, 2022

thinkall commented Sep 1, 2022

azure-pipelines bot commented Sep 1, 2022

codecov-commenter commented Sep 1, 2022 •

edited

Loading

mhamilton723 Sep 1, 2022

serena-ruan Sep 2, 2022

thinkall Sep 2, 2022

thinkall commented Sep 2, 2022

azure-pipelines bot commented Sep 2, 2022

mhamilton723 Sep 2, 2022

thinkall Sep 5, 2022

mhamilton723 left a comment

docs: add mlflow logging and loading #1641

docs: add mlflow logging and loading #1641

Conversation

thinkall commented Sep 1, 2022 • edited by mhamilton723 Loading

Related Issues/PRs

What changes are proposed in this pull request?

How is this patch tested?

Does this PR change any dependencies?

Does this PR add a new feature? If so, have you added samples on website?

github-actions bot commented Sep 1, 2022

thinkall commented Sep 1, 2022

azure-pipelines bot commented Sep 1, 2022

codecov-commenter commented Sep 1, 2022 • edited Loading

Codecov Report

mhamilton723 Sep 1, 2022

Choose a reason for hiding this comment

serena-ruan Sep 2, 2022

Choose a reason for hiding this comment

thinkall Sep 2, 2022

Choose a reason for hiding this comment

thinkall commented Sep 2, 2022

azure-pipelines bot commented Sep 2, 2022

mhamilton723 Sep 2, 2022

Choose a reason for hiding this comment

thinkall Sep 5, 2022

Choose a reason for hiding this comment

mhamilton723 left a comment

Choose a reason for hiding this comment

thinkall commented Sep 1, 2022 •

edited by mhamilton723

Loading

codecov-commenter commented Sep 1, 2022 •

edited

Loading