fix: Fix validation data creation for useSingleDataset mode #1527

svotaw · 2022-06-12T16:37:20Z

Summary

Fix validation Dataset creation in useSingleDataset mode. Due to shared code with the regular training Dataset, every partition tries to merge its data with the "single" executor Dataset. But for validation data, there is only 1 array of data, so this ends up duplicating it. This causes 2 problems:

Extra pressure for OOM errors
If not every executor has the same # of partitions, then the validation Dataset is different length on each executor, causing errors.

Tests

The existing validation Dataset tests still pass.

Dependency changes

If you needed to make any changes to dependencies of this project, please describe them here.

AB#1828018

imatiach-msft

LGTM!

svotaw · 2022-06-13T15:09:03Z

/azp run

azure-pipelines · 2022-06-13T15:09:18Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov-commenter · 2022-06-13T15:46:03Z

Codecov Report

Merging #1527 (21b0107) into master (0a6a728) will increase coverage by 1.42%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1527      +/-   ##
==========================================
+ Coverage   82.84%   84.27%   +1.42%     
==========================================
  Files         297      290       -7     
  Lines       14942    14819     -123     
  Branches      728      719       -9     
==========================================
+ Hits        12379    12489     +110     
+ Misses       2563     2330     -233

Impacted Files	Coverage Δ
...rosoft/azure/synapse/ml/lightgbm/SharedState.scala	`91.17% <100.00%> (+1.89%)`	⬆️
...ynapse/ml/lightgbm/dataset/DatasetAggregator.scala	`96.18% <100.00%> (ø)`
...org/apache/spark/ml/param/JsonEncodableParam.scala	`57.89% <0.00%> (-26.32%)`	⬇️
...g/apache/spark/ml/param/PythonWrappableParam.scala	`66.66% <0.00%> (-8.34%)`	⬇️
...re/src/main/python/synapse/ml/core/schema/Utils.py	`67.10% <0.00%> (-5.27%)`	⬇️
...soft/azure/synapse/ml/cognitive/TextToSpeech.scala	`84.84% <0.00%> (-3.04%)`	⬇️
...oft/azure/synapse/ml/cognitive/TextAnalytics.scala	`86.59% <0.00%> (-2.69%)`	⬇️
.../azure/synapse/ml/cognitive/TextAnalyticsSDK.scala	`86.01% <0.00%> (-1.40%)`	⬇️
...ft/azure/synapse/ml/cognitive/FormRecognizer.scala	`73.40% <0.00%> (-1.07%)`	⬇️
...t/azure/synapse/ml/cognitive/BingImageSearch.scala	`89.28% <0.00%> (-0.90%)`	⬇️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0a6a728...21b0107. Read the comment docs.

lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/SharedState.scala

lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/dataset/DatasetAggregator.scala

svotaw · 2022-06-14T05:28:43Z

/azp run

azure-pipelines · 2022-06-14T05:28:56Z

Azure Pipelines successfully started running 1 pipeline(s).

lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/dataset/DatasetAggregator.scala

mhamilton723 · 2022-06-14T21:51:49Z

/azp run

azure-pipelines · 2022-06-14T21:52:02Z

Azure Pipelines successfully started running 1 pipeline(s).

Fix validation data creation for useSingleDataset mode

772f6e2

svotaw requested review from imatiach-msft and mhamilton723 June 12, 2022 16:37

svotaw added 2 commits June 12, 2022 09:44

cleanup

2f24b84

cleanup

20fea3e

imatiach-msft previously approved these changes Jun 13, 2022

View reviewed changes

Merge branch 'master' into fix-single-validation

b74e86b

mhamilton723 reviewed Jun 13, 2022

View reviewed changes

lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/SharedState.scala Outdated Show resolved Hide resolved

mhamilton723 reviewed Jun 13, 2022

View reviewed changes

lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/dataset/DatasetAggregator.scala Outdated Show resolved Hide resolved

responded to comments

eda35fd

svotaw dismissed imatiach-msft’s stale review via eda35fd June 14, 2022 05:28

svotaw requested review from mhamilton723 and imatiach-msft June 14, 2022 14:15

svotaw enabled auto-merge (squash) June 14, 2022 14:16

imatiach-msft reviewed Jun 14, 2022

View reviewed changes

lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/dataset/DatasetAggregator.scala Show resolved Hide resolved

imatiach-msft approved these changes Jun 14, 2022

View reviewed changes

Merge branch 'master' into fix-single-validation

21b0107

mhamilton723 approved these changes Jun 14, 2022

View reviewed changes

svotaw merged commit d0bc785 into microsoft:master Jun 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix validation data creation for useSingleDataset mode #1527

fix: Fix validation data creation for useSingleDataset mode #1527

svotaw commented Jun 12, 2022 •

edited by mhamilton723

Loading

imatiach-msft left a comment

svotaw commented Jun 13, 2022

azure-pipelines bot commented Jun 13, 2022

codecov-commenter commented Jun 13, 2022 •

edited

Loading

svotaw commented Jun 14, 2022

azure-pipelines bot commented Jun 14, 2022

mhamilton723 commented Jun 14, 2022

azure-pipelines bot commented Jun 14, 2022

fix: Fix validation data creation for useSingleDataset mode #1527

fix: Fix validation data creation for useSingleDataset mode #1527

Conversation

svotaw commented Jun 12, 2022 • edited by mhamilton723 Loading

Summary

Tests

Dependency changes

imatiach-msft left a comment

Choose a reason for hiding this comment

svotaw commented Jun 13, 2022

azure-pipelines bot commented Jun 13, 2022

codecov-commenter commented Jun 13, 2022 • edited Loading

Codecov Report

svotaw commented Jun 14, 2022

azure-pipelines bot commented Jun 14, 2022

mhamilton723 commented Jun 14, 2022

azure-pipelines bot commented Jun 14, 2022

svotaw commented Jun 12, 2022 •

edited by mhamilton723

Loading

codecov-commenter commented Jun 13, 2022 •

edited

Loading