Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Refactor of LightGBM wrapper to prepare for streaming #1557

Merged
merged 5 commits into from Jul 13, 2022

Conversation

svotaw
Copy link
Collaborator

@svotaw svotaw commented Jul 8, 2022

Summary

In preparation for adding a "streaming" mode to LightGBM functionality (mostly to improve memory usage), this PR refactors the existing wrapper code. Followup PRs will actually add the streaming components, but this initial PR makes the following major interrelated changes to improve the organization of LightGBM code:

  1. Splits the driver and worker methods into distinct classes. LightGBMBase is now driver-only methods, and all worker code is in BasePartitionTask and derived classes (only BulkPartitionTask for now).
  2. Splits Networking-specific methods into a new NetworkManager, a class that encapsulates all Networking functionality, both driver/worker and Spark/LightGBM.
  3. Creates TrainingContext object to help pass around configuration and initialization information easily, without passing lots of method parameters. This applies at the driver level for all of training, but there are also similar new PartitionTaskContext and PartitionTaskTrainingState objects that encapsulate read-only and variable properties respectively for individual tasks.
  4. Adds InstrumentationMeasures, a class that holds timing perf for various stages, both for all of training and individual tasks. This class is returned along with the Booster, so is even exposed to the user.

I also cleaned up all warnings about file/method length, use of "while" loops, excessive code branching, etc. There's more official tail recursion and less use of vars.

Notable new functionality:

  1. The networking phase now gathers information about the partition size and distribution across executors, and passes that to all Tasks (previously it was just limited info, and just "main" workers).
  2. Timing perf can now be reported to the user and can also help us with benchmarking and diagnostics

NOTE: The actual streaming components are not huge, but were omitted for now to help shorten the long PR, and also since this PR can be checked in without any new native LightGBM library components.

Tests

Since this is a refactor, the existing LightGBM test suite should be sufficient, and for now runs in "bulk" mode only. The next PR will add "streaming" mode and reconcile the tests.

Dependency changes

If you needed to make any changes to dependencies of this project, please describe them here.

AB#1866292

@svotaw svotaw changed the title Refactor of LightGBM wrapper to prepare for streaming fix: Refactor of LightGBM wrapper to prepare for streaming Jul 8, 2022
@svotaw
Copy link
Collaborator Author

svotaw commented Jul 9, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov-commenter
Copy link

codecov-commenter commented Jul 9, 2022

Codecov Report

Merging #1557 (5a9c70c) into master (1604bc0) will decrease coverage by 0.20%.
The diff coverage is 82.17%.

@@            Coverage Diff             @@
##           master    #1557      +/-   ##
==========================================
- Coverage   83.76%   83.55%   -0.21%     
==========================================
  Files         301      305       +4     
  Lines       15376    15776     +400     
  Branches      698      732      +34     
==========================================
+ Hits        12879    13181     +302     
- Misses       2497     2595      +98     
Impacted Files Coverage Δ
...soft/azure/synapse/ml/lightgbm/LightGBMUtils.scala 81.81% <ø> (-9.26%) ⬇️
.../synapse/ml/lightgbm/booster/LightGBMBooster.scala 92.75% <ø> (ø)
...e/synapse/ml/lightgbm/params/BaseTrainParams.scala 100.00% <ø> (ø)
...ure/synapse/ml/lightgbm/dataset/DatasetUtils.scala 82.75% <37.50%> (+21.64%) ⬆️
...soft/azure/synapse/ml/core/utils/ClusterUtil.scala 65.75% <50.00%> (-1.86%) ⬇️
...zure/synapse/ml/lightgbm/LightGBMPerformance.scala 53.78% <53.78%> (ø)
...rosoft/azure/synapse/ml/lightgbm/SharedState.scala 88.23% <66.66%> (-2.95%) ⬇️
...ft/azure/synapse/ml/lightgbm/TrainingContext.scala 68.42% <68.42%> (ø)
...osoft/azure/synapse/ml/lightgbm/LightGBMBase.scala 88.33% <81.48%> (-7.45%) ⬇️
...crosoft/azure/synapse/ml/lightgbm/TrainUtils.scala 84.16% <85.71%> (-2.00%) ⬇️
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1604bc0...5a9c70c. Read the comment docs.

@svotaw
Copy link
Collaborator Author

svotaw commented Jul 10, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@svotaw
Copy link
Collaborator Author

svotaw commented Jul 11, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Collaborator

@mhamilton723 mhamilton723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few nits and comments, awesome work :)

@svotaw
Copy link
Collaborator Author

svotaw commented Jul 12, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

mhamilton723
mhamilton723 previously approved these changes Jul 13, 2022
Copy link
Collaborator

@mhamilton723 mhamilton723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved to the best of my knowledge, @imatiach-msft @eisber please feel free to take a look and provide any comments!

@svotaw
Copy link
Collaborator Author

svotaw commented Jul 13, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723 mhamilton723 merged commit 2304b52 into microsoft:master Jul 13, 2022
Copy link
Contributor

@eisber eisber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed VW changes. Nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants