Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: chunksize parameter incorrectly specified during data copy #1490

Merged
merged 1 commit into from Apr 25, 2022

Conversation

imatiach-msft
Copy link
Contributor

@imatiach-msft imatiach-msft commented Apr 25, 2022

Summary

This PR fixes the chunksize being incorrectly specified during data copy in single dataset mode. This PR should resolve the user issue: #1478

Specifically, the chunk size for the features array was actually specified to be num_cols * default_chunksize, hence using just the default chunksize is incorrect during the data copy operation. The PR uses the chunksize retrieved from the array directly.

Also, this PR removes the unused method getNumRowsForChunksArray.

Tests

A test was added to validate that specifying different chunk size values should not have any impact on the model metrics. The chunk size is only for copying data and should not have any impact on the model's accuracy.

Dependency changes

There are no dependency changes in this PR.

AB#1761413

@imatiach-msft
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov-commenter
Copy link

codecov-commenter commented Apr 25, 2022

Codecov Report

Merging #1490 (8c9749b) into master (edafb8b) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1490      +/-   ##
==========================================
+ Coverage   84.44%   84.48%   +0.03%     
==========================================
  Files         295      295              
  Lines       14793    14786       -7     
  Branches      702      701       -1     
==========================================
  Hits        12492    12492              
+ Misses       2301     2294       -7     
Impacted Files Coverage Δ
...osoft/azure/synapse/ml/lightgbm/LightGBMBase.scala 95.78% <ø> (ø)
...soft/azure/synapse/ml/lightgbm/LightGBMUtils.scala 91.07% <ø> (+12.60%) ⬆️
...ynapse/ml/lightgbm/dataset/DatasetAggregator.scala 95.73% <100.00%> (-0.03%) ⬇️
...oft/azure/synapse/ml/lightgbm/swig/SwigUtils.scala 89.58% <100.00%> (+0.69%) ⬆️
...ala/org/apache/spark/ml/param/DataFrameParam.scala 70.83% <0.00%> (-16.67%) ⬇️
...zure/synapse/ml/stages/PartitionConsolidator.scala 93.61% <0.00%> (-2.13%) ⬇️
...re/synapse/ml/cognitive/CognitiveServiceBase.scala 75.96% <0.00%> (-0.78%) ⬇️
...ft/azure/synapse/ml/cognitive/ComputerVision.scala 73.47% <0.00%> (-0.44%) ⬇️
...re/synapse/ml/lightgbm/params/LightGBMParams.scala 78.70% <0.00%> (+0.26%) ⬆️
...se/ml/cognitive/MultivariateAnomalyDetection.scala 87.77% <0.00%> (+0.74%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update edafb8b...8c9749b. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants