fix: correct the objective param when using fobj #1292

sinnfashen · 2021-12-08T16:31:00Z

Currently there will be no objective param if a fobj is provided.

That caused the LightGBM mistakenly think it's regression during initial setup as that's the [default value] for it(https://github.com/microsoft/LightGBM/blob/master/include/LightGBM/config.h#L139).

It may seem harmless at first glance as we are providing grad and hess during iteration. But it actually makes the output incorrect by having the is_constant_hessian to true all the time (which is the case for unweighted regression), and cause the calculated sum_hessian split incorrect, which eventually leads to wrong model output.

The fix I propose here is simply to pass objective=custom rather than no objective in param when provided fobj.

I'm pretty new to Scala and Java as this is the first Scala project I've seen. I've tested things in sbt console. But not sure how to setup the environment properly as something fails when I run sbt test on master without any change. Would be really appreciate if anyone can point me to any updated doc I can reference. (the runme link in Contribute.md leads to 404).

Could also include code similar with the code below (which I used to debug the mismatch) as a test case if needed. You may also use the same code to verify the bug.

import com.microsoft.azure.synapse.ml.lightgbm.dataset.LightGBMDataset
import com.microsoft.azure.synapse.ml.lightgbm.params.FObjTrait
import com.microsoft.azure.synapse.ml.lightgbm._
import org.apache.spark.ml.feature.VectorAssembler
import scala.math.exp
import scala.util.Random

import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().config("spark.master", "local[*]").getOrCreate()
import spark.implicits._

val rng = new Random(1206)
val x:Seq[Double] = for (i <- 0 to 99) yield (rng.nextDouble)
val y = for (i <- 0 to 99) yield (if (rng.nextDouble > x(i)) 1 else 0)
val df = (x.zip(y).map {case (x,y) => (x,y,0)}).toDF("feature", "label", "init_score")
val converted_df = new VectorAssembler().setInputCols(Array("feature")).setOutputCol("features").transform(df)
converted_df.show()
converted_df.registerTempTable("temp_table")

def baseModel: LightGBMClassifier = {
  new LightGBMClassifier()
    .setFeaturesCol("features")
    .setNumLeaves(2)
    .setLambdaL2(0)
    .setLearningRate(0.1)
    .setMinGainToSplit(0)
    .setLabelCol("label")
    .setNumTasks(1)
    .setMaxDepth(1)
    .setVerbosity(1)
    .setNumThreads(1)
    .setMinDataInLeaf(20)
    .setInitScoreCol("init_score")
}
class CrossEntropy extends FObjTrait {
      override def getGradient(predictions: Array[Array[Double]],
                               trainingData: LightGBMDataset): (Array[Float], Array[Float]) = {
        // Get the labels
        val labels = trainingData.getLabel()
        // Compute gradient and hessian
        val probabilities = predictions.map(rowPrediction =>
          rowPrediction.map(prediction => 1.0 / (1.0 + exp(-prediction))))
        val grad =  probabilities.zip(labels).map {
          case (prob: Array[Double], label: Float) => (prob(0)-label).toFloat
        }
        val hess = probabilities.map(probabilityArray => (probabilityArray(0) * (1 - probabilityArray(0))).toFloat)
        (grad, hess)
      }
    }

baseModel.setNumIterations(1).setObjective("binary").fit(converted_df).transform(converted_df).show(10, false)

baseModel.setNumIterations(1).setFObj(new CrossEntropy()).fit(converted_df).transform(converted_df).show(10, false)

val native_model = baseModel.setNumIterations(3).setObjective("binary").fit(converted_df)
val native_res = native_model.transform(converted_df)
native_res.show(10, false)

// For binary classification: cross entropy is equivalent with binary.
val fobj_model = baseModel.setNumIterations(3).setFObj(new CrossEntropy()).fit(converted_df)
val fobj_res = fobj_model.transform(converted_df)
fobj_res.show(10, false)

native_res.select("rawPrediction").except(fobj_res.select("rawPrediction")).count == 0

mhamilton723 · 2021-12-09T07:09:41Z

/azp run

azure-pipelines · 2021-12-09T07:09:51Z

Azure Pipelines successfully started running 1 pipeline(s).

mhamilton723 · 2021-12-09T07:15:21Z

I put some instructions in your raised issue and but I will also put the localbuild instructions here

https://microsoft.github.io/SynapseML/docs/reference/developer-readme/

codecov-commenter · 2021-12-09T07:17:11Z

Codecov Report

Merging #1292 (084ce7c) into master (a261480) will increase coverage by 7.73%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1292      +/-   ##
==========================================
+ Coverage   75.62%   83.36%   +7.73%     
==========================================
  Files         273      273              
  Lines       13453    13453              
  Branches      689      689              
==========================================
+ Hits        10174    11215    +1041     
+ Misses       3279     2238    -1041

Impacted Files	Coverage Δ
...azure/synapse/ml/lightgbm/params/TrainParams.scala	`100.00% <100.00%> (+12.50%)`	⬆️
.../execution/streaming/continuous/HTTPSourceV2.scala	`92.08% <0.00%> (-0.72%)`	⬇️
...ynapse/ml/lightgbm/dataset/DatasetAggregator.scala	`95.52% <0.00%> (+2.48%)`	⬆️
...ure/synapse/ml/lightgbm/dataset/DatasetUtils.scala	`61.11% <0.00%> (+2.77%)`	⬆️
...icrosoft/azure/synapse/ml/io/http/HTTPSchema.scala	`88.65% <0.00%> (+2.83%)`	⬆️
...soft/azure/synapse/ml/lightgbm/LightGBMUtils.scala	`74.50% <0.00%> (+3.92%)`	⬆️
...ure/synapse/ml/io/http/SimpleHTTPTransformer.scala	`92.18% <0.00%> (+6.25%)`	⬆️
...osoft/azure/synapse/ml/core/contracts/Params.scala	`95.65% <0.00%> (+6.52%)`	⬆️
...re/synapse/ml/cognitive/CognitiveServiceBase.scala	`75.96% <0.00%> (+6.97%)`	⬆️
...re/synapse/ml/lightgbm/params/LightGBMParams.scala	`84.46% <0.00%> (+7.57%)`	⬆️
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a261480...084ce7c. Read the comment docs.

imatiach-msft

nice find and fix!

imatiach-msft · 2021-12-10T05:19:08Z

Interesting I don't see "custom" value for objective mentioned in the lightgbm docs here:
https://lightgbm.readthedocs.io/en/latest/Parameters.html#objective

but I do see it for metric:
https://lightgbm.readthedocs.io/en/latest/Parameters.html#metric
which has an alias for "None", "na" and "null"

mhamilton723 · 2021-12-10T05:42:07Z

/azp run

azure-pipelines · 2021-12-10T05:42:17Z

Azure Pipelines successfully started running 1 pipeline(s).

fix: correct the objective param when using fobj

252b80c

sinnfashen requested a review from imatiach-msft as a code owner December 8, 2021 16:31

sinnfashen mentioned this pull request Dec 8, 2021

How to build SynapseML package for Databricks from the source code? #1293

Closed

Merge branch 'master' into fix-objective-setting-with-fobj

8ed02a7

imatiach-msft approved these changes Dec 10, 2021

View reviewed changes

Merge branch 'master' into fix-objective-setting-with-fobj

084ce7c

mhamilton723 merged commit 98b46c2 into microsoft:master Dec 10, 2021

stuartleeks mentioned this pull request Dec 16, 2021

Various improvements for TextAnalyze stuartleeks/SynapseML#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct the objective param when using fobj #1292

fix: correct the objective param when using fobj #1292

sinnfashen commented Dec 8, 2021 •

edited

mhamilton723 commented Dec 9, 2021

azure-pipelines bot commented Dec 9, 2021

mhamilton723 commented Dec 9, 2021

codecov-commenter commented Dec 9, 2021 •

edited

imatiach-msft left a comment

imatiach-msft commented Dec 10, 2021

mhamilton723 commented Dec 10, 2021

azure-pipelines bot commented Dec 10, 2021

fix: correct the objective param when using fobj #1292

fix: correct the objective param when using fobj #1292

Conversation

sinnfashen commented Dec 8, 2021 • edited

mhamilton723 commented Dec 9, 2021

azure-pipelines bot commented Dec 9, 2021

mhamilton723 commented Dec 9, 2021

codecov-commenter commented Dec 9, 2021 • edited

Codecov Report

imatiach-msft left a comment

Choose a reason for hiding this comment

imatiach-msft commented Dec 10, 2021

mhamilton723 commented Dec 10, 2021

azure-pipelines bot commented Dec 10, 2021

sinnfashen commented Dec 8, 2021 •

edited

codecov-commenter commented Dec 9, 2021 •

edited