Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

use newly impled softmax function in NB

Why are the changes needed?

to simplify impl

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing testsuite

@github-actions github-actions bot added the ML label Jun 16, 2021
@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Test build #139863 has finished for PR 32927 at commit cbcb584.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [SPARK-35678][ML][FOLLOWUP] use softmax in NB [SPARK-35678][ML][FOLLOWUP] Use softmax in NB Jun 16, 2021
@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44392/

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44392/

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but are there other places it can be used? let's get as many as we can in one go in one PR / JIRA if possible. They're logically related

@zhengruifeng
Copy link
Contributor Author

@srowen Sorry for missing this one.
There are other softmax computations in ML (ANN), but seems need to make softmax support offset & step, I will have a try.

var localLossSum = 0.0
var localWeightSum = 0.0
var i = 0
val probs = Array.ofDim[Double](numClasses)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when softmax support offset & step, this tmp array is no longer needed

var localWeightSum = 0.0
var i = 0
val probs = Array.ofDim[Double](numClasses)
val multiplierSum = Array.ofDim[Double](numClasses)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiplierSum is only needed when fitIntercept is true

private val marginOffset = if (fitWithMean) {
val offset = intercept.copy
@transient private lazy val marginOffset = if (fitWithMean) {
val offset = new DenseVector(coefficientsArray.takeRight(numClasses)) // intercept
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not call intercept to avoid trigger the lazy initialization of intercept

val weights = new BDV[Double](0)

override def eval(data: BDM[Double], output: BDM[Double]): Unit = {
require(!data.isTranspose && !output.isTranspose)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two BDM are expected to be not transposed here

@github-actions github-actions bot added the MLLIB label Jun 16, 2021
@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44409/

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Test build #139879 has finished for PR 32927 at commit 6ee9054.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44409/

@zhengruifeng zhengruifeng changed the title [SPARK-35678][ML][FOLLOWUP] Use softmax in NB [SPARK-35678][ML][FOLLOWUP] softmax support offset and step Jun 17, 2021
@zhengruifeng
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44435/

@SparkQA
Copy link

SparkQA commented Jun 17, 2021

Test build #139905 has finished for PR 32927 at commit 6ee9054.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 17, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44435/

@huaxingao huaxingao closed this in fdf86fd Jun 18, 2021
@zhengruifeng
Copy link
Contributor Author

Thanks @huaxingao @srowen for reviewing!

@zhengruifeng zhengruifeng deleted the softmax__followup branch June 18, 2021 05:50
@huaxingao
Copy link
Contributor

Merged to master, thanks!

@HyukjinKwon
Copy link
Member

Hm, the tests didn't pass:

2021-06-16T16:16:45.6226175Z ======================================================================
2021-06-16T16:16:45.6227313Z ERROR [7.047s]: test_raw_and_probability_prediction (pyspark.ml.tests.test_algorithms.MultilayerPerceptronClassifierTest)
2021-06-16T16:16:45.6228919Z ----------------------------------------------------------------------
2021-06-16T16:16:45.6229472Z Traceback (most recent call last):
2021-06-16T16:16:45.6270603Z   File "/__w/spark/spark/python/pyspark/ml/tests/test_algorithms.py", line 89, in test_raw_and_probability_prediction
2021-06-16T16:16:45.6271739Z     self.assertTrue(np.allclose(result.rawPrediction, expected_rawPrediction, rtol=0.1))
2021-06-16T16:16:45.6272750Z AssertionError: False is not true
2021-06-16T16:16:45.6273240Z 
2021-06-16T16:16:45.6274058Z ----------------------------------------------------------------------

and seems like it fails. I am reverting this for now.

@huaxingao
Copy link
Contributor

Sorry, I will be more careful next time.

@zhengruifeng
Copy link
Contributor Author

@HyukjinKwon It is OK to revert this PR, I will send a new one which fix this failed case.
But It is somewhat surprising to me that it showed This patch passes all tests. with a failed pyspark test.

@HyukjinKwon
Copy link
Member

Ah, Jenkins and GA have a a bit different env some times. GA is more correct in general fwiw

@zhengruifeng
Copy link
Contributor Author

OK, we will to be more careful @HyukjinKwon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants