Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3130][MLLIB] detect negative values in naive Bayes #2038

Closed
wants to merge 2 commits into from

Conversation

mengxr
Copy link
Contributor

@mengxr mengxr commented Aug 19, 2014

because NB treats feature values as term frequencies. @jkbradley

@SparkQA
Copy link

SparkQA commented Aug 19, 2014

QA tests have started for PR 2038 at commit 65f892d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 19, 2014

QA tests have finished for PR 2038 at commit 65f892d.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case dv: DenseVector =>
dv.values
}
if (!values.forall(x => x >= 0.0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could shorten:
if (!values.forall(_ >= 0.0)) {

@jkbradley
Copy link
Member

LGTM. The function requireNonnegativeValues() actually also identifies NaN values, so you could add a test in the Suite for that. (Same as the test("detect negative values") but with -1.0 replaced with 0.0/0.0)

@@ -17,7 +17,8 @@ Bayes](http://en.wikipedia.org/wiki/Naive_Bayes_classifier#Multinomial_naive_Bay
which is typically used for [document
classification](http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html).
Within that context, each observation is a document and each
feature represents a term whose value is the frequency of the term.
feature represents a term whose value is the frequency of the term.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is really an unnormalized frequency? Should that be clarified by saying "unnormalized frequency" or "frequency or count" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be normalized, then user needs to adjust alpha.

@SparkQA
Copy link

SparkQA commented Aug 19, 2014

QA tests have started for PR 2038 at commit 52c37c3.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 19, 2014

QA tests have finished for PR 2038 at commit 52c37c3.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor Author

mengxr commented Aug 19, 2014

Jenkins, retest this please.

1 similar comment
@mengxr
Copy link
Contributor Author

mengxr commented Aug 19, 2014

Jenkins, retest this please.

@jkbradley
Copy link
Member

LGTM

@mengxr
Copy link
Contributor Author

mengxr commented Aug 19, 2014

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 19, 2014

QA tests have started for PR 2038 at commit 52c37c3.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 19, 2014

QA tests have finished for PR 2038 at commit 52c37c3.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor Author

mengxr commented Aug 20, 2014

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 20, 2014

QA tests have started for PR 2038 at commit 52c37c3.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 20, 2014

QA tests have finished for PR 2038 at commit 52c37c3.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Aug 20, 2014
because NB treats feature values as term frequencies. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #2038 from mengxr/nb-neg and squashes the following commits:

52c37c3 [Xiangrui Meng] address comments
65f892d [Xiangrui Meng] detect negative values in nb

(cherry picked from commit 068b6fe)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
@asfgit asfgit closed this in 068b6fe Aug 20, 2014
@mengxr
Copy link
Contributor Author

mengxr commented Aug 20, 2014

Merged into master and branch-1.1.

xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
because NB treats feature values as term frequencies. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes apache#2038 from mengxr/nb-neg and squashes the following commits:

52c37c3 [Xiangrui Meng] address comments
65f892d [Xiangrui Meng] detect negative values in nb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants