Skip to content

[SPARK-17718] [Update MLib Classification Documentation]#15293

Closed
jagadeesanas2 wants to merge 1 commit intoapache:masterfrom
ibmsoe:SPARK-17718
Closed

[SPARK-17718] [Update MLib Classification Documentation]#15293
jagadeesanas2 wants to merge 1 commit intoapache:masterfrom
ibmsoe:SPARK-17718

Conversation

@jagadeesanas2
Copy link
Contributor

What changes were proposed in this pull request?

The loss function here for logistic regression is confusing. It seems to imply that spark uses only -1 and 1 class labels. However it uses 0,1. Added detailed documentation to avoid confusion.

@SparkQA
Copy link

SparkQA commented Sep 29, 2016

Test build #66091 has finished for PR 15293 at commit 210dc85.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to implement the second alternative in the JIRA: Better yet, the loss function should be replaced with that for 0, 1 despite mathematical inconvenience, since that is what is actually implemented.

@srowen
Copy link
Member

srowen commented Sep 29, 2016

No, these expressions aren't correct then for y=0,1

@SparkQA
Copy link

SparkQA commented Sep 29, 2016

Test build #66097 has finished for PR 15293 at commit 1fa016f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jagadeesanas2
Copy link
Contributor Author

jagadeesanas2 commented Sep 29, 2016

app1
The hinge loss provides a relatively tight, convex upper bound on the 0–1 indicator function. Specifically, the hinge loss equals the 0–1 indicator function.
Source: https://en.wikipedia.org/wiki/Loss_functions_for_classification#Hinge_loss

Else whether we can use previous note docs.

@srowen any suggestion..?

@srowen
Copy link
Member

srowen commented Sep 29, 2016

Yes that's the definition, but for y = +/- 1. The expression can't be correct for y = 0/1; when y = 0 the loss is always 1 for example.

Well, I'm looking at what the equivalent is like for 0,1 and it's more complicated really in all cases. It wouldn't match the comments in the source code either. Maybe it is actually better to just move the note, yeah.

@dbtsai what do you think?

@dbtsai
Copy link
Member

dbtsai commented Sep 29, 2016

+1 on just having the note. For y = 0/1, just more confusing to have complicated formulation in the doc.

@SparkQA
Copy link

SparkQA commented Sep 30, 2016

Test build #66151 has finished for PR 15293 at commit 9c50522.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah back to the start, sorry. Looks good, though you can also remove the statement below that this duplicates now. Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the JIRA, i simply added detailed documentation to avoid future confusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates an existing statement below. The idea was to move it u here rather than copy it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed duplicates

@SparkQA
Copy link

SparkQA commented Oct 2, 2016

Test build #66241 has finished for PR 15293 at commit 56821a1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Oct 2, 2016

This still duplicates the message. The point is to move it to a more prominent place; that's all. I can open a PR directly if this is just unclear.

@jagadeesanas2
Copy link
Contributor Author

Thanks @srowen 👍 please go ahead

@srowen
Copy link
Member

srowen commented Oct 3, 2016

This can be closed; see #15330

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants