[SPARK-6129][MLLIB][DOCS] Added user guide for evaluation metrics #7655

sethah · 2015-07-25T00:11:45Z

No description provided.

AmplabJenkins · 2015-07-25T00:16:08Z

Can one of the admins verify this patch?

srowen · 2015-07-25T07:36:00Z

docs/mllib-guide.md

Evaluation Metrics? Metrics makes me thing of app runtime performance metrics

Changed to evaluation metrics.

srowen · 2015-07-25T08:14:19Z

I really like a good write up on evaluation metrics. My broad suggestions are to consider less math and more intuitive explanation, without getting into a whole lot of writing. People can read a text or wikipedia for the math; that's not quite the MLlib docs audience. Also I think you can refactor out some of the duplicated comments.

sethah · 2015-07-25T18:34:54Z

Sean,

Thanks for your feedback. I agree that more intuitive descriptions will be helpful, and so I will work on getting those into the document. One thing I'd like to point out is that I think having clean mathematical formulas for each algorithm is necessary because some of these algorithms are implemented differently in MLlib than they are defined on, say Wikipedia. I think having a description, or as you said, some links alongside the mathematical definitions is best. If you disagree, let me know, otherwise I'll move forward with that thought process in mind.

srowen · 2015-07-25T18:50:05Z

Yes, that sounds important where the definition or math may not be standard or obvious. I think most people reading the doc won't have that background, so a few words or a link to the idea behind each metric can help, yes.

sethah · 2015-07-27T21:16:35Z

Sean,

I added a bit of background on things like TP, FP, precision, recall, ROC, etc... to the guide. I tried to explain the base concepts for classification since the different flavors of classification algo metrics basically just extend the basic ideas of precision, recall, etc. I also added an explanation of each of the ranking metric algorithms since those are not as well defined/easy to find on the internet. Additionally, I added hyperlinks to further reading on these topics via wikipedia.

I left all the math definitions; I wasn't clear if you were suggesting that we only leave some of them in or just supplement them with explanations. I didn't think it made a ton of sense to define some of them but not all. Also, I find it useful to see mathematical representations of what the algorithms take as parameters. Let me know if you'd like to remove more of the math (the notation can get a bit heavy) or if you think it's too wordy.

Thanks!

srowen · 2015-07-28T07:46:23Z

docs/mllib-evaluation-metrics.md

I think the text is fine as-is, but I would take a little issue with stating that the output is usually a probability. It isn't for several common models like the SVM or decision trees, but it also isn't necessary that it be a probability for the discussion here -- just a number that's higher when the model thinks the positive class is more likely. But I wouldn't edit it just for this

I agree and I've reworded with your points in mind.

srowen · 2015-07-28T08:00:52Z

Looking good. I left minor comments which I don't feel strongly about. I did not try running the code examples. We'll need to double-check the docs build and look right on merge.

srowen · 2015-07-28T20:43:09Z

LGTM. Let's leave it for a bit to collect other comments.

mengxr · 2015-07-29T01:32:13Z

docs/mllib-evaluation-metrics.md

Maybe we can remove the section title. I was wondering how Algorithm Metrics differs from Evaluation Metrics when I first saw it.

mengxr · 2015-07-29T01:42:18Z

@sethah This is great! I went over the generated doc. Just some minor comments:

Could you update the capitalization for section tiles? This is to be consistent with other MLlib guides. For example, we only write Evaluation Metrics in the display title, then Evaluation metrics in section titles and some other places.
Could you remove the number formatting? It is useful but it might distract readers. Also, for evaluation metrics, sometimes having 2 or 3 digits is not sufficient.

mengxr · 2015-07-29T01:45:33Z

Btw, you should consider splitting the PR into small ones next time. It is easier to review 4~5 small PRs than one big PR combining all of them:)

sethah · 2015-07-30T00:54:34Z

@mengxr I updated the guide with your suggestions. I corrected the capitalization scheme and other things you listed. Let me know if you find anything else.

Thanks for the tip on splitting PRs, will definitely keep that in mind for future PRs. Would have made this one a bit more manageable as well!

mengxr · 2015-07-30T01:25:06Z

Merged into master. Thanks @sethah for writing this up and @srowen for reviewing!

mengxr · 2015-07-30T01:26:25Z

@sethah Ah, just realized one minor issue. print abc doesn't work in Python 3. We have to use print(abc). Could you create a new JIRA for this? Thanks!

sethah · 2015-07-30T22:49:29Z

@mengxr I have created the JIRA. I may get a chance to fix that sometime next week.

sethah added 5 commits July 2, 2015 11:37

Adding documentations of metrics for ML algorithms to user guide

53a24fc

Most java and python example code added. Further latex formatting

98813fe

All example code for metrics section done

6f31c21

Cleaning up and formatting metrics user guide section

c9dd058

Removing unnecessary latex commands from metrics guide

3a61ff9

srowen reviewed Jul 25, 2015
View reviewed changes

adding some explanations of concepts to the eval metrics user guide

d5dad4d

srowen reviewed Jul 28, 2015
View reviewed changes

rewording threshold section

b769cab

mengxr reviewed Jul 29, 2015
View reviewed changes

removed number formatting from example code

253db2d

asfgit closed this in 2a9fe4a Jul 30, 2015

[SPARK-6129][MLLIB][DOCS] Added user guide for evaluation metrics #7655

[SPARK-6129][MLLIB][DOCS] Added user guide for evaluation metrics #7655

Uh oh!

Conversation

sethah commented Jul 25, 2015

Uh oh!

AmplabJenkins commented Jul 25, 2015

Uh oh!

srowen Jul 25, 2015

Choose a reason for hiding this comment

Uh oh!

sethah Jul 27, 2015

Choose a reason for hiding this comment

Uh oh!

srowen commented Jul 25, 2015

Uh oh!

sethah commented Jul 25, 2015

Uh oh!

srowen commented Jul 25, 2015

Uh oh!

sethah commented Jul 27, 2015

Uh oh!

srowen Jul 28, 2015

Choose a reason for hiding this comment

Uh oh!

sethah Jul 28, 2015

Choose a reason for hiding this comment

Uh oh!

srowen commented Jul 28, 2015

Uh oh!

srowen commented Jul 28, 2015

Uh oh!

mengxr Jul 29, 2015

Choose a reason for hiding this comment

Uh oh!

mengxr commented Jul 29, 2015

Uh oh!

mengxr commented Jul 29, 2015

Uh oh!

sethah commented Jul 30, 2015

Uh oh!

mengxr commented Jul 30, 2015

Uh oh!

mengxr commented Jul 30, 2015

Uh oh!

sethah commented Jul 30, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants