-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-6129][MLLIB][DOCS] Added user guide for evaluation metrics #7655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
docs/mllib-guide.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Evaluation Metrics? Metrics makes me thing of app runtime performance metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to evaluation metrics.
|
I really like a good write up on evaluation metrics. My broad suggestions are to consider less math and more intuitive explanation, without getting into a whole lot of writing. People can read a text or wikipedia for the math; that's not quite the MLlib docs audience. Also I think you can refactor out some of the duplicated comments. |
|
Sean, Thanks for your feedback. I agree that more intuitive descriptions will be helpful, and so I will work on getting those into the document. One thing I'd like to point out is that I think having clean mathematical formulas for each algorithm is necessary because some of these algorithms are implemented differently in MLlib than they are defined on, say Wikipedia. I think having a description, or as you said, some links alongside the mathematical definitions is best. If you disagree, let me know, otherwise I'll move forward with that thought process in mind. |
|
Yes, that sounds important where the definition or math may not be standard or obvious. I think most people reading the doc won't have that background, so a few words or a link to the idea behind each metric can help, yes. |
|
Sean, I added a bit of background on things like TP, FP, precision, recall, ROC, etc... to the guide. I tried to explain the base concepts for classification since the different flavors of classification algo metrics basically just extend the basic ideas of precision, recall, etc. I also added an explanation of each of the ranking metric algorithms since those are not as well defined/easy to find on the internet. Additionally, I added hyperlinks to further reading on these topics via wikipedia. I left all the math definitions; I wasn't clear if you were suggesting that we only leave some of them in or just supplement them with explanations. I didn't think it made a ton of sense to define some of them but not all. Also, I find it useful to see mathematical representations of what the algorithms take as parameters. Let me know if you'd like to remove more of the math (the notation can get a bit heavy) or if you think it's too wordy. Thanks! |
docs/mllib-evaluation-metrics.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the text is fine as-is, but I would take a little issue with stating that the output is usually a probability. It isn't for several common models like the SVM or decision trees, but it also isn't necessary that it be a probability for the discussion here -- just a number that's higher when the model thinks the positive class is more likely. But I wouldn't edit it just for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree and I've reworded with your points in mind.
|
Looking good. I left minor comments which I don't feel strongly about. I did not try running the code examples. We'll need to double-check the docs build and look right on merge. |
|
LGTM. Let's leave it for a bit to collect other comments. |
docs/mllib-evaluation-metrics.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can remove the section title. I was wondering how Algorithm Metrics differs from Evaluation Metrics when I first saw it.
|
@sethah This is great! I went over the generated doc. Just some minor comments:
|
|
Btw, you should consider splitting the PR into small ones next time. It is easier to review 4~5 small PRs than one big PR combining all of them:) |
|
@mengxr I updated the guide with your suggestions. I corrected the capitalization scheme and other things you listed. Let me know if you find anything else. Thanks for the tip on splitting PRs, will definitely keep that in mind for future PRs. Would have made this one a bit more manageable as well! |
|
@sethah Ah, just realized one minor issue. |
No description provided.