Skip to content

Conversation

@sethah
Copy link
Contributor

@sethah sethah commented Jul 25, 2015

No description provided.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evaluation Metrics? Metrics makes me thing of app runtime performance metrics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to evaluation metrics.

@srowen
Copy link
Member

srowen commented Jul 25, 2015

I really like a good write up on evaluation metrics. My broad suggestions are to consider less math and more intuitive explanation, without getting into a whole lot of writing. People can read a text or wikipedia for the math; that's not quite the MLlib docs audience. Also I think you can refactor out some of the duplicated comments.

@sethah
Copy link
Contributor Author

sethah commented Jul 25, 2015

Sean,

Thanks for your feedback. I agree that more intuitive descriptions will be helpful, and so I will work on getting those into the document. One thing I'd like to point out is that I think having clean mathematical formulas for each algorithm is necessary because some of these algorithms are implemented differently in MLlib than they are defined on, say Wikipedia. I think having a description, or as you said, some links alongside the mathematical definitions is best. If you disagree, let me know, otherwise I'll move forward with that thought process in mind.

@srowen
Copy link
Member

srowen commented Jul 25, 2015

Yes, that sounds important where the definition or math may not be standard or obvious. I think most people reading the doc won't have that background, so a few words or a link to the idea behind each metric can help, yes.

@sethah
Copy link
Contributor Author

sethah commented Jul 27, 2015

Sean,

I added a bit of background on things like TP, FP, precision, recall, ROC, etc... to the guide. I tried to explain the base concepts for classification since the different flavors of classification algo metrics basically just extend the basic ideas of precision, recall, etc. I also added an explanation of each of the ranking metric algorithms since those are not as well defined/easy to find on the internet. Additionally, I added hyperlinks to further reading on these topics via wikipedia.

I left all the math definitions; I wasn't clear if you were suggesting that we only leave some of them in or just supplement them with explanations. I didn't think it made a ton of sense to define some of them but not all. Also, I find it useful to see mathematical representations of what the algorithms take as parameters. Let me know if you'd like to remove more of the math (the notation can get a bit heavy) or if you think it's too wordy.

Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the text is fine as-is, but I would take a little issue with stating that the output is usually a probability. It isn't for several common models like the SVM or decision trees, but it also isn't necessary that it be a probability for the discussion here -- just a number that's higher when the model thinks the positive class is more likely. But I wouldn't edit it just for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and I've reworded with your points in mind.

@srowen
Copy link
Member

srowen commented Jul 28, 2015

Looking good. I left minor comments which I don't feel strongly about. I did not try running the code examples. We'll need to double-check the docs build and look right on merge.

@srowen
Copy link
Member

srowen commented Jul 28, 2015

LGTM. Let's leave it for a bit to collect other comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can remove the section title. I was wondering how Algorithm Metrics differs from Evaluation Metrics when I first saw it.

@mengxr
Copy link
Contributor

mengxr commented Jul 29, 2015

@sethah This is great! I went over the generated doc. Just some minor comments:

  1. Could you update the capitalization for section tiles? This is to be consistent with other MLlib guides. For example, we only write Evaluation Metrics in the display title, then Evaluation metrics in section titles and some other places.
  2. Could you remove the number formatting? It is useful but it might distract readers. Also, for evaluation metrics, sometimes having 2 or 3 digits is not sufficient.

@mengxr
Copy link
Contributor

mengxr commented Jul 29, 2015

Btw, you should consider splitting the PR into small ones next time. It is easier to review 4~5 small PRs than one big PR combining all of them:)

@sethah
Copy link
Contributor Author

sethah commented Jul 30, 2015

@mengxr I updated the guide with your suggestions. I corrected the capitalization scheme and other things you listed. Let me know if you find anything else.

Thanks for the tip on splitting PRs, will definitely keep that in mind for future PRs. Would have made this one a bit more manageable as well!

@asfgit asfgit closed this in 2a9fe4a Jul 30, 2015
@mengxr
Copy link
Contributor

mengxr commented Jul 30, 2015

Merged into master. Thanks @sethah for writing this up and @srowen for reviewing!

@mengxr
Copy link
Contributor

mengxr commented Jul 30, 2015

@sethah Ah, just realized one minor issue. print abc doesn't work in Python 3. We have to use print(abc). Could you create a new JIRA for this? Thanks!

@sethah
Copy link
Contributor Author

sethah commented Jul 30, 2015

@mengxr I have created the JIRA. I may get a chance to fix that sometime next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants