Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update generate_predictions to include probabilities for all classes if requested #430

Closed
aoifecahill opened this issue Aug 13, 2018 · 10 comments
Assignees

Comments

@aoifecahill
Copy link
Collaborator

Currently the assumption is that we only want to see the probability for one class (defaults to the first class). A common use case would be to see the probabilities for all classes.

@Lguyogiro Lguyogiro self-assigned this Sep 7, 2018
@Lguyogiro
Copy link
Contributor

I am working on this now. One question, though. Right now, as I understand it, generate_predictions simply prints the predictions to standard out, one prediction per line. For the use case mentioned above, we also want to preserve label information. I can think of two good possible ways to do this while remaining consistent with the current behavior:

  1. print the labels, tab-separated, on the first line, followed by the tab-separated probabilities of each sample on each consecutive line:
0    1
0.014063027888963298    0.9859369721110367
0.9316368910436832    0.06836310895631677
0.8895249780645522    0.11047502193544781
...
  1. print dicts/json objects that map the each label to its probability for each row:
{0: 0.014063027888963298, 1: 0.9859369721110367}
{0: 0.9316368910436832, 1: 0.06836310895631677}
{0: 0.8895249780645522, 1: 0.11047502193544781}

Is there a preference here?

@desilinguist
Copy link
Member

Do we need to print the labels out at all? Why not just the tab separated probabilities? We don’t print out the label now do we?

@Lguyogiro
Copy link
Contributor

We don't now, but that's because we only output the Positive Label probability...In this case, if the labels are strings, you will need the mapping saved in the saved learner I think.

@desilinguist
Copy link
Member

Hmm, how about just outputting the string labels as a commented tab-separated row at the top?

@aoifecahill
Copy link
Collaborator Author

why commented? If it wasn't commented, we could redirect everything to a .tsv and it would get well-formed column identifiers. Otherwise, the user still has to parse the row. Or does this break some other expected behaviour from generate_predictions?

@desilinguist
Copy link
Member

desilinguist commented Oct 17, 2018

Actually, I'm a little confused. It looks like learner.predict() already prints out the probabilities for all classes with the proper column headers AND as a proper tsv file. So, is this issue just about bringing generate_predictions in line with what learner.predict already does? If so, then you can just use the same file format?

@aoifecahill
Copy link
Collaborator Author

That would work. But generate_predictions doesn't output a header row now, so we should probably be consistent about it if we make that change for this case.

@desilinguist
Copy link
Member

Yes, my vote would be to make generate_predictions entirely consistent with learner.predict().

@aoifecahill
Copy link
Collaborator Author

👍

@desilinguist
Copy link
Member

Addressed by #433.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

3 participants