update `generate_predictions` to include probabilities for all classes if requested #430

aoifecahill · 2018-08-13T15:13:14Z

Currently the assumption is that we only want to see the probability for one class (defaults to the first class). A common use case would be to see the probabilities for all classes.

Lguyogiro · 2018-10-16T23:59:05Z

I am working on this now. One question, though. Right now, as I understand it, generate_predictions simply prints the predictions to standard out, one prediction per line. For the use case mentioned above, we also want to preserve label information. I can think of two good possible ways to do this while remaining consistent with the current behavior:

print the labels, tab-separated, on the first line, followed by the tab-separated probabilities of each sample on each consecutive line:

0    1
0.014063027888963298    0.9859369721110367
0.9316368910436832    0.06836310895631677
0.8895249780645522    0.11047502193544781
...

print dicts/json objects that map the each label to its probability for each row:

{0: 0.014063027888963298, 1: 0.9859369721110367}
{0: 0.9316368910436832, 1: 0.06836310895631677}
{0: 0.8895249780645522, 1: 0.11047502193544781}

Is there a preference here?

desilinguist · 2018-10-17T00:27:14Z

Do we need to print the labels out at all? Why not just the tab separated probabilities? We don’t print out the label now do we?

Lguyogiro · 2018-10-17T01:07:39Z

We don't now, but that's because we only output the Positive Label probability...In this case, if the labels are strings, you will need the mapping saved in the saved learner I think.

desilinguist · 2018-10-17T01:46:57Z

Hmm, how about just outputting the string labels as a commented tab-separated row at the top?

aoifecahill · 2018-10-17T13:00:20Z

why commented? If it wasn't commented, we could redirect everything to a .tsv and it would get well-formed column identifiers. Otherwise, the user still has to parse the row. Or does this break some other expected behaviour from generate_predictions?

desilinguist · 2018-10-17T13:21:46Z

Actually, I'm a little confused. It looks like learner.predict() already prints out the probabilities for all classes with the proper column headers AND as a proper tsv file. So, is this issue just about bringing generate_predictions in line with what learner.predict already does? If so, then you can just use the same file format?

aoifecahill · 2018-10-17T14:10:13Z

That would work. But generate_predictions doesn't output a header row now, so we should probably be consistent about it if we make that change for this case.

desilinguist · 2018-10-17T14:12:31Z

Yes, my vote would be to make generate_predictions entirely consistent with learner.predict().

aoifecahill · 2018-10-17T14:13:26Z

👍

desilinguist · 2018-12-03T13:50:28Z

Addressed by #433.

Lguyogiro self-assigned this Sep 7, 2018

desilinguist closed this as completed Dec 3, 2018

desilinguist added this to Done in SKLL Release v1.5.3 Dec 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update `generate_predictions` to include probabilities for all classes if requested #430

update `generate_predictions` to include probabilities for all classes if requested #430

aoifecahill commented Aug 13, 2018

Lguyogiro commented Oct 16, 2018

desilinguist commented Oct 17, 2018

Lguyogiro commented Oct 17, 2018

desilinguist commented Oct 17, 2018

aoifecahill commented Oct 17, 2018

desilinguist commented Oct 17, 2018 •

edited

Loading

aoifecahill commented Oct 17, 2018

desilinguist commented Oct 17, 2018

aoifecahill commented Oct 17, 2018

desilinguist commented Dec 3, 2018

update generate_predictions to include probabilities for all classes if requested #430

update generate_predictions to include probabilities for all classes if requested #430

Comments

aoifecahill commented Aug 13, 2018

Lguyogiro commented Oct 16, 2018

desilinguist commented Oct 17, 2018

Lguyogiro commented Oct 17, 2018

desilinguist commented Oct 17, 2018

aoifecahill commented Oct 17, 2018

desilinguist commented Oct 17, 2018 • edited Loading

aoifecahill commented Oct 17, 2018

desilinguist commented Oct 17, 2018

aoifecahill commented Oct 17, 2018

desilinguist commented Dec 3, 2018

update `generate_predictions` to include probabilities for all classes if requested #430

update `generate_predictions` to include probabilities for all classes if requested #430

desilinguist commented Oct 17, 2018 •

edited

Loading