Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 430/all probabilities get predictions #433

Merged
merged 14 commits into from Dec 3, 2018

Conversation

Lguyogiro
Copy link
Contributor

@Lguyogiro Lguyogiro commented Oct 18, 2018

This PR implements the feature suggested in PR-430. Namely, it adds an option in generate_predictions to output the probability for all labels, instead of just the indicated positive label. It also updates generate_predictions to behave more like the Learner.predict method in that it prints out the predictions in tsv format, optionally to a file.

NOTE: This is will break backwards compatibility since we are now adding a header even in the single label case

@coveralls
Copy link

coveralls commented Oct 18, 2018

Coverage Status

Coverage increased (+36.4%) to 92.288% when pulling 67587d4 on issue-430/all-probabilities-get-predictions into 9f7e962 on master.

Copy link
Member

@desilinguist desilinguist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual algorithm for generating the probabilities is fine but the naming of some variables and arguments can be improved.

Also, can you please note in the PR description that this is will break backwards compatibility since we are now adding a header even in the single label case?

Also, we need to figure out why the coverage dropped.

skll/utilities/generate_predictions.py Outdated Show resolved Hide resolved
skll/utilities/generate_predictions.py Outdated Show resolved Hide resolved
skll/utilities/generate_predictions.py Outdated Show resolved Hide resolved
skll/utilities/generate_predictions.py Outdated Show resolved Hide resolved
skll/utilities/generate_predictions.py Outdated Show resolved Hide resolved
tests/test_utilities.py Outdated Show resolved Hide resolved
tests/test_utilities.py Outdated Show resolved Hide resolved
tests/test_utilities.py Outdated Show resolved Hide resolved
tests/test_utilities.py Outdated Show resolved Hide resolved
@desilinguist
Copy link
Member

@Lguyogiro what’s happening with this PR? Do you need help with the coverage issue?

@Lguyogiro
Copy link
Contributor Author

@desilinguist Sorry I got a little sidetracked with something else. I will add tests and hopefully have the coverage issues resolved by end of week.

Robert Pugh and others added 3 commits November 28, 2018 15:19
We need to do this since otherwise the travis builds time out.
@Lguyogiro
Copy link
Contributor Author

This is now passing and the coverage drop has been fixed. Please feel free to review 👍

Copy link
Contributor

@mulhod mulhod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@@ -14,6 +14,7 @@
import argparse
import logging
import os
import sys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: It seems like this import is never used?

@desilinguist
Copy link
Member

desilinguist commented Nov 30, 2018

@Lguyogiro actually you still need to update the PR description per my original comment to say that this will break backwards compatibility.

Copy link
Collaborator

@jbiggsets jbiggsets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor comments.

all_labels: bool
A flag indicating whether to return the probabilities for all
labels in each row instead of just returning the probability of
`positive_label`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nitpick, but can we add "Defaults to False" here?

probs_str = "\t".join([str(p) for p in probabilities])
print("{}\t{}".format(id_, probs_str), file=outputfh)
else:
for i, pred in enumerate(preds):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this j or something else, since we're using i in the outer loop? Same comment below.

@Lguyogiro
Copy link
Contributor Author

I've addressed @jbiggsets comments, and also added a unit test for the case of having multiple input files.

@desilinguist
Copy link
Member

I‘ll take a look at the other test you added on Monday.

@desilinguist desilinguist merged commit 075971a into master Dec 3, 2018
@desilinguist desilinguist deleted the issue-430/all-probabilities-get-predictions branch December 3, 2018 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants