New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly handle unlabeled data in multiple places in SKLL #453
Conversation
- Add `--no_labels` option to `skll_convert` - Make `--no_labels` and `--label_col` mutually exclusive and add test for this. - Remove `.ndj` from conversion test since it's identical to `.jsonlines` and just adds to the test time.
We do not need to add/remove `label_col` from the fieldnames if it's None to begin with.
- Remove unnecessary test and associated files.
# Conflicts: # skll/data/featureset.py
@Lguyogiro @mulhod @jbiggsets any chance of reviewing this soon? I have another branch ready :) |
I will take a look today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I had a question about the case where you have unlabelled data being converted. You would now use the --no_labels
flag, but what happens if you don't use it and there is no label column (y
if unspecified)? In the csv to arff case, this still works, but it's probably not doing what we want. I would expect that if you don't pass in --no_labels
and there is no label column, it would fail.
This is because we allow the CSV reader to ignore non-existent columns and set the label to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Allow
skll_convert
to handle unlabaled input data (skll_convert
does not handle conversion of unlabeled data correctly in all cases #452)--no_labels
option toskll_convert
--no_labels
and--label_col
mutually exclusive and add test for this.Remove the
.ndj
format from the various conversion tests since it's identical to the.jsonlines
format and just adds unnecessarily to the test run time.Fix
FeatureSet.has_labels
to recognize list ofNone
objects which is what happens when you read in an unlabeled data set and passlabel_col=None
(Reader/Writers not totally compatible for unlabelled feature sets #426).Fix bug in
ARFFWriter
that adds/removeslabel_col
from the field names even if it's None to begin with.Update
test_convert_featureset()
intest_featuresets.py
to also test for unlabeled data.