Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need support for reading instance weights from feature files #182

Closed
doxav opened this issue Oct 21, 2014 · 13 comments
Closed

Need support for reading instance weights from feature files #182

doxav opened this issue Oct 21, 2014 · 13 comments

Comments

@doxav
Copy link

doxav commented Oct 21, 2014

I noticed that instance weights are not supported by Skll readers and writers. It is a key feature to deal with unbalanced data (or cost-sensitive learning additionally to matrix cost). ARFF supports it (http://weka.wikispaces.com/ARFF+(stable+version)#Instance weights in ARFF files ) and some Sklearn classifiers.

@doxav doxav changed the title handling of class/label/instance "weights" of ML formats handling of class/label/instance "weights" Oct 21, 2014
@dan-blanchard
Copy link
Contributor

It is true that we don't support that in our ARFF reader, but there are only 4 classifiers in scikit-learn supported by SKLL that support the class_weight parameter: SVC, LogisticRegression, LinearSVC and SGDClassifier.

The way we've been handling unbalanced data is by specifying the class_weight option to in fixed_parameters as explained here in our docs.

You'll have to search for "class_weight" because sphinx didn't generate an anchor for that particular note. We should really have better documentation for this.

@doxav
Copy link
Author

doxav commented Oct 22, 2014

So no chance, it'll change ? Your ml data converters could be useful for a broader use (more sklearn estimators or any ml lib). Skll could be a more general purpose library then. I currently use Skll only for file conversion and it is already a limiting factor. I also use sklearn but extend it to map other ML libs (Vowpal, MOA) as estimators, those libs use weights. So in the same principle, Skll could support much more estimators easily.

@dan-blanchard
Copy link
Contributor

Your ml data converters could be useful for a broader use...

But the target format would also have to support instance weights, which I don't believe any of the others do.

Granted, we could just make it so ARFFReader adds multiple copies of any instances that have a weight greater than 1 to the resulting FeatureSet, but if the weights can be floats that obviously wouldn't work. It's not clear from the format description whether weights have to be integers or not.

As for supporting other ML libs, we've actually got a PR open (#183) to add support for this sort of thing. We just need to add some examples and tests for it.

@dan-blanchard
Copy link
Contributor

Hmm... apparently scikit-learn supports instance weights and not just class weights. If we were going to support reading the instance weights from files, we'd need a way to do it across file formats. That would be pretty straightforward for all of the supported file types except for MegaM. Although, I guess we could add it to the comment that we're currently using for IDs, much like we do for LibSVM files.

Adding support for this would be a somewhat major change, so I'm going to slate this for the 1.1 release.

@dan-blanchard dan-blanchard changed the title handling of class/label/instance "weights" Support for reading instance weights from feature files Oct 22, 2014
@dan-blanchard dan-blanchard added this to the 1.1 milestone Oct 22, 2014
@dan-blanchard dan-blanchard changed the title Support for reading instance weights from feature files Need support for reading instance weights from feature files Nov 23, 2014
@desilinguist desilinguist modified the milestones: 1.1, 1.2 Jul 18, 2015
@desilinguist
Copy link
Member

I think this still squarely belongs in the help-wanted category since we haven't seen a lot of demand for this internally.

@desilinguist desilinguist removed this from the 1.2 milestone Feb 15, 2016
@stale
Copy link

stale bot commented Dec 19, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 19, 2017
@aoifecahill aoifecahill assigned ghost Dec 20, 2017
@stale stale bot removed the stale label Dec 20, 2017
@stale
Copy link

stale bot commented Mar 20, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Mar 20, 2018
@stale
Copy link

stale bot commented Jun 18, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 18, 2018
@stale stale bot closed this as completed Jun 25, 2018
@ghost
Copy link

ghost commented Jun 25, 2018

Please keep it open.

@ghost ghost reopened this Jun 25, 2018
@stale stale bot removed the stale label Jun 25, 2018
@stale
Copy link

stale bot commented Sep 23, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 23, 2018
@aoifecahill
Copy link
Collaborator

keep it open please

@stale stale bot removed the stale label Sep 23, 2018
@desilinguist desilinguist added this to the 2.0 milestone Feb 6, 2019
@stale
Copy link

stale bot commented May 7, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label May 7, 2019
@stale stale bot closed this as completed May 14, 2019
@desilinguist desilinguist added this to the v2.0 milestone Sep 12, 2019
@ghost ghost reopened this May 27, 2021
@ghost ghost removed this from the v2.0 milestone May 27, 2021
@desilinguist
Copy link
Member

I am going to close this since no one really seems to want this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants