Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semi-Supervised Learning #75

Open
bubbazz opened this issue May 5, 2022 · 4 comments
Open

Semi-Supervised Learning #75

bubbazz opened this issue May 5, 2022 · 4 comments
Labels

Comments

@bubbazz
Copy link

bubbazz commented May 5, 2022

Hi,

in the tutorial under the paragraph semi-supervised learning. is in the command an unlabed.arff.
image

I wonder how in the arff such a line looks. The only thing I found are "?" as attribute values.

For example:
@relation unlabeded
@Attribute X1 NUMERIC
@Attribute X2 NUMERIC
@Attribute y0 {0, 1}
@Attribute y1 {0, 1}

{ 0 42.42, 1 42.42, 2 ?, 3 ? }

Is the above described unlabed? or what does such a dataset look like?

@fracpete
Copy link
Member

fracpete commented May 6, 2022

The unlabeled dataset requires the exact same structure as the training set (ie same attribute and nominal label order) and the class attribute columns to contain only missing values (ie ?).

If you need to introduce missing values, have a look at the missing-values-imputation Weka package.

I've added a note to Tutorial.tex to make it clearer. Thanks for pointing it out!

@bubbazz
Copy link
Author

bubbazz commented May 6, 2022

Thanks for clearing it up. it helped me a lot.

@bubbazz bubbazz closed this as completed May 6, 2022
@bubbazz
Copy link
Author

bubbazz commented May 13, 2022

Dear Meka-Team,

  1. Is it possible to combine semi-supervised learning with hyperparameter tuning?
  • because in the Tutorial.pdf the Semi-Supervised-Learning with EM/CM has two commands (see the first post) and i can't figure out how to built a pipe with hyperparameter tuning (e.g. meka.*.MultiSearch)
  1. after training and testing (the two seperate commands), how do you predict unseen data.

Thank you very much indeed.

With kind regards

@bubbazz bubbazz reopened this May 13, 2022
@fracpete
Copy link
Member

From a quick look at the code:

  1. MultiSearch isn't a semi-supervised algorithm itself (and therefore won't get the unlabeled dataset for training), so can't be used to optimize a semi-supervised classifier.
  2. On the command-line, not sure. In code: meka.core.MLEvalUtils calculates threshold/thresholds using the collected prediction arrays (obtained from the classifier's distributionForInstance method for each row in the weka.core.Instance object) using the meka.core.ThresholdUtils class.

Please note, I don't use Meka, so only some vague pointers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants