Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run iterative Semi-Supervised labeling experiment #11

Closed
bdzyubak opened this issue Mar 21, 2024 · 1 comment
Closed

Run iterative Semi-Supervised labeling experiment #11

bdzyubak opened this issue Mar 21, 2024 · 1 comment
Assignees
Labels
Experiment Run an evaluation and compare multiple approaches in the same data

Comments

@bdzyubak
Copy link
Owner

bdzyubak commented Mar 21, 2024

Use or simulate a dataset with only 30% of the data labeled. Run iterative semi-supervised labeling by the model to illustrate (expected) gain in performance vs training on only the original 30% of the labels.

@bdzyubak bdzyubak added the Experiment Run an evaluation and compare multiple approaches in the same data label Mar 21, 2024
@bdzyubak bdzyubak self-assigned this Mar 21, 2024
@bdzyubak bdzyubak added this to the April Beta Release milestone Mar 21, 2024
@bdzyubak
Copy link
Owner Author

bdzyubak commented Mar 21, 2024

Due to the dataset being too easy (>0.96% accuracy), the amount of labeled data had to be reduced to 10%. Otherwise, differences between settings were in the noise and varied substantially between retraining runs.

Implemented experiment using SVM. A 0.9 probability threshold gives the highest accuracy, with two iterations needed to reach the accuracy plateau. These hyperparameters may not generalize to different datasets. Ideally, this would be run on a much more difficult dataset.
torch-control\projects\MachineLearning\semi_supervised_breast_cancer_classification\semi_supervised_svm.py

accuracy

Semi-supervised learning with a DecisionTree/Random Forest/XGBoost would be interesting. It requires experimenting with tree depth as having arbitrary depth yields 100% prediction confidences. Pushing this to separate backlogged issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Experiment Run an evaluation and compare multiple approaches in the same data
Projects
Status: Done
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant