Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code/strategy to add active learning classifications to training set #211

Closed
Tracked by #54
bfhealy opened this issue Jan 13, 2023 · 0 comments · Fixed by #370
Closed
Tracked by #54

Code/strategy to add active learning classifications to training set #211

bfhealy opened this issue Jan 13, 2023 · 0 comments · Fixed by #370

Comments

@bfhealy
Copy link
Collaborator

bfhealy commented Jan 13, 2023

Scope code currently downloads and sums votes for each classification in a group of sources, but we do not yet have the code or strategy to grow the existing training set from that information. One potential strategy is:

  • Add all sources marked as 'Labelled' to the existing training set
  • Remove a classification if its votes sum to < 0
  • Keep a classification if its votes sum to >= 0
  • In cases of duplicate classifications, assign the probability from the one most recently posted

One problem with the above strategy is that there is ambiguity about the reliability of classifications with a net vote sum of 0. Perhaps no labeler was sure about that classification, or an even number disagreed - should it be kept or not?

Perhaps each newly added portion of the training set learning should be uploaded to a unique group on Fritz. This would serve as a kind of 'version control' for this part of the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant