Code/strategy to add active learning classifications to training set #211

bfhealy · 2023-01-13T20:51:01Z

Scope code currently downloads and sums votes for each classification in a group of sources, but we do not yet have the code or strategy to grow the existing training set from that information. One potential strategy is:

Add all sources marked as 'Labelled' to the existing training set
Remove a classification if its votes sum to < 0
Keep a classification if its votes sum to >= 0
In cases of duplicate classifications, assign the probability from the one most recently posted

One problem with the above strategy is that there is ambiguity about the reliability of classifications with a net vote sum of 0. Perhaps no labeler was sure about that classification, or an even number disagreed - should it be kept or not?

Perhaps each newly added portion of the training set learning should be uploaded to a unique group on Fritz. This would serve as a kind of 'version control' for this part of the project.

bfhealy mentioned this issue Jan 13, 2023

EPIC Scope catalogue #54

Open

48 tasks

bfhealy linked a pull request May 8, 2023 that will close this issue

Add ability to download active learning classifications, merge with training set #370

Merged

bfhealy closed this as completed in #370 May 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code/strategy to add active learning classifications to training set #211

Code/strategy to add active learning classifications to training set #211

bfhealy commented Jan 13, 2023

Code/strategy to add active learning classifications to training set #211

Code/strategy to add active learning classifications to training set #211

Comments

bfhealy commented Jan 13, 2023