Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use classy-classification for active learning #13

Open
davidberenstein1957 opened this issue Dec 20, 2022 · 6 comments
Open

use classy-classification for active learning #13

davidberenstein1957 opened this issue Dec 20, 2022 · 6 comments
Assignees
Labels
active-learning help wanted Extra attention is needed

Comments

@davidberenstein1957
Copy link
Member

davidberenstein1957 commented Dec 20, 2022

Ideally we would be able to easily host active learners in a more abstract and intuitive process.

MVP

from argilla_plugins.active_learning import classy_classification_learner

classy_classification_learner(name="dataset", model="bert", validation_threshold: int, min_n_samples: int, max_n_samples: int)
classy_classification_learner.start()

Stretch
filtering variables like query could be added to limit the sync. Things like threshold could be added to pre-annotate and validate certain data.

@davidberenstein1957
Copy link
Member Author

only update predictions as predicted_by classy-classification

@davidberenstein1957 davidberenstein1957 self-assigned this Jan 19, 2023
@dvsrepo
Copy link
Member

dvsrepo commented Jan 19, 2023

Really excited to see this happening!

Regarding the max-number-examples I've been thinking about some scenarios related to continuous training and monitoring:

When we reach this limit, I understand we stop training, but we keep updating new records with the predictions of the model right? This is the scenario where user can send more data to the dataset and we use the model in the loop to label new data.

In the above scenario, if I already reached the limit and the users annotate more data, we will retrain the model with the newest annotations? I think you mention this to act as LIFO queue? In my mind it makes total sense. We shift the fewshot training set towards more recent examples.

@dvsrepo
Copy link
Member

dvsrepo commented Jan 19, 2023

Not to over complicate things of course, just some quick thoughts about how powerful this could get!

@davidberenstein1957
Copy link
Member Author

davidberenstein1957 commented Jan 31, 2023

@dvsrepo
The plugin currently works by getting all annotated records, getting the fifo/lifo annotations and creating a training dataset for classy classification. This dataset with index i, is then applied every interval t to a batch of x records without annotation and which are queried where metadata.idx!=i. These records are updated if the prediction score has enough certainty and if the previous prediction is allowed to be over-written.

This approach ensures the plugin will keep updating predictions in the background whenever new data is annotated but that it doesn't take too long to infer the new knowledge.

@dvsrepo
Copy link
Member

dvsrepo commented Jan 31, 2023

Looks awesome, looking forward to trying it out

@davidberenstein1957
Copy link
Member Author

Yes, me too. I need to write tests for edge-cases but I want to do these formal structural things after reviewing the entire concepts based on the PyData Bordeaux input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
active-learning help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants