Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggest which data to annotate #2380

Closed
nirhutnik opened this issue Mar 7, 2023 · 0 comments · Fixed by #2505
Closed

Suggest which data to annotate #2380

nirhutnik opened this issue Mar 7, 2023 · 0 comments · Fixed by #2505
Labels
ds Tasks suited for Data Scientists feature Feature update or code change to the package linear nlp Affects deepchecks.nlp package

Comments

@nirhutnik
Copy link
Contributor

Give user a ranked list of their samples, by priority of which to send for annotation in a way that will help improve their model the most.

Possible ways to do this:

  • Find dense clusters of data with no labels
  • If comparing train to test, find what information is "new" in test (meaning, similar data does not exist in train)
  • Find clusters of data with high loss in train, which could benefit from more labels

Requires design - is this a regular check, or some other feature?

@nirhutnik nirhutnik added feature Feature update or code change to the package ds Tasks suited for Data Scientists nlp Affects deepchecks.nlp package labels Mar 7, 2023
@github-actions github-actions bot added needs triage Issue needs to be labeled and prioritized linear labels Mar 7, 2023
@nirhutnik nirhutnik removed the needs triage Issue needs to be labeled and prioritized label Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ds Tasks suited for Data Scientists feature Feature update or code change to the package linear nlp Affects deepchecks.nlp package
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant