Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: 230 feature add sentence transformers support for the to argilla method #262

Conversation

davidberenstein1957
Copy link
Member

@davidberenstein1957 davidberenstein1957 commented Jan 17, 2024

I added sentence-transformers support to the to_argilla methods.

Things to note:

  • I added an add_vectors_to_argilla_dataset to the Task base class
  • I added a vector_strategy argument to the to_argilla methods. This can either be a bool or a SentenceTransformersExtractor and default to True. If True, it uses the defaults. if False, it does nothing, if SentenceTransformersExtractor it will use a custom-initialized extractor.
  • I added sentence-transformers to the argilla extras to avoid having more splits in this and to align with using this as default.
  • I bumped the argilla version to 1.22.0 because this is required for the SentenceTransformersExtractor

WIP:

  • docs and examples

Copy link
Contributor

@plaguss plaguss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments, this will be so useful! Some pictures of the UI would be nice once we add the docs, just to keep it in mind

src/distilabel/tasks/base.py Outdated Show resolved Hide resolved
src/distilabel/tasks/text_generation/self_instruct.py Outdated Show resolved Hide resolved
src/distilabel/dataset.py Outdated Show resolved Hide resolved
Comment on lines 27 to 30
from distilabel.utils.imports import _ARGILLA_AVAILABLE

if _ARGILLA_AVAILABLE:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from distilabel.utils.imports import _ARGILLA_AVAILABLE
if _ARGILLA_AVAILABLE:
pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has no effect currently

Copy link
Contributor

@plaguss plaguss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small notes, looks nice, will be really helpful 😄

tests/test_dataset.py Outdated Show resolved Hide resolved
@davidberenstein1957
Copy link
Member Author

@plaguss I now added a dataset_columns option, which allows either datasets columns or FeedbackDataset fields to select. What do you think of the naming?

Copy link
Contributor

@plaguss plaguss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming looks fine to me 😄

@davidberenstein1957 davidberenstein1957 merged commit 698b556 into main Jan 24, 2024
4 checks passed
@davidberenstein1957 davidberenstein1957 deleted the feat/230-feature-add-sentence-transformers-support-for-the-to_argilla-method branch January 24, 2024 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] add sentence-transformers support for the to_argilla method
2 participants