Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARKNLP 836 - Introducing "Instructor Embeddings" for sentence embeddings like Instructor-XL model #13849

Conversation

prabod
Copy link
Contributor

@prabod prabod commented Jun 8, 2023

This PR adds Instructor embeddings to SparkNLP

Instructor-XL is the best model according to this leaderboard: https://huggingface.co/spaces/mteb/leaderboard

We introduce Instructor👨‍🏫, an instruction-finetuned text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) by simply providing the task instruction, without any finetuning. Instructor👨‍ achieves sota on 70 diverse embedding tasks! The model is easy to use with our customized sentence-transformer library. For more details, check out our paper and project page!

Types of changes

  • New feature (non-breaking change which adds functionality)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING page.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@prabod prabod added the DON'T MERGE Do not merge this PR label Jun 8, 2023
@prabod prabod requested a review from maziyarpanahi June 8, 2023 09:23
@maziyarpanahi maziyarpanahi changed the title Sparknlp 836 implement t5 encoder model for sentence embeddings SPARKNLP 836 - Introducing "Instructor Embeddings" for sentence embeddings like Instructor-XL model Jun 8, 2023
@maziyarpanahi maziyarpanahi added new-feature Introducing a new feature new model labels Jun 8, 2023
@maziyarpanahi maziyarpanahi changed the base branch from master to release/500-release-candidate July 1, 2023 13:07
@maziyarpanahi maziyarpanahi merged commit ae688ab into release/500-release-candidate Jul 1, 2023
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DON'T MERGE Do not merge this PR new model new-feature Introducing a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants