Skip to content

Conversation

@moltrus
Copy link

@moltrus moltrus commented Mar 25, 2025

Description of Changes

Upgraded the default embedding model from jina-embeddings-v2-base-en to jina-embeddings-v3, incorporating new API parameters. The updated version provides better performance, multilingual support (89 languages), and MRL embeddings.

Key Changes:

  • Upgraded the default Jina model from jina-embeddings-v2-base-en to jina-embeddings-v3.
  • Added new parameters to enhance flexibility:
    • task: Allows selection of LoRA adapters for specific downstream tasks.
    • late_chunking: Supports token embedding, chunking, and pooling for improved context awareness.
    • dimensions: Enables setting a custom dimensionality to optimize storage and performance.
    • embedding_type: Supports output as float, binary (for faster retrieval), or base64 (for efficient transmission)
  • Updated docstrings to reflect the new parameters and functionality.

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@moltrus moltrus changed the title Update jina ef [ENH] Update Jina Embedding Function to support v3 with new parameters Mar 25, 2025
Copy link
Contributor

@jairad26 jairad26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thank you for the PR! With updates to the embedding functions, it's also necessary to update the schemas to support cross language compatibility, as well as the JS version of the jina embedding function. The schema file can be found in schemas/embedding_functions/jina.json, and the jina embedding function in js can be found at clients/js/packages/chromadb-core/src/embeddings/JinaEmbeddingFunction.ts

@moltrus
Copy link
Author

moltrus commented Apr 1, 2025

Hi, I was a held up with my internship. Thanks for the updates, I will fix the needed parts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants