New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 New Destination: Vectara (Vector Database) #31958
🎉 New Destination: Vectara (Vector Database) #31958
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
airbyte-integrations/connectors/destination-vectara/destination_vectara/indexer.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@topefolorunso Is this PR ready to review? I took a basic look and the following things seem off:
- There is still stuff around from the vector db cdk usage
- When building the docker image, the spec command is failing with
ModuleNotFoundError: No module named 'langchain'
(most likely related to the above) - Basic documentation on how to set up a destination this way
- I'm not sure whether OAuth is the right approach here - isn't this more of a use case for API keys? I will reach out to the vectara devs for this question
@topefolorunso Just chatted with vectara devs and going with oauth only for now makes sense. Please let me know once I can take a look, really excited for this connector :) |
@flash1293 No it is not, still working on it. I will take note of your comments and update once complete. Please bear with me. Thanks. |
…ctara-destination
Please review now @flash1293 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides some smaller notes, this PR doesn't implement this part specified in the issue:
To map a record to a document, the user should configure text fields and metadata fields as part of the connector configuration.
For each text field, create a corresponding section of the document, and copy the text of the field into that section.
Set metadata on the section, e.g. source_type with the value being the name of the text field.
Ideally, the corpus needs to be created with source_type specified as a filterable column. This will allow customers to run searches over some columns, and exclude others.
Fields specified as metadata field will be attached to the document as metadata.
It shouldn't just drop everything the record as-is into the first section, there should be a user-configurable text vs. metadata field configuration like in the vector db destinations (just two fields with array of strings).
Then, each text field should become the text
field of one seaprate section in the document sent to vectara and all the metadata fields should be added to the metadatajson (not just the stream name).
Also, if a stream has a configured primary key, it should be used as the documentId
, concatenating all of these:
- Namespace
- Stream name
- all parts of the primary key
This needs to be done to make sure if the record gets updated and loaded again, it will replace the old version instead of creating a new document.
airbyte-integrations/connectors/destination-vectara/destination_vectara/destination.py
Show resolved
Hide resolved
from destination_vectara.destination import DestinationVectara | ||
|
||
|
||
class TestDestinationVectara(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests need to be updated?
airbyte-integrations/connectors/destination-vectara/destination_vectara/writer.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be an svg
airbyte-integrations/connectors/destination-vectara/destination_vectara/client.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/destination-vectara/destination_vectara/client.py
Outdated
Show resolved
Hide resolved
@topefolorunso are you still planning to work on this? |
…ctara-destination
Yes I will work on it @marcosmarxm. |
@topefolorunso as it laid bare a while, @ofermend from Vectara took over to bring it I’ve Ruhe finish line. |
Oh alright. That's fine I guess. Well done @ofermend. |
Thanks @topefolorunso - will pull your latest and try to get it to the finale. Appreciate your work so far and I'll reach out if I need any further help here. |
Close in favor of #33616 |
What
Adding a new vector destination - Vectara resolves #29012
How
Generated the Python destination boilerplate codes and wrote the logic for connecting and writing to the database
Recommended reading order
destination.py
config.py
indexer.py
🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user?
For connector PRs, use this section to explain which type of semantic versioning bump occurs as a result of the changes. Refer to our Semantic Versioning for Connectors guidelines for more information. Breaking changes to connectors must be documented by an Airbyte engineer (PR author, or reviewer for community PRs) by using the Breaking Change Release Playbook.
If there are breaking changes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
Pre-merge Actions
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.0.0.1
Dockerfile
has version0.0.1
README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog with an entry for the initial version. See changelog exampledocs/integrations/README.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.