Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 New Destination: Vectara (Vector Database) #31958

Closed

Conversation

topefolorunso
Copy link
Collaborator

What

Adding a new vector destination - Vectara resolves #29012

How

Generated the Python destination boilerplate codes and wrote the logic for connecting and writing to the database

Recommended reading order

  1. destination.py
  2. config.py
  3. indexer.py

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user?

For connector PRs, use this section to explain which type of semantic versioning bump occurs as a result of the changes. Refer to our Semantic Versioning for Connectors guidelines for more information. Breaking changes to connectors must be documented by an Airbyte engineer (PR author, or reviewer for community PRs) by using the Breaking Change Release Playbook.

If there are breaking changes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Actions

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Connector version is set to 0.0.1
    • Dockerfile has version 0.0.1
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog with an entry for the initial version. See changelog example
    • docs/integrations/README.md

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.

@vercel
Copy link

vercel bot commented Oct 30, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Dec 13, 2023 7:10pm

@github-actions
Copy link
Contributor

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@topefolorunso Is this PR ready to review? I took a basic look and the following things seem off:

  • There is still stuff around from the vector db cdk usage
  • When building the docker image, the spec command is failing with ModuleNotFoundError: No module named 'langchain' (most likely related to the above)
  • Basic documentation on how to set up a destination this way
  • I'm not sure whether OAuth is the right approach here - isn't this more of a use case for API keys? I will reach out to the vectara devs for this question

@flash1293
Copy link
Contributor

@topefolorunso Just chatted with vectara devs and going with oauth only for now makes sense. Please let me know once I can take a look, really excited for this connector :)

@topefolorunso
Copy link
Collaborator Author

@topefolorunso Is this PR ready to review?

@flash1293 No it is not, still working on it. I will take note of your comments and update once complete. Please bear with me. Thanks.

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Nov 10, 2023
@topefolorunso
Copy link
Collaborator Author

Please review now @flash1293

Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides some smaller notes, this PR doesn't implement this part specified in the issue:

To map a record to a document, the user should configure text fields and metadata fields as part of the connector configuration.

For each text field, create a corresponding section of the document, and copy the text of the field into that section.
Set metadata on the section, e.g. source_type with the value being the name of the text field.

Ideally, the corpus needs to be created with source_type specified as a filterable column. This will allow customers to run searches over some columns, and exclude others.

Fields specified as metadata field will be attached to the document as metadata.

It shouldn't just drop everything the record as-is into the first section, there should be a user-configurable text vs. metadata field configuration like in the vector db destinations (just two fields with array of strings).

Then, each text field should become the text field of one seaprate section in the document sent to vectara and all the metadata fields should be added to the metadatajson (not just the stream name).

Also, if a stream has a configured primary key, it should be used as the documentId, concatenating all of these:

  • Namespace
  • Stream name
  • all parts of the primary key

This needs to be done to make sure if the record gets updated and loaded again, it will replace the old version instead of creating a new document.

from destination_vectara.destination import DestinationVectara


class TestDestinationVectara(unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests need to be updated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be an svg

@marcosmarxm
Copy link
Member

@topefolorunso are you still planning to work on this?

@topefolorunso
Copy link
Collaborator Author

@topefolorunso are you still planning to work on this?

Yes I will work on it @marcosmarxm.

@flash1293
Copy link
Contributor

@topefolorunso as it laid bare a while, @ofermend from Vectara took over to bring it I’ve Ruhe finish line.

@topefolorunso
Copy link
Collaborator Author

@topefolorunso as it laid bare a while, @ofermend from Vectara took over to bring it I’ve Ruhe finish line.

Oh alright. That's fine I guess. Well done @ofermend.
Sorry for the delay on my part. Though I just pushed some changes just in case.

@ofermend
Copy link
Contributor

Thanks @topefolorunso - will pull your latest and try to get it to the finale. Appreciate your work so far and I'll reach out if I need any further help here.

@marcosmarxm
Copy link
Member

Close in favor of #33616

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

✨ Add Destination Vectara
5 participants