Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚨 Weaviate destination: Add embedding capabilities, overwrite and dedup support, API key auth mode and available on Airbyte Cloud #30151

Merged
merged 61 commits into from Sep 28, 2023

Conversation

flash1293
Copy link
Contributor

@flash1293 flash1293 commented Sep 5, 2023

What

Fixes #29663 and brings Weaviate destination to Airbyte Cloud.

By aligning the Weaviate destination with other vector db destinations.

How

Restructure the destination based on Pinecone / Milvus:

  • Use vector db CDK
  • Implement custom indexer

Reading order:

  • config.py
  • destination.py
  • indexer.py
  • integration_test.py

Special cases:

  • Add "no embedding" as embedding option if this part is handled in Weaviate itself
  • Normalize complex fields by serializing as JSON
  • Rename field names to always start with a lowercase letter (this is necessary in Weaviate)
  • Internally retry failed loads within the batch - I kept this from the previous code base, not sure how relevant in practice
  • Do not allow non-https connections on cloud specifically
  • Removed the old integration test that were mostly focused around the different schema handling - just rely on properly created schema in the destination or on auto-creating being enabled
  • The internal fields _ab_stream and _ab_record_id are queried to delete existing data in the class, but this only works if the fields exist already - as they will only be created after data gets loaded the first time, the connector checks whether they exist and skip the deletion step if they don't exist.

🚨 User Impact 🚨

Several breaking changes are made in this PR - check out weaviate_migrations.md for the details: https://github.com/airbytehq/airbyte/pull/30151/files#diff-6ce12b19886f87d02ff8a5b2141ca99fbfd0284ee8a857a833c6d5bb15e85e14

@flash1293 flash1293 changed the title Weaviate destination: Add embedding capabilities and API key auth mode Weaviate destination: Add embedding capabilities, overwrite and dedup support and API key auth mode Sep 5, 2023
@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues CDK Connector Development Kit connectors/destination/weaviate labels Sep 5, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 5, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Sep 5, 2023
@flash1293 flash1293 changed the title Weaviate destination: Add embedding capabilities, overwrite and dedup support and API key auth mode 🚨 Weaviate destination: Add embedding capabilities, overwrite and dedup support and API key auth mode Sep 5, 2023
@airbyte-oss-build-runner
Copy link
Collaborator

destination-weaviate test report (commit c7497eb06a) - ✅

⏲️ Total pipeline duration: 02mn48s

Step Result
Connector package install
Build destination-weaviate docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/destination-weaviate/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-weaviate test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-weaviate test report (commit 3a6b0e2544) - ❌

⏲️ Total pipeline duration: 01mn55s

Step Result
Connector package install
Build destination-weaviate docker image for platform linux/x86_64
Unit tests
Code format checks
Validate airbyte-integrations/connectors/destination-weaviate/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-weaviate test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-weaviate test report (commit 47c682dffe) - ✅

⏲️ Total pipeline duration: 01mn54s

Step Result
Connector package install
Build destination-weaviate docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/destination-weaviate/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-weaviate test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-weaviate test report (commit 828855dd05) - ✅

⏲️ Total pipeline duration: 02mn42s

Step Result
Connector package install
Build destination-weaviate docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/destination-weaviate/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-weaviate test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-weaviate test report (commit e7de8516e9) - ✅

⏲️ Total pipeline duration: 01mn39s

Step Result
Connector package install
Build destination-weaviate docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/destination-weaviate/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-weaviate test

Copy link
Collaborator

@aaronsteers aaronsteers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this - it's a significant rewrite.

I did a full readthrough of the code and I don't see any major issues. I added some notes/suggestions inline for your review. 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this logic be moved to a post-build shell script? I heard Dockerfiles are on their way out.

(Just a question for consideration; I wouldn't block on this.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my knowledge this new Dockerfile is closer to the "default logic", so I expect it to not cause any problems.

docs/integrations/destinations/weaviate.md Outdated Show resolved Hide resolved
docs/integrations/destinations/weaviate-migrations.md Outdated Show resolved Hide resolved
docs/integrations/destinations/weaviate-migrations.md Outdated Show resolved Hide resolved
Joe Reuter and others added 3 commits September 27, 2023 16:55
Co-authored-by: Aaron ("AJ") Steers <aj@airbyte.io>
Co-authored-by: Aaron ("AJ") Steers <aj@airbyte.io>
Co-authored-by: Aaron ("AJ") Steers <aj@airbyte.io>
@airbyte-oss-build-runner
Copy link
Collaborator

destination-weaviate test report (commit f1c33a36ed) - ✅

⏲️ Total pipeline duration: 03mn43s

Step Result
Connector package install
Build destination-weaviate docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/destination-weaviate/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-weaviate test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-weaviate test report (commit 359cd06f68) - ✅

⏲️ Total pipeline duration: 03mn42s

Step Result
Connector package install
Build destination-weaviate docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/destination-weaviate/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-weaviate test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-weaviate test report (commit 8c6c583397) - ❌

⏲️ Total pipeline duration: 01mn55s

Step Result
Connector package install
Build destination-weaviate docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/destination-weaviate/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-weaviate test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-weaviate test report (commit f8d9237765) - ✅

⏲️ Total pipeline duration: 01mn55s

Step Result
Connector package install
Build destination-weaviate docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/destination-weaviate/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-weaviate test

@flash1293 flash1293 merged commit f161e2e into master Sep 28, 2023
25 checks passed
@flash1293 flash1293 deleted the flash1293/weaviate-rewrite branch September 28, 2023 09:09
girarda pushed a commit that referenced this pull request Oct 10, 2023
…up support, API key auth mode and available on Airbyte Cloud (#30151)

Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
Co-authored-by: Aaron ("AJ") Steers <aj@airbyte.io>
girarda pushed a commit that referenced this pull request Oct 10, 2023
…up support, API key auth mode and available on Airbyte Cloud (#30151)

Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
Co-authored-by: Aaron ("AJ") Steers <aj@airbyte.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/destination/weaviate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Weaviate destination: Refactor to align with other vector db destinations
5 participants