Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Milvus / Qdrant / Pinecone: Update CDK #30689

Merged
merged 23 commits into from Sep 26, 2023
Merged

Conversation

flash1293
Copy link
Contributor

@flash1293 flash1293 commented Sep 22, 2023

What

Based off #30718

This PR upgrades the CDK for Milvus and Qdrant desinations to allow processing of large records as well as adding support for Azure OpenAI embeddings and adding more text splitting options.

It also sets allowed hosts correctly.

@vercel
Copy link

vercel bot commented Sep 22, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 26, 2023 9:46am

@airbyte-oss-build-runner
Copy link
Collaborator

destination-qdrant test report (commit feef5222a6) - ✅

⏲️ Total pipeline duration: 03mn22s

Step Result
Connector package install
Build destination-qdrant docker image for platform linux/x86_64
Unit tests
Code format checks
Validate airbyte-integrations/connectors/destination-qdrant/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-qdrant test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-milvus test report (commit feef5222a6) - ❌

⏲️ Total pipeline duration: 03mn32s

Step Result
Connector package install
Build destination-milvus docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/destination-milvus/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-milvus test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-qdrant test report (commit 25fc12e857) - ✅

⏲️ Total pipeline duration: 01mn21s

Step Result
Connector package install
Build destination-qdrant docker image for platform linux/x86_64
Unit tests
Code format checks
Validate airbyte-integrations/connectors/destination-qdrant/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-qdrant test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-pinecone test report (commit 25fc12e857) - ✅

⏲️ Total pipeline duration: 02mn24s

Step Result
Connector package install
Build destination-pinecone docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/destination-pinecone/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-pinecone test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-milvus test report (commit 25fc12e857) - ❌

⏲️ Total pipeline duration: 02mn40s

Step Result
Connector package install
Build destination-milvus docker image for platform linux/x86_64
Unit tests
Code format checks
Validate airbyte-integrations/connectors/destination-milvus/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-milvus test

@flash1293 flash1293 changed the base branch from master to flash1293/pinecone-fix September 26, 2023 08:18
@airbyte-oss-build-runner
Copy link
Collaborator

destination-milvus test report (commit 97fdd834b7) - ❌

⏲️ Total pipeline duration: 03mn56s

Step Result
Connector package install
Build destination-milvus docker image for platform linux/x86_64
Unit tests
Code format checks
Validate airbyte-integrations/connectors/destination-milvus/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-milvus test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-pinecone test report (commit 97fdd834b7) - ✅

⏲️ Total pipeline duration: 04mn09s

Step Result
Connector package install
Build destination-pinecone docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/destination-pinecone/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-pinecone test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-qdrant test report (commit 97fdd834b7) - ✅

⏲️ Total pipeline duration: 04mn07s

Step Result
Connector package install
Build destination-qdrant docker image for platform linux/x86_64
Unit tests
Code format checks
Validate airbyte-integrations/connectors/destination-qdrant/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-qdrant test

Copy link
Collaborator

@aaronsteers aaronsteers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I performed a full code readthrough. A couple questions but no blockers. 👍

Comment on lines +9 to +12
- "${indexing.url}"
- api.openai.com
- api.cohere.ai
- "${embedding.api_base}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my own understanding, I'm curious how these dynamic refs are parsed. I see embedding.api_base matches the config setting name but I'm not sure about indexing.url.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 41 to 46
def _init_indexer(self, config: ConfigModel):
self.embedder = embedder_map[config.embedding.mode](config.embedding)
if config.embedding.mode == "azure_openai" or config.embedding.mode == "openai":
self.embedder = embedder_map[config.embedding.mode](config.embedding, config.processing.chunk_size)
else:
self.embedder = embedder_map[config.embedding.mode](config.embedding)
self.indexer = MilvusIndexer(config.indexing, self.embedder.embedding_dimensions)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this and embedder_map be moved into the CDK?

Not a blocker for this round, but it seems it would be good to have this logic centrally managed. (Also appears for Milvus.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that would make sense. I'll create an issue for that

Comment on lines +101 to +106
for key, value in metadata.items():
normalized_key = key
# the primary key can't be set directly with auto_id, so we prefix it with an underscore
if key == self._primary_key:
normalized_key = f"_{key}"
result[normalized_key] = value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I 100% follow the issue or the mitigation. I understand that we can't modify the primary key when it is of type auto_id - so are we setting a different key using the underscore? Is it in the case of the 'natural' primary key of the stream colliding with a built-in column name?

Does the problem become less if (in the future) we generate our own unique keys instead of using auto_id?

Seems like this is not a big issue either way (as long as we stop the failure scenario) so I'm not asking to change anything - just would be good for me to understand longer-term.

Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Milvus collection has to be set to auto_id, but if that's the case Milvus will complain if an entity contains a value for the primary key (as it will pick a value itself). So let's say you create a collection with primary key id set to auto_id, but a record coming from some source contains an id field as well, it will error.

This field changes the logic so the id field from the Airbyte record is stored as _id in the collection. This is necessary because one record might be split into multiple entities, so _id won't be guaranteed to be unique.

Base automatically changed from flash1293/pinecone-fix to master September 26, 2023 09:05
@airbyte-oss-build-runner
Copy link
Collaborator

destination-qdrant test report (commit 15fde9abfa) - ✅

⏲️ Total pipeline duration: 01mn29s

Step Result
Connector package install
Build destination-qdrant docker image for platform linux/x86_64
Unit tests
Code format checks
Validate airbyte-integrations/connectors/destination-qdrant/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-qdrant test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-milvus test report (commit 15fde9abfa) - ❌

⏲️ Total pipeline duration: 01mn41s

Step Result
Connector package install
Build destination-milvus docker image for platform linux/x86_64
Unit tests
Code format checks
Validate airbyte-integrations/connectors/destination-milvus/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-milvus test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-pinecone test report (commit 15fde9abfa) - ✅

⏲️ Total pipeline duration: 02mn25s

Step Result
Connector package install
Build destination-pinecone docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/destination-pinecone/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-pinecone test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-qdrant test report (commit fcdc9a2db4) - ✅

⏲️ Total pipeline duration: 01mn51s

Step Result
Connector package install
Build destination-qdrant docker image for platform linux/x86_64
Unit tests
Code format checks
Validate airbyte-integrations/connectors/destination-qdrant/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-qdrant test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-pinecone test report (commit fcdc9a2db4) - ✅

⏲️ Total pipeline duration: 01mn59s

Step Result
Connector package install
Build destination-pinecone docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/destination-pinecone/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-pinecone test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-milvus test report (commit fcdc9a2db4) - ✅

⏲️ Total pipeline duration: 02mn43s

Step Result
Connector package install
Build destination-milvus docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/destination-milvus/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-milvus test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-qdrant test report (commit c7c7ef3ecf) - ✅

⏲️ Total pipeline duration: 01mn23s

Step Result
Connector package install
Build destination-qdrant docker image for platform linux/x86_64
Unit tests
Code format checks
Validate airbyte-integrations/connectors/destination-qdrant/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-qdrant test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-pinecone test report (commit c7c7ef3ecf) - ✅

⏲️ Total pipeline duration: 02mn16s

Step Result
Connector package install
Build destination-pinecone docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/destination-pinecone/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-pinecone test

@airbyte-oss-build-runner
Copy link
Collaborator

destination-milvus test report (commit c7c7ef3ecf) - ✅

⏲️ Total pipeline duration: 02mn21s

Step Result
Connector package install
Build destination-milvus docker image for platform linux/x86_64
Unit tests
Integration tests
Acceptance tests
Code format checks
Validate airbyte-integrations/connectors/destination-milvus/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=destination-milvus test

@flash1293 flash1293 merged commit e100c27 into master Sep 26, 2023
27 checks passed
@flash1293 flash1293 deleted the flash1293/milvus-update branch September 26, 2023 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants