New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚨 Weaviate destination: Add embedding capabilities, overwrite and dedup support, API key auth mode and available on Airbyte Cloud #30151
Merged
Merged
Changes from 54 commits
Commits
Show all changes
61 commits
Select commit
Hold shift + click to select a range
83c97bb
better error message for misconfigured text fields
de0cf30
improve message
5a88145
improve error message
6c74055
refactor vector db cdk helpers a bit
6b7a3b0
make most of the things work
2ff7617
Automated Commit - Formatting Changes
flash1293 a414d88
adjust component interaction
3cc529a
Merge branch 'flash1293/ai-cdk-improvements' into flash1293/weaviate-…
b47eadf
Merge branch 'flash1293/weaviate-rewrite' of github.com:airbytehq/air…
139c0ab
make no_embedding mode work
51ca514
Automated Commit - Formatting Changes
flash1293 347f333
format fixes
5d76df2
Merge branch 'flash1293/ai-cdk-improvements' into flash1293/weaviate-…
52c2095
Merge branch 'flash1293/weaviate-rewrite' of github.com:airbytehq/air…
17a5c30
docs, auto-create classes
a5c5ff8
review comments
716abc8
Merge remote-tracking branch 'origin/master' into flash1293/ai-cdk-im…
bd0f3c2
revert formatting
3ce6334
fix unit tests
db2f04e
Merge branch 'flash1293/ai-cdk-improvements' into flash1293/weaviate-…
aa8461a
work on tests
4084ddb
work on tests
3c2f06c
Merge remote-tracking branch 'origin/master' into flash1293/weaviate-…
70fe870
make most things works
49d0963
Merge remote-tracking branch 'origin/master' into flash1293/weaviate-…
006d277
documentation
6738efd
Merge remote-tracking branch 'origin/master' into flash1293/weaviate-…
80f480d
fix metadata and dockerfile
89b9d50
fix small things in weaviate destination
6e6331c
try more retries
d238c52
try to fix integration tests
8de45e2
Merge remote-tracking branch 'origin/master' into flash1293/weaviate-…
2ccd705
review comments
2a61557
respect reserved property names
7986eaa
Merge branch 'master' into flash1293/weaviate-rewrite
cb4ef90
Merge remote-tracking branch 'upstream/master' into flash1293/weaviat…
e26aaaf
adjust based on feedback
cd0acb2
fix integration tests
5c3e877
bump changelog
5bff5ed
enable on cloud
037803d
fix breaking change message
e2c3cc0
set index correctly for _ab_record_id field
a626f7d
format
5c82f1a
Merge remote-tracking branch 'origin/master' into flash1293/weaviate-…
60d8144
adjust metadata
53e80d4
Automated Commit - Formatting Changes
flash1293 3eeb341
update cdk
c9eaf17
Merge branch 'flash1293/weaviate-rewrite' of github.com:airbytehq/air…
c7497eb
fix
3a6b0e2
disallow no auth on cloud
47c682d
fix test
ab9dcdd
Merge remote-tracking branch 'upstream/master' into flash1293/weaviat…
828855d
set to certified
e7de851
Merge branch 'master' into flash1293/weaviate-rewrite
c7a0b8f
Update docs/integrations/destinations/weaviate.md
751025e
Update docs/integrations/destinations/weaviate-migrations.md
f1c33a3
Update docs/integrations/destinations/weaviate-migrations.md
68f7a94
update cdk
359cd06
Merge branch 'flash1293/weaviate-rewrite' of github.com:airbytehq/air…
8c6c583
chunk as configured
f8d9237
fix format
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
7 changes: 7 additions & 0 deletions
7
airbyte-integrations/connectors/destination-weaviate/acceptance-test-config.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
acceptance_tests: | ||
spec: | ||
tests: | ||
- spec_path: integration_tests/spec.json | ||
backward_compatibility_tests_config: | ||
disable_for_version: "0.2.0" | ||
connector_image: airbyte/destination-weaviate:dev |
2 changes: 2 additions & 0 deletions
2
airbyte-integrations/connectors/destination-weaviate/acceptance-test-docker.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
#!/usr/bin/env sh | ||
source "$(git rev-parse --show-toplevel)/airbyte-integrations/bases/connector-acceptance-test/acceptance-test-docker.sh" |
141 changes: 0 additions & 141 deletions
141
airbyte-integrations/connectors/destination-weaviate/destination_weaviate/client.py
This file was deleted.
Oops, something went wrong.
141 changes: 141 additions & 0 deletions
141
airbyte-integrations/connectors/destination-weaviate/destination_weaviate/config.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,141 @@ | ||
# | ||
# Copyright (c) 2023 Airbyte, Inc., all rights reserved. | ||
# | ||
|
||
from typing import List, Literal, Union | ||
|
||
import dpath.util | ||
from airbyte_cdk.destinations.vector_db_based.config import ( | ||
AzureOpenAIEmbeddingConfigModel, | ||
CohereEmbeddingConfigModel, | ||
FakeEmbeddingConfigModel, | ||
FromFieldEmbeddingConfigModel, | ||
OpenAIEmbeddingConfigModel, | ||
ProcessingConfigModel, | ||
) | ||
from airbyte_cdk.utils.spec_schema_transformations import resolve_refs | ||
from pydantic import BaseModel, Field | ||
|
||
|
||
class UsernamePasswordAuth(BaseModel): | ||
mode: Literal["username_password"] = Field("username_password", const=True) | ||
username: str = Field(..., title="Username", description="Username for the Weaviate cluster", order=1) | ||
password: str = Field(..., title="Password", description="Password for the Weaviate cluster", airbyte_secret=True, order=2) | ||
|
||
class Config: | ||
title = "Username/Password" | ||
schema_extra = {"description": "Authenticate using username and password (suitable for self-managed Weaviate clusters)"} | ||
|
||
|
||
class NoAuth(BaseModel): | ||
mode: Literal["no_auth"] = Field("no_auth", const=True) | ||
|
||
class Config: | ||
title = "No Authentication" | ||
schema_extra = { | ||
"description": "Do not authenticate (suitable for locally running test clusters, do not use for clusters with public IP addresses)" | ||
} | ||
|
||
|
||
class TokenAuth(BaseModel): | ||
mode: Literal["token"] = Field("token", const=True) | ||
token: str = Field(..., title="API Token", description="API Token for the Weaviate instance", airbyte_secret=True) | ||
|
||
class Config: | ||
title = "API Token" | ||
schema_extra = {"description": "Authenticate using an API token (suitable for Weaviate Cloud)"} | ||
|
||
|
||
class Header(BaseModel): | ||
header_key: str = Field(..., title="Header Key") | ||
value: str = Field(..., title="Header Value", airbyte_secret=True) | ||
|
||
|
||
class WeaviateIndexingConfigModel(BaseModel): | ||
host: str = Field( | ||
..., | ||
title="Public Endpoint", | ||
order=1, | ||
description="The public endpoint of the Weaviate cluster.", | ||
examples=["https://my-cluster.weaviate.network"], | ||
) | ||
auth: Union[TokenAuth, UsernamePasswordAuth, NoAuth] = Field( | ||
..., title="Authentication", description="Authentication method", discriminator="mode", type="object", order=2 | ||
) | ||
batch_size: int = Field(title="Batch Size", description="The number of records to send to Weaviate in each batch", default=128) | ||
text_field: str = Field(title="Text Field", description="The field in the object that contains the embedded text", default="text") | ||
default_vectorizer: str = Field( | ||
title="Default Vectorizer", | ||
description="The vectorizer to use if new classes need to be created", | ||
default="none", | ||
enum=[ | ||
"none", | ||
"text2vec-cohere", | ||
"text2vec-huggingface", | ||
"text2vec-openai", | ||
"text2vec-palm", | ||
"text2vec-contextionary", | ||
"text2vec-transformers", | ||
"text2vec-gpt4all", | ||
], | ||
) | ||
additional_headers: List[Header] = Field( | ||
title="Additional headers", | ||
description="Additional HTTP headers to send with every request.", | ||
default=[], | ||
examples=[{"header_key": "X-OpenAI-Api-Key", "value": "my-openai-api-key"}], | ||
) | ||
|
||
class Config: | ||
title = "Indexing" | ||
schema_extra = { | ||
"group": "indexing", | ||
"description": "Indexing configuration", | ||
} | ||
|
||
|
||
class NoEmbeddingConfigModel(BaseModel): | ||
mode: Literal["no_embedding"] = Field("no_embedding", const=True) | ||
|
||
class Config: | ||
title = "No external embedding" | ||
schema_extra = { | ||
"description": "Do not calculate and pass embeddings to Weaviate. Suitable for clusters with configured vectorizers to calculate embeddings within Weaviate or for classes that should only support regular text search." | ||
} | ||
|
||
|
||
class ConfigModel(BaseModel): | ||
processing: ProcessingConfigModel | ||
embedding: Union[ | ||
NoEmbeddingConfigModel, | ||
AzureOpenAIEmbeddingConfigModel, | ||
OpenAIEmbeddingConfigModel, | ||
CohereEmbeddingConfigModel, | ||
FromFieldEmbeddingConfigModel, | ||
FakeEmbeddingConfigModel, | ||
] = Field(..., title="Embedding", description="Embedding configuration", discriminator="mode", group="embedding", type="object") | ||
indexing: WeaviateIndexingConfigModel | ||
|
||
class Config: | ||
title = "Weaviate Destination Config" | ||
schema_extra = { | ||
"groups": [ | ||
{"id": "processing", "title": "Processing"}, | ||
{"id": "embedding", "title": "Embedding"}, | ||
{"id": "indexing", "title": "Indexing"}, | ||
] | ||
} | ||
|
||
@staticmethod | ||
def remove_discriminator(schema: dict) -> None: | ||
"""pydantic adds "discriminator" to the schema for oneOfs, which is not treated right by the platform as we inline all references""" | ||
dpath.util.delete(schema, "properties/*/discriminator") | ||
dpath.util.delete(schema, "properties/**/discriminator") | ||
|
||
@classmethod | ||
def schema(cls): | ||
"""we're overriding the schema classmethod to enable some post-processing""" | ||
schema = super().schema() | ||
schema = resolve_refs(schema) | ||
cls.remove_discriminator(schema) | ||
return schema |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this logic be moved to a post-build shell script? I heard Dockerfiles are on their way out.
(Just a question for consideration; I wouldn't block on this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my knowledge this new Dockerfile is closer to the "default logic", so I expect it to not cause any problems.