Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3, Azure Blob Storage, GCS, Pinecone, Weaviate, Milvus, Chroma, Qdrant: Update CDK to improve spec generation #32357

Merged
merged 28 commits into from
Nov 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
c7c4340
fix specs
Nov 9, 2023
9650313
Merge remote-tracking branch 'origin/master' into flash1293/fix-specs
Nov 9, 2023
22b204f
fix specs
Nov 9, 2023
e94dcdd
fix markdown formatting
Nov 9, 2023
69649aa
Automated Commit - Formatting Changes
flash1293 Nov 9, 2023
d76b3ac
fix tests
Nov 10, 2023
d4c5901
Merge remote-tracking branch 'origin/master' into flash1293/fix-specs
Nov 10, 2023
8c75ff6
Merge branch 'flash1293/fix-specs' of github.com:airbytehq/airbyte in…
Nov 10, 2023
700bef9
Automated Commit - Formatting Changes
flash1293 Nov 10, 2023
e840fdb
fix test
Nov 10, 2023
99d80fa
fix dockerfile
Nov 10, 2023
cde80bb
Merge remote-tracking branch 'origin/master' into flash1293/fix-specs
Nov 10, 2023
376e3bc
revert google drive
Nov 10, 2023
6da80dd
switch to shared base image for gcs
Nov 10, 2023
b4a3f59
fix
Nov 10, 2023
96c77ef
Automated Commit - Formatting Changes
flash1293 Nov 10, 2023
6305b3a
Merge branch 'master' into flash1293/fix-specs
Nov 13, 2023
bc88fbe
fix build
Nov 13, 2023
5bf0aef
Merge branch 'master' into flash1293/fix-specs
Nov 13, 2023
f3b10e0
bump
Nov 13, 2023
d13cae4
Merge branch 'flash1293/fix-specs' of github.com:airbytehq/airbyte in…
Nov 13, 2023
742feda
fix
Nov 13, 2023
cbb5625
Revert "fix"
Nov 13, 2023
82c6c7f
Merge branch 'master' into flash1293/fix-specs
Nov 14, 2023
3699618
Merge remote-tracking branch 'origin/master' into flash1293/fix-specs
Nov 14, 2023
6ee2908
fix build
Nov 14, 2023
01adf04
Merge branch 'flash1293/fix-specs' of github.com:airbytehq/airbyte in…
Nov 14, 2023
5385300
fix specs
Nov 14, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -41,5 +41,5 @@ COPY destination_chroma ./destination_chroma
ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

LABEL io.airbyte.version=0.0.5
LABEL io.airbyte.version=0.0.6
LABEL io.airbyte.name=airbyte/destination-chroma
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ data:
connectorSubtype: vectorstore
connectorType: destination
definitionId: 0b75218b-f702-4a28-85ac-34d3d84c0fc2
dockerImageTag: 0.0.5
dockerImageTag: 0.0.6
dockerRepository: airbyte/destination-chroma
githubIssueLabel: destination-chroma
icon: chroma.svg
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from setuptools import find_packages, setup

MAIN_REQUIREMENTS = [
"airbyte-cdk[vector-db-based]==0.51.41",
"airbyte-cdk[vector-db-based]==0.53.3",
"chromadb",
]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
OpenAIEmbeddingConfigModel,
ProcessingConfigModel,
)
from airbyte_cdk.utils.oneof_option_config import OneOfOptionConfig
from airbyte_cdk.utils.spec_schema_transformations import resolve_refs
from pydantic import BaseModel, Field

Expand All @@ -23,28 +24,29 @@ class UsernamePasswordAuth(BaseModel):
username: str = Field(..., title="Username", description="Username for the Milvus instance", order=1)
password: str = Field(..., title="Password", description="Password for the Milvus instance", airbyte_secret=True, order=2)

class Config:
class Config(OneOfOptionConfig):
title = "Username/Password"
schema_extra = {"description": "Authenticate using username and password (suitable for self-managed Milvus clusters)"}
description = "Authenticate using username and password (suitable for self-managed Milvus clusters)"
discriminator = "mode"


class NoAuth(BaseModel):
mode: Literal["no_auth"] = Field("no_auth", const=True)

class Config:
class Config(OneOfOptionConfig):
title = "No auth"
schema_extra = {
"description": "Do not authenticate (suitable for locally running test clusters, do not use for clusters with public IP addresses)"
}
description = "Do not authenticate (suitable for locally running test clusters, do not use for clusters with public IP addresses)"
discriminator = "mode"


class TokenAuth(BaseModel):
mode: Literal["token"] = Field("token", const=True)
token: str = Field(..., title="API Token", description="API Token for the Milvus instance", airbyte_secret=True)

class Config:
class Config(OneOfOptionConfig):
title = "API Token"
schema_extra = {"description": "Authenticate using an API token (suitable for Zilliz Cloud)"}
description = "Authenticate using an API token (suitable for Zilliz Cloud)"
discriminator = "mode"


class MilvusIndexingConfigModel(BaseModel):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
"chunk_size": {
"title": "Chunk size",
"description": "Size of chunks in tokens to store in vector store (make sure it is not too big for the context if your LLM)",
"minimum": 1,
"maximum": 8191,
"type": "integer"
},
Expand Down Expand Up @@ -91,6 +92,7 @@
"type": "boolean"
}
},
"required": ["mode"],
"description": "Split the text by the list of separators until the chunk size is reached, using the earlier mentioned separators where possible. This is useful for splitting text fields by paragraphs, sentences, words, etc."
},
{
Expand All @@ -113,6 +115,7 @@
"type": "integer"
}
},
"required": ["mode"],
"description": "Split the text by Markdown headers down to the specified header level. If the chunk size fits multiple sections, they will be combined into a single chunk."
},
{
Expand Down Expand Up @@ -150,7 +153,7 @@
"type": "string"
}
},
"required": ["language"],
"required": ["language", "mode"],
"description": "Split the text by suitable delimiters based on the programming language. This is useful for splitting code into chunks."
}
]
Expand Down Expand Up @@ -182,7 +185,7 @@
"type": "string"
}
},
"required": ["openai_key"],
"required": ["openai_key", "mode"],
"description": "Use the OpenAI API to embed text. This option is using the text-embedding-ada-002 model with 1536 embedding dimensions."
},
{
Expand All @@ -202,7 +205,7 @@
"type": "string"
}
},
"required": ["cohere_key"],
"required": ["cohere_key", "mode"],
"description": "Use the Cohere API to embed text."
},
{
Expand All @@ -217,6 +220,7 @@
"type": "string"
}
},
"required": ["mode"],
"description": "Use a fake embedding made out of random vectors with 1536 embedding dimensions. This is useful for testing the data pipeline without incurring any costs."
},
{
Expand All @@ -243,7 +247,7 @@
"type": "integer"
}
},
"required": ["field_name", "dimensions"],
"required": ["field_name", "dimensions", "mode"],
"description": "Use a field in the record as the embedding. This is useful if you already have an embedding for your data and want to store it in the vector store."
},
{
Expand Down Expand Up @@ -276,7 +280,7 @@
"type": "string"
}
},
"required": ["openai_key", "api_base", "deployment"],
"required": ["openai_key", "api_base", "deployment", "mode"],
"description": "Use the Azure-hosted OpenAI API to embed text. This option is using the text-embedding-ada-002 model with 1536 embedding dimensions."
},
{
Expand Down Expand Up @@ -316,7 +320,7 @@
"type": "integer"
}
},
"required": ["base_url", "dimensions"],
"required": ["base_url", "dimensions", "mode"],
"description": "Use a service that's compatible with the OpenAI API to embed text."
}
]
Expand Down Expand Up @@ -372,7 +376,7 @@
"type": "string"
}
},
"required": ["token"],
"required": ["token", "mode"],
"description": "Authenticate using an API token (suitable for Zilliz Cloud)"
},
{
Expand Down Expand Up @@ -400,7 +404,7 @@
"type": "string"
}
},
"required": ["username", "password"],
"required": ["username", "password", "mode"],
"description": "Authenticate using username and password (suitable for self-managed Milvus clusters)"
},
{
Expand All @@ -415,7 +419,8 @@
"type": "string"
}
},
"description": "Do not authenticate (suitable for locally running test clusters, do not use for clusters with public IP addresses)"
"description": "Do not authenticate (suitable for locally running test clusters, do not use for clusters with public IP addresses)",
"required": ["mode"]
}
]
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ data:
connectorSubtype: vectorstore
connectorType: destination
definitionId: 65de8962-48c9-11ee-be56-0242ac120002
dockerImageTag: 0.0.8
dockerImageTag: 0.0.9
dockerRepository: airbyte/destination-milvus
githubIssueLabel: destination-milvus
icon: milvus.svg
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

from setuptools import find_packages, setup

MAIN_REQUIREMENTS = ["airbyte-cdk[vector-db-based]==0.51.41", "pymilvus==2.3.0"]
MAIN_REQUIREMENTS = ["airbyte-cdk[vector-db-based]==0.53.3", "pymilvus==2.3.0"]

TEST_REQUIREMENTS = ["pytest~=6.2"]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ class Config:
@staticmethod
def remove_discriminator(schema: dict) -> None:
"""pydantic adds "discriminator" to the schema for oneOfs, which is not treated right by the platform as we inline all references"""
dpath.util.delete(schema, "properties/*/discriminator")
dpath.util.delete(schema, "properties/**/discriminator")

@classmethod
def schema(cls):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@
"type": "string"
}
},
"required": ["openai_key"],
"required": ["openai_key", "mode"],
"description": "Use the OpenAI API to embed text. This option is using the text-embedding-ada-002 model with 1536 embedding dimensions."
},
{
Expand All @@ -73,7 +73,7 @@
"type": "string"
}
},
"required": ["cohere_key"],
"required": ["cohere_key", "mode"],
"description": "Use the Cohere API to embed text."
},
{
Expand All @@ -88,6 +88,7 @@
"type": "string"
}
},
"required": ["mode"],
"description": "Use a fake embedding made out of random vectors with 1536 embedding dimensions. This is useful for testing the data pipeline without incurring any costs."
},
{
Expand Down Expand Up @@ -120,7 +121,7 @@
"type": "string"
}
},
"required": ["openai_key", "api_base", "deployment"],
"required": ["openai_key", "api_base", "deployment", "mode"],
"description": "Use the Azure-hosted OpenAI API to embed text. This option is using the text-embedding-ada-002 model with 1536 embedding dimensions."
},
{
Expand Down Expand Up @@ -160,7 +161,7 @@
"type": "integer"
}
},
"required": ["base_url", "dimensions"],
"required": ["base_url", "dimensions", "mode"],
"description": "Use a service that's compatible with the OpenAI API to embed text."
}
]
Expand All @@ -172,6 +173,7 @@
"chunk_size": {
"title": "Chunk size",
"description": "Size of chunks in tokens to store in vector store (make sure it is not too big for the context if your LLM)",
"minimum": 1,
"maximum": 8191,
"type": "integer"
},
Expand Down Expand Up @@ -226,14 +228,6 @@
"title": "Text splitter",
"description": "Split text fields into chunks based on the specified method.",
"type": "object",
"discriminator": {
"propertyName": "mode",
"mapping": {
"separator": "#/definitions/SeparatorSplitterConfigModel",
"markdown": "#/definitions/MarkdownHeaderSplitterConfigModel",
"code": "#/definitions/CodeSplitterConfigModel"
}
},
"oneOf": [
{
"title": "By Separator",
Expand All @@ -260,6 +254,7 @@
"type": "boolean"
}
},
"required": ["mode"],
"description": "Split the text by the list of separators until the chunk size is reached, using the earlier mentioned separators where possible. This is useful for splitting text fields by paragraphs, sentences, words, etc."
},
{
Expand All @@ -282,6 +277,7 @@
"type": "integer"
}
},
"required": ["mode"],
"description": "Split the text by Markdown headers down to the specified header level. If the chunk size fits multiple sections, they will be combined into a single chunk."
},
{
Expand Down Expand Up @@ -319,7 +315,7 @@
"type": "string"
}
},
"required": ["language"],
"required": ["language", "mode"],
"description": "Split the text by suitable delimiters based on the programming language. This is useful for splitting code into chunks."
}
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ data:
connectorSubtype: vectorstore
connectorType: destination
definitionId: 3d2b6f84-7f0d-4e3f-a5e5-7c7d4b50eabd
dockerImageTag: 0.0.19
dockerImageTag: 0.0.20
dockerRepository: airbyte/destination-pinecone
documentationUrl: https://docs.airbyte.com/integrations/destinations/pinecone
githubIssueLabel: destination-pinecone
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from setuptools import find_packages, setup

MAIN_REQUIREMENTS = [
"airbyte-cdk[vector-db-based]==0.51.41",
"airbyte-cdk[vector-db-based]==0.53.3",
"pinecone-client[grpc]",
]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,5 +41,5 @@ COPY destination_qdrant ./destination_qdrant
ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

LABEL io.airbyte.version=0.0.6
LABEL io.airbyte.version=0.0.7
LABEL io.airbyte.name=airbyte/destination-qdrant
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ data:
connectorSubtype: vectorstore
connectorType: destination
definitionId: 6eb1198a-6d38-43e5-aaaa-dccd8f71db2b
dockerImageTag: 0.0.6
dockerImageTag: 0.0.7
dockerRepository: airbyte/destination-qdrant
githubIssueLabel: destination-qdrant
icon: qdrant.svg
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

from setuptools import find_packages, setup

MAIN_REQUIREMENTS = ["airbyte-cdk[vector-db-based]==0.51.41", "qdrant-client", "fastembed"]
MAIN_REQUIREMENTS = ["airbyte-cdk[vector-db-based]==0.53.3", "qdrant-client", "fastembed"]

TEST_REQUIREMENTS = ["pytest~=6.2"]

Expand Down