Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[agents] Add support for AstraDB Collections (Astra Vector DB, using Stargate) #731

Merged
merged 15 commits into from
Nov 23, 2023

Conversation

eolivelli
Copy link
Member

@eolivelli eolivelli commented Nov 20, 2023

Summary:

  • add support for Astra Collections: asserts, query (similarity search. regular query and execute-statement) and vector-db-writer sink
  • add examples

With this patch there is a new vector-database type: "astra-vector-db".

If you use "service=astra" than you are going to use the classic APIs with CQL, if you use "service=astra-vector-db" then you use the JSON API, based on Stargate (https://gitub.com/stargate).

The sample application and the integration tests contains examples about how to use it.
We will follow up on the documents repository with detailed reference for all the commands.

Quick overview

configuration.yaml:

configuration:
  resources:
  - type: "datasource"
    name: "AstraDatasource"
    configuration:
      service: "astra-vector-db"
      token: "${secrets.astra-vector-db.token}"
      endpoint: "${secrets.astra-vector-db.endpoint}"

Similarity search:

  - name: "lookup-related-documents"
    type: "query-vector-db"
    configuration:
      datasource: "AstraDatasource"
      query: |
          {
              "collection-name": "documents",
              "limit": 20,
              "vector": ?
          }
      fields:
        - "value.question_embeddings"
      output-field: "value.related_documents"

Vector DB Sink:

  - name: "Write to Astra"
    type: "vector-db-sink"
    input: "chunks-topic"
    configuration:
      datasource: "AstraDatasource"
      collection-name: "documents"
      fields:
        - name: "id"
          expression: "fn:concat(value.filename, '-', value.chunk_id)"
        - name: "vector"
          expression: "value.embeddings_vector"
        - name: "text"
          expression: "value.text"
        - name: "filename"
          expression: "value.filename"
        - name: "chunk_id"
          expression: "value.chunk_id"

@eolivelli eolivelli changed the title [agents] Add support for AstraDB Document API [agents] Add support for AstraDB Collections Nov 21, 2023
@eolivelli eolivelli marked this pull request as ready for review November 23, 2023 08:29
@eolivelli eolivelli changed the title [agents] Add support for AstraDB Collections [agents] Add support for AstraDB Collections (Astra Vector DB, using Stargate) Nov 23, 2023
@eolivelli eolivelli merged commit 23f9b6d into main Nov 23, 2023
10 checks passed
@eolivelli eolivelli deleted the impl/astra-db-json-api branch November 23, 2023 14:30
benfrank241 pushed a commit to vectorize-io/langstream that referenced this pull request May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant