DA 1153 Autovec Unstructured Data #57

giriraj-singh-couchbase · 2025-09-26T19:37:49Z

This pull request introduces a new tutorial for using Couchbase Capella's AI Services auto-vectorization feature with LangChain, focusing on unstructured data workflows—especially data stored in S3 buckets. The changes add comprehensive documentation and a runnable Jupyter notebook that walks users through deploying models, configuring workflows, importing unstructured data, and performing semantic vector search with LangChain.

The most important changes are:

Documentation and Tutorial Content:

Added a detailed README.md explaining prerequisites, installation steps, and a quick start guide for the auto-vectorization tutorial.
Added frontmatter.md to provide metadata and summary information for the tutorial, including title, description, tags, and estimated duration.

Jupyter Notebook Tutorial:

Introduced autovec_unstructured.ipynb, a step-by-step notebook covering:
- Capella cluster and model deployment.
- Setting up access control and API keys for AI models.
- Creating and configuring auto-vectorization workflows to import and process unstructured data from S3.
- Performing semantic search using vector embeddings and LangChain integration, with code samples and explanations.

github-actions · 2025-09-26T19:37:59Z

Caution

Notebooks or Frontmatter Files Have Been Modified

Please ensure that a frontmatter.md file is accompanying the notebook file, and that the frontmatter is up to date.
These changes will be published to the developer portal tutorials only if frontmatter.md is included.
Proofread all changes before merging, as changes to notebook and frontmatter content will update the developer tutorial.

1 Notebook Files Modified:

Notebook File	Frontmatter Included?
`autovec_unstructured/autovec_unstructured.ipynb`	✅

1 Frontmatter Files Modified:

Frontmatter File
`autovec_unstructured/frontmatter.md`
Note: frontmatter will be checked and tested in the Test Frontmatter* workflow.*

gemini-code-assist · 2025-09-26T19:38:00Z

Summary of Changes

Hello @giriraj-singh-couchbase, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a new, comprehensive tutorial designed to educate users on leveraging Couchbase Capella's AI Services for auto-vectorization of unstructured data sourced from S3 buckets. The tutorial provides a practical, step-by-step guide, including a Jupyter notebook, to set up and execute workflows that transform raw data into vector embeddings, enabling advanced semantic search capabilities through LangChain integration. The primary goal is to empower users to efficiently manage and query unstructured data within the Capella ecosystem.

Highlights

New Tutorial Introduction: This pull request introduces a new tutorial demonstrating Couchbase Capella's AI Services auto-vectorization feature for unstructured data, specifically focusing on data stored in S3 buckets.
LangChain Integration: The tutorial showcases how to integrate with LangChain for performing semantic vector search on the auto-vectorized data.
Comprehensive Documentation: A detailed README.md and frontmatter.md have been added to provide prerequisites, installation steps, and metadata for the tutorial.
Interactive Jupyter Notebook: A runnable Jupyter notebook (autovec_unstructured.ipynb) is included, guiding users through Capella cluster and model deployment, access control setup, configuring auto-vectorization workflows for S3 data, and executing semantic search.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new tutorial for using Couchbase Capella's AI Services auto-vectorization feature. While the tutorial is comprehensive, there are several areas that need improvement. Critically, the Jupyter notebook references images that are not included in the pull request, which will prevent users from following the visual steps. There are also significant structural issues, such as incorrect section numbering and confusing instructions that reference incorrect data sources. Additionally, there are opportunities to improve code quality by removing unused imports, using environment variables for credentials to promote security best practices, and fixing minor typos and grammatical errors. Addressing these points will greatly improve the quality and usability of the tutorial.

autovec_unstructured/autovec_unstructured.ipynb

autovec_unstructured/README.md

autovec_unstructured/autovec_unstructured.ipynb

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

nithishr · 2025-10-02T14:07:33Z

autovec_unstructured/frontmatter.md

+title: Auto-Vectorization with Couchbase Capella AI Services and LangChain
+short_title: Auto-Vectorization with Couchbase and LangChain
+description:
+  - Learn how to use Couchbase Capella's AI Services auto-vectorization feature to automatically convert your data into vector embeddings.


convert your unstructured data into vector embeddings

nithishr · 2025-10-02T14:14:38Z

autovec_unstructured/autovec_unstructured.ipynb

Looks good for the most part.
Couple of questions/suggestions:

We should not show the example with TLS disabled. That is insecure & something most end users will not see as the Production clusters will not require this (SDK bundles the certs for Prod clusters)

Can you try using the OpenAI LangChain package instead of NVidia as that is what we recommend end users to use? You would need to set a few parameters to make it work but it should work. Unless there is some documentation around using Nvidia over OpenAI that I have missed. You can find examples on using OpenAI package in the Capella AI notebooks.

Can you also use a better search term? The current example looks a lot like FTS instead of semantic search. We want to show the power of Semantic Search.

giriraj-singh-couchbase added 4 commits September 26, 2025 05:07

document updated

7088c3e

updated the document

eec1a72

fixed some issues

fa4de94

fixed text formatting

84832ae

gemini-code-assist bot reviewed Sep 26, 2025

View reviewed changes

giriraj-singh-couchbase and others added 2 commits September 27, 2025 01:20

Apply suggestions from code review

c5d565f

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

fixed index name

055b78f

nithishr reviewed Oct 2, 2025

View reviewed changes

nithishr mentioned this pull request Oct 2, 2025

Autovectorization Tutorial #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DA 1153 Autovec Unstructured Data #57

DA 1153 Autovec Unstructured Data #57

Uh oh!

giriraj-singh-couchbase commented Sep 26, 2025

Uh oh!

github-actions bot commented Sep 26, 2025 •

edited

Loading

Notebooks or Frontmatter Files Have Been Modified

Uh oh!

gemini-code-assist bot commented Sep 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nithishr Oct 2, 2025

Uh oh!

nithishr Oct 2, 2025

Uh oh!

Uh oh!

DA 1153 Autovec Unstructured Data #57

Are you sure you want to change the base?

DA 1153 Autovec Unstructured Data #57

Uh oh!

Conversation

giriraj-singh-couchbase commented Sep 26, 2025

Uh oh!

github-actions bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notebooks or Frontmatter Files Have Been Modified

1 Notebook Files Modified:

1 Frontmatter Files Modified:

Uh oh!

gemini-code-assist bot commented Sep 26, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nithishr Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

nithishr Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Sep 26, 2025 •

edited

Loading