Skip to content

Conversation

@nithishr
Copy link
Contributor

@nithishr nithishr commented Oct 22, 2025

Sync all the GSI Vector Search examples from forked repo

prajwal-pai77 and others added 27 commits August 26, 2025 14:39
* update: make dirs for fts and gsi

* update: fts tutorial's dependencies

* updated: intro and frontmatter to explicitly mention fts or gsi

* updated: frontmatter for gsi

* added: notebook for gsi tutorial for huggingface

* update: execution results in the fts and gsi tutorials

* added: env sample files

* update: spelling mistake in frontmatter and path

* updated: intro links to point to dev portal instead

* update: gsi frontmatter path

* update: based on comments by Prajwal

- restructure hugging_face notebook with improved section headings
- detailed explanations for GSI vector search and embedding processes
- perfomance comparision
- changed score to distance

* update: according to Nithish's comments

- Added more explanatrion for Composite Index
- Updated performance comparison
- corrected version info in gsi tutorial

---------

Co-authored-by: Viraj Agarwal <virajagarwal15@gmail.com>
* update: made dirs for fts and gsi

* update: dependencies of fts tutorial

* update: frontmatter for fts

* update: intro of fts tutorial

* add: tutorial and frontmatter for gsi

* update: execution results for fts and gsi tutorials

* update: frontmatter paths for gsi and fts

* update: link to gsi tutorial in fts to devportal link

* update: comments by Prajwal

- link to fts tutorial in gsi to devportal link
- removed INDEX_NAME env var
- removed Query Service check since, it isn't confirmed if it is wokring as expected
- rmeoved setup task of creating a primary index
- removed vector dimension config since it is figured out by langchain integration
- added comment to make Composite index instead
- removed some false advantages mentioned in markdowns of Bhive

* update: gsi tutorial

- Improved section headings for better navigation and understanding.
- Added performance comparision of crew ai agent with and without gsi index.
- Added detailed explanations for setup, prerequisites, and GSI vector search..
- Streamlined content for readability and coherence.

* update: gsi tutorial

- performance testing to be simpler and without using crewai, just pure rag
- mentioned composite index much more explicitily

---------

Co-authored-by: Viraj Agarwal <virajagarwal15@gmail.com>
* update: made dirs for fts and gsi

* update: forntmatter of gsi and fts tutorial

* update: fts tutorial link and execution results

* added: env sample file and gsi tutorial

* update: gsi tutorial

- added perf comparision before and after gsi and cache
- fixed sections order and organization
- score -> distance

* update: GSI tutorial

- simplified performance testing
- added explicit composite index sub-heading

* update: based on Prajwal's comments

- Added cache setup
- removed comment fo creating primary index
- Revised user queries and responses for better rag performance

---------

Co-authored-by: Viraj Agarwal <virajagarwal15@gmail.com>
* update: made dirs for fts and gsi

* update: dependencies and into of fts tutorial

* update: frontmatter of fts tutorial

* fix: spelling mistake in fts frontmatter

* add: frontmatter and tutorial for gsi

* update: execution results in tutorial of fts and gsi

* update: frontmatter paths for gsi and fts

* update: standard changes in fts and gsi tutorials

- link to fts tutorial in gsi to devportal link
- added comment to make Composite index instead

* update: gsi tutorial

- Refined markdown sections for clarity and organization
- Updated score -> distance
- Comparision of performance before and after bhive

* update: enhance gsi tutorial

- simplify performance testing
- explicit composite index creation sub-heading

---------

Co-authored-by: Viraj Agarwal <virajagarwal15@gmail.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @nithishr, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the vector search cookbook by integrating comprehensive examples for Couchbase's Global Secondary Index (GSI) vector search. The changes involve a systematic reorganization of existing Full Text Search (FTS) examples into dedicated subdirectories and the creation of new GSI-specific examples across various AI providers. This initiative aims to provide users with a broader spectrum of options for implementing high-performance semantic search, leveraging the distinct advantages of GSI for vector-first workloads. The updated documentation clarifies the different approaches and streamlines the setup process, ensuring a more intuitive and efficient user experience.

Highlights

  • Introduction of GSI Vector Search Examples: This pull request significantly expands the vector search cookbook by adding new examples that demonstrate Couchbase's Global Secondary Index (GSI) vector search capabilities across various AI providers. This provides users with an alternative, often more performant, approach compared to Full Text Search (FTS) for vector-first workloads.
  • Restructuring of Existing Examples: All existing Full Text Search (FTS) examples have been reorganized and moved into dedicated fts/ subdirectories within their respective AI provider folders (e.g., awsbedrock/fts/, azure/fts/). This clear separation helps users easily distinguish between FTS and GSI implementations.
  • New GSI-Specific Examples and Documentation: New gsi/ subdirectories have been created for each AI provider (AWS Bedrock, Azure OpenAI, Anthropic Claude, Cohere, Hugging Face, Jina AI, OpenRouter Deepseek), containing .env.sample, frontmatter.md, and .ipynb files tailored for GSI vector search. These new examples guide users through setting up and utilizing GSI for vector similarity searches.
  • Updated README.md for Clarity: The main README.md file has been updated to clearly explain the two distinct vector search approaches (FTS and GSI), providing guidance on when to use each and detailing the setup process for both. This improves the overall clarity and usability of the cookbook.
  • Dependency and LLM Model Updates: Several Jupyter notebooks have received updates to their pip install commands, ensuring compatibility with newer package versions. Additionally, specific LLM models, such as Claude and Cohere, have been updated to their latest versions within their respective FTS examples.
  • Removal of CrewAI Short-Term Memory Example: The crewai-short-term-memory directory and its associated files (README.md, couchbase_storage.py, .env.sample, crew_index.json, frontmatter.md) have been removed from the repository. The crewai example has been restructured to include FTS and GSI subdirectories.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds examples for GSI vector search to the repository. The changes include modifications to the README.md file, renaming several files to include 'fts' to indicate Full Text Search, and adding new files and directories for GSI examples. The review comments address potential issues related to code clarity and correctness.

Comment on lines 32 to +40
```

### 2. Set up the Couchbase Vector Search Index:
### 2. Choose Your Approach:

#### For FTS (Full Text Search) Examples:
Use the provided `{model}_index.json` index definition file in each model's `fts/` directory to create a new vector search index in your Couchbase cluster.

Use the provided `{model}_index.json` index definition file in each model's directory to create a new index in your Couchbase cluster.
The index supports separate properties for each embedding model.
#### For GSI (Global Secondary Index) Examples:
No additional setup required. GSI index will be created in each model's example.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It might be useful to provide a brief explanation of what FTS and GSI are, and why someone might choose one over the other. This will help users understand the context of the examples better.

For example, you could mention that FTS is good for full-text search capabilities, while GSI is better for more structured queries and aggregations. Also, it would be helpful to mention that GSI requires Couchbase 8.0+.

Also, consider rephrasing "Choose Your Approach" to something more descriptive like "Select Search Index Type".

Comment on lines +799 to 800
" llm = ChatAnthropic(temperature=0.1, anthropic_api_key=ANTHROPIC_API_KEY, model_name='claude-sonnet-4-20250514') \n",
" logging.info(\"Successfully created ChatAnthropic\")\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The model name claude-sonnet-4-20250514 includes a date. It's generally better to use a model name without a date if possible, to avoid the code breaking in the future when the model is updated. If there is not a non-dated model name, then this is fine.

Comment on lines +785 to 786
" model=\"command-a-03-2025\",\n",
" temperature=0\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The model name command-a-03-2025 includes a date. It's generally better to use a model name without a date if possible, to avoid the code breaking in the future when the model is updated. If there is not a non-dated model name, then this is fine.

Comment on lines +116 to +121
"couchbase_cluster_url = os.getenv('CB_CLUSTER_URL') or input(\"Couchbase Cluster URL:\")\n",
"couchbase_username = os.getenv('CB_USERNAME') or input(\"Couchbase Username:\")\n",
"couchbase_password = os.getenv('CB_PASSWORD') or getpass.getpass(\"Couchbase password:\")\n",
"couchbase_bucket = os.getenv('CB_BUCKET') or input(\"Couchbase Bucket:\")\n",
"couchbase_scope = os.getenv('CB_SCOPE') or input(\"Couchbase Scope:\")\n",
"couchbase_collection = os.getenv('CB_COLLECTION') or input(\"Couchbase Collection:\")"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider loading the environment variables before taking user input. This way, if the user has already configured the environment variables, they won't be prompted to enter them again.

   # Load environment variables
   load_dotenv("./.env")
   
   # Configuration
   couchbase_cluster_url = os.getenv('CB_CLUSTER_URL') or input("Couchbase Cluster URL:")
   couchbase_username = os.getenv('CB_USERNAME') or input("Couchbase Username:")
   couchbase_password = os.getenv('CB_PASSWORD') or getpass.getpass("Couchbase password:")
   couchbase_bucket = os.getenv('CB_BUCKET') or input("Couchbase Bucket:")
   couchbase_scope = os.getenv('CB_SCOPE') or input("Couchbase Scope:")
   couchbase_collection = os.getenv('CB_COLLECTION') or input("Couchbase Collection:")

@nithishr
Copy link
Contributor Author

The detect notebook changes workflow failed due to the code not being in the same repo. Something to address for later.
Merging this PR to test out the publishing workflow on the staging environment.
All the examples should have been reviewed in the forked PR.

@nithishr nithishr merged commit 076d277 into couchbase-examples:main Oct 22, 2025
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants