-
Notifications
You must be signed in to change notification settings - Fork 8
Add GSI vector search examples #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add support for gsi aws bedrock example
Feature/add claudeai
Feature/add azure open ai
Feature/add cohere
Feature/add deepseek openrouter
* update: make dirs for fts and gsi * update: fts tutorial's dependencies * updated: intro and frontmatter to explicitly mention fts or gsi * updated: frontmatter for gsi * added: notebook for gsi tutorial for huggingface * update: execution results in the fts and gsi tutorials * added: env sample files * update: spelling mistake in frontmatter and path * updated: intro links to point to dev portal instead * update: gsi frontmatter path * update: based on comments by Prajwal - restructure hugging_face notebook with improved section headings - detailed explanations for GSI vector search and embedding processes - perfomance comparision - changed score to distance * update: according to Nithish's comments - Added more explanatrion for Composite Index - Updated performance comparison - corrected version info in gsi tutorial --------- Co-authored-by: Viraj Agarwal <virajagarwal15@gmail.com>
* update: made dirs for fts and gsi * update: dependencies of fts tutorial * update: frontmatter for fts * update: intro of fts tutorial * add: tutorial and frontmatter for gsi * update: execution results for fts and gsi tutorials * update: frontmatter paths for gsi and fts * update: link to gsi tutorial in fts to devportal link * update: comments by Prajwal - link to fts tutorial in gsi to devportal link - removed INDEX_NAME env var - removed Query Service check since, it isn't confirmed if it is wokring as expected - rmeoved setup task of creating a primary index - removed vector dimension config since it is figured out by langchain integration - added comment to make Composite index instead - removed some false advantages mentioned in markdowns of Bhive * update: gsi tutorial - Improved section headings for better navigation and understanding. - Added performance comparision of crew ai agent with and without gsi index. - Added detailed explanations for setup, prerequisites, and GSI vector search.. - Streamlined content for readability and coherence. * update: gsi tutorial - performance testing to be simpler and without using crewai, just pure rag - mentioned composite index much more explicitily --------- Co-authored-by: Viraj Agarwal <virajagarwal15@gmail.com>
* update: made dirs for fts and gsi * update: forntmatter of gsi and fts tutorial * update: fts tutorial link and execution results * added: env sample file and gsi tutorial * update: gsi tutorial - added perf comparision before and after gsi and cache - fixed sections order and organization - score -> distance * update: GSI tutorial - simplified performance testing - added explicit composite index sub-heading * update: based on Prajwal's comments - Added cache setup - removed comment fo creating primary index - Revised user queries and responses for better rag performance --------- Co-authored-by: Viraj Agarwal <virajagarwal15@gmail.com>
* update: made dirs for fts and gsi * update: dependencies and into of fts tutorial * update: frontmatter of fts tutorial * fix: spelling mistake in fts frontmatter * add: frontmatter and tutorial for gsi * update: execution results in tutorial of fts and gsi * update: frontmatter paths for gsi and fts * update: standard changes in fts and gsi tutorials - link to fts tutorial in gsi to devportal link - added comment to make Composite index instead * update: gsi tutorial - Refined markdown sections for clarity and organization - Updated score -> distance - Comparision of performance before and after bhive * update: enhance gsi tutorial - simplify performance testing - explicit composite index creation sub-heading --------- Co-authored-by: Viraj Agarwal <virajagarwal15@gmail.com>
Summary of ChangesHello @nithishr, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant enhancement to the vector search cookbook by integrating comprehensive examples for Couchbase's Global Secondary Index (GSI) vector search. The changes involve a systematic reorganization of existing Full Text Search (FTS) examples into dedicated subdirectories and the creation of new GSI-specific examples across various AI providers. This initiative aims to provide users with a broader spectrum of options for implementing high-performance semantic search, leveraging the distinct advantages of GSI for vector-first workloads. The updated documentation clarifies the different approaches and streamlines the setup process, ensuring a more intuitive and efficient user experience. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds examples for GSI vector search to the repository. The changes include modifications to the README.md file, renaming several files to include 'fts' to indicate Full Text Search, and adding new files and directories for GSI examples. The review comments address potential issues related to code clarity and correctness.
| ``` | ||
|
|
||
| ### 2. Set up the Couchbase Vector Search Index: | ||
| ### 2. Choose Your Approach: | ||
|
|
||
| #### For FTS (Full Text Search) Examples: | ||
| Use the provided `{model}_index.json` index definition file in each model's `fts/` directory to create a new vector search index in your Couchbase cluster. | ||
|
|
||
| Use the provided `{model}_index.json` index definition file in each model's directory to create a new index in your Couchbase cluster. | ||
| The index supports separate properties for each embedding model. | ||
| #### For GSI (Global Secondary Index) Examples: | ||
| No additional setup required. GSI index will be created in each model's example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful to provide a brief explanation of what FTS and GSI are, and why someone might choose one over the other. This will help users understand the context of the examples better.
For example, you could mention that FTS is good for full-text search capabilities, while GSI is better for more structured queries and aggregations. Also, it would be helpful to mention that GSI requires Couchbase 8.0+.
Also, consider rephrasing "Choose Your Approach" to something more descriptive like "Select Search Index Type".
| " llm = ChatAnthropic(temperature=0.1, anthropic_api_key=ANTHROPIC_API_KEY, model_name='claude-sonnet-4-20250514') \n", | ||
| " logging.info(\"Successfully created ChatAnthropic\")\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| " model=\"command-a-03-2025\",\n", | ||
| " temperature=0\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "couchbase_cluster_url = os.getenv('CB_CLUSTER_URL') or input(\"Couchbase Cluster URL:\")\n", | ||
| "couchbase_username = os.getenv('CB_USERNAME') or input(\"Couchbase Username:\")\n", | ||
| "couchbase_password = os.getenv('CB_PASSWORD') or getpass.getpass(\"Couchbase password:\")\n", | ||
| "couchbase_bucket = os.getenv('CB_BUCKET') or input(\"Couchbase Bucket:\")\n", | ||
| "couchbase_scope = os.getenv('CB_SCOPE') or input(\"Couchbase Scope:\")\n", | ||
| "couchbase_collection = os.getenv('CB_COLLECTION') or input(\"Couchbase Collection:\")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider loading the environment variables before taking user input. This way, if the user has already configured the environment variables, they won't be prompted to enter them again.
# Load environment variables
load_dotenv("./.env")
# Configuration
couchbase_cluster_url = os.getenv('CB_CLUSTER_URL') or input("Couchbase Cluster URL:")
couchbase_username = os.getenv('CB_USERNAME') or input("Couchbase Username:")
couchbase_password = os.getenv('CB_PASSWORD') or getpass.getpass("Couchbase password:")
couchbase_bucket = os.getenv('CB_BUCKET') or input("Couchbase Bucket:")
couchbase_scope = os.getenv('CB_SCOPE') or input("Couchbase Scope:")
couchbase_collection = os.getenv('CB_COLLECTION') or input("Couchbase Collection:")
|
The detect notebook changes workflow failed due to the code not being in the same repo. Something to address for later. |
Sync all the GSI Vector Search examples from forked repo