GitHub - couchbaselabs/query-vector-search-demo

Movie Search using Couchbase Query Service

This is a demo app built to perform hybrid search using the Vector Search capabilities of Couchbase Query & Index Service. The demo allows users to search for movies based on the synopsis or overview of the movie using the LangChain Query Vector Store integration.

Note that you need Couchbase Server 8.0 or higher for Vector Search.

How does it work?

You can perform semantic searches for movies based on the plot synopsis. Additionally, you can filter the results based on the year of release and the IMDB rating for the movie. Optionally, you can also search for the keyword in the movie title.

The hybrid search can be performed using both the Couchbase Python SDK & the LangChain Vector Store integration for Couchbase. Here we show how to use the LangChain integration. We use OpenAI for generating the embeddings.

How to Run

Install dependencies

pip install -r requirements.txt

Set the environment secrets

Copy the secrets.example.toml file and rename it to secrets.toml and replace the placeholders with the actual values for your environment.

For the ingestion script, the same environment variables need to be set in the environment (using .env file from .env.example) as it runs outside the Streamlit environment.

OPENAI_API_KEY = "<open_ai_api_key>"
DB_CONN_STR = "<connection_string_for_couchbase_cluster>"
DB_USERNAME = "<username_for_couchbase_cluster>"
DB_PASSWORD = "<password_for_couchbase_cluster>"
DB_BUCKET = "<name_of_bucket_to_store_documents>"
DB_SCOPE = "<name_of_scope_to_store_documents>"
DB_COLLECTION = "<name_of_collection_to_store_documents>"
INDEX_NAME = "<name_of_search_index_with_vector_support>"
EMBEDDING_MODEL = "text-embedding-3-small" # OpenAI embedding model to use to encode the documents

Ingest the Documents

For this demo, we are using the IMDB dataset from Kaggle. You can download the CSV file, imdb_top_1000.csv to the source folder or use the one provided in the repo.

To ingest the documents including generating the embeddings for the Overview field, you can run the script, ingest.py

python ingest.py
Run the application

streamlit run movies_search.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.streamlit		.streamlit
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
imdb_top_1000.csv		imdb_top_1000.csv
ingest.py		ingest.py
movie_search_demo.png		movie_search_demo.png
movies_search.py		movies_search.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Movie Search using Couchbase Query Service

How does it work?

How to Run

Install dependencies

Set the environment secrets

Ingest the Documents

Run the application

About

Uh oh!

Releases

Packages

Languages

License

couchbaselabs/query-vector-search-demo

Folders and files

Latest commit

History

Repository files navigation

Movie Search using Couchbase Query Service

How does it work?

How to Run

Install dependencies

Set the environment secrets

Ingest the Documents

Run the application

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages