# LangChain ReAct Agent with Couchbase via Model Context Protocol (MCP) - A Tutorial

This notebook demonstrates how to build a ReAct (Reasoning and Acting) agent using [LangChain](https://www.langchain.com/) and [LangGraph](https://www.langchain.com/langgraph) that can interact with a Couchbase database. The key to this interaction is the Model Context Protocol (MCP), which allows the AI agent to seamlessly connect to and use Couchbase as a tool. Read more about LangChain's ReAct agent [here](https://python.langchain.com/docs/how_to/agent_executor/).

## What is the Model Context Protocol (MCP)?

The [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) is an open standard designed to standardize how AI assistants and applications connect to and interact with external data sources, tools, and systems. Think of MCP as a universal adapter that allows AI models to seamlessly access the context they need to produce more relevant, accurate, and actionable responses.

**Key Goals and Features of MCP:**

*   **Standardized Communication:** MCP provides a common language and structure for AI models to communicate with diverse backend systems, replacing the need for numerous custom integrations.
*   **Enhanced Context Management:** It helps manage the limited context windows of LLMs efficiently, enabling them to maintain longer, more coherent interactions and leverage historical data.
*   **Secure Data Access:** MCP emphasizes secure connections, allowing developers to expose data through MCP servers while maintaining control over their infrastructure.
*   **Tool Use and Actionability:** It enables LLMs to not just retrieve information but also to use external tools and trigger actions in other systems.
*   **Interoperability:** Fosters an ecosystem where different AI tools, models, and data sources can work together more cohesively.

MCP aims to break down data silos, making it easier for AI to integrate with real-world applications and enterprise systems, leading to more powerful and context-aware AI solutions.

**MCP Typically Follows a Client-Server Architecture:**
*   **MCP Hosts/Clients:** Applications (like AI assistants, IDEs, or other AI-powered tools) that want to access data or capabilities. In this demo, this notebook, through LangChain, acts as an MCP client.
*   **MCP Servers:** Lightweight programs that expose specific data sources or tools (e.g., a database, an API) through the standardized MCP. The `mcp-server-couchbase` project fulfills this role for Couchbase.


# Before you start
## Get Credentials for OpenAI
Please follow the [instructions](https://platform.openai.com/docs/quickstart) to generate the OpenAI credentials.
## Create and Deploy Your Free Tier Operational cluster on Capella

To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint.

To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html).

### Couchbase Capella Configuration

When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met.

* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application.
* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running.
* Your Capella free-tier account includes a travel-sample bucket, with sample documents used for booking and travel purposes. You can find more information [here](https://docs.couchbase.com/cloud/get-started/run-first-queries.html).

## Setup Instructions

Before running this notebook, ensure you have the following prerequisites met:

*   **Set Environment Variables:** This notebook loads the OpenAI API key and other environment variables from the `.env` file. Include the following:

    ```
    OPENAI_API_KEY=your_openai_api_key_here
    CB_CONNECTION_STRING=your_couchbase_connection_string
    CB_USERNAME=your_couchbase_username
    CB_PASSWORD=your_couchbase_password
    CB_BUCKET_NAME=your_target_bucket # e.g., travel-sample
    ```

    We have already included a `.env.sample` file. Change the file name to `.env` and fill in the environment variables.
*   **Setup uv:** uv is a modern and fast python package and project manager. We will use uv to run the MCP server. Install uv from [here](https://docs.astral.sh/uv/getting-started/installation/#installing-uv).
*   **Python Libraries:** Install the necessary libraries by running the code cell below.

In [1]:
%pip install -q 'langchain>=1.2.10' 'langgraph>=1.0.9' 'langchain-openai>=1.1.10' 'langchain-mcp-adapters>=0.2.1' 'python-dotenv>=1.2.1'

Note: you may need to restart the kernel to use updated packages.


## Importing Necessary Libraries

This cell imports the essential Python tools for our project:

*   **`dotenv` & `os`:** For loading and using secret API keys and other settings from a `.env` file.
*   **`mcp` (ClientSession, StdioServerParameters, stdio_client):** For connecting this notebook (as a client) to the MCP server, which in turn talks to Couchbase.
*   **`langchain_mcp_adapters.tools` (`load_mcp_tools`):** To make the Couchbase tools (exposed via MCP) usable by our LangChain AI agent.
*   **`langchain.agents` (`create_agent`):** To easily build a "ReAct" AI agent that can think and use tools.
*   **`langgraph.checkpoint.memory` (`InMemorySaver`):** To help the agent remember past parts of the conversation.
*   **`langchain_openai` (`ChatOpenAI`):** To connect to and use OpenAI's language models (like GPT-5).

Running this cell makes all these components ready to use.

In [2]:
from dotenv import load_dotenv
import os

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

from langchain_mcp_adapters.tools import load_mcp_tools
from langchain.agents import create_agent
from langgraph.checkpoint.memory import InMemorySaver
from langchain_openai import ChatOpenAI

load_dotenv()

True

## Defining the Question-Answering Function

This cell defines an asynchronous function `qna(agent)` that we'll use to interact with our ReAct agent.

*   It takes the created `agent` as an argument.
*   `config = {"configurable": {"thread_id": "1"}}`: This configuration is important for LangGraph agents. It uses a `thread_id` to maintain conversation state. Using the same `thread_id` across multiple calls to the agent allows it to remember previous interactions in that "thread."
*   The function then defines a series of example questions (`message`) which we want to ask the agent. The agent queries the Couchbase MCP to get travel related data, formats it and presents it to the user.
*   This function allows us to easily test the agent with multiple queries in sequence.

In [3]:
async def qna(agent):
    config = {"configurable": {"thread_id": "1"}}

    message = "Tell me about the database that you are connected to."
    print(f"\n\n**Running:** {message}\n")
    result = await agent.ainvoke({"messages": message}, config)
    print(result["messages"][-1].content)
    print('-'*50)

    message = "List out the top 5 hotels by the highest aggregate rating?"
    print(f"\n\n**Running**: {message}\n")
    result = await agent.ainvoke({"messages": message}, config)
    print(result["messages"][-1].content)
    print('-'*50)

    message = "Recommend me a flight and hotel from New York to San Francisco"
    print(f"\n\n**Running**: {message}\n")
    result = await agent.ainvoke({"messages": message}, config)
    print(result["messages"][-1].content)
    print('-'*50)

    message = "I'm going to the UK for 1 week. Recommend some great spots to visit for sightseeing. Also mention the respective prices of those places for adults and kids."
    print(f"\n\n**Running**: {message}\n")
    result = await agent.ainvoke({"messages": message}, config)
    print(result["messages"][-1].content)
    print('-'*50)

    message = "My budget is around 30 pounds a night. What will be the best hotel to stay in?"
    print(f"\n\n**Running**: {message}\n")
    result = await agent.ainvoke({"messages": message}, config)
    print(result["messages"][-1].content)
    print('-'*50)

## Defining the System Prompt

The system prompt is a crucial piece of instruction given to the Large Language Model (LLM) that powers our agent. It sets the context, defines the agent's persona, capabilities, and constraints.

In this system prompt:
*   We explain the **Couchbase data hierarchy** (Cluster, Bucket, Scope, Collection, Document) to help the LLM understand how the data is organized.
*   We specifically instruct the agent that **"The data is inside `inventory` scope, so use only that scope."** This focuses the agent on the relevant part of the `travel-sample` database.
*   We provide **SQL++ query generation guidelines**:
    *   "Any query you generate needs to have only the collection name in the FROM clause."
    *   "Every field, collection, scope or bucket name inside the query should be inside backticks."
*   The overall goal is to guide the LLM to use the provided MCP tools (which will be Couchbase operations) effectively and to formulate correct SQL++ queries for the `inventory` scope.

A well-crafted system prompt significantly improves the agent's performance and reliability.

In [4]:
system_prompt = """Couchbase organizes data with the following hierarchy (from top to bottom):
1. Cluster: The overall container of all Couchbase data and services.
2. Bucket: A bucket is similar to a database in traditional systems. Each bucket contains multiple scopes. Example: "users", "analytics", "products"
3. Scope: A scope is a namespace within a bucket that groups collections. Scopes help isolate data for different microservices or tenants. Default scope name: _default
4. Collection: The equivalent of a table in relational databases. Collections store JSON documents. Default collection name: _default
5. Document: The atomic data unit (usually JSON) stored in a collection. Each document has a unique key within its collection.IMPORTANT SQL++ Query Rules:

- Use the tools to read the database and answer questions based on this database
- The data is inside `inventory` scope, so use only that scope
- Use only the collection name in the FROM clause (e.g., FROM `hotel`)
- Collection names and top-level field names should be in backticks
- For nested fields, use dot notation WITHOUT backticks around each part

  CORRECT: `hotel`.reviews[0].ratings.Overall
  WRONG: `hotel`.`reviews`.`ratings`.`Overall`

- When accessing nested objects or arrays, use bracket notation or dot notation directly

Examples:
- hotel.reviews[0].author
- hotel.address.city (note: address is a single object, not an array)Hotel Document Structure:
- address: Object with fields like {city, country, address, state, county}
- reviews: Array of review objects with ratings and content
- To filter by city: WHERE address.city = "San Francisco"
- Do NOT use "addresses" (plural) - the field is "address" (singular)ARRAY Operations in SQL++:
- To aggregate data from arrays (like reviews), use UNNEST to flatten the array first
- CORRECT way to sum array values:

  SELECT h.name, SUM(r.ratings.Overall) as total_rating
  FROM `hotel` h
  UNNEST h.reviews r
  GROUP BY h.name
  ORDER BY total_rating DESC- WRONG ways (these will cause parser errors):
  x SELECT name, SUM(ARRAY_SUM(ARRAY reviews[*].ratings.Overall FOR reviews IN...))
  x SELECT name, ARRAY reviews[*].ratings.Overall FOR reviews...
  x WHERE ANY a IN addresses SATISFIES... (wrong field name)- Use UNNEST whenever you need to work with individual array elements in aggregations"""

## Configuring the Language Model and MCP Server

This cell sets up two key components:

1.  **`model = ChatOpenAI(model="gpt-5.2")`**: This line initializes the LLM we'll be using. We're choosing `gpt-5.2` from OpenAI.
2.  **`server_params = StdioServerParameters(...)`**:
    *   This configures how our Python script will start and communicate with the `mcp-server-couchbase` application.
    *   `command="uvx"` and `args=["couchbase-mcp-server"]`: This uses `uvx` (from the `uv` tool) to automatically fetch and run the published `couchbase-mcp-server` package. No need to clone the repository — the package is installed and executed on the fly.
    *   `env={...}`: This dictionary defines environment variables that will be passed to the MCP server process when it starts. These are crucial for the MCP server to connect to your Couchbase instance:

In [5]:
model = ChatOpenAI(model="gpt-5.2")

server_params = StdioServerParameters(
    command="uvx",
    args=["couchbase-mcp-server"],
    env={
        "CB_CONNECTION_STRING": os.getenv("CB_CONNECTION_STRING"),
        "CB_USERNAME": os.getenv("CB_USERNAME"),
        "CB_PASSWORD": os.getenv("CB_PASSWORD"),
        "CB_BUCKET_NAME": os.getenv("CB_BUCKET_NAME")
    }
)

## Defining the Main Execution Logic

The `main` function ties everything together to set up and run our agent:

1.  **Start & Connect to MCP Server:** It first starts the `mcp-server-couchbase` process using `stdio_client` and establishes a communication `ClientSession` with it.
2.  **Initialize Session & Load Tools:** The MCP `session` is initialized. Then, `load_mcp_tools` queries the MCP server to get the available Couchbase tools and prepares them for LangChain.
3.  **Set Up Agent Memory:** `InMemorySaver` is created to allow the agent to remember conversation history.
4.  **Create ReAct Agent:** The `create_agent` function builds our AI agent, providing it with the language `model`, the Couchbase `tools`, our `system_prompt`, and the `checkpoint` for memory.
5.  **Run Q&A:** Finally, it calls the `qna` function, passing the created `agent` to start the question-and-answer process with the database.

In [6]:
async def main():
    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            # Initialize the connection
            print("Initializing connection...")
            await session.initialize()

            # Get tools
            print("Loading tools...")
            tools = await load_mcp_tools(session)

            # Create and run the agent
            print("Creating agent...")
            checkpoint = InMemorySaver()

            agent = create_agent(
                model, 
                tools,
                system_prompt=system_prompt,
                checkpointer=checkpoint
            )

            print("-"*25, "Starting Run", "-"*25)
            await qna(agent)

## Running the Agent

This final cell simply executes the `await main()` function.

When you run this cell:
1.  The `mcp-server-couchbase` process will be started in the background.
2.  The Python script will connect to it as an MCP client.
3.  The LangChain ReAct agent will be initialized with the Couchbase tools exposed via MCP.
4.  The agent will then attempt to answer the series of questions defined in the `qna` function by:
    *   Reasoning about the question.
    *   Deciding if a Couchbase tool is needed.
    *   Formulating a SQL++ query (if appropriate, based on the system prompt).
    *   Executing the tool (which sends the query to the MCP server, which then runs it on Couchbase).
    *   Using the tool's output to generate a natural language response.

You will see the questions and the agent's answers printed below. This demonstrates the end-to-end flow of a natural language query being translated into database actions and then into a user-friendly response, all orchestrated by the LangChain agent using MCP.

In [7]:
await main()

Initializing connection...
Loading tools...
Creating agent...
------------------------- Starting Run -------------------------


**Running:** Tell me about the database that you are connected to.

You’re connected to a **Couchbase** cluster with the **travel-sample** bucket available and connected.

## What I can see in this database

### Cluster / services
The cluster is reachable and healthy, with these Couchbase services responding (latency shown in microseconds by the ping tool):
- **KV (Data)**: ok (~696 µs)
- **Query (SQL++)**: ok (~1468 µs)
- **Analytics**: ok (~2155 µs)
- **Search (FTS)**: ok (~1847 µs)
- **Views**: ok (~2178 µs)
- **Management**: ok (~2217 µs)

### Bucket
- **Bucket name:** `travel-sample`
- This is Couchbase’s sample dataset commonly used for demos/tutorials (travel-related data: hotels, airports, routes, etc.).

### Scopes and collections in `travel-sample`
Within the bucket, these scopes/collections exist:

- **Scope:** `inventory`  
  Collections:
  - `air