In [1]:
import os
from pymongo import MongoClient

# provide your own MONGODB_URI and OPENAI_API_KEY
MONGODB_URI = "write_your_own_mongodb_uri"
OPENAI_API_KEY = "write_your_own_openai_api_key"


# Build the agent's toolbox

In this code, we will create tools that an AI agent can use to autonomously execute all the steps in a text-to-query workflow.

Instead of writing our own Python functions to use as tools for the agent, we will use the MongoDB database toolkit (`MongoDBDatabaseToolkit`) which is part of MongoDB's Langchain integration (`langchain-mongodb`). The toolkit consists of pre-built tools for AI agents to interact with a MongoDB database.

The tools available in the toolkit are as follows:
* `mongodb_list_collections`: Lists the collections present in the database. Helps the text-to-query agent identify which collection to query to obtain information for a particular user query.
* `mongodb_schema`: Given a list of collections, outputs the schema and sample documents from the collection. Helps the agent identify what field names to use when generating the MongoDB queries.
* `mongodb_query_checker`: Validates the MongoDB query generated by the LLM.
* `mongodb_query`: Executes the MongoDB query.

In [2]:
# !pip install  --quiet langchain-openai==0.3.28 pymongo==4.13.2 langchain-mongodb==0.6.2

In addition to the database toolkit, the `langchain-mongodb` integration also provides a `MongoDBDatabase` class for easily accessing a MongoDB database.

Let's access the `sample_mflix` database using this class and use the resulting database object to initialize the MongoDB database toolkit.

We will use the `from_connection_string` method of the class to connect to your MongoDB cluster using the connection string.

**Create a LangChain-compatible MongoDB database instance using your `MONGODB_URI` and specifying the `"sample_mflix"` database.**

In [3]:
from langchain_mongodb.agent_toolkit.database import MongoDBDatabase

# Access the sample_mflix database using the MongoDBDatabase class
db = MongoDBDatabase.from_connection_string(connection_string=MONGODB_URI, database="sample_mflix")

**Initialize the MongoDB database toolkit and view the tools present in the toolkit using the `db` and `llm`.**

In [7]:
from langchain_mongodb.agent_toolkit.toolkit import MongoDBDatabaseToolkit
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    api_key=OPENAI_API_KEY
)

# Initialize the MongoDB database toolkit
toolkit = MongoDBDatabaseToolkit(db=db, llm=llm)

# Extract the tools from the toolkit
tools = toolkit.get_tools()

# See what each tool does
for tool in tools:
    print(tool.name, ":", tool.description, "\n")

mongodb_query : Input to this tool is a detailed and correct MongoDB query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', use mongodb_schema to query the correct collections fields. 

mongodb_schema : Input to this tool is a comma-separated list of collections, output is the schema and sample rows for those collections. Be sure that the collectionss actually exist by calling mongodb_list_collections first! Example Input: collection1, collection2, collection3 

mongodb_list_collections : Input is an empty string, output is a comma-separated list of collections in the database. 

mongodb_query_checker : Use this tool to double check if your query is correct before executing it. Always use this tool before executing a query with mongodb_query! 



Pay attention to what the inputs to each tool should be. You will need this to test the tools next.

### Test the tools

Next, let's test out each of the tools in the MongoDB database toolkit. This will help us understand what the inputs and outputs of each tool look like.

In our text-to-query agent, the inputs for the tool calls are generated by an LLM but for now, let's test the tools manually.

In [9]:
# Map tool names to tool objects
tool_map = {tool.name:tool for tool in tools}

In [11]:
# Test the mongodb_list_collections tool
print(tool_map["mongodb_list_collections"].invoke({}))

comments, embedded_movies, movies, sessions, theaters, users


In [12]:
# Test the mongodb_schema tool
print(tool_map["mongodb_schema"].invoke("theaters"))

Database name: sample_mflix
Collection name: theaters
Schema from a sample of documents from the collection:
_id: ObjectId
theaterId: Number
location.address.street1: String
location.address.city: String
location.address.state: String
location.address.zipcode: String
location.geo.type: String
location.geo.coordinates: Array<Number>

/*
3 documents from theaters collection:
[
  {
    "_id": {
      "$oid": "59a47286cfa9a3a73e51e72c"
    },
    "theaterId": 1000,
    "location": {
      "address": {
        "street1": "340 W Market",
        "city": "Bloomington",
        "state": "MN",
        "zipcode": "55425"
      },
      "geo": {
        "type": "Point",
        "coordinates": [
          -93.24565,
          44.85466
        ]
      }
    }
  },
  {
    "_id": {
      "$oid": "59a47286cfa9a3a73e51e72d"
    },
    "theaterId": 1003,
    "location": {
      "address": {
        "street1": "45235 Worth Ave.",
        "city": "California",
        "state": "MD",
        "zipcode": "2

In [13]:
import json

# Test the mongodb_query_checker tool
# text query: Top 5 directors by average IMDB rating, who have made at least 20 movies
query = [{"$unwind": "$directors"},
          {"$group": {"_id": "$directors", "filmCount": {"$sum": 1}, "avgRating": {"$avg": "$imdb.rating"}}},
          {"$match": {"filmCount": {"$gte": 20}}},
          {"$sort": {"avgRating": -1}},
          {"$limit": 5}]
print(tool_map["mongodb_query_checker"].invoke(json.dumps(query)))

content='```json\n[{"$unwind": "$directors"}, {"$group": {"_id": "$directors", "filmCount": {"$sum": 1}, "avgRating": {"$avg": "$imdb.rating"}}}, {"$match": {"filmCount": {"$gte": 20}}}, {"$sort": {"avgRating": -1}}, {"$limit": 5}]\n```' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 85, 'prompt_tokens': 178, 'total_tokens': 263, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_51db84afab', 'id': 'chatcmpl-CLEufzzV1OExuhy2jk6yIxn4w39fy', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None} id='run--14fe61d9-8427-4af9-8454-1a076e7df659-0' usage_metadata={'input_tokens': 178, 'output_tokens': 85, 'total_tokens': 263, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, '

In [14]:
# Test the mongodb_query tool
# text query: Top 5 directors by average IMDB rating, who have made at least 20 movies
print(tool_map["mongodb_query"].invoke(f"db.movies.aggregate({json.dumps(query)}"), end="")

[
  {
    "_id": "William Wyler",
    "filmCount": 21,
    "avgRating": 7.676190476190476
  },
  {
    "_id": "Martin Scorsese",
    "filmCount": 32,
    "avgRating": 7.640625
  },
  {
    "_id": "Alfred Hitchcock",
    "filmCount": 24,
    "avgRating": 7.5874999999999995
  },
  {
    "_id": "Steven Spielberg",
    "filmCount": 29,
    "avgRating": 7.479310344827587
  },
  {
    "_id": "Woody Allen",
    "filmCount": 40,
    "avgRating": 7.215000000000001
  }
]