# MongoDB Query Engine Tutorial

This notebook demonstrates the use of the `MongoDBQueryEngine` for retrieval-augmented question answering over documents using MongoDB. It shows how to set up the engine with MongoDB and simple text parser using LlamaIndex parsed Markdown files, and execute natural language queries against the indexed data. 

The `MongoDBQueryEngine` integrates cloud MongoDB Atlas but also MongoDB localhost vector storage with LlamaIndex for efficient document retrieval.

In [None]:
%pip install llama-index==0.12.16
%pip install llama-index-llms-langchain

Before calling the `MongoDBQueryEngine`, we will create and import our own embedding function. You can replace this with any embedding function built-in from Langchain or Llama-Index or your custom build

In [None]:
import os

from chromadb.utils import embedding_functions
from llama_index.llms.openai import OpenAI

os.environ["OPENAI_API_KEY"] = ""
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="",
    model_name="text-embedding-ada-002",
)

llm = OpenAI(model="gpt-4o")

We will define the folder path that contains our document that we want to input. In this case, I will use experimental document from ag2 test folder

In [None]:
input_dir = "/ag2/test/agents/experimental/document_agent/pdf_parsed/"
# you might need to change the folder path based on which folder you want to input

Initialize the `MongoDBQueryEngine` with:
- MongoDB Atlas Connection String (you can also provide your localhost MongoDB Connection String as well)
- Pre-defined embedding function
- Database Name
- Pre-defined LLM Model (Optional)

Beside that, you can also customize your `collection_name` and `index_name`

In [None]:
from autogen.agentchat.contrib.rag.mongodb_query_engine import MongoDBQueryEngine

query_engine = MongoDBQueryEngine(
    connection_string="", embedding_function=openai_ef, database_name="vector_db", llm=llm
)

Because this is our first time running this and we do not have any database yet so we will call the `init_db` function along with a list of input document or folder path that we want.

You can also use this `init_db` to overwrite and re-create your database again as well, simply use the arg `overwrite = True`

In [None]:
# nvidia_10k_2024.md
query_engine.init_db(new_doc_paths=[input_dir + "Toast_financial_report.md"])

Using `query` to answer user input question

In [None]:
question = "What is the trading symbol for Toast"
answer = query_engine.query(question)
print(answer)

To add-on new document, we can use `add_records`, this could take new document path or new document dir as input

In [None]:
query_engine.add_records(new_doc_paths_or_urls=[input_dir + "nvidia_10k_2024.md"])

In [None]:
print(query_engine.query("How much money did Nvidia spend in research and development"))

In case that you already have a MongoDB Vector Database and you just want to connect to it without having to initialize it again, you can call the `connect_db`. You can also overwrite and re-setup your connected Vector Database by using the arg `overwrite=True`

In [None]:
query_engine.connect_db()

In [None]:
print(query_engine.query("How much money did Nvidia spend in research and development"))