# Microsoft Azure OpenAI On Your Data with CosmosDB

In this notebook we'll use CosmosDB indices to provide grounding data for queries to Azure OpenAI models using the Azure OpenAI On Your Data service.

The Azure OpenAI On Your Data service currently supports three search scenarios for retrieval of documents that will be sent to the LLM for processing:

1) vector search using embeddings generated using Azure OpenAI (text-embeddigns-v3).
2) vector search embedding your own pdf files.

Each of these examples will be covered in the following sections.

## Requirements

For this example, you will need:
* Python 3.11 or later
* An Azure OpenAI Resource
    * One multimodal model (gpt-4o-mini) should be deployed for your resource to enable chatting about your data and allow images and audios.
    * For vector search this notebook uses the Azure OpenAI text-embedding-3-small model. The examples below will assume you are using the model `text-embedding-3-small`, but can be updated to suit your needs.
* The [OpenAI Python Client](https://platform.openai.com/docs/api-reference/introduction?lang=python)

### Create and Configurate CosmosDB 

If you don't have a CosmosDB cluster, you can read more about how to get started here in the official [https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/quickstart-portal](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/quickstart-portal).


### Configure Azure OpenAI Resource

If you don't have an Azure OpenAI resource, detailed information about how to obtain one can be found in the [official documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/use-your-data-quickstart?tabs=command-line&pivots=programming-language-python) for the Azure OpenAI On Your Data service.



In [None]:
!pip install python-dotenv pymongo openai pymupdf langchain

In [1]:
from dotenv import dotenv_values
config = dotenv_values(".env")

from cosmosdb_rag_attachment import Document, CosmosDBRAG, ChatClient, PDFChunk
import uuid

In [2]:
# Read parameters from .env file
CONNECTION_STRING = config['AZURE_COSMOSDB_MONGO_VCORE_CONNECTION_STRING']
DATABASE_NAME = config['AZURE_COSMOSDB_MONGO_VCORE_DATABASE']
COLLECTION_NAME = config['AZURE_COSMOSDB_MONGO_VCORE_CONTAINER']
AZURE_OPENAI_ENDPOINT = config['AZURE_OPENAI_ENDPOINT']
AZURE_OPENAI_KEY = config['AZURE_OPENAI_KEY']
AZURE_OPENAI_PREVIEW_API_VERSION = config['AZURE_OPENAI_PREVIEW_API_VERSION']
AZURE_OPENAI_MODEL = config['AZURE_OPENAI_MODEL']
AZURE_OPENAI_EMBEDDING_NAME = config['AZURE_OPENAI_EMBEDDING_NAME']

In [3]:
# Define paramerters for MongoDB Vector index
index_name = "vector_search_index"

# PDF Upload Instructions for RAG Application

Welcome to the **Retrieval-Augmented Generation (RAG)** application!

In this notebook, we are demonstrating how to upload and process a PDF file using the RAG approach. Specifically, this notebook was tested using the book **_Speech and Language Processing_** by **Daniel Jurafsky** and **James H. Martin**. Follow the instructions below to upload your own PDF file and get started.

### Instructions:

1. **Download the Book**:
   - If you don't have the book already, please download [Speech and Language Processing](https://web.stanford.edu/~jurafsky/slp3/ed3bookaug20_2024.pdf) by Jurafsky and Martin. This is the book that this notebook was tested with.
   
2. **Upload the PDF**:
   - Specify your file location in the following cell to upload the PDF file of the book (or any other PDF you'd like to process).
   
3. **Processing the PDF**:
   - Once the PDF is uploaded, the notebook will process the file and allow you to interact with it using the RAG methodology.


### Important Notes:
- This notebook was **tested with the book "Speech and Language Processing"** by **Jurafsky and Martin**, which is an excellent resource for learning about natural language processing (NLP).
- If you'd like to use this notebook with other PDFs, it should work just as well with any text-based PDF. However, if your document is not text-based (i.e., it’s scanned or image-based), it may not be supported yet.

---

### Example PDF File:
- **Book Title**: _Speech and Language Processing_
- **Authors**: Daniel Jurafsky and James H. Martin
- **Tested Version**: 3rd edition


Happy exploring!

In [4]:
user_pdf_file = "/Users/pabloracana/Downloads/ed3bookaug20_2024.pdf"

In [5]:
# Initialize clients
db_client = CosmosDBRAG(CONNECTION_STRING, DATABASE_NAME, COLLECTION_NAME)
ai_client = ChatClient(AZURE_OPENAI_ENDPOINT, 
                       AZURE_OPENAI_KEY, 
                       AZURE_OPENAI_PREVIEW_API_VERSION, 
                       AZURE_OPENAI_MODEL, 
                       AZURE_OPENAI_EMBEDDING_NAME)
pdf_chunker = PDFChunk(user_pdf_file)

  self.client = MongoClient(connection_string)


In [6]:
def rag_application(db_client, ai_client, pdf_chunker):
    print("First, we create the Index to optimize the search")
    db_client.create_or_update_vector_index(index_name, 'vector_content')
    
    if db_client.verify_file_exists:
        print("PDF File already exist in the Database, you can continue with the Chat")
    else:
        print("Reading PDF file and generating chunks")
        documents = pdf_chunker.extract_and_chunk_pdf(chunk_size=520, chunk_overlap=20)
        print("Generating vector representations and storing in the DB")
        for doc in tqdm(documents):
            doc.vector_content = ai_client.generate_embeddings(doc.content)
            doc._id = f"doc:{uuid.uuid4()}"
            db_client.create_document(doc)

    print("All set!")

In [7]:
rag_application(db_client, ai_client, pdf_chunker)

First, we create the Index to optimize the search
Index already exists
PDF File already exist in the Database, you can continue with the Chat
All set!


In [78]:
def chat_completion_rag(user_question):
    user_embedding = ai_client.generate_embeddings(user_question)
    retrieved_docs = db_client.similarity_search(user_embedding, k=3)
    docs_content = [doc['content'] for doc in retrieved_docs]
    response = ai_client.chat_completion(user_prompt=user_question,
                                         documents=docs_content)
    
    return response.choices[0].message.content

In [79]:
user_question = "Why it's import to implement retrieval augmented generation when creating question-answering applications?"

In [80]:
llm_response = chat_completion_rag(user_question)


In [81]:
print(llm_response)

Implementing retrieval-augmented generation (RAG) in question-answering applications is important because it addresses several limitations of simple question-answering methods. While large language models can generate answers based on their pretraining, they often struggle with issues like hallucination, lack of supporting textual evidence, and inability to answer questions based on specific proprietary data. 

RAG improves the QA process by first retrieving relevant text passages and then conditioning the language model's output on these passages, providing more accurate and contextually grounded responses. Essentially, it allows the model to generate answers with real textual evidence, making the application more reliable and effective in delivering accurate information. This approach also overcomes the limitations of generating answers solely from pre-trained knowledge. 

You can find more about this on pages 309-311.


# Next Steps for RAG Application

## 1. **Modify Embedding Ingestion Method**
   - **Current Issue:** Unable to process in batches due to rate limits on the free tier.
   - **Next Steps:**
     - Implement batch ingestion by segmenting data into smaller chunks.
     - Add rate limiting logic to handle free tier constraints effectively.

## 2. **Update Vector Search Method**
   - **Current Issue:** Unable to use HNSW (Hierarchical Navigable Small World) due to free tier cluster size limitations.

## 3. **Validate Outputs**
   - **Next Steps:**
     - Validate results against expected outcomes for various queries.
     - Tune parameters, such as:
       - Number of retrieved documents per query.
       - Embedding dimensionality.
       - Similarity thresholds.