<a href="https://colab.research.google.com/github/Indranil-R/Silver-Badge-Assignments/blob/main/1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Copilot Documentation - Question Answering RAG

This project is a part of silver certification
- Implement a question answering system with RAG, word embedding, vector database, langchain, llm and any other tools

# 1. Project Setup

## Install Dependencies

In [11]:
# Installing libraries
!pip install -q google google-genai langchain loguru pypdf langchain-community langchain-chroma langchain-google-genai

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m303.4/303.4 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m48.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.3/44.3 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.9/18.9 MB[0m [31m61.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.9/94.9 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00

In [59]:
import os
from google.colab import userdata
from loguru import logger
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_chroma import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_google_genai import ChatGoogleGenerativeAI
from IPython.display import display
from IPython.display import Markdown

## Environment Variables

In [9]:
# Google API key

os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')
if os.environ["GOOGLE_API_KEY"] != None:
  logger.info("Google API key loaded")
else:
  logger.error("Google API key not loaded")


[32m2025-05-17 02:11:10.337[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 0>[0m:[36m5[0m - [1mGoogle API key loaded[0m


# 2. Data Ingestion

In [14]:
# Mounting drive
from google.colab import drive
drive.mount('/content/drive')



Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [35]:
# Reading the PDF file
file_path = "/content/drive/MyDrive/Colab Documents/copilot.pdf"

loader = PyPDFLoader(file_path,mode="page")
pages = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=200)
docs = text_splitter.split_documents(pages)
logger.info(f"Number of documents: {len(docs)}\n")
logger.info("Data ingestion step is completed.. Moving on to embedding")

[32m2025-05-17 02:58:42.654[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 0>[0m:[36m8[0m - [1mNumber of documents: 56
[0m
[32m2025-05-17 02:58:42.655[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 0>[0m:[36m9[0m - [1mData ingestion step is completed.. Moving on to embedding[0m


# 3. Embedding Generation

In [36]:
embedding_fn = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

persist_directory = '/content/drive/MyDrive/Colab Documents/db'

if not os.path.exists(persist_directory):
    logger.info(f'Creating a new directory at {persist_directory}')
    os.makedirs(persist_directory, exist_ok=True)

# Creating the memory vector database
vectordb = Chroma.from_documents(documents=docs,embedding=embedding_fn,persist_directory=persist_directory)

logger.info("Embedding generation step is completed.. Moving on to retriever")

[32m2025-05-17 02:59:16.556[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 0>[0m:[36m6[0m - [1mCreating a new directory at /content/drive/MyDrive/Colab Documents/db[0m
[32m2025-05-17 02:59:20.108[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 0>[0m:[36m12[0m - [1mEmbedding generation step is completed.. Moving on to retriever[0m


# 4. Retriever Setup

In [40]:
retriever = vectordb.as_retriever(search_type="similarity", search_kwargs={"k": 7})
logger.info("Retriever setup is completed.. Moving on to RAG chain")

[32m2025-05-17 03:01:44.428[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 0>[0m:[36m2[0m - [1mRetriever setup is completed.. Moving on to RAG chain[0m


# 5. RAG Chain Construction

In [65]:
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash",temperature=0.1, max_tokens=5000)
logger.info(f"Initialized {llm.model}")

[32m2025-05-17 03:29:10.961[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 0>[0m:[36m2[0m - [1mInitialized models/gemini-2.0-flash[0m


In [66]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer the question."
    "If you don't know the answer, say that you don't know."
    "\n\n"
    "{context}"
   )

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

In [67]:
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

# Testing the LLM RAG Model

In [61]:
response = rag_chain.invoke({"input": "How to Enable Copilot Chat in Edge"})
display(Markdown(response["answer"]))

To use Copilot Chat in Edge, follow these steps:

1.  Sign in to Microsoft Edge with your Microsoft Entra account (work or school account).
2.  Access Copilot Chat by clicking the Copilot icon in the upper right of the Edge browser (Ctrl+Shift+.).

A shield icon at the top of the Copilot Chat experience in the sidebar confirms that Copilot Chat in Edge offers enterprise data protection.

In [60]:
response = rag_chain.invoke({"input": "What features are available in Copilot Chat?"})
display(Markdown(response["answer"]))

Copilot Chat has many features, including:
*   Copilot Pages: Takes Copilot Chat-generated content and puts it in a dynamic, persistent canvas where users can edit it, add to it, share it, and work on it with others in real time.
*   File upload: Lets users upload files like Word docs, Excel files, and PDFs to prompt Copilot Chat to reason over it as part of its response.
*   Image generation: Create an AI-generated image by describing it.
*   Previous chats: Lets users access previous chats for reference or to continue the chat.
*   Agents: Agents use AI to automate and execute business processes, working alongside or on behalf of a person, team, or organization.
*   Contextual prompt suggestions (Copilot Chat in Edge): References open pages within Microsoft Edge to help users create more relevant prompts.
*   Page summarization (Copilot Chat in Edge): Lets users summarize an open webpage or PDF in Edge.
*   Code interpreter: Copilot Chat uses the Python programming language to help users perform complex data analysis such as coding, visualization, and math.
*   Image upload: Lets users take photos, upload images, or copy and paste images into Copilot Chat to use in a prompt.

In [64]:
response = rag_chain.invoke({"input": "How to Enable agents?"})
display(Markdown(response["answer"]))

To enable agents that are billed based on metered consumption for users in Copilot Chat, admins need to set up or use an existing Copilot Studio subscription. Admins can set up billing through either the Microsoft 365 admin center (MAC) or the Power Platform admin center (PPAC).

In [68]:
response = rag_chain.invoke({"input": "Can you tell me about ESOP?"})
display(Markdown(response["answer"]))

I'm sorry, but the provided context does not contain information about ESOP (Employee Stock Ownership Plan). Therefore, I cannot answer your question.