<a href="https://colab.research.google.com/github/Indranil-R/Silver-Badge-Assignments/blob/main/1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Implement a question answering system with RAG, word embedding, vector database, langchain, llm and any other tools

In [1]:
# !pip install -q -r requirements.txt

In [2]:
import os
from os import getenv
from langchain_google_genai import GoogleGenerativeAI
import getpass
from langchain_core.prompts import PromptTemplate
from google import genai

## Creating Embedding from Google API and storing into vector DB and then using it to retreive data

In [3]:
from langchain_community.document_loaders import WebBaseLoader
import bs4

USER_AGENT environment variable not set, consider setting it to identify your requests.


### Loading the data from Webpage

In [4]:
# I am fetching the data from SAP page due to API limitations, keeping the data small

loader = WebBaseLoader(
    web_paths=("https://www.sap.com/india/about/company/faq.html",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("Grid__col-12--cMZPy Grid__col-sm-12--JrL2M Grid__col-md-12--tG5nF Grid__col-lg-12--wWsJK Grid__col-xl-12--v6LlR ")
        )
    ),
)
docs = loader.load()

In [5]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
texts = text_splitter.split_text(docs[0].page_content)

print(f"The data from the webpage is nowsplit into {len(texts)} parts")

The data from the webpage is nowsplit into 64 parts


In [6]:
from langchain_chroma import Chroma

### Creating the embedding

### Adding the embedding to Chroma VectorDB

In [7]:
import chromadb
import chromadb.utils.embedding_functions as embedding_functions

# creating chrom client
client = chromadb.Client()

# use directly
google_ef  = embedding_functions.GoogleGenerativeAiEmbeddingFunction(api_key=getenv("GOOGLE_API_KEY"))

# pass documents to query for .add and .query
collection = client.get_or_create_collection(name="SAP",embedding_function=google_ef)

In [8]:
# Embedding the data 
collection.add(ids=[f"doc-{i}" for i in range(len(texts))], documents=texts)

In [9]:
results = collection.query(
    query_texts=["How does SAP ensure data protection and privacy"],
    n_results=1
)

print(results['documents'][0][0])


How does SAP ensure data protection and privacy?We have implemented safeguards to help protect the fundamental rights of everyone whose data is processed by SAP, whether they are our customers, prospects, employees, or partners. In addition, we work toward compliance with all relevant legal requirements for data protection. You can find more information in the chapter "Security, Data Protection and Privacy" of the SAP Integrated Report, on SAP Trust Center, in the SAP Privacy Statement.


### Retreive the data

In [10]:
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

In [11]:
import chromadb.utils.embedding_functions as embedding_functions
import chromadb

In [12]:
# Final part of this RAG application 


# Fetch results 
def fetch_results(user_question):
    results = collection.query(query_texts=[user_question],n_results=1)
    return results['documents'][0][0]

In [13]:
from google import genai
from google.genai import types

question = input("What is your question about SAP")

client = genai.Client(api_key=getenv("GOOGLE_API_KEY"))

# Send request with function declarations
response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=f"I will provide you with a question and answer your task is to present the answer into a more professinal format, The user has asked the question {question} and the answer is {fetch_results(question)}. Strictly follow the question and answer pattern only, Do not add any other info",
)


What is your question about SAP Does SAP incorporate non-financial measures in its executive compensation?


In [14]:
print(response.text)

**Question:** Does SAP incorporate non-financial measures in its executive compensation?

**Answer:** Since 2020, SAP’s executive short-term incentive (STI) compensation includes sustainability targets (sustainability KPIs) on top of our financial targets with a total weight of 20%. The sustainability KPIs are: Customer Net Promoter Score, which measures SAP’s customer loyalty; Employee Engagement Index, which measures SAP’s employee commitment, pride, and loyalty; and Carbon Impact, which measures SAP’s greenhouse gas.

