# Chapter 10: Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG) is an important improvement for LLMs. It combines the power of LLMs with the ability to fetch real-time information. Instead of just relying on the data they were trained on, these AI agents can now pull in relevant external data as they generate responses. This means they can give more accurate, up-to-date, and context-aware answers. This makes them more effective in areas like conversational AI, decision support, and automated research. You may have noticed that when an LLM uses tools, it forms its final answer based on the output of the last tool it called, often pulling data from a JSON response. In RAG, the focus shifts to giving the LLM the ability to answer questions using information from documents.

In [1]:
import numpy as np
import pandas as pd
from language_models.proxy_client import ProxyClient
from language_models.agent import Agent, OutputType, PromptingStrategy, Workflow, WorkflowLLMStep
from language_models.models.llm import ChatMessage, ChatMessageRole
from language_models.tools.tool import Tool
from language_models.models.llm import OpenAILanguageModel
from language_models.models.embedding import SentenceTransformerEmbeddingModel
from language_models.vector_stores import FAISSVectorStore, DistanceMetric
from language_models.settings import settings
from langchain_core.documents import Document
from pydantic import BaseModel, Field
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [2]:
proxy_client = ProxyClient(
    client_id=settings.CLIENT_ID,
    client_secret=settings.CLIENT_SECRET,
    auth_url=settings.AUTH_URL,
    api_base=settings.API_BASE,
)

## Idea

Integrating RAG into AI systems can significantly enhance LLM responses. RAG allows the LLM to reference a knowledge base outside of its training data sources before generating a response. This is useful when dealing with documents or content specific to your business that LLMs may not be familiar with. By using external information to ground the LLM, it can effectively answer questions related to those topics. However, careful implementation is essential.

![rag](./assets/images/rag.png)

In [3]:
system_prompt = "You are an expert in job postings. Respond with the most accurate information about the job."

llm = OpenAILanguageModel(
    proxy_client=proxy_client,
    model="gpt-4",
    max_tokens=250,
    temperature=0.2,
)

agent = Agent.create(
    llm=llm,
    system_prompt=system_prompt,
    prompt="{question}",
    prompt_variables=["question"],
    output_type=OutputType.STRING,
    prompting_strategy=PromptingStrategy.SINGLE_COMPLETION,
)

In this example, we inquire about the salary range for a job position, specifically an accountant. Since the LLM lacks specific details about our company, it relies on its general knowledge and information it has gathered from the internet to generate an answer.

In [4]:
response = agent.invoke({"question": "What is the salary range of an accountant."})

[1m[38;2;50;164;103mFinal Answer[0m[1m[0m: The salary range for an accountant can vary widely depending on the location, industry, and level of experience. On average, in the United States, an entry-level accountant might expect to earn around $45,000 to $55,000 per year. A mid-career accountant might earn between $55,000 and $70,000, while a senior accountant or manager could earn $70,000 to $90,000 or more. However, these figures are just averages and actual salaries can be higher or lower.


In [5]:
print(response.final_answer)

The salary range for an accountant can vary widely depending on the location, industry, and level of experience. On average, in the United States, an entry-level accountant might expect to earn around $45,000 to $55,000 per year. A mid-career accountant might earn between $55,000 and $70,000, while a senior accountant or manager could earn $70,000 to $90,000 or more. However, these figures are just averages and actual salaries can be higher or lower.


Here, we provide the relevant job details from our document, allowing the LLM to use this specific information to provide an accurate response to the user.

In [6]:
question = """What is the salary range of an accountant.

Context:
ACCOUNTANT

Class Code:       1513
Open Date:  06-22-18
(Exam Open to All, including Current City Employees)

ANNUAL SALARY

$49,903 to $72,996 and $55,019 to $80,472"""

response = agent.invoke({"question": question})

[1m[38;2;50;164;103mFinal Answer[0m[1m[0m: The salary range for the Accountant position in this specific job posting is $49,903 to $80,472 annually.


In [7]:
print(response.final_answer)

The salary range for the Accountant position in this specific job posting is $49,903 to $80,472 annually.


## Embeddings

To effectively compare documents and automatically find relevant ones, we create embeddings that capture the semantic meaning of the questions posed to the LLM. This involves converting text into numerical representations, or vectors, using an embedding model. Typically, the embedding model of the LLM itself is used for this purpose. However, due to limited access to OpenAI's embedding model, we will use sentence transformers to showcase this process. By comparing these vectors, we can assess the similarity between documents and queries, enabling us to identify the most relevant information efficiently.

![embedding](./assets/images/embedding.png)

In [8]:
embedding_model = SentenceTransformerEmbeddingModel(model="all-MiniLM-L6-v2")

First, we convert the user query into a vector representation.

In [9]:
query = "What is the salary range of an accountant."
embedding1 = embedding_model.embed_query(query)

In [10]:
print(embedding1)

[0.02522313967347145, 0.02307744510471821, -0.05601518973708153, 0.014181708917021751, -0.12336049228906631, 0.012008094228804111, -0.030247898772358894, 0.018137196078896523, 0.019031377509236336, -0.01606547087430954, 0.005754050333052874, -0.12696686387062073, -0.0764838308095932, -0.03204340487718582, 0.007648212369531393, 0.013705767691135406, -0.004476392641663551, -0.030027681961655617, 0.09027383476495743, -0.028235040605068207, 0.036534909158945084, 0.021080350503325462, -0.0282881036400795, -0.08542317152023315, 0.10774991661310196, -0.07495275139808655, -0.008742023259401321, -0.03884077072143555, -0.01728292554616928, 0.049688030034303665, -0.030432291328907013, -0.024790428578853607, 0.02125607803463936, -0.028942178934812546, 0.05682627111673355, -0.037460342049598694, -0.01860189065337181, 0.11144133657217026, 0.045013900846242905, 0.026884878054261208, -0.024479597806930542, 0.061160311102867126, -0.03742044419050217, -0.09473290294408798, -0.057644959539175034, 0.01196

Next, we convert our documents into vectors, allowing us to compare them with the user's question. By embedding both the user's query and the documents, we project the query into the same vector space as the documents. This enables us to identify the closest matches, such as the 5 most similar documents.

![vector-space](./assets/images/vector-space.png)

In [11]:
document = """ACCOUNTANT

Class Code:       1513
Open Date:  06-22-18
(Exam Open to All, including Current City Employees)

ANNUAL SALARY

$49,903 to $72,996 and $55,019 to $80,472"""

embedding2 = embedding_model.embed_query(document)

To compute the similarities between vectors, we can use mathematical formulas such as cosine similarity, euclidean distance, or inner product:
- **Euclidean distance:** Measures the straight-line distance between two points. Smaller values indicate greater similarity.
- **Cosine similarity:** Measures the cosine of the angle between two vectors, ranging from -1 to 1. Values closer to 1 indicate greater similarity.
- **Inner product:** Measures the dot product of two vectors. Higher values typically indicate greater similarity.

In [12]:
cosine_similarity = np.dot(embedding1, embedding2) / (np.linalg.norm(embedding1) * np.linalg.norm(embedding2))
print(f"Cosine similarity: {cosine_similarity:.4f}")

Cosine similarity: 0.6591


## Asking Questions Related to the Hiring Process

As a company, part of our operations involves hiring individuals for a variety of positions. To facilitate the hiring process and improve efficiency, we've incorporated the use of an LLM to assist us. Our hiring process entails the collaboration of various entities, including recruiters, hiring managers, engineers, applicants, departments, and the specific job roles themselves. For this demonstration, we'll use the LLM to respond to inquiries related to our hiring procedures and overall business operations.

We'll focus on using documents containing job descriptions and applicant resumes, but we'll also consider that we have access to other documents related to various entities. Although these additional documents are virtual, we'll assume they are available for reference.

In [13]:
df_jobs = pd.read_csv("./assets/datasets/jobs.csv")
df_jobs.job_title = df_jobs.job_title.str.lower()
df_jobs = df_jobs.sort_values("job_title")
df_jobs = df_jobs.head(250)
df_jobs.head()

Unnamed: 0,job_title,job_class_no,job_duties,open_date,salary,deadline,application_form,where_to_apply,text
637,311 director,9206,A 311 Director is responsible for the successf...,18-04-2014,"[{'description': 'Annual Salary', 'min_salary'...",01-05-2014,online,http://agency.governmentjobs.com/lacity/defaul...,311 DIRECTOR\nClass Code: 9206\nOpen Dat...
198,accountant,1513,An Accountant does professional accounting wor...,22-06-2018,"[{'description': 'Lower pay grade', 'min_salar...",11-07-2019,online,https://www.governmentjobs.com/careers/lacity,ACCOUNTANT\n\nClass Code: 1513\nOpen Dat...
326,accounting clerk,1223,An Accounting Clerk performs difficult and res...,13-07-2018,"[{'description': 'Annual salary', 'min_salary'...",20-06-2019,online,https://www.governmentjobs.com/careers/lacity,ACCOUNTING CLERK\n\nClass Code: 1223\nOp...
190,accounting records supervisor,1119,"An Accounting Records Supervisor assigns, revi...",27-07-2018,"[{'description': 'Salary range 1', 'min_salary...",09-08-2018,online,https://www.governmentjobs.com/careers/lacity/...,ACCOUNTING RECORDS SUPERVISOR\n\nClass Code: ...
316,administrative analyst,1590,An Administrative Analyst performs professiona...,06-01-2018,"[{'description': 'Lower pay grade position', '...",14-06-2018,online,https://www.governmentjobs.com/careers/lacity/...,ADMINISTRATIVE ANALYST\n\nClass Code: 15...


In [14]:
df_resumes = pd.read_csv("./assets/datasets/resumes.csv")
df_resumes = df_resumes[df_resumes.Resume_str.str.contains("accountant", case=False, na=False)]
df_resumes = df_resumes.head(250)
df_resumes.head()

Unnamed: 0,ID,Resume_str,Resume_html,Category
103,20417897,EXECUTIVE ASSISTANT HR Summary ...,"<div class=""fontsize fontface vmargins hmargin...",HR
274,18176523,SENIOR INFORMATION TECHNOLOGY MANAGER...,"<div class=""fontsize fontface vmargins hmargin...",INFORMATION-TECHNOLOGY
383,16270906,TEACHER Accomplishments ...,"<div class=""fontsize fontface vmargins hmargin...",TEACHER
468,20324037,FULFILLMENT ADVOCATE Summary ...,"<div class=""fontsize fontface vmargins hmargin...",ADVOCATE
634,12377803,BUSINESS DEVELOPMENT MANAGER ...,"<div class=""fontsize fontface vmargins hmargin...",BUSINESS-DEVELOPMENT


### Naive Implementation

For the simplest implementation, we embed our available documents, chunk them into smaller pieces if necessary, and store them in a vector database. For this demonstration, we'll use FAISS.

In [15]:
documents = [Document(page_content=document) for document in df_jobs.text.tolist() + df_resumes.Resume_str.tolist()]
text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n", " ", ""], chunk_size=1000, chunk_overlap=100)
documents = text_splitter.split_documents(documents)

In [16]:
try:
    vector_store = FAISSVectorStore.load_local("./assets/datasets/", "embeddings")
except:
    vector_store = FAISSVectorStore.from_documents(
        documents=documents,
        embedding_model=embedding_model,
        distance_metric=DistanceMetric.COSINE_SIMILARITY,
    )
    vector_store.save_local("./assets/datasets/", "embeddings")

To reduce costs, instead of adding additional context from documents to every user question - which can worsen the answer quality if the LLM doesn't need external knowledge - we'll convert the RAG functionality into a tool. This way, the LLM can autonomously decide when to search the documents.

In our basic retriever, we've implemented everything discussed so far. When a user asks a question, the LLM uses the search tool to find relevant documents and then provides an answer based on the information from those documents.

In [17]:
class BasicRetriever(BaseModel):
    """Class that implements naive RAG."""

    vector_store: FAISSVectorStore

    def get_relevant_documents(self, user_text: str, fetch_k: int = 5) -> str:
        """Gets relevant documents."""
        documents = self.vector_store.similarity_search(user_text, fetch_k)
        documents = [document for document, _ in documents]
        return "\n\n".join(document.page_content for document in documents)

class SearchDocuments(BaseModel):
    user_text: str = Field(description="The user query to search for")

basic_retriever = Tool(
    function=BasicRetriever(vector_store=vector_store).get_relevant_documents,
    name="Get Relevant Information",
    description="Use this tool to search for job positions",
    args_schema=SearchDocuments,
)

In [18]:
system_prompt = "Use Get Relevant Information to answer user questions."

llm = OpenAILanguageModel(
    proxy_client=proxy_client,
    model="gpt-4-32k",
    max_tokens=1000,
    temperature=0.2,
)

basic_retriever_agent = Agent.create(
    llm=llm,
    system_prompt=system_prompt,
    prompt="{question}",
    prompt_variables=["question"],
    output_type=OutputType.STRING,
    tools=[basic_retriever],
    prompting_strategy=PromptingStrategy.CHAIN_OF_THOUGHT,
)

In [19]:
output = basic_retriever_agent.invoke({"question": "We have an opening for an accountant. Give me the details of the job."})

[1m[38;2;45;114;210mThought[0m[1m[0m: The user is asking for details about an accountant job opening. I will use the Get Relevant Information tool to search for this job position.
[1m[38;2;236;154;60mTool[0m[1m[0m: Get Relevant Information
[1m[38;2;236;154;60mTool Input[0m[1m[0m: {'user_text': 'accountant job position'}
[1m[38;2;236;154;60mTool Output[0m[1m[0m: ACCOUNTANT           Summary     Bookkeeper with strong technical proficiency and commitment to accuracy in financial data entry and financial record keeping. Desires an accounting position in a positive working environment   that encourages and supports continuing professional growth.       Highlights        RELATED SKILLS ● Over 20 years experience in payroll processing and preparing quarterly federal and state employer tax returns for numerous CPA firm clients, A/P and A/R processing for clients, bank reconciliations, maintaining general ledger, journal entries, making adjusting entries, preparation of mon

In [20]:
print(output.final_answer)

The accountant position requires strong technical proficiency and commitment to accuracy in financial data entry and financial record keeping. The role involves over 20 years of experience in payroll processing, preparing quarterly federal and state employer tax returns, A/P and A/R processing, bank reconciliations, maintaining general ledger, journal entries, making adjusting entries, preparation of monthly financial statements, and providing year-end information for income tax returns. The position also requires office management skills, including processing payroll and associated employment tax deposits, completing construction draws for lenders, processing A/R and A/P, maintaining subcontractor files and 1099. The ideal candidate should be self-motivated, honest, reliable, hard-working, thorough in completing projects, and committed to excellent customer service. They should also have a strong proficiency in identifying and responding to opportunities that improve profitability.


Another popular implementation is contextual compression, where two LLMs are used. One LLM is tasked with evaluating documents retrieved from the vector database, filtering out irrelevant ones. Each document is sequentially presented to this LLM to determine its relevance to the user's question. After filtering, the remaining documents are then provided to the 2nd LLM, which interacts directly with the user and provides the final answer.

In [21]:
INSTRUCTIONS = """Given the following question and context, respond with YES if the context is relevant to the question and NO if it isn't.

Question:
{question}

Context:
{context}"""

class ContextualCompressionRetriever(BaseModel):
    """Class that implements a contextual compression retriever."""

    llm: OpenAILanguageModel
    vector_store: FAISSVectorStore

    def _parse_output(self, output: str) -> bool:
        """Parses LLM output."""
        cleaned_upper_text = output.strip().upper()
        if "YES" in cleaned_upper_text and "NO" in cleaned_upper_text:
            raise ValueError(f"Ambiguous response. Both 'YES' and 'NO' in received: {output}.")
        elif "YES" in cleaned_upper_text:
            return True
        elif "NO" in cleaned_upper_text:
            return False
        else:
            raise ValueError(f"Expected output value to include either 'YES' or 'NO'. Received {output}.")

    def _compress_documents(self, user_text: str, documents: list[Document]) -> list[Document]:
        """Filters relevant documents."""
        compressed_documents = []
        for document in documents:
            prompt = INSTRUCTIONS.format(question=user_text, context=document.page_content)
            output = self.llm.get_completion([ChatMessage(role=ChatMessageRole.USER, content=prompt)])
            try:
                include_doc = self._parse_output(output)
            except ValueError:
                include_doc = False
            if include_doc:
                compressed_documents.append(document)
        return compressed_documents

    def get_relevant_documents(self, user_text: str, fetch_k: int = 5) -> str:
        """Gets relevant documents."""
        documents = self.vector_store.similarity_search(user_text, fetch_k)
        documents = [document for document, _ in documents]
        compressed_documents = self._compress_documents(user_text, documents)
        return "\n\n".join(document.page_content for document in compressed_documents)

llm = OpenAILanguageModel(
    proxy_client=proxy_client,
    model="gpt-4",
    max_tokens=16,
    temperature=0.2,
)

class SearchDocuments(BaseModel):
    user_text: str = Field(description="The user query to search for")

contextual_compression_retriever = Tool(
    function=ContextualCompressionRetriever(llm=llm, vector_store=vector_store).get_relevant_documents,
    name="Get Relevant Information",
    description="Use this tool to search for job positions",
    args_schema=SearchDocuments,
)

In [22]:
system_prompt = "Use Get Relevant Information to answer user questions."

llm = OpenAILanguageModel(
    proxy_client=proxy_client,
    model="gpt-4",
    max_tokens=1000,
    temperature=0.2,
)

contextual_compression_retriever_agent = Agent.create(
    llm=llm,
    system_prompt=system_prompt,
    prompt="{question}",
    prompt_variables=["question"],
    output_type=OutputType.STRING,
    tools=[contextual_compression_retriever],
    prompting_strategy=PromptingStrategy.CHAIN_OF_THOUGHT,
)

In [23]:
output = contextual_compression_retriever_agent.invoke({"question": "We have an opening for an accountant. Give me the details of the job."})

[1m[38;2;45;114;210mThought[0m[1m[0m: The user is asking for details about the job opening for an accountant. I will use the Get Relevant Information tool to search for the job details.
[1m[38;2;236;154;60mTool[0m[1m[0m: Get Relevant Information
[1m[38;2;236;154;60mTool Input[0m[1m[0m: {'user_text': 'accountant job details'}
[1m[38;2;236;154;60mTool Output[0m[1m[0m: ACCOUNTANT           Summary     Bookkeeper with strong technical proficiency and commitment to accuracy in financial data entry and financial record keeping. Desires an accounting position in a positive working environment   that encourages and supports continuing professional growth.       Highlights        RELATED SKILLS ● Over 20 years experience in payroll processing and preparing quarterly federal and state employer tax returns for numerous CPA firm clients, A/P and A/R processing for clients, bank reconciliations, maintaining general ledger, journal entries, making adjusting entries, preparation o

In [24]:
print(output.final_answer)

The accountant position requires strong technical proficiency and commitment to accuracy in financial data entry and record keeping. The responsibilities include preparing financial statements and bank reconciliations, managing financial departments with responsibility for Accounts Payable and Fixed Assets, managing vendor accounts, reconciling Asset accounts, and preparing documents and reports using advanced software proficiencies. The accountant will also be responsible for maintaining the integrity of the general ledger, including the chart of accounts, and partnering with auditors to prepare yearly audits. Experience in payroll processing, preparing quarterly federal and state employer tax returns, and maintaining general ledger is also required. The position encourages and supports continuing professional growth.


Every naive implementation of RAG, including those demonstrated previously and others (RAG + keyword search, RAG + re-ranking, etc.), encounters a common challenge.

Naive RAG is adequate for handling single-object scenarios. For example, if the vector space consists solely of documents related to earthquakes, cars, or jobs, a naive RAG approach can be sufficient. However, this simplistic method struggles to ensure that the retrieved documents provide the necessary context to effectively address a query. When projecting an embedding into the vector space and retrieving the five closest neighbors or most similar documents, there's no guarantee that these documents will pertain exclusively to the specific object of interest, leading to potential inaccuracies in the results.

![naive-rag-vector-space](./assets/images/naive-rag-vector-space.png)

In our basic demonstration, we've confined our vector space to contain only documents about job postings and applicant resumes. However, the LLM's responses remain subpar. After reviewing the logs of the LLM's Chain-of-Thought process, we can see that all the information retrieved is from applicant resumes where individuals have previously worked as accountants. While this information might be relevant in some contexts, it’s not always useful for answering the query accurately. We can improve the process by filtering out such irrelevant documents from the beginning.

When using contextual compression, where another LLM reviews and filters documents to remove irrelevant content, we observed that it successfully eliminated irrelevant documents in this instance. However, this outcome is not guaranteed in every case, so we should not overly depend on the LLM's judgment alone. Additionally, this method has a drawback: by filtering out documents deemed irrelevant, the LLM reduces the context available. This reduced information might limit the LLM's ability to provide a high-quality response.

Envision a scenario where our document repository extends beyond job postings and applicant resumes to include a range of materials integral to our hiring process. Applicants provide documents such as cover letters; recruiters, engineers, and managers maintain notes from interviews; and departments hold documents outlining team compositions and project details for prospective hires. If all these documents were incorporated into the vector space, it could worsen the issue, leading to further deterioration in the LLM's responses.

This challenge arises from the ambiguity in document usage and accessibility. For instance, when querying about an accountant position, applicant resumes may mention prior experience in the role, recruiters/engineers may record interactions related to airport engineering roles, and departments may outline their need for such positions. This lack of control over document utilization and access complicates the task and contributes to the degradation of the LLM's responses. For improved outcomes, we can use an ontology/knowledge graph.

Technically, we could add metadata, such as job titles, to achieve similar results to the ontology/knowledge graph implementation. However, as mentioned earlier, we assume there are documents related to other entities as well. This complicates the situation because documents tied to different business objects may contain different metadata. While searching using metadata is still possible, it would require filtering through numerous documents, which is slow. In addition, managing documents with different metadata in the same database is tedious, and sometimes databases don't support this level of complexity. This approach would also make it more difficult for users to understand.

One might consider standardizing all documents to include all metadata, but this would lead to many documents being filled with irrelevant metadata and dummy values, resulting in significant inefficiency in space usage. Another issue is the potential for metadata fields to have the same name across different business objects, which would prevent exclusive searches for relevant documents. Additionally, some metadata, like document creation dates or filenames, might not be known to users, who could simply look up the documents if they had this information. Therefore, in our showcase, we perform a simple search without metadata filtering.

### Ontology/Knowledge Graph Implementation

Implementing knowledge ontology/knowledge graph-based RAG presents a pragmatic solution. When considering knowledge graphs, we often envision structures similar to the figure below.

![knowledge-graph](./assets/images/knowledge-graph.png)

Nevertheless, we need to recalibrate our perception of knowledge graphs within the framework of LLMs. Ideally, we aim to create a digital twin of our business processes, wherein nodes symbolize objects and edges denote their links. Since the use case was introduced earlier, we're already familiar with the entities engaged in the hiring process. This includes applicants, who provide documents such as cover letters and resumes, along with additional metadata like their name, birthday, degrees, and responses to application questions. Additionally, individuals involved in the process - such as recruiters, engineers, and hiring managers - may maintain notes about applicants from interviews, alongside metadata such as name, birthday, and department affiliation. Furthermore, departments play a role, providing both metadata and documents, and the job postings themselves contain metadata such as job title, salary range, department, and required technical skills. The figure below provides a basic outline of the process. While it may not depict every detail accurately, it conveys the overall concept.

![knowledge-graph-vector-space](./assets/images/knowledge-graph-vector-space.png)

Now that we've established the ontology/knowledge graph, we can initially locate the relevant object or entity related to the question. Moreover, we can leverage the available metadata to refine our search process, effectively reducing the pool of documents to be searched. Essentially, this allows us to exclude entirely irrelevant documents and focus solely on those that are highly relevant. This structure also provides enhanced control and security. We can decide how the ontology/knowledge graph is traversed and specify which data, including metadata, the LLM is permitted to access.

Another idea is to use an ontology/knowledge graph with subgraphs for more granular data representation.

In [25]:
resumes = [Document(page_content=document) for document in df_resumes.Resume_str.tolist()]
text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n", " ", ""], chunk_size=2000, chunk_overlap=100)
resumes = text_splitter.split_documents(resumes)
df_resumes = pd.DataFrame({"resume": resume.page_content} for resume in resumes)
df_resumes["embedding"] = [embedding_model.embed_query(resume) for resume in df_resumes.resume]

In [26]:
jobs = [Document(page_content=document["text"], metadata={"job_title": document["job_title"]}) for document in df_jobs.to_dict(orient="records")]
text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n", " ", ""], chunk_size=2000, chunk_overlap=100)
jobs = text_splitter.split_documents(jobs)
df_jobs = pd.DataFrame({"job": job.page_content, "job_title": job.metadata["job_title"]} for job in jobs)
df_jobs["embedding"] = [embedding_model.embed_query(job) for job in df_jobs.job]

In a typical scenario, you would first locate the relevant object and then filter based on properties. However, we'll simplify this by showing the LLM all available job postings from the start using a tool, allowing it to focus on these and ignore applicant resumes. This approach reduces the document pool considered. In practice, an additional step to identify the object of interest would be necessary before this filtering process.

In [27]:
data = {"resumes": df_resumes, "jobs": df_jobs}

In [28]:
def get_jobs() -> list[str]:
    return data["jobs"].job_title.unique().tolist()

get_job_titles = Tool(
    function=get_jobs,
    name="Get Available Job Titles",
    description="Use this tool to get the job titles",
)

def search(user_text: str, job_title: str, fetch_k: int = 5) -> str:

    def calculate_cosine_similarity(user_text_embedding, embedding):
        cosine_similarity = np.dot(user_text_embedding, embedding) / (np.linalg.norm(user_text_embedding) * np.linalg.norm(embedding))
        return cosine_similarity

    user_text_embedding = embedding_model.embed_query(user_text)
    df = data["jobs"]
    df = df.loc[df["job_title"] == job_title.lower()].copy()
    df["cosine_similarity"] = df.embedding.apply(lambda embedding: calculate_cosine_similarity(user_text_embedding, embedding))
    df = df.sort_values(by="cosine_similarity", ascending=False)
    df = df.iloc[:fetch_k]
    return "\n\n".join(df.job.tolist())

class SearchDocuments(BaseModel):
    user_text: str = Field(description="The user query to search for")
    job_title: str = Field(description="The job title to filter for")

search = Tool(
    function=search,
    name="Get Relevant Information",
    description="Use this tool to search for job positions",
    args_schema=SearchDocuments,
)

In [29]:
system_prompt = """Use Get Relevant Information to answer user questions.

Check for available job titles using Get Available Job Titles first."""

llm = OpenAILanguageModel(
    proxy_client=proxy_client,
    model="gpt-4",
    max_tokens=1000,
    temperature=0.2,
)

agent = Agent.create(
    llm=llm,
    system_prompt=system_prompt,
    prompt="{question}",
    prompt_variables=["question"],
    output_type=OutputType.STRING,
    tools=[search],
    prompting_strategy=PromptingStrategy.CHAIN_OF_THOUGHT,
)

In [30]:
output = agent.invoke({"question": "We have an opening for an accountant. Give me the details of the job."})

[1m[38;2;45;114;210mThought[0m[1m[0m: The user wants to know the details of the job opening for an accountant. I will use the Get Relevant Information tool to find the details of the job.
[1m[38;2;236;154;60mTool[0m[1m[0m: Get Relevant Information
[1m[38;2;236;154;60mTool Input[0m[1m[0m: {'user_text': 'accountant job details', 'job_title': 'Accountant'}
[1m[38;2;236;154;60mTool Output[0m[1m[0m: ACCOUNTANT

Class Code:       1513
Open Date:  06-22-18
(Exam Open to All, including Current City Employees)

ANNUAL SALARY

$49,903 to $72,996 and $55,019 to $80,472

NOTES:

1. Candidates from the eligible list are normally appointed to vacancies in the lower pay grade positions.
2. For information regarding reciprocity between the City of Los Angeles departments and LADWP, go to http://per.lacity.org/Reciprocity_CityDepts_and_DWP.pdf.
3. Annual salary is at the start of the pay range. The current salary range is subject to change. Please confirm the starting salary with th

In [31]:
print(output.final_answer)

The accountant position has an annual salary range of $49,903 to $72,996 and $55,019 to $80,472. The job involves professional accounting work in the analysis, preparation, maintenance, control, and reconciliation of financial records and reports. The minimum qualification is graduation from an accredited four-year college or university and at least 24 semester or 36 quarter units in accounting. The selection process involves a qualifying multiple-choice test and an interview. The first qualifying written test will be administered on SATURDAY, AUGUST 25, 2018, in Los Angeles. Applications will only be accepted online. The application deadlines are from 8:00 am Friday, June 22, 2018, to 11:59 pm, Thursday, July 5, 2018, and from 8:00 am Friday, June 28, 2019, to 11:59 pm, Thursday, July 11, 2019.


As evident from the improved response of the LLM, the ontology/knowledge graph approach ensures that the model exclusively accesses documents about an accountant, thereby enhancing its performance. Scaling the solution to encompass multiple entities and numerous documents allows us to effectively filter out irrelevant content, concentrating solely on significant documents. Consequently, this boosts the likelihood of the LLM receiving high-quality documents to address the task. 

In conclusion, we ought to separate our business components before enhancing our information retrieval and generation process to ensure the best results. If RAG doesn't improve the LLM's answer quality, you might consider taking an additional step: fine-tuning a model specifically for the domain and optionally integrating RAG for further enhancement.