In [5]:
!ollama list

NAME                               	ID          	SIZE  	MODIFIED       
knoopx/hermes-2-pro-mistral:7b-q8_0	db41f4a9e570	7.7 GB	52 minutes ago	
nomic-embed-text:latest            	0a109f422b47	274 MB	23 hours ago  	
llama2:latest                      	78e26419b446	3.8 GB	3 days ago    	


**INGESTING DOC**

In [7]:
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_community.document_loaders import UnstructuredWordDocumentLoader
from langchain_community.document_loaders import OnlinePDFLoader

In [8]:
local_path = "anyname.docx" # replace with path to file to be laoded

# Local PDF file uploads
if local_path:
  loader = UnstructuredWordDocumentLoader(file_path=local_path)
  data = loader.load()
else:
  print("Upload a Word file")

In [9]:
# Preview first page
data[0].page_content

'Solution Delivery Analyst - India\n\n\tSolution Delivery Analyst, Risk Advisory\n\nManipal Institute of Technology, B.Tech in IT St. Xavier’s Collegiate School, ISC\n\n\t+91 8017100109\n\n\tribhattacharya@deloitte.com \t\tEnglish, Hindi, Bengali\n\nProfile\n\nSolution Delivery Analyst\n\nProfessional/client experience (anonymised)\n\nWorked on automating security infrastructure across various AWS services using Lamba functions.\n\nDeveloped custom RQLs and YAML codes for run and build time policies on Prisma Cloud environment.\n\nWorked on various integrations with Prisma Cloud Security platform like Jenkins and GitHub workflows.\n\nDelivered presentations on specific work domains as part of the project.\n\nSSDL setup in GCP environment.\n\nEngaged in Network Project for delivery of Terraform automation scripts for multi cloud resource deployment.\n\n\n\nSolution Delivery Analyst - India\n\nAdditional education and certifications\n\nCommunication proficiency\n\nPrinciples of ESG and S

**VECTOR EMBEDDINGS**

In [10]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

In [11]:
#Split and chunk
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=30)
chunks = text_splitter.split_documents(data)

In [12]:
#Add to Vector database
vector_db = Chroma.from_documents(
    documents=chunks,
    embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
    collection_name="local-rag-2"
)

OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:01<00:00,  4.54it/s]


**RETRIEVAL**

In [14]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

In [15]:
# LLM from Ollama
local_model = "knoopx/hermes-2-pro-mistral:7b-q8_0"
llm = ChatOllama(model=local_model)

In [16]:
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task to phrase the user question in the best possible way in order to get the most accurate answer from the document loaded. Your response should also separately contain the precise data which was referred to in the document in order to come up with your response to the question. Make sure to refer to the exact point in the document in the form of page numbers, etc.
    Original question: {question}""",
)

In [17]:
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

# RAG prompt
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [18]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [19]:
chain.invoke("What is this document about?")

OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.40s/it]


'Based on the given context, this document appears to be a CV or resume belonging to an individual named Rishav, as it contains details of his education, certifications, work experience, and professional activities. The person seems to have expertise in areas like ESG and Sustainability for Business, SSDL setup in GCP environment, Terraform automation scripts for multi-cloud resource deployment, Prisma Cloud Security platform integrations, and AWS Solutions Architect. They have also participated in various courses, conferences, and other professional activities.'

In [21]:
chain.invoke("Where has Rishav pursued his undergrad from and what are his relevant areas?")

OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.62s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 20.21it/s]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.43it/s]


'Rishav has pursued his undergraduate degree in IT from Manipal Institute of Technology. The relevant areas from the provided context include Solution Delivery Analyst, Risk Advisory, Communication proficiency, Principles of ESG and Sustainability for Business (Issued by Arizona State University), and participation in Deloitte Impact Day as a volunteer as well as Impact Everyday as an in-person and virtual session participant.'