<a href="https://colab.research.google.com/github/Nanashi-bot/eduquery/blob/main/Advanced_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Chat with eBook
### Installation

In [None]:
!pip install langchain sentence-transformers chromadb pypdf unstructured pdf2image

In [None]:
!pip install unstructured['pdf']

In [None]:
from langchain.document_loaders import UnstructuredPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceInferenceAPIEmbeddings
from langchain.vectorstores import Chroma

from langchain.llms import HuggingFaceHub
from langchain.chains import RetrievalQA
from langchain.storage import InMemoryStore
from langchain.retrievers import ParentDocumentRetriever

In [None]:
import os
from getpass import getpass

HF_token = getpass()

··········


In [None]:
os.environ['HUGGINGFACEHUB_API_TOKEN'] = HF_token

In [None]:
file_path = "TarunJain_Resume.pdf"

In [None]:
data = UnstructuredPDFLoader(file_path)

In [None]:
content = data.load()

In [None]:
print(content)

[Document(page_content='Tarun R Jain\n\nlinkedin.com/in/jaintarun75/ | tarunjain.netlify.app/ | +919986197355 | jain.tarun7501@gmail.com\n\nWORK EXPERIENCE AI Planet\n\nBelgium\n\nDeveloper Relations and Community Manager\n\nApril 2023- Present\n\nIn this startup, I wear multiple hats by being part of the Data Science team and handling the community. I have worked on Fine Tuning LLMs, building Consultant POC to migrate the enterprise and business into AI, and deploying 6+ state-of-the-art models on AI Planet’s AI Marketplace.\n\nI have organized 20+ live sessions with experts from Google, Weights & Biases, Intel, and more. ● Furthermore, I am the lead curriculum contributor to the LLM Bootcamp, where I reached out to 11 speakers and led a group of 8 AI Ambassadors for the AI Changemaker program. I also feel proud that LLM Bootcamp had 2300+ registrations including Students and working professionals.\n\nI built Panda Coder 13B, a state-of-the-art LLM, a fine-tuned model, specifically de

In [None]:
len(content[0].page_content)

2808

## Chunking- Text Splitter

In [None]:
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=512,chunk_overlap=0)
child_splitter = RecursiveCharacterTextSplitter(chunk_size=100,chunk_overlap=0)

## Embedding Model

In [None]:
embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key = HF_token,model_name = "thenlper/gte-large"
)

In [None]:
vectorstore = Chroma(embedding_function=embeddings)
store = InMemoryStore()

In [None]:
model = HuggingFaceHub(repo_id="HuggingFaceH4/zephyr-7b-alpha",
                       model_kwargs={"temperature":0.5,"max_new_tokens":512,"max_length":64})

In [None]:
retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=store,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)

In [None]:
retriever.add_documents(content,ids=None)

In [None]:
query = "What is Tarun's contribution at AI Planet?"

In [None]:
relevant_context = retriever.get_relevant_documents(query)

In [None]:
relevant_context

[Document(page_content='Tarun R Jain\n\nlinkedin.com/in/jaintarun75/ | tarunjain.netlify.app/ | +919986197355 | jain.tarun7501@gmail.com\n\nWORK EXPERIENCE AI Planet\n\nBelgium\n\nDeveloper Relations and Community Manager\n\nApril 2023- Present\n\nIn this startup, I wear multiple hats by being part of the Data Science team and handling the community. I have worked on Fine Tuning LLMs, building Consultant POC to migrate the enterprise and business into AI, and deploying 6+ state-of-the-art models on AI Planet’s AI Marketplace.', metadata={'source': 'TarunJain_Resume.pdf'})]

In [None]:
relevant_context

[Document(page_content='Tarun R Jain\n\nlinkedin.com/in/jaintarun75/ | tarunjain.netlify.app/ | +919986197355 | jain.tarun7501@gmail.com\n\nWORK EXPERIENCE AI Planet\n\nBelgium\n\nDeveloper Relations and Community Manager\n\nApril 2023- Present\n\nIn this startup, I wear multiple hats by being part of the Data Science team and handling the community. I have worked on Fine Tuning LLMs, building Consultant POC to migrate the enterprise and business into AI, and deploying 6+ state-of-the-art models on AI Planet’s AI Marketplace.', metadata={'source': 'TarunJain_Resume.pdf'})]

In [None]:
query = "Who is Rahul?"

In [None]:
relevant_context = retriever.get_relevant_documents(query)

In [None]:
relevant_context

[Document(page_content='Google Developer Expert in Machine Learning ● Entrepreneurship Training at CHOSS- Cambridge House of Student Startup. ● An active participant in the HuggingFace Keras working group. ● Deep Learning AI Event Ambassador in Bangalore Region.\n\n[Certifications] [Oct 2023-] [May 2022- Mar 2023] [Sept 2022- Jan 2023] [Oct 2022- present]', metadata={'source': 'TarunJain_Resume.pdf'})]

In [None]:
# !pip install cohere

## Step-1 Retrieval

In [None]:
# os.environ["COHERE_API_KEY"] = "5uuX8mk9dhf9KHzw7vSDhQdXlV2x92MzELvJ972T"

In [None]:
# from langchain.retrievers import ContextualCompressionRetriever
# from langchain.retrievers.document_compressors import CohereRerank

In [None]:
# from cohere import Client

In [None]:
# co = Client(api_key = "5uuX8mk9dhf9KHzw7vSDhQdXlV2x92MzELvJ972T")

In [None]:
# from typing import ForwardRef
# from pydantic import BaseModel

# class CustomCohereRerank(CohereRerank):
  # class Config(BaseModel.Config):
    # arbitrary_types_allowed = True

# CustomCohereRerank.update_forward_refs()

In [None]:
# compressor = CustomCohereRerank(client=co)

In [None]:
# compression_retriever = ContextualCompressionRetriever(
    # base_compressor=compressor, base_retriever=retriever
# )

## Step - 2 Augment

In [None]:
from langchain_core.prompts import ChatPromptTemplate

In [None]:
template = """
<|system|>>
You are an AI Assistant that follows instructions extremely well.
Please be truthful and give direct answers. Please tell 'I don't know' if user query is not in CONTEXT

CONTEXT: {context}
</s>
<|user|>
{query}
</s>
<|assistant|>
"""

In [None]:
prompt = ChatPromptTemplate.from_template(template)

## Step-3 Generation

In [None]:
from langchain_core.output_parsers import StrOutputParser

In [None]:
from langchain_core.runnables import RunnablePassthrough

In [None]:
output_parser = StrOutputParser()

In [None]:
chain = (
    {"context": retriever, "query": RunnablePassthrough()}
    | prompt
    | model
    | output_parser
)

In [None]:
query = "Who is Rahul?"

In [None]:
response = chain.invoke(query)

In [None]:
print(response)

I do not have information about a specific person named rahul. please provide more context or information about rahul to help me identify who you are referring to.


In [None]:
print(chain.invoke("what is Tarun's role at AI Planet?"))

Tarun's role at AI Planet is "Developer Relations and Community Manager." (from the provided context)
