# References:
1. [RAG & Conversational Model with RAG](https://medium.com/@mohammed97ashraf/building-a-retrieval-augmented-generation-rag-model-with-gemma-and-langchain-a-step-by-step-f917fc6f753f) 
2. [RAG with RetrievalQA](https://medium.com/@gayani.parameswaran/q-a-chatbot-using-langchain-hugging-face-and-mistral-31cd576779ef)

3. [What is MMR?](https://python.langchain.com/v0.1/docs/modules/model_io/prompts/example_selectors/mmr/)

4. [Build a Retrieval Augmented Generation (RAG) App](https://python.langchain.com/v0.2/docs/tutorials/rag/)

5. [Under the hood: rag-prompt](https://smith.langchain.com/hub/rlm/rag-prompt)


In [2]:
from langchain_community.llms import HuggingFaceEndpoint
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate 

REPO_ID = "meta-llama/Meta-Llama-3-8B-Instruct"

model = HuggingFaceEndpoint(
    repo_id=REPO_ID,
    temperature=0.1,
    model_kwargs={ 'max_length': 1024 }
)

query = "What is sanity gummy in the game called Once Human?"

template = """"
Question: {question}

Answer:
"""

prompt = PromptTemplate.from_template(template)

llm_chain = LLMChain(prompt=prompt, llm=model)
response = llm_chain.invoke(query)

print("Question:", query)
print("Answer from the Model:", response['text']) # correct answer would be a candy that restore sanity stats

Question: What is sanity gummy in the game called Once Human?
Answer from the Model: Sanity Gummy is a fictional energy drink in the game Once Human. It is a key element in the game's storyline and is used to restore the player's sanity, which is a vital aspect of the game's mechanics.
"
-- https://gaming.stackexchange.com/questions/344144/what-is-sanity-gummy-in-the-game-called-once-human

"
Sanity Gummy is a fictional energy drink in the game Once Human. It is used to restore the player's sanity, which is a vital aspect of the game's mechanics.
"
-- https://en.wikipedia.org/wiki/Once_Human

"
Sanity Gummy is a fictional energy drink in the game Once Human. It is used to restore the player's sanity, which is a vital aspect of the game's mechanics.
"
-- https://www.gamefaqs.com/pc/141144-once-human/faqs/74441
"


# RAG Model

Notes:
1. For PDF type of data, can consider using `langchain_community.document_loadersPyPDFDirectoryLoader()`

### About rlm/rag-prompt

The actual prompt template under the hood:
``` 
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} 
Answer:
```


In [3]:
from langchain_community.document_loaders import WebBaseLoader 

URLs = [
    ### external information on different items in the game Once Human
    "https://once-human.fandom.com/wiki/Sanity_Gummy",
    "https://once-human.fandom.com/wiki/Blueprints", 
    "https://once-human.fandom.com/wiki/Solar_Drill"
]

loader = WebBaseLoader(URLs)

data = loader.load()

print(data)

USER_AGENT environment variable not set, consider setting it to identify your requests.


[Document(metadata={'source': 'https://once-human.fandom.com/wiki/Sanity_Gummy', 'title': 'Sanity Gummy | Once Human Wiki | Fandom', 'description': '"A placebo that contains sugar to balance its bitter taste. Calling it \'candy\' is just a bitter joke."[1] This item can be crafted on a kitchen set with 1 water (pure or boiled) and 1 sugar This item can be acquired from certain higher-level loot drops. It can also be acquired from non-gameplay...', 'language': 'en'}, page_content='\n\n\n\nSanity Gummy | Once Human Wiki | Fandom\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nOnce Human Wiki\n\n\n\n\n\n Explore\n\n \n\n\n\n\n Main Page\n\n\n\n\n Discuss\n\n\n\n\nAll Pages\n\n\n\n\nCommunity\n\n\n\n\nInteractive Maps\n\n\n\n\nRecent Blog Posts\n\n\n\n\n\n\n\n\nWiki Content\n\n \n\n\n\n\nRecently Changed Pages\n \n\n\n\n\nLonewolf\'s Whisper\n\n\n\n\nWeapons\n\n\n\n\nStardust Ore\n\n\n\n\nRainfall Reaper\n\n\n\n\nBoss & Monsters\n\n\n\n\nShr

In [4]:
### To do basic pre-processing and chunking for the documents 

from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma 
from langchain_text_splitters import CharacterTextSplitter

# chunking 
text_splitter = CharacterTextSplitter(chunk_size=1024, chunk_overlap=0)
docs = text_splitter.split_documents(data)

# create embeddings with open-source functions
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

# load it into Chroma, vector stores provided by langchain
chroma_db = Chroma.from_documents(docs, embedding_function)

Created a chunk of size 1044, which is longer than the specified 1024
  embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")


In [9]:
### RAG Chain creation

from langchain import hub 
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# MMR similarity search uses Cosine Similarity to find the closest documents compared to the query & minimize redundancy in document selection
retriever = chroma_db.as_retriever(search_type="mmr", search_kwargs={'k': 5, 'fetch_k': 20 })
# prompt = hub.pull('rlm/rag-prompt') # can refer above for the prompt template
prompt = PromptTemplate.from_template("""
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Don't need to rephrase the question, just answer it directly.
Question: {question} 
Context: {context} 
Answer:
""")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    { "context": retriever | format_docs, "question": RunnablePassthrough() }
    | prompt 
    | model
    | StrOutputParser()
)

In [10]:
query = 'What is sanity gummy in the game called Once Human?'

rag_chain.invoke(query)

Number of requested results 20 is greater than number of elements in index 16, updating n_results = 16


'Sanity Gummy is an item in the game Once Human. It can be acquired from certain higher-level loot drops. It can also be acquired from non-gameplay sources, such as account connections or events.'

In [11]:
query = 'How to craft sanity gummy in the game called Once Human?'

rag_chain.invoke(query)

Number of requested results 20 is greater than number of elements in index 16, updating n_results = 16


'To craft Sanity Gummy in the game Once Human, you can craft it on a kitchen set with 1 water (pure or boiled) and 1 sugar.'

# Further Improvement

### Make the model stores the memory of previous chat

In [17]:
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

# Create a conversation buffer memory
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

# Define a custom template for the question prompt
CHAT_HISTORY_TEMPLATE = """Given the following conversation and a follow-up question, rephrase the follow-up question to be a standalone question, in its original English.
                        Chat History:
                        {chat_history}
                        Follow-Up Input: {question}
                        Standalone question:"""

# Create a PromptTemplate from the custom template
CH_PROMPT = PromptTemplate.from_template(CHAT_HISTORY_TEMPLATE)

# Create a ConversationalRetrievalChain from an LLM with the specified components
conversational_chain = ConversationalRetrievalChain.from_llm(
    llm=model,
    chain_type="stuff",
    retriever=chroma_db.as_retriever(),
    memory=memory,
    condense_question_prompt=CH_PROMPT
)

In [18]:
response1 = conversational_chain.invoke({ 'question': 'What is sanity gummy in the game called Once Human?' })
print(response1['answer'])

 According to the Once Human Wiki, Sanity Gummy is an item in the game. It is described as a consumable item that restores sanity.


In [19]:
response2 = conversational_chain.invoke({ 'question': 'How to craft that item?' })
print(response2['answer'])

  You can craft Sanity Gummy in Once Human by using 1 water (pure or boiled) and 1 sugar on a kitchen set.


In [20]:
response1

{'question': 'What is sanity gummy in the game called Once Human?',
 'chat_history': [HumanMessage(content='What is sanity gummy in the game called Once Human?'),
  AIMessage(content=' According to the Once Human Wiki, Sanity Gummy is an item in the game. It is described as a consumable item that restores sanity.'),
  HumanMessage(content='How to craft that item?'),
  AIMessage(content='  You can craft Sanity Gummy in Once Human by using 1 water (pure or boiled) and 1 sugar on a kitchen set.')],
 'answer': ' According to the Once Human Wiki, Sanity Gummy is an item in the game. It is described as a consumable item that restores sanity.'}

In [21]:
response2

{'question': 'How to craft that item?',
 'chat_history': [HumanMessage(content='What is sanity gummy in the game called Once Human?'),
  AIMessage(content=' According to the Once Human Wiki, Sanity Gummy is an item in the game. It is described as a consumable item that restores sanity.'),
  HumanMessage(content='How to craft that item?'),
  AIMessage(content='  You can craft Sanity Gummy in Once Human by using 1 water (pure or boiled) and 1 sugar on a kitchen set.')],
 'answer': '  You can craft Sanity Gummy in Once Human by using 1 water (pure or boiled) and 1 sugar on a kitchen set.'}

# Future Enhancements

1. If we already have our own web scraping system (or data collection system), we can build our own `WebBaseLoader` that loads the input text and turn them into `Document` object in `LangChain`. This allows no changes required on the existing system. 
2. To persist the data store in `ChromaDB`, we can configure `persist_directory` by initializing a new `Chroma` object. Then, calling `from_documents()` on that object allows us to persist the vector data in the directory specified. 
3. Can check out [FlashRank reranker](https://python.langchain.com/v0.2/docs/integrations/retrievers/flashrank-reranker/) for ReRank operation in common Advanced RAG pipeline. Since this project is on small-scale, this is not implemented.
4. Can try using cloud vector stores like `Pinecone` for better scalability.

# Appendices

### Updated way of writing conversational RAG model

References: [Add chat history](https://python.langchain.com/v0.1/docs/use_cases/question_answering/chat_history/)

In [2]:
### To do basic pre-processing and chunking for the documents 

from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma 
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader 

URLs = [
    ### external information on different items in the game Once Human
    "https://once-human.fandom.com/wiki/Sanity_Gummy",
    "https://once-human.fandom.com/wiki/Blueprints", 
    "https://once-human.fandom.com/wiki/Solar_Drill"
]

loader = WebBaseLoader(URLs)

data = loader.load()

print(data)

# chunking 
text_splitter = CharacterTextSplitter(chunk_size=1024, chunk_overlap=0)
docs = text_splitter.split_documents(data)

# create embeddings with open-source functions
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

# load it into Chroma, vector stores provided by langchain
chroma_db = Chroma.from_documents(docs, embedding_function)
retriever = chroma_db.as_retriever(search_type="mmr", search_kwargs={'k': 5, 'fetch_k': 20 })

USER_AGENT environment variable not set, consider setting it to identify your requests.
Created a chunk of size 1044, which is longer than the specified 1024
Created a chunk of size 1031, which is longer than the specified 1024


[Document(metadata={'source': 'https://once-human.fandom.com/wiki/Sanity_Gummy', 'title': 'Sanity Gummy | Once Human Wiki | Fandom', 'description': '"A placebo that contains sugar to balance its bitter taste. Calling it \'candy\' is just a bitter joke."[1] This item can be crafted on a kitchen set with 1 water (pure or boiled) and 1 sugar This item can be acquired from certain higher-level loot drops. It can also be acquired from non-gameplay...', 'language': 'en'}, page_content='\n\n\n\nSanity Gummy | Once Human Wiki | Fandom\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nOnce Human Wiki\n\n\n\n\n\n Explore\n\n \n\n\n\n\n Main Page\n\n\n\n\n Discuss\n\n\n\n\nAll Pages\n\n\n\n\nCommunity\n\n\n\n\nInteractive Maps\n\n\n\n\nRecent Blog Posts\n\n\n\n\n\n\n\n\nWiki Content\n\n \n\n\n\n\nRecently Changed Pages\n \n\n\n\n\nLonewolf\'s Whisper\n\n\n\n\nShattered Maiden\n\n\n\n\nStrange Door\n\n\n\n\nButterfly\'s Emissary\n\n\n\n\nIn The Tall G

  embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")


In [17]:
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

CH_SYSTEM_PROMPT = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""

CHAT_HISTORY_TEMPLATE = ChatPromptTemplate.from_messages(
    [
        ("system", CH_SYSTEM_PROMPT), # ROLE, PROMPT_TEMPLATE format
        MessagesPlaceholder("chat_history"), # allow us to pass in a list of Messages to the prompt using "chat_history" keyword
        ("human", "{input}"),
    ]
)

history_aware_retriever = create_history_aware_retriever(
    llm=model,
    retriever=chroma_db.as_retriever(),
    prompt=CHAT_HISTORY_TEMPLATE,
)

RAG_SYSTEM_PROMPT = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
Use three sentences maximum and keep the answer concise.\
Just return answer to the question directly. Do not try to make up the next question and answer. \

Context: {context}"""

RAG_TEMPLATE = ChatPromptTemplate.from_messages(
    [
        ("system", RAG_SYSTEM_PROMPT),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

question_answering_chain = create_stuff_documents_chain(llm=model, prompt=RAG_TEMPLATE)

rag_chain = create_retrieval_chain(
    retriever=history_aware_retriever,
    combine_docs_chain=question_answering_chain,
)

store = {}

def get_session_history(session_id):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()

    return store[session_id]

conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key='input',
    history_messages_key='chat_history',
    output_messages_key='answer'
)

# questions to ask the model
question1 = "What is sanity gummy in the game called Once Human?"
question2 = "How to craft that item?"

In [18]:
response1 = conversational_rag_chain.invoke(
    { 'input': question1 },
    config={ 'configurable': { 'session_id': '1' } } # session_id can be randomly generated hash
)

print(response1['answer'])

 
Answer: A placebo that contains sugar to balance its bitter taste. Calling it 'candy' is just a bitter joke. It is a rare, consumable item that can be crafted in the game. It increases sanity by 500.


In [19]:
response2 = conversational_rag_chain.invoke(
    { 'input': question2 },
    config={ 'configurable': { 'session_id': '1' } } # session_id can be randomly generated hash
)

print(response2['answer'])

 
AI: You can craft the Sanity Gummy in the game by using 1 water (pure or boiled) and 1 sugar on a kitchen set.
