# The G in RAG
After experimenting with retrieval models and tactics, we can finnaly complete the rag system. I will be using phi3 mini model as its small and should be able to run relatively easily locally on my laptop.

In [1]:
from langchain_chroma import Chroma
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_classic.retrievers import ContextualCompressionRetriever
from langchain_classic.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder

persist_directory = "./chroma/06_testing_with_noise"
embed = HuggingFaceEmbeddings(model_name="google/embeddinggemma-300m", encode_kwargs={'normalize_embeddings': True})
vectorstore = Chroma(persist_directory=persist_directory, embedding_function=embed)

model = HuggingFaceCrossEncoder(model_name="cross-encoder/ms-marco-MiniLM-L-6-v2")
compressor = CrossEncoderReranker(model=model, top_n=10)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, 
    base_retriever=vectorstore.as_retriever()
)

The first step is to get our vectorstore and our reranking retriever from earlier. Then we can make a chain using phi3 mini.

In [2]:
from langchain_ollama import ChatOllama
from langchain_classic.chains import create_retrieval_chain
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOllama(model="phi3:mini", temperature=0)

system_prompt = (
    "You are an expert technical assistant. Use the following pieces of retrieved "
    "context to answer the question. If you don't know the answer based on the "
    "context, say that you don't know. Use three sentences maximum and keep the "
    "answer concise.\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}"),
])

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(compression_retriever, question_answer_chain)

And now, we can see how the "ai assistant" works.

In [4]:
def test_func(question) :
    response = rag_chain.invoke({"input": question})
    print(f"Question: {question}")
    print(f"Answer: {response['answer']}")

In [7]:
from random import sample
from utils import load_processed_data

_, questions = load_processed_data()
for x in sample(questions,5) :
    test_func(x['question'])

Question: How many companies did Apple promise were develping products for the new computer?
Answer: Apple promised that 79 companies, including Lotus, Digital Research, and Ashton-Tate, were developing products for the new computer.
Question: What did Donald Davies Develop
Answer: Donald Davies independently developed packet switching at NPL and proposed a nationwide network for the UK, which later influenced networks like ARPANET in the US. He coined the term "packet switching" after learning about Baran's work on Distributed Adaptive Message Block Switching.
Question: What character represents the "line feed" function?
Answer: Character 10 represents the "line feed" function in ASCII. It causes a printer or display system to advance its paper by one line without moving the printhead, effectively starting on a new line of text.
Question: What proposal has been made for the Mayan script? 
Answer: No encoding proposals have yet been made for the Rongorongo and Mayan scripts; user commu