# Document Search and Retrieval for complex documents using RAG


<img src="arch.png" width=400px>

### Process document and create vectorstore

In [1]:
from rag_101.retriever import (
    load_pdf,
    split_text,
    load_embedding_model,
    load_reranker_model,
    generate_embeddings,
    rerank_docs,
)

Load documents and split into chunks

In [None]:
# Load the PDFs from arxiv
# You can upload your own document and update the path
files = ["example_data/runescapepdf.pdf"]
loaders = load_pdf(files=files)

# Spliting
documents = split_text(loaders=loaders, chunk_size=1000)

Generate embeddings and store in a vector database

In [None]:
# initialize models
embedding_model = load_embedding_model(model_name="BAAI/bge-large-en-v1.5")
reranker_model = load_reranker_model(reranker_model_name="BAAI/bge-reranker-large")

# generate embeddings and store in vector database
print("Generating embeddings... This might take some time.")
vectorstore = generate_embeddings(documents, embedding_model=embedding_model)
retriever = vectorstore.as_retriever(search_kwargs={"k": 10})

Generating embeddings... This might take some time.


### Query the input document

In [None]:
query = "What is the grand exchange in runescape?"

retrieved_documents = retriever.get_relevant_documents(query)
reranked_documents = rerank_docs(reranker_model, query, retrieved_documents)

print("\nUser query:", query)
print("--" * 50)
print(
    "Retrieved content:",
)
print(reranked_documents[0][0].page_content)
print("--" * 50)
print("metadata:", reranked_documents[0][0].metadata)


User query: What is the grand exchange in runescape?
----------------------------------------------------------------------------------------------------
Retrieved content:
3.2 The Grand Exchange The Grand Exchange (“GE”) is a marketplace that connects buyers and sellers of all OSRS tradeable items, similar to a real-world secu- rities exchange. The GE works by clearing offers between buyers and sellers in the order it receives them. For example, suppose a buyer places an order for an item at 𝑃𝐵 and there exists a sell offer

Figure 1: The Grand Exchange (GE) is a mechanism within RuneScape to connect buy and sell orders for thousands of items.
----------------------------------------------------------------------------------------------------
metadata: {'source': 'example_data/runescapepdf.pdf'}


### Run through some sample queries and observe the results

In [None]:
query1 = "How does social behaviour affect trading in online games?"
query2 = "How does the runescape economy work?"
query3 = "What happens with item sinks long term?"

queries = [
    query1,
    query2,
    query3,

]

for i, query in enumerate(queries):
    print(f"Example {i+1}: Query->", query)
    print(
        ".." * 50,
    )
    print("Retrieved document:")

    retrieved_documents = retriever.get_relevant_documents(query)
    reranked_documents = rerank_docs(reranker_model, query, retrieved_documents)

    print("--" * 50)
    print(reranked_documents[0][0].page_content)
    print("--" * 50)
    print("metadata:", reranked_documents[0][0].metadata)
    print("==" * 50, "\n\n")

Example 1: Query-> How does social behaviour affect trading in online games?
....................................................................................................
Retrieved document:
----------------------------------------------------------------------------------------------------


2.2 Social Behavior in MMORPGs Many MMORPGs have a high degree of player interaction, such as through text chat or trading mechanisms. These interactions have important effects on player retention and economic activity. For example, Jeong et al. find that friendship and trading interaction networks have high degrees of overlap compared to other types of social interactions [29]. In Aion, a popular MMORPG, Chun et al. observe that interactions such as message exchange are highly correlated with trade [16]. Kang et al. show that people tend to join guilds, which are in-game groupings of players, when they expect benefits, such as money or valuable items [32]. Furthermore, users often exhibit 