# Basic RAG Pipeline Modularised

This notebook contains a modularised version of the codecamp tutorial code, contained under one callable function that starts the model.

In [1]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_ollama import OllamaLLM
import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS
from langchain.prompts import PromptTemplate
import os
import numpy as np

In [2]:
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever

In [4]:
MODEL_NAME = "llama3.2"

I have created a function to start a model, this will be updated to include our vector store of embedded data when the model is started.

In [5]:
def load_docs():
    
    document_loader = []

    for root, dirs, files in os.walk("."):
        # Skip chroma_db folder
        if "faiss" in root or "git" in root:
            continue
        for file in files:
            if file.endswith(".pdf"):
                document_loader.append(file)

    return document_loader

In [6]:
document_loader = load_docs()
document_loader

['ENSC3016_Course_Notes_Part_1_Electromagnetism_Transformers.pdf',
 'ENSC3016_Course_Notes_Part_2_Electric_Machines.pdf',
 'Electric Machinery Fundamentals Textbook -- Chapman.pdf',
 'ENSC3016 Study Guide 1-Review of Circuit Fundamentals.pdf',
 'Three Phase Power System Fundamentals.pdf']

In [7]:
embedding_model ="sentence-transformers/all-MiniLM-L6-v2" #embedding matrix model

def embed_splitting(document_loader, embedding_model):
    embeddings = HuggingFaceEmbeddings(model = embedding_model, encode_kwargs={'normalize_embeddings': True})

    doc_store = []
    for file in document_loader:
        loader = PyPDFLoader(file)
        doc = loader.load()
        doc_store += doc

    text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
        chunk_size = 400,
        chunk_overlap = 64
        )
    
    #Make splits
    splits = text_splitter.split_documents(doc_store)

    return embeddings, splits


In [8]:
embeddings, splits = embed_splitting(document_loader, embedding_model)

In [9]:
embeddings

HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2', cache_folder=None, model_kwargs={}, encode_kwargs={'normalize_embeddings': True}, query_encode_kwargs={}, multi_process=False, show_progress=False)

In [10]:
len(splits)

402

In [11]:
def cosine_similarity(input):

    input_vec = embeddings.embed_query(input)
    texts = [doc.page_content for doc in splits]
    vectors = embeddings.embed_documents(texts)

    mag_ivec = np.linalg.norm(input_vec)

    dot_product = []
    mag_ovec = []

    for context_vec in vectors:
        dot_product.append(np.dot(input_vec, context_vec))
        mag_ovec.append(np.linalg.norm(context_vec))

    cos_sim = []
    for i in range(len(mag_ovec)):
        mag_ovec[i] *= mag_ivec
        cos_sim.append(dot_product[i]/mag_ovec[i])

    cossim_sort = list(enumerate(cos_sim))
    cossim_sort.sort(key=lambda x: x[1], reverse=True)

    return cossim_sort


In [12]:
cossim_sort = cosine_similarity("Explain transformers")

In [13]:
i = 0
while i < 3:
    print(f"The number {i+1} document is the {cossim_sort[i][0]} chunk, and reads the following: \n\n{splits[cossim_sort[i][0]].page_content}\n")
    i += 1

The number 1 document is the 106 chunk, and reads the following: 

Transformer 52 
 
 
 
   Figure 6-3 Shell-type transformers. 
 
 
 
Figure 6-4 Flux plot: shell-type transformer 
 
 
Toroidal transformers exploit the remarkable properties of toroidal coils described in section 3.6. 
Although they are more expensive than shell-type transformers, the performance is better. They are used 
in high -quality electronic equipment and for instrument transformers (see section 6.3) where 
measurement accuracy is important. Typical toroidal transformers are shown in figure 6-5. 
 
Figure 6-5 Toroidal transformers.
 
 
 
6.2 Transformer Principle: 
The action of a transformer is most easily understood if the two coils are wound on opposite sides of a 
magnetic core, as shown in the model of figure 6 -6. This form is used for some low -cost transformers, 
but the magnetic coupling is not as good as with the shell-type construction. 
 
 
Figure 6-6  Core-type transformer 
 
 
 
Figure 6 -7 is a s

In [14]:
dim = len(embeddings.embed_query("test sentence"))
index = faiss.IndexFlatL2(dim)

if os.path.exists("faiss_index"):
    print("Loading FAISS index from disk...")
    vector_store = FAISS.load_local("faiss_index", embeddings=embeddings, allow_dangerous_deserialization=True)
else:
    print("Building FAISS index from scratch...")
    dim = len(embeddings.embed_query("test sentence"))
    index = faiss.IndexFlatL2(dim)
    vector_store = FAISS(
        embedding_function=embeddings,
        index=index,
        docstore=InMemoryDocstore(),
        index_to_docstore_id={},
    )
    vector_store.add_documents(splits)
    vector_store.save_local("faiss_index")

Loading FAISS index from disk...


In [None]:
# create the retriever object once
semantic_retriever = vector_store.as_retriever(search_kwargs={'k': 4})

# define your function to query it
def semantic_search(retriever_obj, input_context: str):
    return retriever_obj.invoke(input_context)

# call the function with retriever and query string
results = semantic_search(semantic_retriever, "Explain transformers")

AttributeError: 'function' object has no attribute 'invoke'

In [16]:
semantic_retriever

VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x312e6b310>, search_kwargs={'k': 4})

In [17]:
results

[Document(id='065ed451-14c6-4500-8de0-d852bce2b40a', metadata={'producer': 'Microsoft® Word 2013', 'creator': 'Microsoft® Word 2013', 'creationdate': '2019-07-27T15:04:48+08:00', 'author': 'Ali Kharrazi', 'moddate': '2019-07-27T15:04:48+08:00', 'source': 'ENSC3016_Course_Notes_Part_1_Electromagnetism_Transformers.pdf', 'total_pages': 76, 'page': 51, 'page_label': '52'}, page_content='Transformer 52 \n \n \n \n   Figure 6-3 Shell-type transformers. \n \n \n \nFigure 6-4 Flux plot: shell-type transformer \n \n \nToroidal transformers exploit the remarkable properties of toroidal coils described in section 3.6. \nAlthough they are more expensive than shell-type transformers, the performance is better. They are used \nin high -quality electronic equipment and for instrument transformers (see section 6.3) where \nmeasurement accuracy is important. Typical toroidal transformers are shown in figure 6-5. \n \nFigure 6-5 Toroidal transformers.\uf020\n \n \n \n6.2 Transformer Principle: \nThe ac

In [18]:
for i, doc in enumerate(results):
    print(i+1, "\n")
    print(doc.page_content, "\n")

1 

Transformer 52 
 
 
 
   Figure 6-3 Shell-type transformers. 
 
 
 
Figure 6-4 Flux plot: shell-type transformer 
 
 
Toroidal transformers exploit the remarkable properties of toroidal coils described in section 3.6. 
Although they are more expensive than shell-type transformers, the performance is better. They are used 
in high -quality electronic equipment and for instrument transformers (see section 6.3) where 
measurement accuracy is important. Typical toroidal transformers are shown in figure 6-5. 
 
Figure 6-5 Toroidal transformers.
 
 
 
6.2 Transformer Principle: 
The action of a transformer is most easily understood if the two coils are wound on opposite sides of a 
magnetic core, as shown in the model of figure 6 -6. This form is used for some low -cost transformers, 
but the magnetic coupling is not as good as with the shell-type construction. 
 
 
Figure 6-6  Core-type transformer 
 
 
 
Figure 6 -7 is a schematic representation of the transformer. It will be assumed 

In [19]:
bm25_retriever = BM25Retriever.from_documents(splits)
bm25_retriever.k = 4

def bm25_keyword_search_lc(query):
    return bm25_retriever.invoke(query)

In [20]:
keyword_results = bm25_keyword_search_lc("Explan transformers")
for i, doc in enumerate(keyword_results):
    print(f"Document {i+1}:\n{doc.page_content}\n")

Document 1:
65 Electrical Machines and Systems                                                                                                            
6.8 Current Transformers 
Instrument transformers are special transformers for extending the range of measur ing instruments. 
There are two basic types: voltage transformers for measuring high voltages, and current transformers 
for measuring high currents. Using transformers for voltage measurement is similar in principle to the 
ordinary use of transformers to ch ange voltage levels, so it will not be considered further. Current 
transformers, on the other hand, need special consideration. These are usually toroidal transformers with 
high-quality core material. 
Figure 6-25 shows a load connected to a source. The primary of a current transformer is in series with 
the load, and the secondary is connected to a meter 
  
Figure 6-25 Use of a current transformer 
 
 
Equation 6-19 gives: 
𝐼𝑀 ≈ 𝑁1
𝑁2
𝐼𝐿     𝑜𝑟  𝐼𝐿 ≈ 𝑁1
𝑁2
𝐼𝑀         

In [21]:
keyword_results

[Document(metadata={'producer': 'Microsoft® Word 2013', 'creator': 'Microsoft® Word 2013', 'creationdate': '2019-07-27T15:04:48+08:00', 'author': 'Ali Kharrazi', 'moddate': '2019-07-27T15:04:48+08:00', 'source': 'ENSC3016_Course_Notes_Part_1_Electromagnetism_Transformers.pdf', 'total_pages': 76, 'page': 64, 'page_label': '65'}, page_content='65 Electrical Machines and Systems                                                                                                            \n6.8 Current Transformers \nInstrument transformers are special transformers for extending the range of measur ing instruments. \nThere are two basic types: voltage transformers for measuring high voltages, and current transformers \nfor measuring high currents. Using transformers for voltage measurement is similar in principle to the \nordinary use of transformers to ch ange voltage levels, so it will not be considered further. Current \ntransformers, on the other hand, need special consideration. These are

In [22]:
ensemble_retriever = EnsembleRetriever(retrievers= [semantic_retriever, bm25_retriever], weights = [0.67, 0.33], search_kwargs={"k": 3})

def hybrid_search(retriever_obj, input_context: str):
    return retriever_obj.invoke(input_context)

hybrid_results = hybrid_search(ensemble_retriever, "Explain transformers")

In [23]:
len(hybrid_results)

7

In [24]:
for i, doc in enumerate(hybrid_results):
    print(f"Document {i+1}:\n{doc.page_content}\n")

Document 1:
Transformer 52 
 
 
 
   Figure 6-3 Shell-type transformers. 
 
 
 
Figure 6-4 Flux plot: shell-type transformer 
 
 
Toroidal transformers exploit the remarkable properties of toroidal coils described in section 3.6. 
Although they are more expensive than shell-type transformers, the performance is better. They are used 
in high -quality electronic equipment and for instrument transformers (see section 6.3) where 
measurement accuracy is important. Typical toroidal transformers are shown in figure 6-5. 
 
Figure 6-5 Toroidal transformers.
 
 
 
6.2 Transformer Principle: 
The action of a transformer is most easily understood if the two coils are wound on opposite sides of a 
magnetic core, as shown in the model of figure 6 -6. This form is used for some low -cost transformers, 
but the magnetic coupling is not as good as with the shell-type construction. 
 
 
Figure 6-6  Core-type transformer 
 
 
 
Figure 6 -7 is a schematic representation of the transformer. It will be 

In [25]:
#We need to create functions that create embeddings, load documents and split text

In [26]:
def pipeline_combined(model_name = MODEL_NAME):

    llm = OllamaLLM(model = MODEL_NAME)

    template = """You are an expert assistant answering based only on the provided context.

    Here are 3 relevant document chunks retrieved:

    Chunk 1:
    {chunk1}

    Chunk 2:
    {chunk2}

    Chunk 3:
    {chunk3}
    
    Chunk 4:
    {chunk4}
    
    Use all relevant information above to answer the question below. If the answer isn't found in the chunks, say:
    "I cannot answer this question because the necessary information was not found in the provided documents."

    When answering, cite the **source file name** and **slide/page number** if available.

    Question: {question}
    """

    prompt = PromptTemplate.from_template(template)
    chain = prompt | llm
    print(f"\n Model {model_name} has been initiated. Please feel free to ask any questions or type 'exit' to end this session")
    
    while True:
        user_input = input("You:")
        if user_input.lower() in ['exit', 'quit']:
            print("Have a good day.")
            break

        context_docs = hybrid_search(ensemble_retriever, user_input)[:4]

        # Pass context and question into the chain
        chunks = [
            f"Source: {doc.metadata.get('source', 'unknown')}, Page: {doc.metadata.get('page', 'unknown')}\n{doc.page_content}"
            for doc in context_docs
        ]

        response = chain.invoke({
            "chunk1": chunks[0],
            "chunk2": chunks[1],
            "chunk3": chunks[2],
            "chunk4": chunks[3],
            "question": user_input
        })

        print(f"LLM: {response}\n")

In [27]:
pipeline_combined()


 Model llama3.2 has been initiated. Please feel free to ask any questions or type 'exit' to end this session
LLM: Transformers are a practical application of magnetically coupled coils. They are used to transfer energy from one coil to another, typically to provide electrical isolation between the source and the load or to change the voltage and current levels (Chunk 2, Source: ENSC3016_Course_Notes_Part_1_Electromagnetism_Transformers.pdf, Page: 50).

Transformers work by using two coils, one connected to the source (primary) and the other connected to the load (secondary), which are wound on opposite sides of a magnetic core. The magnetic flux passes through each turn of each coil, and the ratio of the turns determines the transformation of voltage and current (Chunk 3, Source: ENSC3016_Course_Notes_Part_1_Electromagnetism_Transformers.pdf, Page: 52).

The core type transformer is the most common form used for transformers, where the coils are wound on opposite sides of a magnetic c