# RAG-Powered Assistant for Custom Cash Register Manual

This project demonstrates a simple Retrieval-Augmented Generation (RAG) pipeline using a custom PDF manual I created for a point-of-sale (POS) cash register system. The goal is to allow users to ask natural language questions (e.g., *"How do I print a daily report?"*) and get accurate, concise answers based on the manual's content. It's a practical showcase of how LLMs can enhance user guidance for specialized tools, built entirely with free and open-source tools.

Chrysovalantis K.

#Pips And Libs

In [None]:
!pip install -U sentence-transformers faiss-cpu PyMuPDF
!pip install transformers

Collecting sentence-transformers
  Downloading sentence_transformers-4.1.0-py3-none-any.whl.metadata (13 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.8 kB)
Collecting PyMuPDF
  Downloading pymupdf-1.25.5-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.4 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading n

In [None]:
import fitz  # PyMuPDF
from typing import List
from sentence_transformers import SentenceTransformer
import numpy as np
import faiss
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

In [None]:
pip install rouge-score


Collecting rouge-score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge-score
  Building wheel for rouge-score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge-score: filename=rouge_score-0.1.2-py3-none-any.whl size=24934 sha256=bb208198a635e0e94a9b1518fbebd059c30a79e942f65fdc3ac3ffdd28dd4817
  Stored in directory: /root/.cache/pip/wheels/1e/19/43/8a442dc83660ca25e163e1bd1f89919284ab0d0c1475475148
Successfully built rouge-score
Installing collected packages: rouge-score
Successfully installed rouge-score-0.1.2


In [None]:
from google.colab import drive
drive.mount('/content/drive')
pdf_path = '/content/drive/MyDrive/Custom Manual.pdf'


Mounted at /content/drive


CODE

# PDF Extraction

Extracts all text from a PDF file using PyMuPDF.  
Opens the PDF, iterates through each page, and collects the text.  
Returns the full document text as a single string.

In [None]:

def extract_text_from_pdf(pdf_path):
    doc = fitz.open(pdf_path)
    text = ""
    for page in doc:
        text += page.get_text()
    return text

pdf_text = extract_text_from_pdf(pdf_path)


# Chunking the Text

Splits the extracted text into overlapping word chunks to ensure context continuity.  
Each chunk has a maximum length of 100 words with a 25-word overlap, I did this based on the setup of my pdf.  
This helps improve retrieval accuracy when searching for relevant information later.

In [None]:
def chunk_text(text: str, max_length: int = 100, overlap: int = 25) -> List[str]:
    words = text.split()
    chunks = []
    i = 0
    while i < len(words):
        chunk = words[i:i + max_length]
        chunks.append(" ".join(chunk))
        i += max_length - overlap
    return chunks

chunks = chunk_text(pdf_text)


# Embedding and Indexing

Embeds each chunk into a dense vector using the `all-mpnet-base-v2` model.  
This model is chosen for its strong performance in English sentence similarity tasks.  
FAISS is then used to index these embeddings for fast similarity-based retrieval.


In [None]:
model = SentenceTransformer('all-mpnet-base-v2')

embeddings = model.encode(chunks, convert_to_numpy=True)

dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(embeddings)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

# Retrieving Relevant Chunks

Encodes the user question into the same vector space as the document chunks.  
Retrieves the top-k most similar chunks using FAISS based on vector similarity.  
Returns them as context for answering the question.


In [None]:
def retrieve_answer(question, top_k=3):
    question_embedding = model.encode([question])
    D, I = index.search(np.array(question_embedding), top_k)
    retrieved_chunks = [chunks[i] for i in I[0]]
    return "\n---\n".join(retrieved_chunks)

Checking how it works.



In [None]:
greek_question = "How to add an item?"
print(retrieve_answer(greek_question))

that have been stored. From there the user can add items by selecting the + at the bottom of the screen, edit existing items by selecting the yellow pencil to the right of the stored item or delete them by selecting the red X to the right of the stored item. How to add an item? In the screen "ADD ITEM" the name and description of the item is filled in. Then select the department to which the article will fall, the gross sales price and the net profit. The net sales price is automatically filled in based on the
---
the desired quantity. Then select the X ,enter the price of the product and select the corresponding section. To import an item, you select "ITEMS" to display the items stored in the warehouse (See STORAGE section). After selecting the item, you press check. To enter a discount after the products have been entered, you select "DISCOUNT" and enter the discount value. To enter a comment, you select "2nd level" and then comment. You enter the comment and press check. The "AC" bu

# Question Answering with FLAN-T5

Loads the `Flan-T5 Large` model for text-to-text generation using Hugging Face's pipeline.  
This model generates answers based on retrieved context and the user's question.  
It's a strong general-purpose LLM fine-tuned for instruction following.

Pros: Free to use, high-quality outputs, good for many tasks.  
Cons: Limited input size (~512 tokens), can sometimes truncate or miss detail.


In [None]:
model_id = "google/flan-t5-large"
t5_tokenizer = AutoTokenizer.from_pretrained(model_id)
t5_model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

qa_pipeline = pipeline("text2text-generation", model=t5_model, tokenizer=t5_tokenizer)


tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Device set to use cpu


# Generating the Final Answer

Generates a well-structured answer using the retrieved context and question.  
Truncates long input to stay within model limits and sends a clear prompt to the model.  
Uses sampling (`temperature=0.7`, `top_p=0.9`) for more natural, varied responses.


In [None]:
def generate_answer(question, retrieved_text, max_input_chars=1000):
    # Truncate retrieved text to avoid overloading model
    retrieved_text = retrieved_text[:max_input_chars]

    prompt = f"Give a concise, professional, complete and clear answer this question using the following data:\n{retrieved_text}\n\nQuestion: {question}"
    print("----- Prompt Sent to Model -----")
    print(prompt)
    print("--------------------------------")

    result = qa_pipeline(prompt, max_length=256, temperature=0.7, top_p=0.9, do_sample=True)


    if result and 'generated_text' in result[0]:
        return result[0]['generated_text']
    else:
        return "[No answer returned from the model]"

# Evaluation

For the evaluation 5 questions were given to the model. Also Rouge-L score was used as a metric to one of the questions.

In [None]:
# Example usage
question = "How to add an item;"
retrieved = retrieve_answer(question)
answer = generate_answer(question, retrieved)

print("----- Generated Answer -----")
print(answer)


----- Prompt Sent to Model -----
Give a concise, professional, complete and clear answer this question using the following data:
that have been stored. From there the user can add items by selecting the + at the bottom of the screen, edit existing items by selecting the yellow pencil to the right of the stored item or delete them by selecting the red X to the right of the stored item. How to add an item? In the screen "ADD ITEM" the name and description of the item is filled in. Then select the department to which the article will fall, the gross sales price and the net profit. The net sales price is automatically filled in based on the
---
the desired quantity. Then select the X ,enter the price of the product and select the corresponding section. To import an item, you select "ITEMS" to display the items stored in the warehouse (See STORAGE section). After selecting the item, you press check. To enter a discount after the products have been entered, you select "DISCOUNT" and enter th

In [None]:
# Example usage
question = "Can I connect EFT-POS;"
retrieved = retrieve_answer(question)
answer = generate_answer(question, retrieved)

print("----- Generated Answer -----")
print(answer)


----- Prompt Sent to Model -----
Give a concise, professional, complete and clear answer this question using the following data:
within the network or save it to PDF. By clicking on the copies option the user can manage the print settings. He then selects the print icon to proceed to print the report. In the "DAILY REPORT" option the user selects the "CREATE" button to print the report giving data by category and totals. OPTIONS How to manage general parameters and connect the eft-pos? Through the "General Parameters" field, the declaration is made with the bank's application for card payments. In the menu you have to select the bank, select the check "1155", if it is the specific type of
---
bank's application for card payments. In the menu you have to select the bank, select the check "1155", if it is the specific type of connection, and fill in the TID that you will find in the bank's application. When it is a Worldline application, the user goes to the option "Bank POS Management" 

In [None]:
# Example usage
question = "How to connect EFT-POS;"
retrieved = retrieve_answer(question)
answer = generate_answer(question, retrieved)

print("----- Generated Answer -----")
print(answer)


----- Prompt Sent to Model -----
Give a concise, professional, complete and clear answer this question using the following data:
within the network or save it to PDF. By clicking on the copies option the user can manage the print settings. He then selects the print icon to proceed to print the report. In the "DAILY REPORT" option the user selects the "CREATE" button to print the report giving data by category and totals. OPTIONS How to manage general parameters and connect the eft-pos? Through the "General Parameters" field, the declaration is made with the bank's application for card payments. In the menu you have to select the bank, select the check "1155", if it is the specific type of
---
bank's application for card payments. In the menu you have to select the bank, select the check "1155", if it is the specific type of connection, and fill in the TID that you will find in the bank's application. When it is a Worldline application, the user goes to the option "Bank POS Management" 

In [None]:
# Example usage
question = "what is the topic of this manual?"
retrieved = retrieve_answer(question)
answer = generate_answer(question, retrieved)

print("----- Generated Answer -----")
print(answer)

----- Prompt Sent to Model -----
Give a concise, professional, complete and clear answer this question using the following data:
interface. Through the document selection field, the document that will be the default when starting the application for invoice issuance is selected. In the field "Code In the field "Prefix. Message" field, the user fills in the code for sending a message during the issuance of a B2G document. When the user completes the selections then presses the green check to save. How to search with VAT number? In this option, the user fills in the name and password that the accountant will give him/her to find the customer details with the VAT number. When the user completes the options
---
the name and password that the accountant will give him/her to find the customer details with the VAT number. When the user completes the options then he/she clicks on the green check to save. How to connect to a Provider? In this option, the user declares the data that is linked (A

In [None]:
# Example usage
question = "How to enter a comment?"
retrieved = retrieve_answer(question)
answer = generate_answer(question, retrieved)

print("----- Generated Answer -----")
print(answer)

----- Prompt Sent to Model -----
Give a concise, professional, complete and clear answer this question using the following data:
the desired quantity. Then select the X ,enter the price of the product and select the corresponding section. To import an item, you select "ITEMS" to display the items stored in the warehouse (See STORAGE section). After selecting the item, you press check. To enter a discount after the products have been entered, you select "DISCOUNT" and enter the discount value. To enter a comment, you select "2nd level" and then comment. You enter the comment and press check. The "AC" button is used to delete all the products that have been entered and the "C" button is used to
---
to be created, then click next to proceed to the next page. How to change contract details? You will then go to the contract details tab. If there is no registered contract then the mandatory fields (those with an asterisk) will need to be entered and the "Budget" option will be selected. Clic

In [None]:
from rouge_score import rouge_scorer

scorer = rouge_scorer.RougeScorer(['rougeL'], use_stemmer=True)

In [None]:
evaluation_set = [
    {
        "question": "How to connect EFT-POS?",
        "expected_answer": """Through the General Parameters field, the declaration is made with the bank's application for card payments. In the menu you have to select the bank, select the check 1155, if it is the specific type of connection, and fill in the TID that you will find in the bank's application. When it is a Worldline application, the user goes to the option "Bank POS Management" then to "Connect to Worldline" and fills in the details. The other fields in this menu are not related to the interface."""
    }
]


In [None]:
for item in evaluation_set:
    retrieved = retrieve_answer(item["question"])
    generated = generate_answer(item["question"], retrieved)
    scores = scorer.score(item["expected_answer"], generated)
    print(f"Question: {item['question']}")
    print(f"ROUGE-L Score: {scores['rougeL'].fmeasure:.4f}")
    print()



----- Prompt Sent to Model -----
Give a concise, professional, complete and clear answer this question using the following data:
within the network or save it to PDF. By clicking on the copies option the user can manage the print settings. He then selects the print icon to proceed to print the report. In the "DAILY REPORT" option the user selects the "CREATE" button to print the report giving data by category and totals. OPTIONS How to manage general parameters and connect the eft-pos? Through the "General Parameters" field, the declaration is made with the bank's application for card payments. In the menu you have to select the bank, select the check "1155", if it is the specific type of
---
bank's application for card payments. In the menu you have to select the bank, select the check "1155", if it is the specific type of connection, and fill in the TID that you will find in the bank's application. When it is a Worldline application, the user goes to the option "Bank POS Management" 

# Conclusion

This mini RAG system successfully answered five real questions based on the custom POS manual, showing how language models can support practical, domain-specific tasks. With a ROUGE-L score of **0.7376** on one of the questions, the responses were accurate and relevant. It’s a lightweight but powerful demo of how you can build smart assistants for specialized content.
