## RAG Application for quick reference of Appliance user manuals

The most reliable source of information for maintenance tips and troubleshooting for any appliance is the accompanied user manual. As the appliance becomes larger and complex, the user manual becomes disproportionately complex. This application aims to help homemakers with quick troubleshooting and maintenance tips without having to search through the several pages of pdf document.


### Importing required libraries

This tool uses opensource tools like Langchain and Ollama with Llama models to create the RAG pipelines.

In [108]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_ollama.embeddings import OllamaEmbeddings
from langchain_chroma import Chroma
from langchain_ollama import ChatOllama

### Extracting text from PDF manual

In [109]:
from pypdf import PdfReader
import re

def extract_utf8_text(pdf_path):
    reader = PdfReader(pdf_path)
    full_text = ""

    for page in reader.pages:
        raw_text = page.extract_text()
        if raw_text:
            # Encode to UTF-8 and ignore non-UTF-8 characters
            clean_text = raw_text.encode("utf-8", errors="ignore").decode("utf-8")
            full_text += clean_text + "\n"

    return full_text

# Example usage
text = extract_utf8_text("washing_machine_user_guide.pdf")    


def strip_non_text(text):
    # Remove non-printable characters and common placeholders
    return re.sub(r"[^\x20-\x7E\n]", "", text)

safe_text = strip_non_text(text) 



### Collecting the extracted text in file

In [139]:
# As the used pipeline methods are well suited for reading text from file, writing the output to file
with open("user_guide.txt", "w", encoding="utf-8") as file:
    file.write(safe_text)
documents = TextLoader("user_guide.txt").load()

### Creating chunks of text
The whole text is split into several smaller chunks of text to be fed into vector store database. 

In [114]:
splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=20)
chunks = splitter.split_documents(documents)

### Creating embeddings and storing in Chroma database

The chunks of texts are converted into embeddings using `nomic-embed-text` model

In [72]:
oembed = OllamaEmbeddings(base_url="http://localhost:11434", model="nomic-embed-text")

In [115]:
db = Chroma.from_documents(chunks, embedding=oembed)

### Checking the database

In [116]:
query = "appliance must be properly grounded"
dc = db.similarity_search(query)

### Creating chat template with Langchain

In [120]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

In [121]:
template = """Answer the question based only on the following context:

{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

### Using Llama 3.1 model

Using `llama3.1:8b` model to be applied on retrieved context from Chroma DB.

In [128]:
model = ChatOllama(
    model="llama3.1:8b",
    temperature=0
)

### Creating Retriever
The reteiver looks up the question and collects the specific chunks of text based on vector embedding similarity search.

In [123]:
retriever = db.as_retriever()

### Function to combine the similarity search results into one chunk of text

In [124]:
def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])

### Creating the pipeline
The `question` is taken as input is fed into `retriever` retrieves the relevant chunks and the chunks are combined using `format_docs`. The combined chunk is fed into the `prompt` to create template chat which is then passed into the llama3.1 `model`. The output is converted into a usable text format using `StrOutputParser`.

In [145]:
chain = (

    {"context": retriever | format_docs, "question":RunnablePassthrough()}
     | prompt 
    | model 
    | StrOutputParser()
)

### Testing the Application

In [146]:
print("Hi, how can I help you?")
q = input()
chain.invoke(q)

Hi, how can I help you?


 What is 3C


'Check the motor for operation and try restarting the cycle.'