# Building a RAG System for Your Project

**Goal:** To create a virtual consultant that can answer customer questions, provide consultation on jewelry care, material descriptions, and help select jewelry based on customer preferences.

To create an intelligent assistant for a jewelry store that will provide not only general answers to questions but also consult on more specific information (for example, on materials, jewelry care, price determination, selection recommendations, etc.), **Retrieval-Augmented Generation (RAG)** can be applied. RAG is an approach that combines text generation and information retrieval, allowing for effective answering of questions using both the model's knowledge and external data sources (e.g., databases).

In this brief guide, we will create a simple RAG system using an LLM from Meta AI - **Llama 3**, specifically **IlyaGusev/saiga_llama3_8b**, fine-tuned for the Russian language, [link](https://huggingface.co/IlyaGusev/saiga_llama3_8b) to the Hugging Face Hub. To work with unstructured data, we will use the **Unstructured Serverless API** We will use **LangChain** to build the action chain. We will also consider the following databases:
- **FAISS** for the vector store
- **ChromaDB**

In [1]:
from google.colab import drive
drive.mount('/content/drive')

import os
import torch
import json
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredAPIFileLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.llms import HuggingFacePipeline
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.globals import set_llm_cache
from langchain.cache import InMemoryCache
from langchain.cache import GPTCache

# Set the API key for Unstructured
# An API key is required to work with different file formats - [Unstructured API key]
os.environ["UNSTRUCTURED_API_KEY"] = ""

In [2]:
directory_path = "/content/drive/MyDrive/data"
os.makedirs(directory_path, exist_ok=True)

## Faiss

### Creating a FAISS vector store

**Description:** FAISS is a library designed for efficient searching and clustering in large vector datasets. It is optimized for fast nearest neighbor search (NNS).

In [3]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

In [4]:
path = "/content/drive/MyDrive/data/"

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

def load_and_split_documents(path, file_name, text_splitter):
    file_path = os.path.join(path, file_name)
    try:
        loader = UnstructuredAPIFileLoader(file_path)
        data = loader.load()
        docs = text_splitter.split_documents(data)
        return docs
    except Exception as e:
        print(f"Error loading or processing {file_name}: {e}")
        return []


def load_and_decode(file_path):
    # Function to load files and decode text
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            text = f.read()
        return text
    except UnicodeDecodeError:
        # Attempt to decode if the text is initially encoded
        try:
            with open(file_path, 'r', encoding='latin-1') as f:
                text = f.read().encode('latin-1').decode('utf-8')
            return text
        except Exception as e:
            print(f"Error decoding file {file_path}: {e}")
            return None
    except Exception as e:
        print(f"Error reading file {file_path}: {e}")
        return None

# Create a second database with product information
def create_product_documents(file_path):
    with open(file_path, 'r', encoding='utf-8') as f:
        products = json.load(f)
    
    product_docs = []
    for product in products:
        product_string = (
            f"Name: {product['name']}. "
            f"Description: {product['description']}. "
            f"Purpose: {product['purpose']}. "
            f"Material: {product['material']}. "
            f"Price: {product['price']} rubles. "
            f"Link: {product['url']}"
        )
        # Converting elements into Document objects
        doc = Document(
            page_content=product_string,
            metadata={
                "name": product['name'],
                "description": product['description'],
                "price": product['price'],
                "url": product['url']
            }
        )
        product_docs.append(doc)
    return product_docs

### Parsing Input Documents

In [5]:
from langchain.schema import Document

# Loading elements with decoded text
file_paths = [
    os.path.join(path, "уход.txt"),
    os.path.join(path, "уход.docx"),
    os.path.join(path, "носка.docx"),
    os.path.join(path, "на_что_обратить_внимание_при_выборе_ювелирных_изделий.pdf")
]

all_docs = []
for file_name in ["уход.docx", "носка.docx", "на_что_обратить_внимание_при_выборе_ювелирных_изделий.pdf"]:
    all_docs.extend(load_and_split_documents(path, file_name, text_splitter))

# Load and decode the text file separately
text_content = load_and_decode(os.path.join(path, "уход.txt"))
if text_content:
    text_docs = text_splitter.split_text(text_content)
    for t in text_docs:
        all_docs.append(Document(page_content=t, metadata={}))

In [6]:
print(f"Number of documents loaded: {len(all_docs)}")

Number of documents loaded: 26


In [7]:
db_faiss = FAISS.from_documents(all_docs, embeddings)

### Creating a second database with product information

In [8]:
# Example of catalog data
catalog_data = [
    {
        "name": "Diamond Ring",
        "description": "Elegant ring with a high-quality 0.5-carat diamond.",
        "purpose": "Perfect for special occasions, such as engagements or weddings.",
        "material": "White gold 585",
        "price": 120000,
        "url": "https://example.com/ring1"
    },
    {
        "name": "Silver Bracelet",
        "description": "Classic silver bracelet, suitable for daily wear.",
        "purpose": "Daily wear, adding a stylish accent.",
        "material": "Silver 925",
        "price": 5000,
        "url": "https://example.com/bracelet1"
    },
    {
        "name": "Gold Earrings with Sapphires",
        "description": "Refined gold earrings with natural sapphires.",
        "purpose": "Gift for a loved one, evening events.",
        "material": "Yellow gold 750, sapphires",
        "price": 45000,
        "url": "https://example.com/earrings1"
    }
]

# Saving data to a JSON file
with open(os.path.join(path, "catalog.json"), "w", encoding="utf-8") as f:
    json.dump(catalog_data, f, ensure_ascii=False, indent=4)

# Example of loading the catalog
product_docs = create_product_documents(os.path.join(path, "catalog.json"))

db_faiss_products = FAISS.from_documents(product_docs, embeddings)

### Saving and loading DB

In [9]:
# Specify the path to save the database
db_faiss.save_local(os.path.join(path, "faiss_index_care"))
db_faiss_products.save_local(os.path.join(path, "faiss_index_products"))
print("FAISS database saved successfully.")

FAISS database saved successfully.


In [10]:
loaded_db_care = FAISS.load_local(os.path.join(path, "faiss_index_care"), embeddings)
loaded_db_products = FAISS.load_local(os.path.join(path, "faiss_index_products"), embeddings)
print("FAISS database loaded successfully.")

FAISS database loaded successfully.


### Loading the Model

In [11]:
model_id = "IlyaGusev/saiga_llama3_8b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id,
                                             torch_dtype=torch.bfloat16,
                                             device_map="auto")

Loading checkpoint shards: 100%
Loading checkpoint shards: 100%


In [12]:
class PostGenerator:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer

    # Function for text generation
    def generate_text(self, system_prompt, user_prompt):
        # Tokenize the input
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ]
        prompt = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

        # Text generation
        inputs = self.tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(self.model.device)
        outputs = self.model.generate(
            **inputs,
            max_new_tokens=256,
            do_sample=True,
            temperature=0.7,
            top_p=0.95,
            eos_token_id=self.tokenizer.eos_token_id
        )
        # Decode the generated tokens into text
        result = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        
        # Remove the input prompt from the result
        response = result.split('<|end_header_id|>')[-1].strip().split('user')[-1].strip().split('<|eot|>')[-2].strip().split('\n\n')[-1].strip()
        return response


# Set the model to evaluation mode
model.eval()
generator = PostGenerator(model, tokenizer)

### Let's look at generation without RAG

In [13]:
system_prompt = "Ты умный ассистент, который дает советы по уходу за ювелирными украшениями. Твой ответ должен быть дружелюбным, информативным и лаконичным."
user_prompt = "Привет! Расскажи о том как ухаживать за серебрянными украшениями"

print(user_prompt)

# Pipeline for text generation
output = generator.generate_text(system_prompt, user_prompt)
print(output)

Hello! Tell me about how to care for silver jewelry
Привет! Серебро — это прекрасный и благородный металл, но со временем он может потускнеть. Чтобы ваши серебряные украшения всегда выглядели как новые, следуйте простым правилам ухода:

1. **Хранение:** Храните серебро отдельно от других украшений, в сухом и темном месте, желательно в специальных мешочках или шкатулках. Избегайте попадания прямых солнечных лучей.
2. **Контакт с химикатами:** Снимайте серебряные украшения перед контактом с бытовой химией, косметикой, парфюмом, хлорированной водой (бассейн).
3. **Чистка:** Регулярно чистите серебро мягкой тряпочкой. Для более глубокой чистки используйте специальные средства для серебра или народные методы, например, слабый раствор соды или нашатырного спирта. Важно не использовать абразивные средства, которые могут поцарапать поверхность.
4. **Носка:** Носите серебро чаще! Контакт с естественными жирами кожи может предотвратить потемнение. Однако, если вы занимаетесь спортом или выполняе

## ChromaDB

**ChromaDB** is a modern vector database that is mainly used for RAG (Retrieval-Augmented Generation) systems. It combines vector and metadata management.

### Creating a ChromaDB Vector Store

Let's create two collections. The first stores information with instructions for jewelry care. The second contains product information - the catalog.

In [14]:
db_chroma_care = Chroma.from_documents(documents=all_docs, 
                                      embedding=embeddings, 
                                      collection_name="jewelry_care_collection", 
                                      persist_directory=os.path.join(path, "chroma_db_care"))

db_chroma_products = Chroma.from_documents(documents=product_docs, 
                                          embedding=embeddings, 
                                          collection_name="jewelry_catalog_collection",
                                          persist_directory=os.path.join(path, "chroma_db_products"))
print("Documents successfully added to the collection.")

Documents successfully added to the collection.
Documents successfully added to the collection.


### Database Reuse

In [15]:
from langchain.vectorstores import Chroma

# Initializing the client with the same path to load the saved database
client_path = os.path.join(path, "chroma_db_products")
collection_name = "jewelry_catalog_collection"

try:
    # Checking if the collection exists and loading it
    loaded_db_chroma_products = Chroma(persist_directory=client_path, 
                                       embedding_function=embeddings, 
                                       collection_name=collection_name)
    
    # Getting only the collection names
    if loaded_db_chroma_products._collection.count() > 0:
        print(f"Collection '{collection_name}' successfully loaded.")
    else:
        print(f"Collection '{collection_name}' not found.") # Placeholder for invalid data
        
except Exception as e:
    print(f"Error loading ChromaDB: {e}")

Collection 'jewelry_catalog_collection' successfully loaded.


### Reading Data from DB

In [16]:
# Assuming the collection is successfully loaded
results = loaded_db_chroma_products._collection.get(include=['documents', 'metadatas'])

# This is a list of all texts/documents in the collection
all_texts = results['documents'] 
print("All documents from the collection:", all_texts)

All documents from the collection: ['Name: Diamond Ring. Description: Elegant ring with a high-quality 0.5-carat diamond. Purpose: Perfect for special occasions, such as engagements or weddings. Material: White gold 585. Price: 120000 rubles. Link: https://example.com/ring1', 'Name: Silver Bracelet. Description: Classic silver bracelet, suitable for daily wear. Purpose: Daily wear, adding a stylish accent. Material: Silver 925. Price: 5000 rubles. Link: https://example.com/bracelet1', 'Name: Gold Earrings with Sapphires. Description: Refined gold earrings with natural sapphires. Purpose: Gift for a loved one, evening events. Material: Yellow gold 750, sapphires. Price: 45000 rubles. Link: https://example.com/earrings1']


## Creating LLM and RAG Chain

In [17]:
# Creating the LLM via HuggingFacePipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    eos_token_id=tokenizer.eos_token_id
)

llm = HuggingFacePipeline(pipeline=pipe)

# Function to search for text in collections
def extract(user_query):
    care_docs = loaded_db_care.similarity_search(user_query, k=3)
    product_docs = loaded_db_products.similarity_search(user_query, k=3)
    
    care_context = "\n---\n".join([doc.page_content for doc in care_docs])
    
    product_context = []
    for doc in product_docs:
        product_context.append(
            f"Name: {doc.metadata['name']}\nDescription: {doc.page_content}\nPrice: {doc.metadata['price']} rubles\nLink: {doc.metadata['url']}\n"
        )
    product_context_str = "\n---\n".join(product_context)

    return product_context_str, care_context

# Template for query formation
template = (
    "You are a smart assistant specializing in jewelry. Your main tasks are: "
    "1. Answer questions about jewelry, their characteristics, and prices based on the following context {context} "
    "2. Help clients choose suitable products based on the following context {products} "
    "Your goal is to provide helpful, clear, and friendly answers. If you don't know the answer, just say: «I don't know». Do not invent information. When suggesting products, try to be specific and describe how the product can help. If more information is needed for an answer, ask clarifying questions. "
    "Question: {question}"
)

prompt = PromptTemplate(template=template, input_variables=["context", "products", "question"])
llm_chain = LLMChain(prompt=prompt, llm=llm)

# Initializing the semantic cache.
set_llm_cache(GPTCache(
    similarity_threshold=0.9, # Euclidean distance threshold for determining question similarity.
                              # a distance of 0 means the sentences are identical
                              # We only return sentences from the cache below this threshold value
    max_size=1000, # Maximum number of responses that the cache can store.
    eviction_policy=None # Policy for evicting elements from the cache.
                         # This can be any policy, for example, 'FIFO' (First In First Out).
                         # If None, the eviction policy will not be applied.
))

In [18]:
user_query = "How to choose a diamond ring?"
print(user_query)

products, context = extract(user_query)

# Creating a dictionary to pass to the chain
inputs = {
    "context": context, # Passing the context
    "products": products, # Passing the list of products
    "question": user_query # Passing the query itself
}

# Running the chain
response = llm_chain.run(inputs)

# Outputting the response
print(response)

How to choose a diamond ring?
При выборе кольца с бриллиантом важно обратить внимание на несколько ключевых моментов, о которых я с удовольствием расскажу:

1.  **"4 С" (4 C's):** Это международный стандарт оценки бриллиантов:
    * **Carat (Каратность):** Вес камня. Чем больше карат, тем выше цена.
    * **Cut (Огранка):** Качество огранки влияет на то, как камень преломляет свет и "играет". Идеальная огранка - залог блеска.
    * **Clarity (Чистота):** Наличие внутренних и внешних дефектов (включений). Чем чище, тем дороже.
    * **Color (Цвет):** Бриллианты оцениваются по шкале от D (абсолютно бесцветный, самый дорогой) до Z (имеет заметный желтоватый оттенок).

2.  **Металл оправы:** Выбирайте металл, который подходит вам по цвету и долговечности. Классикой являются белое золото, желтое золото или платина. Белое золото и платина лучше подчеркивают бесцветность бриллианта.

3.  **Стиль:** Выберите стиль, который понравится будущей владелице: **солитер** (один крупный камень), **паве