## 🧠💡 Retrieval-Augmented Generation (RAG), enhancing LLM for specific use (mining document from Canadian Ontario government)

### 1 (Base) Pre-set up 🛠️
#### 1.1 Set up environment 🌿

In [None]:
#conda create --name RAG_LLAMA python=3.11.9    #python version can't be or highter than 3.12
#conda remove -n ENV_NAME --all
#conda activate RAG_LLAMA
#conda install -n RAG_LLAMA ipykernel --update-deps --force-reinstall


#### 1.2 Install libraries 📚

In [None]:
!pip install bitsandbytes
!pip install accelerate
!pip install sentence-transformers
!pip install Huggingface_hub
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install git+https://github.com/huggingface/transformers.git
!pip install langchain
!pip install langchain-community
!pip install chromadb
!pip install ipywidgets
!pip install PyPDF2 PyCryptodome

#### 1.3 Check GPU and Torch 🖥️🔥

In [None]:
import torch
torch.__version__

### 3. Load LLM 🤖📥

We are using open-source `llama-3-8B model`, running locally. If you wish to use a larger model like `GPT-4`, you can adjust the code accordingly. Remember to securely handle and protect your API key.

#### 3.1 Login to Hugging Face Hub 🤗🔑

In [None]:
from huggingface_hub import notebook_login

notebook_login()

#### 3.2 Import `llama-3-8B` 🦙📥

In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

LLM_model_id = "meta-llama/Meta-Llama-3-8B"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

LLM_model = AutoModelForCausalLM.from_pretrained(
    LLM_model_id,  # Mistral, same as before
    quantization_config=bnb_config,  # Same quantization config as before
    device_map="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(LLM_model_id, add_bos_token=True)


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


#### 3.3 Test the model & ouput 🧪🖨️

In [9]:
query = "What's the date that Ontario has made changes to the Mining Act and created a new regulation, Ontario Regulation 35/24 which replaced Ontario Regulation 240/00?"
inputs = tokenizer(query, return_tensors="pt")

outputs = LLM_model.generate(**inputs, max_length=1024)
response_text = tokenizer.decode(outputs[0], skip_special_tokens=True, num_return_sequences=1,do_sample=False)

print(response_text)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
  attn_output = torch.nn.functional.scaled_dot_product_attention(


What's the date that Ontario has made changes to the Mining Act and created a new regulation, Ontario Regulation 35/24 which replaced Ontario Regulation 240/00? Is it the date of publication in the Ontario Gazette or the date of coming into force? If it is the date of publication, then how can I find out the date of publication? Thank you.
The date of publication is the date that the regulation was published in the Ontario Gazette. The date of coming into force is the date that the regulation came into force. For the Mining Act, the date of publication is the same as the date of coming into force. The date of publication is usually found on the first page of the regulation. The date of coming into force is usually found on the last page of the regulation.


### 4 Prepare the file & embeddings for Retrieval 📄🔍

#### 4.1 Initialize embedding function 🧮🔢
Begin by setting up the embedding function to transform text data into numerical vectors for efficient retrieval and analysis. Here we use open-source `all-MinLM-L6-v2`.

In [None]:
import chromadb
from chromadb.utils import embedding_functions
from sentence_transformers import SentenceTransformer

embed_model = SentenceTransformer('all-MiniLM-L6-v2')
embedding_function = embedding_functions.SentenceTransformerEmbeddingFunction(model_name='all-MiniLM-L6-v2')

#### 4.2 Create Chromadb collection 🗃️💾

In [None]:
client = chromadb.PersistentClient()
collection = client.create_collection("history_collection")

#### 4.3 Read files 📂📖

In [None]:
import os
import PyPDF2

def read_pdf(file_path):
    with open(file_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        text = ''
        for page in reader.pages:
            text += page.extract_text()
    return text

def read_txt(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        return file.read()

def read_documents(directory):
    documents = []
    for filename in os.listdir(directory):
        file_path = os.path.join(directory, filename)
        if filename.endswith('.pdf'):
            documents.append(read_pdf(file_path))
        elif filename.endswith('.txt'):
            documents.append(read_txt(file_path))
    return documents


# Read documents from a directory
directory = "documents"
documents = read_documents(directory)

#### 4.3 Chunk text ✂️📄

We are using `LangChain` with `RecursiveCharacterTextSplitter` to effectively split text into manageable chunks. 

This combination allows for precise segmentation of text based on characters, ensuring each chunk maintains coherence and context throughout the processing pipeline.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def chunk_documents(documents, chunk_size=256, chunk_overlap=24):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        length_function=len
    )
    chunks = []
    for doc in documents:
        chunks.extend(text_splitter.split_text(doc))
    return chunks


# Chunk the documents
chunks = chunk_documents(documents)

#### 4.4 Embedding the chunks and add to the collections

In [None]:
for i in range(0, len(chunks)):
    chunk = chunks[i]
    embedding = embed_model.encode(chunk).tolist()
    collection.add(
        documents=[chunk],
        embeddings=[embedding],
        ids=[f"chunk_{i}"]
    )
print(f"Added chunk {i}")

### 5 Retrieve-Augmented-Generate 🔍🧠💡

#### 5.1 Retrieve  🔎

To query the chunks in the ChromaDB collection and find the top 5 texts most related to a question, the ranking process typically involves using similarity scores derived from these embeddings.

Here, the ChromaDB collection has a `query` function to select the top 5 chunks with the highest similarity scores as they are deemed most closely related to the question.

In [None]:
def retrieve(query, top_k=5):
    results = collection.query(
        query_texts=[query],
        n_results=top_k
    )
    return results['documents'][0]

The retrieved information is incorporated into the prompt as inputs for the Large Language Model (LLM).

In [None]:
def generate(query, context):
    prompt = f"Respond considering the following context for reference if related:\n\n {context} \n\n\n Instructions: Answer the above question and stop. Do not generate additional text or questions. \n\n\n Question: {query} \n\n\n  Answer:"
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = LLM_model.generate(**inputs, max_length=1024)
    response_text = tokenizer.decode(outputs[0], skip_special_tokens=True, num_return_sequences=1,do_sample=False)
    # Custom stopping logic
    #stop_sequence = "Question:"
    #response_text = response_text.split(stop_sequence)[1].strip()
    return response_text

def rag_answer(query):
    relevant_texts = retrieve(query)
    context = " ".join(relevant_texts)
    answer = generate(query, context)
    return answer


Next, we can explore the augmented generation capabilities of the LLM. 🚀💬

In [None]:
# Example usage
query = "What's the date that Ontario has made changes to the Mining Act and created a new regulation, Ontario Regulation 35/24 which replaced Ontario Regulation 240/00?"
answer = rag_answer(query)
print(answer)

You can tailor the prompt style as per your requirements.  🎨✏️

After generating output, format it and use it as input for the LLM again to evaluate. 🔄🧐

Alternatively, assess the relevance of retrieved content to decide on the need for further online search. 🌐🔍

This method offers clues for topics the **LLM agent** can explore, with our introduced `LangChain` being particularly useful for such tasks. 🕵️‍♂️🔗

👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏
----
We welcome contributions to this project! Feel free to fork the repository and submit a pull request. 🤝✨

Feel free to reach out if you have any questions or suggestions. Let's make this project even more awesome together! 🚀😊

Happy coding! 💻🎉