# RAG Tutorial

## 1. Sparse Retrieval

In this first part we will implement a simple search engine using BM25 weighting and the <A HREF="https://whoosh.readthedocs.io/en/latest/intro.html">Whoosh!</A> library. Install Whoosh! using the command `pip install whoosh` if necessary.


In [1]:
from whoosh.index import create_in
from whoosh.fields import Schema, TEXT, ID
from whoosh.analysis import StemmingAnalyzer
from whoosh import index
import os

#analyzer to remove stopwords and do stemming
analyzer = StemmingAnalyzer()

# Define schema
schema = Schema(
    id=ID(stored=True),
    title=TEXT(stored=True, analyzer=analyzer),
    date=TEXT(stored=True),
    content=TEXT(stored=True, analyzer=analyzer)
)

# Create index directory
index_dir = "indexdir"
os.makedirs(index_dir, exist_ok=True)

# Create an index
ix = create_in(index_dir, schema)

# Path to the text files
data_directory = 'data'

# Function to index files
def index_files(data_directory, index):
    writer = index.writer()
    for filename in os.listdir(data_directory):
        if filename.endswith('.txt'):
            file_path = os.path.join(data_directory, filename)
            with open(file_path, 'r', encoding='utf-8') as file:
                lines = file.readlines()
                title = lines[0].strip()
                date = lines[1].strip()
                content = ''.join(lines[2:]).strip()
                writer.add_document(id=filename, title=title, date=date, content=content)
    writer.commit()

# Index the text files
index_files(data_directory, ix)

Now we can search the index with textual queries:

In [2]:
from whoosh.qparser import QueryParser

# Open the index
ix = index.open_dir(index_dir)

# Function to search the index
def search_index(query_str):
    with ix.searcher() as searcher:
        query = QueryParser("content", ix.schema).parse(query_str)
        results = searcher.search(query)
        for result in results:
            print(f"ID: {result['id']}")
            print(f"Title: {result['title']}")
            print(f"Date: {result['date']}")
            print(f"Content: {result['content']}\n")

search_index("LLMs attacks")

ID: data008.txt
Title: SOS! Soft Prompt Attack Against Open-Source Large Language Models
Date: 2024-07-03T14:35:16Z
Content: Open-source large language models (LLMs) have become increasingly popular
among both the general public and industry, as they can be customized,
fine-tuned, and freely used. However, some open-source LLMs require approval
before usage, which has led to third parties publishing their own easily
accessible versions. Similarly, third parties have been publishing fine-tuned
or quantized variants of these LLMs. These versions are particularly appealing
to users because of their ease of access and reduced computational resource
demands. This trend has increased the risk of training time attacks,
compromising the integrity and security of LLMs. In this work, we present a new
training time attack, SOS, which is designed to be low in computational demand
and does not require clean data or modification of the model weights, thereby
maintaining the model's utility intact. T

## 2. Dense Passage Retrieval

In this example, we will use <A HREF="https://sbert.net/">S-Bert</A> to encode query and documents.
Be sure to install the required library: `pip install sentence-transformers`

In [36]:
import os
import numpy as np
from sentence_transformers import SentenceTransformer

# Initialize the model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Path to the text files
data_directory = 'data'

# Function to read the text files and encode them
def encode_documents(data_directory, model):
    documents = []
    ids = []
    for filename in os.listdir(data_directory):
        if filename.endswith('.txt'):
            file_path = os.path.join(data_directory, filename)
            with open(file_path, 'r', encoding='utf-8') as file:
                lines = file.readlines()
                title = lines[0].strip()
                date = lines[1].strip()
                content = ''.join(lines[2:]).strip()
                document = f"{title}\n{date}\n{content}"
                documents.append(document)
                ids.append(filename)
    embeddings = model.encode(documents, convert_to_tensor=True)
    return ids, embeddings.cpu()

# Encode the documents
ids, embeddings = encode_documents(data_directory, model)

# Save the embeddings and IDs
np.save('embeddings.npy', embeddings)
np.save('ids.npy', np.array(ids))


Now, we'll load the embeddings and use them to perform similarity searches with new queries.

In [37]:
import numpy as np
from sentence_transformers import util

# Load the embeddings and IDs
embeddings = np.load('embeddings.npy')
ids = np.load('ids.npy')

# Function to perform a search
def search(query, model, embeddings, ids, top_k=5):
    # Encode the query
    query_embedding = model.encode(query, convert_to_tensor=True)
    
    # Compute cosine similarities between the query and all document embeddings
    cos_scores = util.pytorch_cos_sim(query_embedding.cpu(), embeddings)[0]
    
    # Get the top-k highest scores
    top_results = np.argsort(-cos_scores)[:top_k]
    
    for idx in top_results:
        print(f"ID: {ids[idx]}")
        print(f"Score: {cos_scores[idx]:.4f}")
        with open(os.path.join(data_directory, ids[idx]), 'r', encoding='utf-8') as file:
            print(file.read())
            print("\n" + "="*50 + "\n")

# Example query
query = "LLM attacks"
search(query, model, embeddings, ids)


ID: data008.txt
Score: 0.5462
SOS! Soft Prompt Attack Against Open-Source Large Language Models
2024-07-03T14:35:16Z
  Open-source large language models (LLMs) have become increasingly popular
among both the general public and industry, as they can be customized,
fine-tuned, and freely used. However, some open-source LLMs require approval
before usage, which has led to third parties publishing their own easily
accessible versions. Similarly, third parties have been publishing fine-tuned
or quantized variants of these LLMs. These versions are particularly appealing
to users because of their ease of access and reduced computational resource
demands. This trend has increased the risk of training time attacks,
compromising the integrity and security of LLMs. In this work, we present a new
training time attack, SOS, which is designed to be low in computational demand
and does not require clean data or modification of the model weights, thereby
maintaining the model's utility intact. The att

## 3. Using LLMs to explain results

Now we will retrieve and then ask a LLM to validate the result:

The LLM will tell if the result is good and why

First of all, we need to install LLamaIndex and <A HREF="https://ollama.com/">Ollama</A> to use a LLM locally

Ollama is a tool to help you get set up with LLMs locally (currently supported on OSX and Linux).

After installing Ollama, to download the Llama3 model just do `ollama pull llama3`.

Check the full list of models <A HREF="https://ollama.com/library">here</A>.

Note that for a model with 7B of parameters, you need a machine with 16GB of RAM (or GPU RAM), and larger models require more RAM.

In [None]:
!pip install llama-index

In [None]:
!pip install llama-index-llms-ollama

First of all, be sure that Ollama is running (type `ollama serve` in the console) and then let's load the llm:

In [52]:
from llama_index.llms.ollama import Ollama

llm = Ollama(model="llama3", request_timeout=360.0)

And we can query the llm like this:

In [47]:
response = llm.complete("What is a LLM?")
print(response)

 An LLM, or Master of Laws, is an advanced, postgraduate academic degree pursued by those who have already earned a first law degree (such as a Bachelor of Laws or Juris Doctor). It provides specialization in various areas of law and is often pursued by students looking to gain expertise in a specific field, enhance their career prospects, or engage in academic research. The duration of an LLM program varies depending on the institution and country, but it typically takes one to two years to complete.


Now we can implement the explanations for the output of our retrieval system

In [43]:
# Function to perform a search with llm_validation
def search_val(query, model, embeddings, ids, top_k=3):
    # Encode the query
    query_embedding = model.encode(query, convert_to_tensor=True)
    
    # Compute cosine similarities between the query and all document embeddings
    cos_scores = util.pytorch_cos_sim(query_embedding.cpu(), embeddings)[0]
    
    # Get the top-k highest scores
    top_results = np.argsort(-cos_scores)[:top_k]
    
    for idx in top_results:
        print(f"ID: {ids[idx]}")
        print(f"Score: {cos_scores[idx]:.4f}")
        with open(os.path.join(data_directory, ids[idx]), 'r', encoding='utf-8') as file:
            print(file.read())
            print("LLM validation:")
            response = llm.complete('Is the following document: "'+file.read()+'" relevant to the query "'+query+'" and why?')
            print(response)
            print("\n" + "="*50 + "\n")

In [53]:
# Example query
query = "documents about attacks to extract information from Large Language Models"
search_val(query, model, embeddings, ids)

ID: data008.txt
Score: 0.4911
SOS! Soft Prompt Attack Against Open-Source Large Language Models
2024-07-03T14:35:16Z
  Open-source large language models (LLMs) have become increasingly popular
among both the general public and industry, as they can be customized,
fine-tuned, and freely used. However, some open-source LLMs require approval
before usage, which has led to third parties publishing their own easily
accessible versions. Similarly, third parties have been publishing fine-tuned
or quantized variants of these LLMs. These versions are particularly appealing
to users because of their ease of access and reduced computational resource
demands. This trend has increased the risk of training time attacks,
compromising the integrity and security of LLMs. In this work, we present a new
training time attack, SOS, which is designed to be low in computational demand
and does not require clean data or modification of the model weights, thereby
maintaining the model's utility intact. The att

The provided document is not directly related to the query "documents about attacks to extract information from Large Language Models".

Here's why:

1. The document appears to be a technical report on an attack against a specific type of neural network, namely, Generative Adversarial Networks (GANs).
2. While GANs are a type of deep learning model that can process and generate human-like text, they are not typically referred to as "Large Language Models".
3. The query is specifically focused on attacks against Large Language Models, which suggests that the document does not directly address this topic.

However, it's possible that some of the techniques or ideas presented in the report could be relevant or applicable to attacks on Large Language Models.


ID: data007.txt
Score: 0.3378
Investigating Decoder-only Large Language Models for Speech-to-text
  Translation
2024-07-03T14:42:49Z
Large language models (LLMs), known for their exceptional reasoning
capabilities, generalizability, 

## 4. LlamaIndex

In this part of the tutorial we will use LLamaIndex to perform RAG on a small set of documents.

LLamaIndex uses its own embeddings, so we will need to import them.

To import `llama_index.embeddings.huggingface`, you should run `pip install llama-index-embeddings-huggingface`.

In [None]:
!pip install llama-index-embeddings-huggingface

Now we can start creating the Vector Store from the data contained in the `data` directory. This directory contains a selection of abstracts from ArXiv on ML and AI topics in `csv` format.

In [73]:
!export OPENAI_API_KEY=

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [74]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.llms.openai import OpenAI

documents = SimpleDirectoryReader("data").load_data()

# bge-base embedding model
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

# ollama
#Settings.llm = Ollama(model="mistral", request_timeout=360.0)

#chatGPT
Settings.llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
Settings.chunk_size = 512

index = VectorStoreIndex.from_documents(
    documents,
)

In [75]:
query_engine = index.as_query_engine()


In [76]:
response = query_engine.query("How Does Quantization Affect Multilingual LLMs? What article should I read about this topic?")

Retrying llama_index.llms.openai.base.OpenAI._chat in 0.07482558801524175 seconds as it raised APIConnectionError: Connection error..
Retrying llama_index.llms.openai.base.OpenAI._chat in 0.43401264399519013 seconds as it raised APIConnectionError: Connection error..

KeyboardInterrupt



In [65]:
print(response)

 To find an article regarding how quantization affects multilingual Large Language Models (LLMs), it would be beneficial to search for research papers that discuss the impact of quantization on language models specifically, and preferably in a multilingual context. The provided context does not directly address this specific topic, but you could look for articles related to "Quantization and Multilingual Large Language Models" or "Impact of Quantization on Multilingual LLMs". You may find relevant studies by exploring academic databases like Google Scholar, IEEE Xplore, ACM Digital Library, or arXiv.org.
