# Environment Setup

### Install neccessary Library

(Optional) ARXIV for searching and loading documents from ARXIV

In [1]:
!pip install -q -U arxiv


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


RAGAS for RAG Evaluation

In [2]:
!pip install -q -U ragas


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


(Optional) TQDM for progress indicator

In [None]:
!pip install -q -U tqdm

GPT4ALL for Local LLM and Embedding

In [None]:
!pip install gpt4all

In [11]:
!pip install --upgrade --quiet huggingface_hub
!pip install --upgrade --quiet langchain_huggingface


[notice] A new release of pip is available: 24.1.1 -> 24.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip

[notice] A new release of pip is available: 24.1.1 -> 24.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip


### Get Environment Parameters

In [1]:
import os
from dotenv import load_dotenv
load_dotenv()

True

# Pipeline 1 - Embedding

To describe to embedding flow

### Step 1. Loading

In this step, we load data from various sources. Make them ready to ingest.

#### Load data from Arxiv

In [3]:
import arxiv 
client = arxiv.Client()
search = arxiv.Search(
  query = "ReAct for Large Language Model",
  max_results = 10,
  sort_by = arxiv.SortCriterion.SubmittedDate
)

results = client.results(search)
all_results = list(client.results(search))

In [26]:
for r in all_results:
    print(f"{r.title} {r.entry_id}")

AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning http://arxiv.org/abs/2407.07094v1
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation http://arxiv.org/abs/2407.07093v1
V-VIPE: Variational View Invariant Pose Embedding http://arxiv.org/abs/2407.07092v1
General Relativistic effects and the NIR variability of Sgr A* II: A systematic approach to temporal asymmetry http://arxiv.org/abs/2407.07091v1
3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes http://arxiv.org/abs/2407.07090v1
Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic http://arxiv.org/abs/2407.07089v1
Safe and Reliable Training of Learning-Based Aerospace Controllers http://arxiv.org/abs/2407.07088v1
CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation http://arxiv.org/abs/2407.07087v1
Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language

In [15]:
print([r.title for r in all_results])

['AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning', 'FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation', 'V-VIPE: Variational View Invariant Pose Embedding', 'General Relativistic effects and the NIR variability of Sgr A* II: A systematic approach to temporal asymmetry', '3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes', 'Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic', 'Safe and Reliable Training of Learning-Based Aerospace Controllers', 'CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation', 'Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models', 'On some conjectural determinants of Sun involving residues']


In [None]:
#from langchain.document_loaders import ArxivLoader
#base_docs = ArxivLoader(query="ReAct LLM", load_max_docs=5).load()

In [32]:
ARVIX_DOC = os.getenv("ARVIX_DOC") 
for r in all_results:
    r.download_pdf(dirpath=ARVIX_DOC)

### Step 2. Parsing

##### Type 1. text document

In [None]:
from langchain.document_loaders import TextLoader
DOCUMENT = os.getenv("DOCUMENT")
txt_path = DOCUMENT+"rag.txt"
txt_loader = TextLoader(txt_path)
text_documents = txt_loader.load()
#text_documents

##### Type 2. PDF document

We use PyMuPDFLoader in this experiment

In [None]:
from langchain.document_loaders import PyMuPDFLoader
pdf_path = DOCUMENT+ "*.pdf"
pdf_loader = PyMuPDFLoader(pdf_path)
pdf_documents = pdf_loader.load()

In [4]:
from langchain.document_loaders import PyMuPDFLoader
pdf_documents = []
for file in os.listdir(os.getenv("ARVIX_DOC")):
    if file.endswith('.pdf'):
        pdf_path = os.path.join(os.getenv("ARVIX_DOC"), file)
        loader = PyMuPDFLoader(pdf_path)
        pdf_documents.extend(loader.load())

##### Type 3. Batch Loading Directly from source

In [34]:
from langchain.document_loaders import ArxivLoader
batch_docs = ArxivLoader(query="ReAct for Large Language Model",  load_max_docs=10).load()

In [None]:
from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders.pdf import PyMuPDFLoader
from langchain.document_loaders.xml import UnstructuredXMLLoader
from langchain.document_loaders.csv_loader import CSVLoader

# Define a dictionary to map file extensions to their respective loaders
loaders = {
    '.pdf': PyMuPDFLoader,
    '.xml': UnstructuredXMLLoader,
    '.csv': CSVLoader,
}

# Define a function to create a DirectoryLoader for a specific file type
def create_directory_loader(file_type, directory_path):
    return DirectoryLoader(
        path=directory_path,
        glob=f"**/*{file_type}",
        loader_cls=loaders[file_type],
    )

# Create DirectoryLoader instances for each file type
pdf_loader = create_directory_loader('.pdf', os.getenv("ARVIX_DOC"))

# Load the files
pdf_documents = pdf_loader.load()

### Step 3. Chunking

In [18]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
text_chunks = text_splitter.split_documents(text_documents)
#documents[:3]

In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
pdf_chunks = text_splitter.split_documents(pdf_documents)

In [6]:
chunks = pdf_chunks

### Step 4. Vectorizing

Option 1: Using openAI embedding API

In [7]:
from langchain_openai.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

Option 2: Using gpt4all embedding

In [57]:
from langchain_community.embeddings import GPT4AllEmbeddings
model_name = "all-MiniLM-L6-v2.gguf2.f16.gguf"
gpt4all_kwargs = {'allow_download': 'True'}
embeddings = GPT4AllEmbeddings(
    model_name=model_name,
    gpt4all_kwargs=gpt4all_kwargs
)

Downloading: 100%|██████████| 45.9M/45.9M [00:06<00:00, 7.66MiB/s]
Verifying: 100%|██████████| 45.9M/45.9M [00:00<00:00, 855MiB/s]


### Step 5. Storing

#### In Memory vectordb

In [None]:
#from langchain_community.vectorstores import DocArrayInMemorySearch
#vectorstore = DocArrayInMemorySearch.from_documents(chunks, embeddings)

#### Persist the vectordb with Chroma

In [58]:
from langchain.vectorstores import Chroma
persist_directory = os.getenv("ARXIVSTORE_GPT4ALL")

#Create vector database with local embedding method gpt4all. 
#Note different embedding methods will result different vector dimensions and cannot be stored together
#The same embedding method to be used in retrieval pipeline
vectordb = Chroma.from_documents(documents=chunks,  embedding=embeddings, persist_directory=persist_directory)
vectordb.persist()

  warn_deprecated(


# Pipeline 2 - Retrieving & Generating

### Create a Agent

In [1]:
# Define the agent here
class RAGAgent: 
    
    def __init__(self,
                 llm, embeddings, vectordb) -> None:
        from langchain_core.runnables import RunnableParallel, RunnablePassthrough
        from langchain.prompts import ChatPromptTemplate
        from langchain_core.output_parsers import StrOutputParser

        self.llm = llm
        self.embeddings = embeddings
        self.vectordb = vectordb
        self.retriever = vectordb.as_retriever()
        
        setup = RunnableParallel(context=self.retriever, question=RunnablePassthrough())

        template = """
        Answer the question based on the context below. 
        If you can't answer the question, reply "I don't know".

        Context: {context}

        Question: {question}
        """

        prompt = ChatPromptTemplate.from_template(template)
        
        parser = StrOutputParser()

        self.chain = setup | prompt | llm | parser
        
    def invoke(self,question):
        self.chain.invoke(question)


In [2]:
import os
from dotenv import load_dotenv
load_dotenv()

True

### Step 1. Query

In [3]:
user_query = "What is retrieval augmented generation"
#user_query = "Describe the RAG-Sequence Model?"

### Step 2. Search

Need to load from store if there is. Here the on memory vectorstore is used. 
There is opportunity to improve efficiency of search when the knowledgebase gets larger and more complicated (type of sources)

In [4]:
from langchain_community.embeddings import GPT4AllEmbeddings
model_name = "all-MiniLM-L6-v2.gguf2.f16.gguf"
gpt4all_kwargs = {'allow_download': 'True'}
embeddings = GPT4AllEmbeddings(
    model_name=model_name,
    gpt4all_kwargs=gpt4all_kwargs
)

In [5]:
#retriever = vectorstore.as_retriever()

#Load vectordb from persisted store
from langchain.vectorstores import Chroma
persist_directory = os.getenv("ARXIVSTORE_GPT4ALL")
newvectordb = Chroma(persist_directory=persist_directory, embedding_function=embeddings)
retriever = newvectordb.as_retriever()

In [6]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
setup = RunnableParallel(context=retriever, question=RunnablePassthrough())

In [7]:
retriever.invoke(user_query)

[Document(metadata={'author': '', 'creationDate': 'D:20240710005619Z', 'creator': 'LaTeX with hyperref', 'file_path': 'arvix_document\\2407.07087v1.CopyBench__Measuring_Literal_and_Non_Literal_Reproduction_of_Copyright_Protected_Text_in_Language_Model_Generation.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20240710005619Z', 'page': 5, 'producer': 'pdfTeX-1.40.25', 'source': 'arvix_document\\2407.07087v1.CopyBench__Measuring_Literal_and_Non_Literal_Reproduction_of_Copyright_Protected_Text_in_Language_Model_Generation.pdf', 'subject': '', 'title': '', 'total_pages': 23, 'trapped': ''}, page_content='the prompt. In the fact recall task, the prompt in-\nstructs the model to generate a short answer. To\nfacilitate a fair comparison between base models\nand instruction-tuned models, we incorporate an\ninstruction and in-context learning demonstrations\ninto our prompts. Refer to Section A.2 for more\ndetails.\n3.4\nHuman Analysis of Automatic Event\nCopying Evaluation\nTo verify 

### Step 3. Augmented Prompt

In [8]:
from langchain.prompts import ChatPromptTemplate

template = """
Answer the question based on the context below. 
If you can't answer the question, reply "I don't know".

Context: {context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

### Step 4. Response Generating

In [9]:
from langchain_core.output_parsers import StrOutputParser
parser = StrOutputParser()

Option 1: Using on-cloud OpenAI

In [10]:
from langchain_openai.chat_models import ChatOpenAI
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model="gpt-3.5-turbo")

Option 2: Using Local LLM GPT4All

In [74]:
from langchain_community.llms import GPT4All
from langchain_core.callbacks import StreamingStdOutCallbackHandler

In [72]:
local_path = ("C:\\Users\\derek\\Meta-Llama-3-8B-Instruct.Q4_0.gguf")

In [76]:
# Callbacks support token-wise streaming
callbacks = [StreamingStdOutCallbackHandler()]

# Verbose is required to pass to the callback manager
model = GPT4All(model=local_path, verbose=False)
parser = StrOutputParser()
# If you want to use a custom model add the backend parameter
# Check https://docs.gpt4all.io/gpt4all_python.html for supported backends
#model = GPT4All(model=local_path, backend="gptj", callbacks=callbacks, verbose=True)

In [11]:
chain = setup | prompt | model | parser

In [12]:
response = chain.invoke(user_query)
response

"I don't know."

# RAG Evaluation

### Generate synthesis Test Dataset

##### Using RAGAS

In [13]:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from ragas.run_config import RunConfig
from ragas.embeddings.base import BaseRagasEmbeddings, LangchainEmbeddingsWrapper
from ragas.llms import BaseRagasLLM, LangchainLLMWrapper

  from .autonotebook import tqdm as notebook_tqdm


In [14]:
import os
from dotenv import load_dotenv
load_dotenv()
HUGGINGFACEHUB_API_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN")

In [15]:
import nest_asyncio
nest_asyncio.apply()

In [37]:
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en-v1.5")
embeddings = LangchainEmbeddingsWrapper(embeddings)

In [13]:
from langchain_community.chat_models.huggingface import ChatHuggingFace
from langchain_community.llms import HuggingFaceHub
from langchain_core.language_models.chat_models import BaseChatModel

def chat_factory() -> BaseChatModel:

    llm = HuggingFaceHub(
        repo_id="mistralai/Mistral-7B-Instruct-v0.2",
        task="text-generation",
        model_kwargs={
            "max_new_tokens": 512,
            "top_k": 30,
            "temperature": 0.1,
            "repetition_penalty": 1.03,
        },
        huggingfacehub_api_token=HUGGINGFACEHUB_API_TOKEN,
    )
    chat = ChatHuggingFace(llm=llm)
    return chat


In [87]:
# Add custom llms 
generator_llm = chat_factory()
critic_llm = chat_factory()

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


In [16]:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# documents = load your documents

# generator with openai models
generator_llm = ChatOpenAI(model="gpt-4-1106-preview", temperature=0) 
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

In [17]:

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings,
 #   run_config= RunConfig(max_wait=60)
)

# Change resulting question type distribution
distributions = {
    simple: 0.2,
    multi_context: 0.4,
    reasoning: 0.4
}


In [7]:
from langchain.document_loaders import PyMuPDFLoader
pdf_documents = []
for file in os.listdir(os.getenv("ARVIX_DOC")):
    if file.endswith('.pdf'):
        pdf_path = os.path.join(os.getenv("ARVIX_DOC"), file)
        loader = PyMuPDFLoader(pdf_path)
        pdf_documents.extend(loader.load())

In [8]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
chunks = text_splitter.split_documents(pdf_documents)

In [92]:

try:
    testset = generator.generate_with_langchain_docs(pdf_documents, test_size=10, distributions = distributions) 
except Exception as e:
    print (e)

Exception in thread Thread-109:                                  
Traceback (most recent call last):
  File "c:\Users\derek\OneDrive\1 - Technology\Workspace\rag_win\Lib\site-packages\huggingface_hub\utils\_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
  File "c:\Users\derek\OneDrive\1 - Technology\Workspace\rag_win\Lib\site-packages\requests\models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: https://api-inference.huggingface.co/models/HuggingFaceH4/zephyr-7b-beta

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\derek\AppData\Local\Programs\Python\Python312\Lib\threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "c:\Users\derek\OneDrive\1 - Technology\Workspace\rag_win\Lib\site-packages\ragas\executor.py", line 87, in run
    results = self.loop.run_



##### Create Test Dataset by Prompt

In [67]:
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser

question_schema = ResponseSchema(
    name="question",
    description="a question about the context."
)

question_response_schemas = [
    question_schema,
]

In [68]:
question_output_parser = StructuredOutputParser.from_response_schemas(question_response_schemas)
#format_instructions = question_output_parser.get_format_instructions()

In [14]:
from langchain_huggingface import HuggingFaceEndpoint 
#repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
#repo_id = "HuggingFaceH4/zephyr-7b-beta"
repo_id = "meta-llama/Meta-Llama-3-70B-Instruct"

customer_llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    max_length=128,
    temperature=0.5,
    huggingfacehub_api_token=HUGGINGFACEHUB_API_TOKEN,
)

                    max_length was transferred to model_kwargs.
                    Please make sure that max_length is what you intended.


The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to C:\Users\derek\.cache\huggingface\token
Login successful


In [93]:
from langchain.prompts import ChatPromptTemplate

question_generation_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")

'2\nRITUPARNA CHALIHA AND GAUTAM KALITA\nusing quadratic Gauss sums. Following these, Vsemirnov [14, 15] used a sophisticated matrix decompo-\nsition to conﬁrm a challenging conjecture of Chapman [4] on the determinant\n\x0c\n\x0c\n\x0c\n\x0c\n\x12j −i\np\n\x13\x0c\n\x0c\n\x0c\n\x0c\n1≤i,j≤p+1\n2\n.\nIn [11], Sun concentrated on determinants of the form\n\x0c\n\x0c\n\x0c\n\x0c\n\x12f(i, j)\np\n\x13\x0c\n\x0c\n\x0c\n\x0c\n1≤i,j≤p−1\n2\n,\nwhere f(x, y) is a quadratic form, and investigated their quadratic residue properties. In particular, for\np ∤d, Sun [11] studied the determinant\nS(d, p) =\n\x0c\n\x0c\n\x0c\n\x0c\n\x12i2 + dj2\np\n\x13\x0c\n\x0c\n\x0c\n\x0c\n1≤i,j≤p−1\n2\n,\nand proved that\n\x12S(d, p)\np\n\x13\n=\n\uf8f1\n\uf8f4\n\uf8f2\n\uf8f4\n\uf8f3\n\x10\n−1\np\n\x11\n,\nif\n\x10\nd\np\n\x11\n= 1;\n0,\nif\n\x10\nd\np\n\x11\n= −1.\nIn addition, Sun [11] also posed a number of conjectures related to the determinant S(d, p). In recent\nyears, some of these conjectures and their g

In [121]:
from langchain.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

qa_template = """\
You are a University Professor creating a test for advanced students. For each context, create a question that is specific to the context. Avoid creating generic or general questions.

question: a question about the context.

Format the output as JSON with the following keys:
question

context: {context}
"""

prompt_template = ChatPromptTemplate.from_template(template=qa_template)

setup = RunnableParallel(context=RunnablePassthrough())

question_generation_chain = setup | prompt_template | question_generation_llm | question_output_parser

response = question_generation_chain.invoke(pdf_documents[0].page_content)


In [123]:
from tqdm import tqdm

question_ans_context = []

for text in tqdm(pdf_documents):
  try:
    response = question_generation_chain.invoke(text.page_content)
  except Exception as e:
    continue
  response["context"] = text.page_content
  question_ans_context.append(response)

100%|██████████| 169/169 [04:37<00:00,  1.64s/it]


In [131]:
from operator import itemgetter

answer_generation_llm = ChatOpenAI(model="gpt-4-1106-preview", temperature=0)

answer_schema = ResponseSchema(
    name="answer",
    description="an answer to the question"
)

answer_response_schemas = [
    answer_schema,
]

answer_output_parser = StructuredOutputParser.from_response_schemas(answer_response_schemas)

qa_template = """\
You are a University Professor creating a test for advanced students. For each question and context, create an answer.

answer: a answer about the context.

Format the output as JSON with the following keys:
answer

question: {question}
context: {context}
"""

#setup = RunnableParallel(question = RunnablePassthrough(), context=RunnablePassthrough())

prompt_template = ChatPromptTemplate.from_template(template=qa_template)

answer_generation_chain = (
    {"question": itemgetter("question"), "context": itemgetter("context") }
    | prompt_template 
    | answer_generation_llm 
    | answer_output_parser
)
response = answer_generation_chain.invoke({"question":question_ans_context[0]["question"],"context":question_ans_context[0]["context"]})

In [133]:
for record in tqdm(question_ans_context):
  try:
    response = answer_generation_chain.invoke({"question":record["question"],"context":record["context"]})
  except Exception as e:
    continue
  record["answer"] = response["answer"]

100%|██████████| 165/165 [18:04<00:00,  6.57s/it]


In [134]:
question_ans_context[10]

{'question': 'What is the value of Sn,k(d, p)/p according to Lemma 2.6 in the given context?',
 'context': 'ON SOME CONJECTURAL DETERMINANTS OF SUN INVOLVING RESIDUES\n11\nProof of Theorem 1.3. From Lemma 2.6, we have\n\x12Sn,k(d, p)\np\n\x13\n=\n\x12an,k(d, p)\np\n\x132 \x12bn,k(d, p)\np\n\x13\n.\nSince n is odd and p ≡1 (mod 2k), Lemma 2.6 provides\n\x12bn,k(d, p)\np\n\x13\n=\n\x12d\np\n\x13 p−1\n2k\n\x12χk(d)\np\n\x13 n−1\n2\n\x12−1\np\n\x13 p−1\n2k\n=\n\x12−d\np\n\x13 p−1\n2k\n,\nand hence\n\x12Sn,k(d, p)\np\n\x13\n=\n\x12an,k(d, p)\np\n\x132 \x12−d\np\n\x13 p−1\n2k\n.\n(9)\n(a) Let k be even. Since\nχk(d) ≡d\np−1\nk\n≡1 (mod p),\nwe have\nd\np−1\n2\n= (d\np−1\nk )\nk\n2 ≡1 (mod p).\nAs a result,\n\x12\n−d\np\n\x13 p−1\n2k\n= 1,\nand hence (9) yields\n\x12Sn,k(d, p)\np\n\x13\n=\n\x12an,k(d, p)\np\n\x132\n=\n\uf8f1\n\uf8f4\n\uf8f2\n\uf8f4\n\uf8f3\n1,\nif an,k(d, p) ̸= 0;\n0,\nif an,k(d, p) = 0.\nThus we obtain the desired result.\n(b) If k is odd, then it is easy to see that\n\x12\n

In [None]:
import pandas as pd
from datasets import Dataset

question_ans_context = pd.DataFrame(question_ans_context)
question_ans_context = question_ans_context.rename(columns={"answer" : "ground_truth"})

In [152]:
question_ans_context.to_csv("eval_dataset_arvix.csv")

### Evaluation Functions with RAGAS

In [18]:
from datasets import Dataset
eval_dataset = Dataset.from_csv("eval_dataset_arvix.csv")

In [19]:
def create_ragas_dataset(rag_pipeline, retriever, eval_dataset):
  rag_dataset = []
  for row in tqdm(eval_dataset):
    question = row["question"]
    answer = rag_pipeline.invoke(question)
    rag_dataset.append(
        {"question" : question,
         "answer" : answer,
         "contexts" : [doc.page_content for doc in retriever.get_relevant_documents(question)],
         "ground_truth" : row["ground_truth"]
         }
    )
  rag_df = pd.DataFrame(rag_dataset)
  rag_eval_dataset = Dataset.from_pandas(rag_df)
  return rag_eval_dataset

In [20]:
from ragas import evaluate
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
)

def evaluate_ragas_dataset(ragas_dataset,generator_llm):
  result = evaluate(
    ragas_dataset,
    metrics=[
        faithfulness,
    ],
    llm=generator_llm,
    run_config=RunConfig(timeout=300,thread_timeout=300)
  )
  return result

### Evaluate RAG 1

In [15]:

from langchain_community.embeddings import GPT4AllEmbeddings
model_name = "all-MiniLM-L6-v2.gguf2.f16.gguf"
gpt4all_kwargs = {'allow_download': 'True'}
embeddings = GPT4AllEmbeddings(
    model_name=model_name,
    gpt4all_kwargs=gpt4all_kwargs
)

from langchain.vectorstores import Chroma
persist_directory = os.getenv("ARXIVSTORE_GPT4ALL")
newvectordb = Chroma(persist_directory=persist_directory, embedding_function=embeddings)

myAgent = RAGAgent(model,embeddings,newvectordb)

In [21]:
import pandas as pd
from tqdm import tqdm
rag_eval_dataset = create_ragas_dataset(chain, retriever, eval_dataset)
ans_result_pd = rag_eval_dataset.to_pandas()
pd.set_option("display.max_colwidth", 700)
ans_result_pd[["question", "contexts", "answer", "ground_truth"]]

  warn_deprecated(
100%|██████████| 165/165 [02:45<00:00,  1.01s/it]


Unnamed: 0,question,contexts,answer,ground_truth
0,What determinant does the paper study involving residues?,"[with aij ∈R, we denote the determinant by |M| or |[aij]1≤i,j≤n|. Let p be an odd prime and χℓdenotes\na multiplicative character of order ℓmodulo p. For example, χ2(·) =\n\n·\np\n\nis the usual Legendre symbol.\nIn this paper, we study some conjectural determinants involving residues. Determinants with Legendre\nsymbol entries were ﬁrst considered by Lehmer [9], where he used a general method to determine the, [11] Z.-W. Sun, On some determinants with Legendre symbol entries, Finite Fields Appl. 56 (2019), 285–307.\n[12] Z.-W. Sun, Some determinants involving quadratic residues modulo primes, arXiv:2401.14301 (2024).\n[13] Z.-W. Sun, Quadratic residues and related permutations and ide...",The paper studies some conjectural determinants involving residues.,"The paper studies the determinant Sm,k(d, p), which is defined for an odd prime p and integers d, k, m with gcd(p, d) = 1 and 2 ≤ k ≤ (p−1)/2. The determinant is constructed using distinct k-th power residues modulo p, denoted by αi, and is given by Sm,k(d, p) = |(αi − αj)m|1≤i,j≤(p−1)/k. The paper deduces residue properties for this determinant as a generalization of certain results of Sun and proves some of Sun's related conjectures. Specifically, it addresses the conjectures involving the determinants S(1+p−1)/2,2(−1, p)/p and S(3+p−1)/2,2(−1, p)/p, as well as the number of primes p such that p divides Sm+(p−1)/k,k(−1, p), confirming another conjecture related to Sm+(p−1)/2,2(−1, p)."
1,What determinant did Sun study for p ∤d?,"[generalize some of the results of Ren and Sun [10] to the determinant Sn,k(d, p)., ON SOME CONJECTURAL DETERMINANTS OF SUN INVOLVING RESIDUES\n13\nUsing this in (11), we obtain\n\n\nq\nS p−1\nk\n+1,k(−1, p)\np\n\n=\n \n( p−1\nk\n+ 1)( p−1\nk\n+ 2)\np\n! � p−1\nk\n\n!!\np\n!\n\n\n\n\n\nY\n1≤i<j≤p−1\nk\n(αi −αj)\np\n\n\n\n\n.\nHence the result follows.\nCase II: Let p ≡2k + 1 (mod 4k). In this case, we have from (12) that\n\n\n\n\n\n\n\n\np−1\n2k −2\nY\nl=0\n p−1\nk\n+ 1\n2 + l\n\np\n\n\n\n\n\n\n\n\n=\n \n( p−1\nk\n−1)!!\np\n!\n.\nUsing this in (11), we obtain\n\n\nq\nS p−1\nk\n+1,k(−1, p)\np\n\n=\n \n( p−1\nk\n+ 2)\np\n! \n( p−1, arXiv:2407.07085v1 ...","The determinant that Sun studied for p ∤ d is Sn,2(d, p).","Sun studied the determinant S(d, p) = |(i^2 + dj^2)_p| for 1≤i,j≤(p−1)/2, where p is an odd prime not dividing d."
2,What is Conjecture 1.5 in the context of Sun's conjectures?,"[and p ≡1 (mod 4). In addition, Sun [12] also posed a number of conjectures related to the determinant\nSm+ p−1\n2\n,2(−1, p).\nConjecture 1.5. [12, Conjecture 6.3] For any prime p ≡1 (mod 4), we have\n\n\nq\nS1+ p−1\n2\n,2(−1, p)\np\n\n= (−1)|{0<k< p\n4 :( k\np)=−1}| p\n3\n\n.\nConjecture 1.6. [12, Conjecture 6.4] For any prime p ≡1 (mod 4), we have\n\n\nq\nS3+ p−1\n2\n,2(−1, p)\np\n\n= (−1)|{0<k< p\n4 :( k\np)=−1}|\n \np\n4 + (−1)\np−1\n4\n!\n.\nConjecture 1.7. [12, Conjecture 6.5] For any positive odd integer m, the set, some conjectures of Sun related to\n\n\n\nq\nS1+ p−1\n2\n,2(−1, p)\np\n\n\nand\n\n\n\nq\nS3+ p−1\n2\n,2(−1, p)\np\n\n\n.\nIn addition, we invest...","Conjecture 1.5 in the context of Sun's conjectures is: \nFor any prime p ≡1 (mod 4), we have the equation involving determinants.","Conjecture 1.5 states that for any prime p congruent to 1 modulo 4, the Legendre symbol of S1+(p-1)/2,2(-1, p) over p is equal to (-1) raised to the power of the cardinality of the set of all k less than p/4 for which the Legendre symbol (k/p) is -1, times the Legendre symbol of 3 over p."
3,"What is the definition of the e-factorial, denoted by a!(e)?","[)\np\n\n,\notherwise.\nFor k = 2, Theorem 1.9 yields\n\n\nq\nS1+ p−1\n2\n,2(−1, p)\np\n\n=\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n3\np\n  p−1\n2\n!!\np\n \nT( p−1\n2 )\np\n\n,\nif p ≡1 (mod 8);\n\nT( p−1\n2 )\np\n\n,\nif p = 5;\n\n6\np\n  p−3\n2\n!!\np\n \nT( p−1\n2 )\np\n\n,\nif p ≡5 (mod 8) and p ̸= 5.\n(1)\nFrom Lemma 2.3, we note that\n2\np\n\n=\n p−1\n2 !\np\n!\n=\n p−1\n2 !! · p−3\n2 !!\np\n!\n,\nand hence\n p−1\n2 !!\np\n!\n=\n p−3\n2 !!\np\n! 2\np\n\n.\n(2)\nMoreover, Lemma 2.5 and Lemma 2.3 together provide\n \nT\n� p−1\n2\n\np\n!\n=\n2\np\n\n.\n(3), p\n!\n=\n2\np\n\n.\n(3)\nPlugging (2) and (3) in (1), and then using Lemma 2.4 and Theorem 2...",I don't know.,"The e-factorial of a number 'a', denoted by a!(e), is defined as the product of the sequence of numbers from 'a' down to 1, where each term is 'e' less than the previous term, until the term is less than or equal to 'e'. If 'a' is greater than 'e', then a!(e) = a * (a - e)!(e). If 'a' is between 1 and 'e' inclusive, then a!(e) = a. This definition is a generalization of the factorial function, with the standard factorial being the special case where e = 1."
4,What is the value of q when p ≡1 (mod 4k) in Theorem 1.11?,"[k\n+ 1, p−1\nk\n+ 2, · · · , 2(p−1)\nk\n−1} is odd and d ∈Z with χk(d) = 1.\n(a) If k is even, then\nSn,k(d, p)\np\n\n̸= −1.\n(b) If k is odd, then\nSn,k(d, p)\np\n\n̸=\n\n\n\n\n\n−1,\nif p ≡1 (mod 4k);\n\nd\np\n\n,\nif p ≡2k + 1 (mod 4k).\nRemark 1.4. For k = 2, Theorem 1.3 readily provides [10, Theorem 1.2].\nIn [12], Sun showed that\nSm+ p−1\n2\n,2(−1, p) ≡0 (mod p)\nwhen m is even and p ≡3 (mod 4), and Sm+ p−1\n2\n,2(−1, p) is an integer square modulo p when m is odd, −d\np\n p−1\n2k\n= 1,\nand hence (9) yields\nSn,k(d, p)\np\n\n=\nan,k(d, p)\np\n2\n=\n\n\n\n\n\n1,\nif an,k(d, p) ̸= 0;\n0,\nif an,k(d, p) = 0.\nThus we obtain the desired result.\n(b) If k is odd,...",The value of q when p ≡1 (mod 4k) in Theorem 1.11 is \uf8eb\uf8edq.,The value of q in Theorem 1.11 when p ≡1 (mod 4k) is given by the expression q = (k(3k - 1)(4k - 1) / (6k^3 + (3k - 1)(2k - 1)(k - 1))) * ((p-1)/k - 1)!! / T((p-1)/k).
...,...,...,...,...
160,What are the limitations of traditional general models and domain-specific models that AnyTaskTune addresses?,"[compared the performance of AnyTaskTune against various models including closed-source large\nlanguage models (LLMs), open-source LLMs, and domain-specific models. Critically, our experiments\nmaintained a strict separation between training and testing datasets to ensure unbiased evaluation and\nreproducibility of results.\n3.1\nExperimental Setup\nOur experiments were structured as follows:\n• Model Base: We utilized Qwen2-7B [1] as the base model for AnyTaskTune training. This, outputs need to be diverse and comprehensive, businesses require standardized and controllable\nsolutions. For most enterprises and organizations, their needs are highly specific and contextualized,\nwhich cann...","AnyTaskTune addresses the limitations of traditional general models and domain-specific models by emphasizing precision and specificity, which are critical for real-world applications.","The limitations of traditional general models and domain-specific models that AnyTaskTune addresses include a lack of precision and specificity for real-world applications. Traditional general models are often not tailored to the particular characteristics and requirements of specific operational domains, which can lead to suboptimal performance when applied to specialized tasks. Domain-specific models, while more focused, may still not be fine-tuned with the necessary accuracy and efficiency required for complex and nuanced tasks. AnyTaskTune overcomes these limitations by using Explicit Data Sets to create clear and directive input-output pairs, allowing for precise model tailoring. Th..."
161,"Who provided the main ideas, data construction, model training, and paper writing for DataTager?","[Authorship, Credit Attribution, and Acknowledgements\nPlease cite this work as “DataTager(2024)”.\nAuthorship and Contributions\nPaper Writing\n• Jiaxi Cui - Founder, Provided the main ideas, data construction, model training, paper writing.\n• Wentao Zhang - Organized ideas and outline, and contributed to paper writing.\nEngineering\n• Xudong Tong - Software engineering at DataTager.\n• Zhenwei Zhang - Tencent - Software engineering at DataTager.\nOther Contributions, ∗Please cite this work as “DataTager(2024)"". Full authorship contribution statements appear at the end of\nthe document. Correspondence regarding this technical report can be sent to report@datatager.com\nPreprint. Under ...","Jiaxi Cui provided the main ideas, data construction, model training, and paper writing for DataTager.","Jiaxi Cui provided the main ideas, data construction, model training, and paper writing for DataTager."
162,What is the title of the technical report published in 2023 by J. Bai et al.?,"[∗Please cite this work as “DataTager(2024)"". Full authorship contribution statements appear at the end of\nthe document. Correspondence regarding this technical report can be sent to report@datatager.com\nPreprint. Under review.\narXiv:2407.07094v1 [cs.CL] 9 Jul 2024, (11)\nArticle number, page 4 of 13, References\n[1] J. Bai, S. Bai, Y. Chu, Z. Cui, K. Dang, X. Deng, Y. Fan, W. Ge, Y. Han, F. Huang, B. Hui, L. Ji,\nM. Li, J. Lin, R. Lin, D. Liu, G. Liu, C. Lu, K. Lu, J. Ma, R. Men, X. Ren, X. Ren, C. Tan, S. Tan,\nJ. Tu, P. Wang, S. Wang, W. Wang, S. Wu, B. Xu, J. Xu, A. Yang, H. Yang, J. Yang, S. Yang,\nY. Yao, B. Yu, H. Yuan, Z. Yuan, J. Zhang, X. Zhang, Y. Zhang, Z. Zhang, C. Zhou...",I don't know.,Qwen technical report
163,What is the name of the large language model focused on the medical domain?,"[[23] W. Zhu and X. Wang.\nChatmed:\nA chinese medical large language model.\nhttps://github.com/michael-wzhu/ChatMed, 2023.\n11, [2] Z. Bao, W. Chen, S. Xiao, K. Ren, J. Wu, C. Zhong, J. Peng, X. Huang, and Z. Wei. Disc-medllm:\nBridging general large language models and real-world medical consultation, 2023.\n[3] Z. Cai, M. Cao, H. Chen, K. Chen, K. Chen, X. Chen, X. Chen, Z. Chen, Z. Chen, P. Chu,\nX. Dong, H. Duan, Q. Fan, Z. Fei, Y. Gao, J. Ge, C. Gu, Y. Gu, T. Gui, A. Guo, Q. Guo, C. He,\nY. Hu, T. Huang, T. Jiang, P. Jiao, Z. Jin, Z. Lei, J. Li, J. Li, L. Li, S. Li, W. Li, Y. Li, H. Liu, J. Liu,, M. Wiethoff, D. Willner, C. Winter, S. Wolrich, H. Wong, L. Workman, S. Wu, J. Wu, M....",The name of the large language model focused on the medical domain is Chatmed.,The large language model focused on the medical domain mentioned in the context is IvyGPT.


In [54]:
rag_eval_dataset.to_csv("rag1_eval_ds.csv")

Creating CSV from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 64.00ba/s]


428597

In [84]:
rag_eval_dataset = Dataset.from_csv("rag1_eval_ds.csv")
for i in range(len(rag_eval_dataset)):
    a = rag_eval_dataset[i]["contexts"]
    rag_eval_dataset[i]["contexts"] = a.strip('\"')

In [22]:
result_pd = evaluate_ragas_dataset(rag_eval_dataset,generator_llm)
result_pd = result_pd.to_pandas()
pd.set_option("display.max_colwidth", 700)
result_pd

Evaluating:  16%|█▌        | 26/165 [01:17<10:05,  4.36s/it]No statements were generated from the answer.
Evaluating:  19%|█▉        | 31/165 [01:37<09:15,  4.14s/it]No statements were generated from the answer.
Evaluating:  24%|██▎       | 39/165 [02:14<07:15,  3.46s/it]
Exception in thread Thread-13:
Traceback (most recent call last):
  File "C:\Users\derek\AppData\Local\Programs\Python\Python312\Lib\threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "c:\Users\derek\OneDrive\1 - Technology\Workspace\rag_win\Lib\site-packages\ragas\executor.py", line 87, in run
    results = self.loop.run_until_complete(self._aresults())
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\derek\OneDrive\1 - Technology\Workspace\rag_win\Lib\site-packages\nest_asyncio.py", line 98, in run_until_complete
    return f.result()
           ^^^^^^^^^^
  File "C:\Users\derek\AppData\Local\Programs\Python\Python312\Lib\asyncio\futures.py", line 203, in result
    rais

ExceptionInRunner: The runner thread which was running the jobs raised an exeception. Read the traceback above to debug it. You can also pass `raise_exceptions=False` incase you want to show only a warning message instead.