# GraphRAG

## Import packages

In [28]:
! pip install --quiet nltk numpy pandas unidecode scikit-learn tqdm llm-blender rouge-score xmltodict arxiv biopython rank_bm25
! pip install --quiet langchain langchain-core langchain-community langchain_experimental langchain-openai langchain-chroma langchain_mistralai langgraph langchainhub

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [29]:
import os
import re
import nltk
import string
import numpy as np
import pandas as pd
from unidecode import unidecode
from sklearn.metrics.pairwise import cosine_similarity
from tqdm import tqdm
from pathlib import Path
import pickle
from rouge_score import rouge_scorer
import json
import llm_blender
from operator import itemgetter
import operator
from dotenv import load_dotenv
from getpass import getpass
from typing import List, Annotated
from typing_extensions import TypedDict
from pydantic import BaseModel, Field
from Bio import Entrez, SeqIO
import torch

from langchain_core.callbacks import CallbackManagerForRetrieverRun
from langchain_core.retrievers import BaseRetriever
from langchain.schema import Document
from langchain_community.document_loaders import PDFMinerLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_community.embeddings import OllamaEmbeddings
from langchain_mistralai.chat_models import ChatMistralAI
from langchain_openai import ChatOpenAI
from langchain.embeddings.cache import CacheBackedEmbeddings
from langchain.storage import LocalFileStore
from langchain_community.llms import Ollama
from langgraph.graph import START, END, StateGraph
from langchain_core.output_parsers import PydanticOutputParser
from langchain.output_parsers import RetryOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnableLambda, RunnableParallel
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.retrievers import PubMedRetriever, ArxivRetriever, BM25Retriever
from langchain_community.tools.tavily_search import TavilySearchResults

## Disable warnings

In [30]:
import warnings
warnings.filterwarnings('ignore')

## Setup environment variables

You have to define the following environment variables in the `.env` file, terminal environment, or input field within this Jupyter notebook:
1. MISTRAL_API_KEY
2. OPENAI_API_KEY
3. OPENAI_PROXY
4. TAVILY_API_KEY
5. ENTREZ_EMAIL

## Import packages

In [31]:
env_variables = [
  'MISTRAL_API_KEY',
  'OPENAI_API_KEY',
  'OPENAI_PROXY',
  'TAVILY_API_KEY',
  'ENTREZ_EMAIL',
]

load_dotenv()

for key in env_variables:
  value = os.getenv(key)

  if value is None:
    value = getpass(key)

  os.environ[key] = value

## Setup metrics

### Download NLTK dictionaries

These dictionaries are needed for further text preprocessing.

In [32]:
dict_ids = [
  'punkt_tab',
  'punkt',
  'stopwords',
  'wordnet',
]

for dict_id in dict_ids:
  nltk.download(dict_id, quiet=True)

### Text preprocessing

Define a function for text preprocessing, which is an important step before calculating any metrics. This preprocessing function will help in cleaning the text data, making it ready for further analysis. The preprocessing involves several steps:
1. Lowercasing
2. Stopwords removal
3. Lemmatization
4. Remove accents from characters

In [33]:
lemmatizer = nltk.stem.WordNetLemmatizer()

def preprocess(corpus: str) -> str:
  corpus = corpus.lower()
  stopset = nltk.corpus.stopwords.words('english') + nltk.corpus.stopwords.words('russian') + list(string.punctuation)
  tokens = nltk.word_tokenize(corpus)
  tokens = [t for t in tokens if t not in stopset]
  tokens = [lemmatizer.lemmatize(t) for t in tokens]
  corpus = ' '.join(tokens)
  corpus = unidecode(corpus)
  return corpus

### Embedding Initialization

Here we are initializing the Llama 3 embeddings model. The `OllamaEmbeddings` class is a component of the Ollama library, a set of pre-trained language models. This model is capable of embedding corpora of any length into a 4096-dimensional vector.

The use of `OllamaEmbeddings` requires the installation of a local Ollama server, which can be found at https://ollama.com.

In [34]:
embeddings = OllamaEmbeddings(model='llama3.1')
store = LocalFileStore("./.embeddings_cache")

cached_embeddings = CacheBackedEmbeddings.from_bytes_store(
  embeddings,
  store,
  namespace=embeddings.model,
)

### Average embeddings cosine similarity metric

This function calculates the average cosine similarity between expected answers and LLM predicted answers using their respective embeddings. Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them:

$$
K(a, b) = \frac{\sum \limits_{i=1}^n a_i b_i}{\sqrt{\sum \limits_{i=1}^n a_i^2} \cdot \sqrt{\sum \limits_{i=1}^n b_i^2}}
$$

In [35]:
def embeddings_cosine_sim_metric(expected_answers: list[str], predicted_answers: list[str]) -> float:
  results = []

  for expected_answer, predicted_answer in zip(expected_answers, predicted_answers):
    expected_answer = preprocess(expected_answer)
    predicted_answer = preprocess(predicted_answer)

    expected_embedding = np.array(cached_embeddings.embed_query(expected_answer))
    predicted_embedding = np.array(cached_embeddings.embed_query(predicted_answer))

    sim = cosine_similarity(
      expected_embedding.reshape(1, -1),
      predicted_embedding.reshape(1, -1),
    )[0][0]

    results.append(sim)

  return np.mean(results)

In [36]:
smoothie_f = nltk.translate.bleu_score.SmoothingFunction().method4

def bleu_metric(expected_answers, predicted_answers):
  scores = []

  for expected_answer, predicted_answer in zip(expected_answers, predicted_answers):
    expected_answer = preprocess(expected_answer)
    predicted_answer = preprocess(predicted_answer)

    predicted_tokens = nltk.word_tokenize(predicted_answer)
    expected_tokens = [nltk.word_tokenize(expected_answer)]

    score = nltk.translate.bleu_score.sentence_bleu(
      expected_tokens,
      predicted_tokens,
      smoothing_function=smoothie_f,
    )

    scores.append(score)

  return np.mean(scores)

In [37]:
rogue_1_scorer = rouge_scorer.RougeScorer(['rouge1'], use_stemmer=True)

def rogue_1_metric(expected_answers, predicted_answers):
  scores = []

  for expected_answer, predicted_answer in zip(expected_answers, predicted_answers):
    expected_answer = preprocess(expected_answer)
    predicted_answer = preprocess(predicted_answer)

    result = rogue_1_scorer.score(expected_answer, predicted_answer)

    scores.append(result['rouge1'])

  return np.mean(scores)

In [38]:
rogue_l_scorer = rouge_scorer.RougeScorer(['rougeL'], use_stemmer=True)

def rogue_l_metric(expected_answers, predicted_answers):
  scores = []

  for expected_answer, predicted_answer in zip(expected_answers, predicted_answers):
    expected_answer = preprocess(expected_answer)
    predicted_answer = preprocess(predicted_answer)

    result = rogue_l_scorer.score(expected_answer, predicted_answer)

    scores.append(result['rougeL'])

  return np.mean(scores)

## Load documents

In [39]:
docs_dir = Path('./docs')
docs_cache_dir = Path('./.docs_cache')
raw_docs_pkl_path = docs_cache_dir / 'parsed_docs_cache.pkl'

if os.path.exists(raw_docs_pkl_path):
  with open(raw_docs_pkl_path, 'rb') as f:
    docs = pickle.load(f)
else:
  docs = []

  for file in docs_dir.iterdir():
    file_docs = PDFMinerLoader(file, concatenate_pages=False).load()
    for doc in file_docs:
      doc.page_content = unidecode(doc.page_content)
      page = doc.metadata['page']
      doc.metadata['source'] = f'{file.stem} ({page})'
    docs.extend(file_docs)

  with open(raw_docs_pkl_path, 'wb') as f:
    pickle.dump(docs, f)

len(docs)

4438

## Split documents

In [40]:
splitted_docs_pkl_path = docs_cache_dir / 'splitted_docs_cache.pkl'

if os.path.exists(splitted_docs_pkl_path):
  with open(splitted_docs_pkl_path, 'rb') as f:
    splitted_docs = pickle.load(f)
else:
  text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=750,
    chunk_overlap=250,
    length_function=len,
    is_separator_regex=False,
    separators=[
      '.',
      '\uff0e', # Fullwidth full stop
      '\u3002', # Ideographic full stop
      '\n\n',
    ],
  )
  splitted_docs = text_splitter.create_documents([doc.page_content for doc in docs])

  with open(splitted_docs_pkl_path, 'wb') as f:
    pickle.dump(splitted_docs, f)

len(splitted_docs)

35663

## Setup vector store

In [41]:
vector_store = Chroma(
  collection_name='neurorag',
  embedding_function=cached_embeddings,
  persist_directory='./chroma_db'
)
retriever = vector_store.as_retriever()

## Define JSON extractor

In [42]:
def extract_json(response):
  json_pattern = r'\{.*?\}'
  match = re.search(json_pattern, response, re.DOTALL)

  if match:
    return match.group().strip().replace('\\\\', '\\')

  return response

## Build LLM

In [43]:
llm = Ollama(model='llama3.1', temperature=0)

## Build chains

### Route chain

In [44]:
class RouteQuery(BaseModel):
  sources: List[str] = Field(
    description='Given a user question select the retrieval methods you consider the most appropriate for addressing this question. You may also return an empty array if no methods are required.',
  )

route_parser = PydanticOutputParser(pydantic_object=RouteQuery)
route_retry_parser = RetryOutputParser.from_llm(
  parser=route_parser,
  llm=llm,
  max_retries=3,
)

route_template = """
You are an expert at selecting retrieval methods.
Given a user question select the retrieval methods you consider the most appropriate for addressing user question.
You may also return an empty array if no methods are required.

Possible retrieval methods:
1. The "vectorstore" retriever contains documents related to neurobiology and medicine. Use the vectorstore for questions on these topics.
2. The "pubmed" retriever contains biomedical literature and research articles. It is particularly useful for answering detailed questions about medical research, clinical studies, and scientific discoveries.
3. The "arxiv" retriever contains preprints of research papers across various scientific fields, including physics, mathematics, computer science, and biology. Use the arxiv for questions on recent scientific research and theoretical studies in these areas.
4. The "ncbi_protein" retriever contains protein sequence and functional information. Use the NCBI protein DB for questions related to protein sequences, structures, and functions.
5. The "ncbi_gene" retriever contains gene sequence and functional information. Use the NCBI gene DB for questions related to gene sequences, structures, and functions.

{format_instructions}

User question:
{question}
"""
route_prompt = PromptTemplate(
  template=route_template,
  input_variables=['question'],
  partial_variables={'format_instructions': route_parser.get_format_instructions()},
)

question_router = RunnableParallel(
  completion=route_prompt | llm | extract_json, prompt_value=route_prompt
) | RunnableLambda(lambda x: route_retry_parser.parse_with_prompt(**x))
print(question_router.invoke({'question': 'Who will the Bears draft first in the NFL draft?'}))
print(question_router.invoke({'question': 'What are the functions of the oculomotor nerve?'}))

sources=[]
sources=['vectorstore', 'ncbi_gene']


### Grade documents chain

In [45]:
class GradeDocuments(BaseModel):
  binary_score: str = Field(description="Documents are relevant to the question, 'yes' or 'no'")

docs_grader_parser = PydanticOutputParser(pydantic_object=GradeDocuments)
docs_grader_retry_parser = RetryOutputParser.from_llm(
  parser=docs_grader_parser,
  llm=llm,
  max_retries=3,
)

docs_grader_template = """
You are a grader assessing relevance of a retrieved document to a user question.
If the document contains keyword(s) or semantic meaning related to the question, grade it as relevant.
Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question.

{format_instructions}

User question:
{question}

Retrieved document:
{document}
"""
docs_grader_prompt = PromptTemplate(
  template=docs_grader_template,
  input_variables=['document', 'question'],
  partial_variables={'format_instructions': docs_grader_parser.get_format_instructions()},
)

docs_grader_chain = RunnableParallel(
  completion=docs_grader_prompt | llm | extract_json, prompt_value=docs_grader_prompt
) | RunnableLambda(lambda x: docs_grader_retry_parser.parse_with_prompt(**x))
docs_grader_chain.invoke({'question': 'What is the color of the sky?', 'document': 'The color of the sky is blue'})

GradeDocuments(binary_score='yes')

### Hallucinations chain

In [46]:
class GradeHallucinations(BaseModel):
  binary_score: str = Field(description="Answer is grounded in the facts, 'yes' or 'no'")

hallucination_parser = PydanticOutputParser(pydantic_object=GradeHallucinations)
hallucination_retry_parser = RetryOutputParser.from_llm(
  parser=hallucination_parser,
  llm=llm,
  max_retries=3,
)

hallucination_template = """
You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts. \n
Give a binary score 'yes' or 'no'. 'Yes' means that the answer is grounded in / supported by the set of facts."

{format_instructions}

Set of facts:
{documents}

LLM generation:
{generation}
"""
hallucination_prompt = PromptTemplate(
  template=hallucination_template,
  input_variables=['question', 'generation'],
  partial_variables={'format_instructions': hallucination_parser.get_format_instructions()},
)

hallucination_grader = RunnableParallel(
  completion=hallucination_prompt | llm | extract_json, prompt_value=hallucination_prompt
) | RunnableLambda(lambda x: hallucination_retry_parser.parse_with_prompt(**x))
print(hallucination_grader.invoke({'documents': ['Sky is blue'], 'generation': 'The color of the sky is blue'}))

binary_score='yes'


### Answer grade chain

In [47]:
class GradeAnswer(BaseModel):
  binary_score: str = Field(description="Answer addresses the question, 'yes' or 'no'")

grade_parser = PydanticOutputParser(pydantic_object=GradeAnswer)
grade_retry_parser = RetryOutputParser.from_llm(
  parser=grade_parser,
  llm=llm,
  max_retries=3,
)

grade_template = """
You are a grader assessing whether an answer addresses / resolves a question. \n
Give a binary score 'yes' or 'no'. 'yes' means that the answer resolves the question.

User question:
{question}

LLM generation:
{generation}

{format_instructions}
"""
grade_prompt = PromptTemplate(
  template=grade_template,
  input_variables=['question', 'generation'],
  partial_variables={'format_instructions': grade_parser.get_format_instructions()},
)

answer_grader = RunnableParallel(
  completion=grade_prompt | llm | extract_json, prompt_value=grade_prompt
) | RunnableLambda(lambda x: grade_retry_parser.parse_with_prompt(**x))
print(answer_grader.invoke({"question": "What is the order of the cranial nerves?", 'generation': 'I do not know.'}))

binary_score='no'


### HyDE chain

In [48]:
hyde_template = """
Please write a scientific paper passage to answer the question

Question: {question}

Passage:
"""
hyde_prompt = ChatPromptTemplate.from_template(hyde_template)
hyde_chain = hyde_prompt | llm | StrOutputParser()

hyde_chain.invoke({"question": 'What is the order of the cranial nerves ?'})

"Here's a scientific paper-style passage answering the question:\n\n**Title:** The Cranial Nerve Order: A Review and Classification\n\n**Abstract:**\n\nThe cranial nerves are a complex group of 12 pairs of nerves that arise directly from the brain, playing crucial roles in various physiological functions. Despite their importance, the order of these nerves has been a subject of interest for centuries. This review aims to provide an overview of the classification and nomenclature of the cranial nerves, highlighting their distinct characteristics and functions.\n\n**Introduction:**\n\nThe cranial nerves are a unique group of nerves that emerge from the brain, serving as the primary means of communication between the central nervous system and various peripheral structures. The order of these nerves has been a topic of debate among neuroscientists, with different classification systems proposed over the years. In this review, we will present an updated classification scheme for the crania

### Step-back

In [49]:
class StepBackAnswer(BaseModel):
  step_back: str = Field(description="Given the original query, generate a step-back query that is more general and can help retrieve relevant background information.")


step_back_parser = PydanticOutputParser(pydantic_object=StepBackAnswer)
step_back_retry_parser = RetryOutputParser.from_llm(
  parser=step_back_parser,
  llm=llm,
  max_retries=3,
)

step_back_template = """
You are an AI assistant tasked with generating broader, more general queries to improve context retrieval in a RAG system.
Given the original query, generate a step-back query that is more general and can help retrieve relevant background information.

{format_instructions}

Original query: {question}

Step-back query:
"""
step_back_prompt = PromptTemplate(
  template=step_back_template,
  input_variables=['question'],
  partial_variables={'format_instructions': step_back_parser.get_format_instructions()},
)

step_back_chain = RunnableParallel(
  completion=step_back_prompt | llm | extract_json, prompt_value=step_back_prompt
) | RunnableLambda(lambda x: step_back_retry_parser.parse_with_prompt(**x))
print(step_back_chain.invoke({"question": "What is Benedict’s syndrome?"}))

step_back='What are rare genetic disorders that affect the nervous system?'


### Query Rewriting

In [50]:
class RewriteQueryAnswer(BaseModel):
  rewritten_query: str = Field(description="Given the original query, rewrite it to be more specific, detailed, and likely to retrieve relevant information.")


rewrite_query_parser = PydanticOutputParser(pydantic_object=RewriteQueryAnswer)
rewrite_query_retry_parser = RetryOutputParser.from_llm(
  parser=rewrite_query_parser,
  llm=llm,
  max_retries=3,
)

rewrite_query_template = """
You are an AI assistant tasked with reformulating user queries to improve retrieval in a RAG system.
Given the original query, rewrite it to be more specific, detailed, and likely to retrieve relevant information.

{format_instructions}

Original query: {question}

Rewritten query:
"""
rewrite_query_prompt = PromptTemplate(
  template=rewrite_query_template,
  input_variables=['question'],
  partial_variables={'format_instructions': rewrite_query_parser.get_format_instructions()},
)

rewrite_query_chain = RunnableParallel(
  completion=rewrite_query_prompt | llm | extract_json, prompt_value=rewrite_query_prompt
) | RunnableLambda(lambda x: rewrite_query_retry_parser.parse_with_prompt(**x))
print(rewrite_query_chain.invoke({'question': 'What is the order of the cranial nerves?'}))

rewritten_query='What are the 12 cranial nerves listed in anatomical order, including their functions and any notable characteristics?'


### Decomposition

In [51]:
class DecompositionAnswer(BaseModel):
  subqueries: List[str] = Field(description="Given the original query, decompose it into 2-4 simpler sub-queries as json array of strings")

decomposition_parser = PydanticOutputParser(pydantic_object=DecompositionAnswer)
decomposition_retry_parser = RetryOutputParser.from_llm(
  parser=decomposition_parser,
  llm=llm,
  max_retries=3,
)

decomposition_template = """
You are an AI assistant tasked with breaking down complex queries into simpler sub-queries for a RAG system.
Given the original query, decompose it into 2-4 simpler sub-queries that, when answered together, would provide a comprehensive response to the original query.

Original query: {question}

example: What are the impacts of climate change on the environment?

Sub-queries:
1. What are the impacts of climate change on biodiversity?
2. How does climate change affect the oceans?
3. What are the effects of climate change on agriculture?
4. What are the impacts of climate change on human health?

{format_instructions}
"""
decomposition_prompt = PromptTemplate(
  template=decomposition_template,
  input_variables=['question'],
  partial_variables={'format_instructions': decomposition_parser.get_format_instructions()},
)

decomposition_chain = RunnableParallel(
  completion=decomposition_prompt | llm | extract_json, prompt_value=decomposition_prompt
) | RunnableLambda(lambda x: decomposition_retry_parser.parse_with_prompt(**x))
print(decomposition_chain.invoke({"question": "What is Benedict’s syndrome?"}))

subqueries=["What are the symptoms of Benedict's syndrome?", "What are the causes of Benedict's syndrome?", "How is Benedict's syndrome diagnosed?", "What are the treatment options for Benedict's syndrome?"]


### RAG chain

In [52]:
device = (
  'cuda'
  if torch.cuda.is_available()
  else 'mps'
  if torch.backends.mps.is_available()
  else 'cpu'
)
blender = llm_blender.Blender()
blender.loadranker('llm-blender/PairRM', device=device)
blender.loadfuser('llm-blender/gen_fuser_3b', device=device)



Successfully loaded ranker from  /home/super-pc2/.cache/huggingface/hub/llm-blender/PairRM


In [53]:
rag_template = """
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.

Question: {question}

Context:

{context}

Answer:
"""
rag_prompt = PromptTemplate.from_template(rag_template)

gpt_llm = ChatOpenAI(model='gpt-4o-mini', temperature=0)
openbio_llm = Ollama(model='taozhiyuai/openbiollm-llama-3:70b_q2_k', temperature=0)
biomistral_llm = Ollama(model='cniongolo/biomistral', temperature=0)

gpt_chain = rag_prompt | gpt_llm | StrOutputParser()
openbio_chain = rag_prompt | openbio_llm | StrOutputParser()
biomistral_chain = rag_prompt | biomistral_llm | StrOutputParser()

def fuse_generations(dict):
  question = dict['question']

  gpt_res = dict['gpt_res']
  openbio_res = dict['openbio_res']
  biomistral_res = dict['biomistral_res']
  answers = [gpt_res, openbio_res, biomistral_res]

  fuse_generations, ranks = blender.rank_and_fuse(
    [question],
    [answers],
    instructions=['keep the similar length of the output as the candidates.'],
    return_scores=False,
    batch_size=1,
    top_k=5,
  )
  return fuse_generations[0]

rag_chain = (
  {
    'gpt_res': gpt_chain,
    'openbio_res': openbio_chain,
    'biomistral_res': biomistral_chain,
    'question': itemgetter('question')
  }
  | RunnableLambda(fuse_generations)
)

final_res = rag_chain.invoke({"context": '', "question": 'What is subunit composition of NMDA receptors and role of each subunit?'})
print(final_res)

Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 11.82it/s]
Fusing candidates: 100%|██████████| 1/1 [00:01<00:00,  1.22s/it]

The NMDA receptor is composed of two subunits, GluN1 and GluN2. The GluN1 subunit is essential for the receptor's function and binds glycine, while the GluN2 subunit binds glutamate and determines the receptor's pharmacological properties and kinetics. Each subunit plays a crucial role in the receptor's function.





### Web Search Chain

In [54]:
web_search_tool = TavilySearchResults(k=5)

### PubMed Retriever

In [55]:
pub_med_retriever = PubMedRetriever()
pub_med_retriever.invoke('What is the order of the cranial nerves?')

[]

### Arxiv Retriever

In [56]:
arxiv_retriever = ArxivRetriever(load_max_docs=3, get_ful_documents=True)
arxiv_retriever.invoke('What is the order of the cranial nerves?')

[Document(metadata={'Entry ID': 'http://arxiv.org/abs/1912.10601v2', 'Published': datetime.date(2021, 3, 13), 'Title': 'Optimized Cranial Bandeau Remodeling', 'Authors': 'James Drake, Marina Drygala, Ricardo Fukasawa, Jochen Koenemann, Andre Linhares, Thomas Looi, John Phillips, David Qian, Nikoo Saber, Justin Toth, Chris Woodbeck, Jessie Yeung'}, page_content="Craniosynostosis, a condition affecting 1 in 2000 infants, is caused by\npremature fusing of cranial vault sutures, and manifests itself in abnormal\nskull growth patterns. Left untreated, the condition may lead to severe\ndevelopmental impairment. Standard practice is to apply corrective cranial\nbandeau remodeling surgery in the first year of the infant's life. The most\nfrequent type of surgery involves the removal of the so-called fronto-orbital\nbar from the patient's forehead and the cutting of well-placed incisions to\nreshape the skull in order to obtain the desired result. In this paper, we\npropose a precise optimizati

### NCBI Protein DB retriever

In [57]:
db_params = {
  'gene': {
    'rettype': 'xml',
    'retmode': 'xml',
  },
  'protein': {
    'rettype': 'gb',
    'retmode': 'text',
  },
}

class NCBIRetriever(BaseRetriever):
  db: str
  k: int

  def __init__(self, db: str, k: int):
    super().__init__(db=db, k=k)

    self.db = db
    self.k = k

    entrez_email = os.getenv('ENTREZ_EMAIL')
    if entrez_email == None:
      raise ValueError('ENTREZ_EMAIL is not defined')
    Entrez.email = entrez_email

  def _search(self, term):
    handle = Entrez.esearch(db=self.db, term=term, retmax=self.k)
    record = Entrez.read(handle)
    handle.close()
    return record['IdList']

  def _fetch(self, ids):
    rettype = db_params[self.db]["rettype"]
    retmode = db_params[self.db]["retmode"]

    handle = Entrez.efetch(db=self.db, id=ids, rettype=rettype, retmode=retmode)
    if self.db == 'gene':
      records = Entrez.read(handle)
    else:
      records = [SeqIO.read(handle, rettype)]
    handle.close()
    return records

  def _get_gene_document(self, record):
    gene_id = record['Entrezgene_track-info']['Gene-track']['Gene-track_geneid']
    gene_symbol = record['Entrezgene_gene']['Gene-ref']['Gene-ref_locus']
    gene_description = record.get('Entrezgene_summary', 'N/A')
    organism_name = record['Entrezgene_source']['BioSource']['BioSource_org']['Org-ref']['Org-ref_taxname']
    page_content = (
      f'Gene ID: {gene_id}\n'
      f'Gene Symbol: {gene_symbol}\n'
      f'Organism: {organism_name}\n'
      f'Description: {gene_description}'
    )
    source = f'https://www.ncbi.nlm.nih.gov/gene/{gene_id}'
    document = Document(page_content=page_content, metadata={'source': source})
    return document

  def _get_protein_document(self, record):
    molecule_type = record.annotations.get("molecule_type", "N/A")
    organism = record.annotations.get("organism", "N/A")
    comment = record.annotations.get("comment", "N/A")
    page_content = (
      f'Protein ID: {record.id}\n'
      f'Type: {molecule_type}\n'
      f'Name: {record.name}\n'
      f'Organism: {organism}\n'
      f'Description: {record.description}\n'
      f'Comment: {comment}\n'
      f'Sequence: {record.seq}'
    )
    source = f'https://www.ncbi.nlm.nih.gov/protein/{record.id}'
    document = Document(page_content=page_content, metadata={'source': source})
    return document

  def _get_relevant_documents(self, query: str, *, run_manager: CallbackManagerForRetrieverRun) -> List[Document]:
    ids = self._search(query)
    records = self._fetch(ids)

    docs = []

    for record in records:
      if self.db == 'gene':
        docs.append(self._get_gene_document(record))
      elif self.db == 'protein':
        docs.append(self._get_protein_document(record))

    return docs

In [58]:
ncbi_protein_retriever = NCBIRetriever(db='protein', k=3)
ncbi_protein_retriever.invoke('ABW05875')

[Document(metadata={'source': 'https://www.ncbi.nlm.nih.gov/protein/ABW05875.1'}, page_content='Protein ID: ABW05875.1\nType: protein\nName: ABW05875\nOrganism: Bromelia pinguin\nDescription: PsbN (chloroplast) [Bromelia pinguin]\nComment: Method: conceptual translation.\nSequence: METATLVAISISGLLVSFTGYALYTAFGQPSQQLRDPFEEHGD')]

In [59]:
ncbi_gene_retriever = NCBIRetriever(db='gene', k=3)
ncbi_gene_retriever.invoke('peng')

[Document(metadata={'source': 'https://www.ncbi.nlm.nih.gov/gene/139103289'}, page_content='Gene ID: 139103289\nGene Symbol: Peng\nOrganism: Cardiocondyla obscurior\nDescription: N/A'),
 Document(metadata={'source': 'https://www.ncbi.nlm.nih.gov/gene/139085911'}, page_content='Gene ID: 139085911\nGene Symbol: peng\nOrganism: Chironomus tepperi\nDescription: N/A'),
 Document(metadata={'source': 'https://www.ncbi.nlm.nih.gov/gene/139048079'}, page_content='Gene ID: 139048079\nGene Symbol: peng\nOrganism: Dermacentor albipictus\nDescription: N/A')]

### NCBI Protein DB chain

In [60]:
class NCBIProteinDBAnswer(BaseModel):
  query: str = Field(description='Given the original query, please find a protein locus for the NCBI protein database.')

ncbi_protein_db_parser = PydanticOutputParser(pydantic_object=NCBIProteinDBAnswer)
ncbi_protein_db_retry_parser = RetryOutputParser.from_llm(
  parser=ncbi_protein_db_parser,
  llm=llm,
  max_retries=3,
)

ncbi_protein_db_template = """
As an expert in bioinformatics and user query optimization for biological databases, your task is to transform user questions into precise and effective queries suitable for the NCBI protein database.
Create a query with only locus of a protein for search within the NCBI protein database.

Original query: {question}

{format_instructions}
"""
ncbi_protein_db_prompt = PromptTemplate(
  template=ncbi_protein_db_template,
  input_variables=['question'],
  partial_variables={'format_instructions': ncbi_protein_db_parser.get_format_instructions()},
)

query_extractor = lambda res: res.query

ncbi_protein_db_chain = RunnableParallel(
  completion=ncbi_protein_db_prompt | llm | extract_json, prompt_value=ncbi_protein_db_prompt
) | RunnableLambda(lambda x: ncbi_protein_db_retry_parser.parse_with_prompt(**x)) | query_extractor | ncbi_protein_retriever
print(ncbi_protein_db_chain.invoke({"question": "Calculate the frequency of each amino acid in the ABW05875 protein sequence"}))

[Document(metadata={'source': 'https://www.ncbi.nlm.nih.gov/protein/ABW05875.1'}, page_content='Protein ID: ABW05875.1\nType: protein\nName: ABW05875\nOrganism: Bromelia pinguin\nDescription: PsbN (chloroplast) [Bromelia pinguin]\nComment: Method: conceptual translation.\nSequence: METATLVAISISGLLVSFTGYALYTAFGQPSQQLRDPFEEHGD')]


### NCBI Gene DB chain

In [61]:
class NCBIGeneDBAnswer(BaseModel):
  query: str = Field(description='Given the original query, please find a gene locus for the NCBI gene database.')

ncbi_gene_db_parser = PydanticOutputParser(pydantic_object=NCBIGeneDBAnswer)
ncbi_gene_db_retry_parser = RetryOutputParser.from_llm(
  parser=ncbi_gene_db_parser,
  llm=llm,
  max_retries=3,
)

ncbi_gene_db_template = """
As an expert in bioinformatics and user query optimization for biological databases, your task is to transform user questions into precise and effective queries suitable for the NCBI gene database.
Create a query with only locus of a gene for search within the NCBI gene database.

Original query: {question}

{format_instructions}
"""
ncbi_gene_db_prompt = PromptTemplate(
  template=ncbi_gene_db_template,
  input_variables=['question'],
  partial_variables={'format_instructions': ncbi_gene_db_parser.get_format_instructions()},
)

query_extractor = lambda res: res.query

ncbi_gene_db_chain = RunnableParallel(
  completion=ncbi_gene_db_prompt | llm | extract_json, prompt_value=ncbi_gene_db_prompt
) | RunnableLambda(lambda x: ncbi_gene_db_retry_parser.parse_with_prompt(**x)) | query_extractor | ncbi_gene_retriever
print(ncbi_gene_db_chain.invoke({"question": "Calculate the frequency of each amino acid in the peng gene sequence"}))

[Document(metadata={'source': 'https://www.ncbi.nlm.nih.gov/gene/139103289'}, page_content='Gene ID: 139103289\nGene Symbol: Peng\nOrganism: Cardiocondyla obscurior\nDescription: N/A'), Document(metadata={'source': 'https://www.ncbi.nlm.nih.gov/gene/139085911'}, page_content='Gene ID: 139085911\nGene Symbol: peng\nOrganism: Chironomus tepperi\nDescription: N/A'), Document(metadata={'source': 'https://www.ncbi.nlm.nih.gov/gene/139048079'}, page_content='Gene ID: 139048079\nGene Symbol: peng\nOrganism: Dermacentor albipictus\nDescription: N/A')]


## Build graph app

In [83]:
class GraphState(TypedDict):
  question: str

  specialized_srcs: List[str]

  step_back_query: str
  rewritten_query: str
  subqueries: List[str]

  generated_docs: List[str]

  documents: Annotated[list, operator.add]

  web_search: str

  generation: str
  generations_num: int

def determine_specialized_srcs(state):
  print('---DETERMINE SPECIALIZED SOURCES---')

  question = state['question']

  try:
    res = question_router.invoke({'question': question})
    srcs = [src.strip().lower() for src in res.sources]
  except:
    srcs = []

  return {'specialized_srcs': srcs}

def route_question(state):
  print('---ROUTE QUESTION---')

  sources = state['specialized_srcs']

  if len(sources) == 0:
    print('---ROUTE QUESTION TO WEB SEARCH---')
    return 'websearch'
  else:
    print(f'---ROUTE QUESTION TO SPECIALIZED SOURCES: {", ".join([source.upper() for source in sources])}---')
    return 'specialized_srcs'

def generate_step_back_query(state):
  print('---GENERATE STEP-BACK QUERY---')

  question = state['question']

  try:
    response = step_back_chain.invoke({'question': question})
    step_back_query = response.step_back
  except:
    step_back_query = question

  return {'step_back_query': step_back_query}

def generate_rewritten_query(state):
  print('---GENERATE REWRITTEN QUERY---')

  question = state['question']

  try:
    response = rewrite_query_chain.invoke({'question': question})
    rewritten_query = response.rewritten_query
  except:
    rewritten_query = question

  return {'rewritten_query': rewritten_query}

def generate_subqueries(state):
  print('---GENERATE SUBQUERIES---')

  question = state['question']

  try:
    decomposition_answer = decomposition_chain.invoke({'question': question})
    subqueries = decomposition_answer.subqueries
    # Limit to a maximum of four subqueries
    subqueries = subqueries[:4]
  except:
    subqueries = []

  print(f'---FINAL SUBQUERIES NUMBER: {len(subqueries)}---')

  return {'subqueries': subqueries}

def generate_hyde_docs(state):
  print('---GENERATE HYDE DOCUMENTS---')

  question = state['question']
  step_back_query = state['step_back_query']
  rewritten_query = state['rewritten_query']
  subqueries = state['subqueries']

  queries = [question, step_back_query, rewritten_query, *subqueries]
  generated_docs = []

  for query in queries:
    generated_doc = hyde_chain.invoke({'question': query})
    generated_docs.append(generated_doc)

  return {'question': question, 'generated_docs': generated_docs}

def vector_store_retriever_node(state):
  generated_docs = state['generated_docs']
  specialized_srcs = state['specialized_srcs']

  if 'vectorstore' not in specialized_srcs:
    return {'documents': []}

  print('---RETRIEVE FROM VECTOR STORE---')

  documents = []

  for generated_doc in generated_docs:
    documents.extend(retriever.invoke(generated_doc))

  return {'documents': documents}

def pub_med_retriever_node(state):
  specialized_srcs = state['specialized_srcs']

  if 'pubmed' not in specialized_srcs:
    return {'documents': []}

  print('---RETRIEVE FROM PUBMED---')

  question = state['question']
  step_back_query = state['step_back_query']
  rewritten_query = state['rewritten_query']
  subqueries = state['subqueries']

  queries = [question, step_back_query, rewritten_query, *subqueries]
  documents = []

  for query in queries:
    try:
      documents.extend(pub_med_retriever.invoke(query))
    except:
      pass

  return {'documents': documents}

def arxiv_retriever_node(state):
  specialized_srcs = state['specialized_srcs']

  if 'arxiv' not in specialized_srcs:
    return {'documents': []}

  print('---RETRIEVE FROM ARXIV---')

  question = state['question']
  step_back_query = state['step_back_query']
  rewritten_query = state['rewritten_query']
  subqueries = state['subqueries']

  queries = [question, step_back_query, rewritten_query, *subqueries]
  documents = []

  for query in queries:
    try:
      documents.extend(arxiv_retriever.invoke(query))
    except:
      pass

  return {'documents': documents}

def ncbi_protein_db_retriever_node(state):
  specialized_srcs = state['specialized_srcs']

  if 'ncbi_protein' not in specialized_srcs:
    return {'documents': []}

  print('---RETRIEVE FROM NCBI PROTEIN DB---')

  question = state['question']
  step_back_query = state['step_back_query']
  rewritten_query = state['rewritten_query']
  subqueries = state['subqueries']

  queries = [question, step_back_query, rewritten_query, *subqueries]
  documents = []

  for query in queries:
    try:
      documents.extend(ncbi_protein_db_chain.invoke(query))
    except:
      pass

  return {'documents': documents}

def ncbi_gene_db_retriever_node(state):
  specialized_srcs = state['specialized_srcs']

  if 'ncbi_gene' not in specialized_srcs:
    return {'documents': []}

  print('---RETRIEVE FROM NCBI GENE DB---')

  question = state['question']
  step_back_query = state['step_back_query']
  rewritten_query = state['rewritten_query']
  subqueries = state['subqueries']

  queries = [question, step_back_query, rewritten_query, *subqueries]
  documents = []

  for query in queries:
    try:
      documents.extend(ncbi_gene_db_chain.invoke(query))
    except:
      pass

  return {'documents': documents}

def grade_documents(state):
  print('---CHECK DOCUMENT RELEVANCE TO QUESTION---')

  rewritten_query = state['rewritten_query']
  documents = state['documents']

  print(f'---INITIAL DOCUMENTS NUMBER: {len(documents)}---')

  if len(documents) == 0:
    return {'documents': [], 'web_search': True}

  unique_documents = list({doc.page_content: doc for doc in documents}.values())
  retriever = BM25Retriever.from_documents(unique_documents)
  retrieved_documents = retriever.invoke(rewritten_query)
  filtered_documents = []
  web_search = False

  for index, document in enumerate(retrieved_documents):
    print(f'---GRADE DOCUMENT ({index + 1}/{len(retrieved_documents)})---')

    try:
      score = docs_grader_chain.invoke({
        'question': rewritten_query,
        'document': document.page_content,
      })
      grade = score.binary_score
    except:
      grade = 'No'

    if grade.lower() == 'yes':
      print('---GRADE: DOCUMENT RELEVANT---')
      filtered_documents.append(document)
    else:
      print('---GRADE: DOCUMENT NOT RELEVANT---')
      web_search = True

  print(f'---FINAL DOCUMENTS NUMBER: {len(filtered_documents)}---')

  state['documents'].clear()
  return {
    'documents': filtered_documents,
    'web_search': web_search or len(filtered_documents) == 0,
  }

def decide_to_generate(state):
  print('---ASSESS GRADED DOCUMENTS---')

  web_search = state['web_search']

  if web_search:
    print('---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---')
    return 'websearch'
  else:
    print('---DECISION: GENERATE---')
    return 'generate'

def web_search(state):
  print('---WEB SEARCH---')

  question = state['question']

  web_results = web_search_tool.invoke({'query': question})
  docs = [Document(page_content=result['content'], metadata={'source': result['url']}) for result in web_results]

  return {'documents': docs}

def generate(state):
  print('---GENERATE---')

  question = state['question']
  documents = state['documents']
  generations_num = state.get('generations_num', 0)

  context = '\n\n'.join(map(lambda doc: doc.page_content, documents))
  generation = rag_chain.invoke({'context': context, 'question': question})

  return {'generation': generation, 'generations_num': generations_num + 1}

def grade_generation(state):
  print('---CHECK HALLUCINATIONS---')

  question = state['question']
  documents = state['documents']
  generation = state['generation']
  generations_num = state['generations_num']

  if generations_num >= 2:
    return 'useful'

  try:
    context = '\n\n'.join(map(lambda doc: doc.page_content, documents))
    score = hallucination_grader.invoke({
      'documents': context,
      'generation': generation,
    })
    grade = score.binary_score
  except:
    grade = 'no'

  if grade == 'yes':
    print('---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---')
    print('---GRADE GENERATION vs QUESTION---')

    try:
      score = answer_grader.invoke({'question': question,'generation': generation})
      grade = score.binary_score
    except:
      grade = 'no'

    if grade == 'yes':
      print('---DECISION: GENERATION ADDRESSES QUESTION---')
      return 'useful'
    else:
      print('---DECISION: GENERATION DOES NOT ADDRESS QUESTION---')
      return 'not useful'
  else:
    print('---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---')
    return 'not supported'

In [84]:
workflow = StateGraph(GraphState)

In [85]:
workflow.add_node('determine_specialized_srcs', determine_specialized_srcs)

workflow.add_node('generate_step_back_query', generate_step_back_query)
workflow.add_node('generate_rewritten_query', generate_rewritten_query)
workflow.add_node('generate_subqueries', generate_subqueries)

workflow.add_node('generate_hyde_docs', generate_hyde_docs)

workflow.add_node('vector_store_retriever', vector_store_retriever_node)
workflow.add_node('pub_med_retriever', pub_med_retriever_node)
workflow.add_node('arxiv_retriever', arxiv_retriever_node)
workflow.add_node('ncbi_protein_db_retriever', ncbi_protein_db_retriever_node)
workflow.add_node('ncbi_gene_db_retriever_node', ncbi_gene_db_retriever_node)

workflow.add_node('websearch', web_search)
workflow.add_node('generate', generate)
workflow.add_node('grade_documents', grade_documents)

<langgraph.graph.state.StateGraph at 0x73fdea1bfd00>

In [86]:
workflow.add_edge(START, 'determine_specialized_srcs')
workflow.add_conditional_edges(
  'determine_specialized_srcs',
  route_question,
  {
    'websearch': 'websearch',
    'specialized_srcs': 'generate_step_back_query',
  },
)

workflow.add_edge('generate_step_back_query', 'generate_rewritten_query')
workflow.add_edge('generate_rewritten_query', 'generate_subqueries')
workflow.add_edge('generate_subqueries', 'generate_hyde_docs')

workflow.add_edge('generate_hyde_docs', 'vector_store_retriever')
workflow.add_edge('generate_hyde_docs', 'pub_med_retriever')
workflow.add_edge('generate_hyde_docs', 'arxiv_retriever')
workflow.add_edge('generate_hyde_docs', 'ncbi_protein_db_retriever')
workflow.add_edge('generate_hyde_docs', 'ncbi_gene_db_retriever_node')

workflow.add_edge('vector_store_retriever', 'grade_documents')
workflow.add_edge('pub_med_retriever', 'grade_documents')
workflow.add_edge('arxiv_retriever', 'grade_documents')
workflow.add_edge('ncbi_protein_db_retriever', 'grade_documents')
workflow.add_edge('ncbi_gene_db_retriever_node', 'grade_documents')

workflow.add_conditional_edges(
  'grade_documents',
  decide_to_generate,
  {
    'websearch': 'websearch',
    'generate': 'generate',
  },
)
workflow.add_edge('websearch', 'generate')
workflow.add_conditional_edges(
  'generate',
  grade_generation,
  {
    'not supported': 'generate',
    'useful': END,
    'not useful': 'websearch',
  },
)

<langgraph.graph.state.StateGraph at 0x73fdea1bfd00>

In [87]:
app = workflow.compile()

In [79]:
app.invoke({'question': 'Count each amino acid in the ABW05875 sequence'})

---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: NCBI_PROTEIN---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 3---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM NCBI PROTEIN DB---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 3---
---GRADE DOCUMENT (1/1)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 10.02it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.69it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 10.21it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.78it/s]

---CHECK HALLUCINATIONS---





{'question': 'Count each amino acid in the ABW05875 sequence',
 'specialized_srcs': ['ncbi_protein'],
 'step_back_query': 'What are the general characteristics of protein sequences like ABW05875?',
 'rewritten_query': 'Count each amino acid in the ABW05875 sequence',
 'subqueries': ['What are the counts of each amino acid type (A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y) in the ABW05875 sequence?',
  'How many occurrences of each amino acid are there in the ABW05875 sequence?',
  'What is the frequency distribution of each amino acid type in the ABW05875 sequence?'],
 'generated_docs': ['Here is a scientific paper-style passage answering the question:\n\n**Title:** Amino Acid Composition Analysis of the ABW05875 Sequence\n\n**Abstract:**\nThe ABW05875 protein sequence, a member of the hypothetical protein family, has been analyzed to determine its amino acid composition. This study aimed to count and quantify each amino acid present in the ABW05875 sequence.\n\n**Result

## Evaluate RAG

### Load QA dataset

In [88]:
qa_df = pd.read_csv('brainscape.csv')[:500]
qa_df

Unnamed: 0,question,answer
0,What are the afferent cranial nerve nuclei?,Trigeminal sensory nucleus- fibres carry gener...
1,What is the order of the cranial nerves ?,1-olfactory\n2-optic\n3-oculomotor\n4-trochlea...
2,What are the efferent cranial nerve nuclei?,Edinger-westphal nucleus\nOculomotor nucleus\n...
3,Which nuclei share the embryo logical origin -...,Oculomotor nucleus Trochlear nucleus Abducens ...
4,Which nuclei share the embryo logical origin- ...,Trigeminal motor nucleus Facial motor nucleus ...
...,...,...
495,What are the functions each of these structure...,Cortex is involved in perception Hypothalamus ...
496,What are the stimuli for olfaction ?,Airborne molecules \nAlcohols \nEsters \nAroma...
497,What percentage of human genome are involved w...,3%
498,What is the difference between threshold value...,Lipid soluble have low thresholds while water ...


### Load cached RAGs responses

In [89]:
cache_path = Path('cache.json')

if not os.path.exists(cache_path):
  data = {}
  with open(cache_path, 'w') as file:
    json.dump(data, file)

with open(cache_path, 'r') as f:
  cache = json.load(f)

len(cache.keys())

282

In [93]:
questions = list(qa_df['question'].tolist())
expected_answers = list(qa_df['answer'].tolist())
predicted_answers = []

for index, question in tqdm(enumerate(questions)):
  if not question in cache:
    cache[question] = app.invoke({'question': question})['generation']

  predicted_answers.append(cache[question])

  with open(cache_path, 'w') as f:
    json.dump(cache, f)

cos_score = embeddings_cosine_sim_metric(expected_answers, predicted_answers)
bleu_score = bleu_metric(expected_answers, predicted_answers)
rogue_1_score = rogue_1_metric(expected_answers, predicted_answers)
rogue_l_score = rogue_l_metric(expected_answers, predicted_answers)

cos_score, bleu_score, rogue_1_score, rogue_l_score

361it [00:00, 1787.28it/s]

---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 4---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 28---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 11.06it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.00it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


470it [01:31,  3.56it/s]  

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE, PUBMED---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 4---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE------RETRIEVE FROM PUBMED---

---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 43---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT RELEVANT---
---FINAL DOCUMENTS NUMBER: 1---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 10.48it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  2.53it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


471it [03:14,  1.36it/s]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: PUBMED, VECTORSTORE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 4---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---RETRIEVE FROM PUBMED---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 46---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 1---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00,  9.08it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  2.22it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


472it [05:42,  1.65s/it]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 4---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 28---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 10.81it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  2.17it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


473it [07:17,  2.48s/it]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 4---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 28---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 10.81it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.17it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 10.80it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.23it/s]
474it [09:59,  4.47s/it]

---CHECK HALLUCINATIONS---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 3---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 24---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 10.70it/s]
Fusing candidates: 100%|██████████| 1/1 [00:01<00:00,  1.26s/it]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


475it [11:42,  6.18s/it]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 4---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 28---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00,  7.98it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.22it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


476it [13:26,  8.57s/it]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 4---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 28---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 12.07it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.73it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---


477it [15:23, 12.23s/it]

---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 3---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 24---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00,  8.92it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  2.29it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


478it [16:53, 15.79s/it]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 4---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 28---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00,  8.63it/s]
Fusing candidates: 100%|██████████| 1/1 [00:01<00:00,  1.50s/it]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 12.28it/s]
Fusing candidates: 100%|██████████| 1/1 [00:01<00:00,  1.49s/it]
479it [19:35, 24.78s/it]

---CHECK HALLUCINATIONS---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE, NCBI_GENE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 4---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---RETRIEVE FROM NCBI GENE DB---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 45---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 10.33it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.22it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


480it [22:00, 34.45s/it]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 3---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 24---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00,  8.73it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.26it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


481it [23:26, 39.85s/it]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 4---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 28---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00,  8.77it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.01it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


482it [25:13, 48.42s/it]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 3---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 24---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00,  9.58it/s]
Fusing candidates: 100%|██████████| 1/1 [00:01<00:00,  1.09s/it]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


483it [27:07, 58.64s/it]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE, PUBMED---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 3---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE------RETRIEVE FROM PUBMED---

---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 39---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 1---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00,  9.01it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.21it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


484it [29:22, 72.52s/it]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE, PUBMED---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 4---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE------RETRIEVE FROM PUBMED---

---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 34---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 1---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 10.69it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.46it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


485it [31:26, 83.14s/it]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 3---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 24---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 10.43it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  2.16it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


486it [33:15, 89.12s/it]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE, NCBI_GENE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 4---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---RETRIEVE FROM NCBI GENE DB---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 36---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 2---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 10.51it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.90it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 11.57it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.88it/s]
487it [39:07, 153.44s/it]

---CHECK HALLUCINATIONS---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 11.45it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  2.67it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---


488it [39:59, 127.27s/it]

---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: NCBI_GENE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 4---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM NCBI GENE DB---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 12---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 10.75it/s]
Fusing candidates: 100%|██████████| 1/1 [00:01<00:00,  1.52s/it]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


489it [42:19, 130.67s/it]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: NCBI_GENE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 3---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM NCBI GENE DB---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 15---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 10.73it/s]
Fusing candidates: 100%|██████████| 1/1 [00:01<00:00,  1.04s/it]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


490it [44:46, 135.11s/it]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: NCBI_GENE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 3---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM NCBI GENE DB---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 15---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 1---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00,  8.62it/s]
Fusing candidates: 100%|██████████| 1/1 [00:01<00:00,  1.54s/it]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


491it [47:53, 149.98s/it]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE, PUBMED, NCBI_GENE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 3---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---RETRIEVE FROM PUBMED---
---RETRIEVE FROM NCBI GENE DB---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 45---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 2---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00,  8.40it/s]
Fusing candidates: 100%|██████████| 1/1 [00:01<00:00,  1.39s/it]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


492it [51:19, 166.16s/it]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 4---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 28---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 10.38it/s]
Fusing candidates: 100%|██████████| 1/1 [00:01<00:00,  1.40s/it]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


493it [53:32, 156.43s/it]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: NCBI_GENE, VECTORSTORE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 3---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---RETRIEVE FROM NCBI GENE DB---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 30---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 1---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00,  9.63it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  2.46it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


494it [55:40, 148.19s/it]

---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 4---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 28---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 1---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 11.94it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.47it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 12.80it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.49it/s]
495it [58:25, 152.89s/it]

---CHECK HALLUCINATIONS---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 3---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 24---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 11.98it/s]
Fusing candidates: 100%|██████████| 1/1 [00:01<00:00,  1.40s/it]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 12.20it/s]
Fusing candidates: 100%|██████████| 1/1 [00:01<00:00,  1.16s/it]
496it [1:01:15, 158.07s/it]

---CHECK HALLUCINATIONS---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE, PUBMED---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 4---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---RETRIEVE FROM PUBMED---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 45---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT RELEVANT---
---FINAL DOCUMENTS NUMBER: 1---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 12.52it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.17it/s]


---CHECK HALLUCINATIONS---


497it [1:03:06, 144.03s/it]

---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: NCBI_GENE, VECTORSTORE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 4---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---RETRIEVE FROM NCBI GENE DB---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 46---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00,  8.87it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.16it/s]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 12.71it/s]
Fusing candidates: 100%|██████████| 1/1 [00:00<00:00,  1.15it/s]
498it [1:06:19, 158.69s/it]

---CHECK HALLUCINATIONS---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: VECTORSTORE, PUBMED---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 4---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM VECTOR STORE---
---RETRIEVE FROM PUBMED---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 28---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 0---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 10.65it/s]
Fusing candidates: 100%|██████████| 1/1 [00:01<00:00,  1.50s/it]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 11.36it/s]
Fusing candidates: 100%|██████████| 1/1 [00:01<00:00,  1.52s/it]
499it [1:09:45, 172.76s/it]

---CHECK HALLUCINATIONS---
---DETERMINE SPECIALIZED SOURCES---
---ROUTE QUESTION---
---ROUTE QUESTION TO SPECIALIZED SOURCES: NCBI_GENE---
---GENERATE STEP-BACK QUERY---
---GENERATE REWRITTEN QUERY---
---GENERATE SUBQUERIES---
---FINAL SUBQUERIES NUMBER: 3---
---GENERATE HYDE DOCUMENTS---
---RETRIEVE FROM NCBI GENE DB---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---INITIAL DOCUMENTS NUMBER: 11---
---GRADE DOCUMENT (1/4)---
---GRADE: DOCUMENT RELEVANT---
---GRADE DOCUMENT (2/4)---
---GRADE: DOCUMENT RELEVANT---
---GRADE DOCUMENT (3/4)---
---GRADE: DOCUMENT RELEVANT---
---GRADE DOCUMENT (4/4)---
---GRADE: DOCUMENT NOT RELEVANT---
---FINAL DOCUMENTS NUMBER: 3---
---ASSESS GRADED DOCUMENTS---
---DECISION: SOME DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
---WEB SEARCH---
---GENERATE---


Ranking candidates: 100%|██████████| 1/1 [00:00<00:00, 10.69it/s]
Fusing candidates: 100%|██████████| 1/1 [00:01<00:00,  1.43s/it]


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


500it [1:13:17,  8.79s/it] 

---DECISION: GENERATION ADDRESSES QUESTION---





(0.6251679069211469,
 0.025192447599560224,
 0.2227855995692924,
 0.19491690991905436)