In [19]:
import torch
import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.huggingface import HuggingFaceInferenceAPI  
from llama_index.core import Settings
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.embeddings.langchain import LangchainEmbedding
from llama_index.llms.llama_cpp.llama_utils import messages_to_prompt,completion_to_prompt
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.postprocessor import MetadataReplacementPostProcessor
from llama_index.core import PromptTemplate

In [20]:
os.environ["HUGGINGFACEHUB_API_TOKEN"]  = "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"

In [21]:
documents = SimpleDirectoryReader("./pdfs/").load_data()

In [22]:
query_str = "I'm providing you with a research paper your job is to summarizes the information within it."

query_wrapper_prompt = PromptTemplate(
    "Your job is to summarize different sections of the document given to you."
    "Write a response that appropriately completes the request given to you.\n\n"
    "### Instruction:\n{query_str}\n\n### Response:"
)

In [23]:
llm = HuggingFaceInferenceAPI(model_name="HuggingFaceH4/zephyr-7b-alpha")


In [24]:
embed_model = LangchainEmbedding(
  HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")
)

In [25]:
Settings.llm = llm
Settings.node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=5,
    window_metadata_key="window",
    original_text_metadata_key="original_text").get_nodes_from_documents(documents)
Settings.text_splitter = SentenceSplitter(chunk_size=128,chunk_overlap=20)
Settings.embed_model = embed_model

In [26]:
index = VectorStoreIndex.from_documents(documents)

In [29]:
query_engine = index.as_query_engine(similarity_top_k=20,
    verbose=True,
    response_mode="tree_summarize",
    node_postprocessor=[MetadataReplacementPostProcessor("window")])
response = query_engine.query("Generate a summary about the abstract")
print(F"Response: \n {response}")

Response: 
 

The abstract discusses the use of next-generation sequencing (NGS) for detecting mutations in cancer patients. The article highlights the advantages of NGS over traditional PCR-based methods, including the ability to detect rare and previously uncharacterized alterations in the sequenced gene. The article also mentions the use of NGS for detecting circulating tumor DNA (ctDNA) and its potential for predicting mutational burden. The article provides examples of NGS being used in clinical trials for lung cancer and mentions the involvement of various pharmaceutical companies in the development of NGS technology. The article also discusses the need for error-prooﬁng techniques and algorithms to improve the speciﬁcity of NGS. Overall, the article suggests that NGS is a valuable tool for detecting mutations in cancer patients and has the potential to improve patient outcomes.


In [30]:
response = query_engine.query("Generate a summary about the Methodology")
print(F"Response: \n {response}")
print("\n")

Response: 
 

The liquid biopsy report should be thorough and complete yet easy to interpret. Standards of molecular diagnostics reporting have been published by laboratory accrediting organizations including the CAP and the European Society of Pathology Task Force on Quality Assurance in Molecular Pathology and the Royal College of Pathologists. Certain minimum elements are required in all reports for CAP-accredited laboratories. The liquid biopsy report should include the platform used and all the findings of the molecular analysis. Depending on the size of the panel analyzed, a single NGS report can provide information on dozens of targetable genetic abnormalities simultaneously, thus giving added value by obtaining further useful information from the same specimen. To address this, leading commercial and academic laboratories have adopted sophisticated error-prooing techniques and algorithms to dramatically improve the specificity of the sequencing, in some cases greatly diminishin

In [31]:
response = query_engine.query("Generate a summary about the Results and conclusion")
print(F"Response: \n {response}")

Response: 
 

The review article discusses the current state of liquid biopsy in clinical practice and highlights areas in need of further investigation. The article emphasizes the importance of accurate, concise, and clear reporting of molecular alterations investigated in ctDNA. The liquid biopsy report should include the platform used and all the findings of the molecular analysis. Standards of molecular diagnostics reporting have been published by laboratory accrediting organizations, including the CAP and the European Society of Pathology Task Force on Quality Assurance in Molecular Pathology and the Royal College of Pathologists. The article also discusses the need for established tier classification systems to provide guidance for reporting of clinical significance of genetic alterations. The VAF of a given mutation should be reported, and will likely be informative in longitudinal analyses. The article suggests that NGS and NGS-commercial panels may prove more expedient than PC