<a href="https://colab.research.google.com/github/YuHsin-Chen1221/Large-Language-Model-Agents/blob/main/Research_Paper_Context_Batch_Extraction_with_RAG_%26_LLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [6]:
! pip install -U langchain-community langchain_google_genai pymupdf

Collecting pymupdf
  Downloading pymupdf-1.26.3-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (3.4 kB)
Downloading pymupdf-1.26.3-cp39-abi3-manylinux_2_28_x86_64.whl (24.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.1/24.1 MB[0m [31m49.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pymupdf
Successfully installed pymupdf-1.26.3


In [7]:
import os
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.document_loaders import PyMuPDFLoader

pdf_folder = '/content/drive/MyDrive/PNPL'
docs = []

for filename in os.listdir(pdf_folder):
    if filename.endswith('.pdf'):
        loader = PyMuPDFLoader(os.path.join(pdf_folder, filename))
        docs.extend(loader.load())


In [13]:
!pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0.post1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.0 kB)
Downloading faiss_cpu-1.11.0.post1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (31.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m55.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.11.0.post1


In [14]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
API_KEY = "YOUR_API_KEY"


# Chunk the docs
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
chunks = splitter.split_documents(docs)

from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

from langchain.vectorstores import FAISS
vectorstore = FAISS.from_documents(chunks, embeddings)
vectorstore.save_local("faiss_index")

In [17]:
from langchain.chains import RetrievalQA
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)

# system prompt template
system_template = """You are an expert research assistant specialized in extracting, summarizing, and synthesizing information from scientific research papers.
Use the provided context retrieved from the documents to answer accurately and concisely.
If the information is not available in the context, politely state that the answer cannot be found.
Always respond in a clear, professional, and informative manner suitable for academic or technical audiences.

Context:
{context}

User Question:
{question}

Answer:
"""

# 2. Create system and human message prompt templates
system_message_prompt = SystemMessagePromptTemplate.from_template(system_template)
human_message_prompt = HumanMessagePromptTemplate.from_template("{question}")

# 3. Combine into a ChatPromptTemplate
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

# 4. Initialize LLM with prompt
llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    google_api_key=API_KEY,
    prompt=chat_prompt  # Pass the prompt template here
)

# 5. Create RetrievalQA chain with retriever
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    return_source_documents=True
)

# 6. Run a query
query = "Summarize the key findings across all PDFs."
result = qa_chain({"query": query})

print(result['result'])


                prompt was transferred to model_kwargs.
                Please confirm that prompt is what you intended.
  exec(code_obj, self.user_global_ns, self.user_ns)


Based on the provided text, the key findings related to the studies reviewed are:

*   **Diverse Applications & Comparisons:** Artificial Neural Networks (ANNs) have been applied to MEG research in a wide and diverse range of ways. Many studies explicitly compare ANN-based methods to more traditional MEG analysis techniques, often to justify the adoption of neural networks.
*   **Critical Limitations:** Despite their potential, ANNs in MEG analysis currently face several critical limitations, notably issues with interpretability and data scarcity.
*   **Need for Advancement:** To fully leverage ANNs in MEG, further advancements are necessary, including more robust foundation models that can better handle limited datasets, enhance generalizability, and improve model transparency.
*   **Importance of Validation & Consistency:** Validation is a key element for determining the reliability and applicability of proposed networks. However, the breadth of tasks and data types makes direct comp

In [18]:
# Query List for Research Investigation

"""
- Summarize the main research objective or hypothesis of the papers.
- List the key findings or results presented.
- What methods or experimental techniques were used?
- Identify and explain the main datasets or materials utilized.
- Summarize the conclusions and implications of the studies.
- Extract definitions of important technical terms or concepts.
- What are the limitations or challenges mentioned?
- Describe any future work or open questions proposed.
- What are the applications or real-world impacts discussed?
- List the key authors and their affiliations.
- Compare and contrast methodologies between different papers.
- Identify any hypotheses that were disproved or unsupported.
- What related work or prior research is cited as important?
- Extract any statistical results or quantitative measurements.
- What are the ethical considerations, if any, discussed?
- Summarize any algorithms or models introduced.

"""

'\n- Summarize the main research objective or hypothesis of the papers.\n- List the key findings or results presented.\n- What methods or experimental techniques were used?\n- Identify and explain the main datasets or materials utilized.\n- Summarize the conclusions and implications of the studies.\n- Extract definitions of important technical terms or concepts.\n- What are the limitations or challenges mentioned?\n- Describe any future work or open questions proposed.\n- What are the applications or real-world impacts discussed?\n- List the key authors and their affiliations.\n- Compare and contrast methodologies between different papers.\n- Identify any hypotheses that were disproved or unsupported.\n- What related work or prior research is cited as important?\n- Extract any statistical results or quantitative measurements.\n- What are the ethical considerations, if any, discussed?\n- Summarize any algorithms or models introduced.\n\n'