The Imports

In [1]:
import dotenv
import os
from pydantic import BaseModel, Field
from langchain.document_loaders import PyMuPDFLoader
from config import OUTPUT_DIR, DEFAULT_MODEL

dotenv.load_dotenv()

True

In [9]:
%load_ext autoreload
%autoreload 2

from utils import query_document, process_figure_answers, expand_figure_answers, write_analysis_to_file, query_and_expand

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [13]:
# ensure output directory exists
os.makedirs(OUTPUT_DIR, exist_ok=True)


## Extracting Details of the Paper

Pydantic Models for Structured Output

In [23]:
# Define the model
class FiguresCount(BaseModel):
    total_figures: int = Field(description="Total number of figures in the paper")

class PaperDetails(BaseModel):
    title: str = Field(description="Title of the paper")
    abstract: str = Field(description="Abstract of the paper")
    authors: str = Field(description="Authors of the paper")


Extract the details from the paper: 
- Title
- Abstract
- Authors
- Number of figures

In [4]:
loader = PyMuPDFLoader("papers/p2xa_paper.pdf")
document = loader.load()

In [41]:
from templates import FIGURE_COUNT_TEMPLATE

figure_count_response = query_document(document, 
                                       prompt_template=FIGURE_COUNT_TEMPLATE, 
                                       model_name=DEFAULT_MODEL, 
                                       pydantic_model=FiguresCount)

In [35]:
from templates import EXTRACT_DETAILS_TEMPLATE

details_response = query_document(document, 
                                  prompt_template=EXTRACT_DETAILS_TEMPLATE, 
                                  model_name=DEFAULT_MODEL, 
                                  pydantic_model=PaperDetails)

In [43]:
print(f"Title: {details_response.title}\n")
print(f"Abstract: {details_response.abstract}\n")
print(f"Authors: {details_response.authors}\n")
print(f"Number of figures: {figure_count_response.total_figures}\n")


Title: Increased surface P2X4 receptor regulates anxiety and memory in P2X4 internalization-defective knock-in mice

Abstract: ATP signaling and surface P2X4 receptors are upregulated selectively in neurons and/or glia in various CNS disorders including anxiety, chronic pain, epilepsy, ischemia, and neurodegenerative diseases. However, the cell-specific functions of P2X4 in pathological contexts remain elusive. To elucidate P2X4 functions, we created a conditional transgenic knock-in P2X4 mouse line (Floxed P2X4mCherryIN) allowing the Cre activity-dependent genetic swapping of the internalization motif of P2X4 by the fluorescent mCherry protein to prevent constitutive endocytosis of P2X4. By combining molecular, cellular, electrophysiological, and behavioral approaches, we characterized two distinct knock-in mouse lines expressing noninternalized P2X4mCherryIN either exclusively in excitatory forebrain neurons or in all cells natively expressing P2X4. The genetic substitution of wild-t

### Extracting the information about each figure from the paper, then expanding the answers

In [8]:
# Extracting information about each figure from the paper 
print(f"Extracting information about each figure from the paper {details_response.title}...")
answers = process_figure_answers(document, figure_count_response.total_figures)
print(f"Expanding the answers for {details_response.title}...")
expanded_answers = expand_figure_answers(document, answers)

Processing Figure 1...
Processing Figure 2...
Processing Figure 3...
Processing Figure 4...
Processing Figure 5...
Expanding Answers for Figure 1...
Expanding Answers for Figure 2...
Expanding Answers for Figure 3...
Expanding Answers for Figure 4...
Expanding Answers for Figure 5...


Lets look at our answers

In [14]:
output_file = f"{OUTPUT_DIR}/figure_analysis_results.txt"
print(f"Writing the analysis to {output_file}...")
write_analysis_to_file(answers, expanded_answers, output_file)

Analysis written successfully to output/figure_analysis_results.txt


## Extracing background information 

Looking for preqrequisites to understand the paper

In [5]:
from templates import BACKGROUND_TEMPLATE

background_response = query_document(document, 
                                    prompt_template=BACKGROUND_TEMPLATE, 
                                    model_name=DEFAULT_MODEL)

print(background_response)

content='The paper titled "Increased surface P2X4 receptor regulates anxiety and memory in P2X4 internalization-defective knock-in mice" by Eléonore Bertin and colleagues discusses the role of P2X4 receptors in the central nervous system (CNS), particularly in relation to anxiety and memory functions. The background information presented in the paper outlines several critical concepts and topics that are essential for understanding the research findings. Below is a detailed explanation of these concepts:\n\n### Background Information:\n\n1. **P2X4 Receptors**: \n   - P2X4 receptors are part of a family of receptors known as purinergic receptors, specifically activated by ATP (adenosine triphosphate). These receptors are ion channels that facilitate the flow of cations (such as Na⁺ and Ca²⁺) across cell membranes when activated, which can lead to various cellular responses.\n   - P2X4 receptors display high calcium permeability and are found in both neurons and glial cells (supporting c

In [7]:
print(background_response.content)

The paper titled "Increased surface P2X4 receptor regulates anxiety and memory in P2X4 internalization-defective knock-in mice" by Eléonore Bertin and colleagues discusses the role of P2X4 receptors in the central nervous system (CNS), particularly in relation to anxiety and memory functions. The background information presented in the paper outlines several critical concepts and topics that are essential for understanding the research findings. Below is a detailed explanation of these concepts:

### Background Information:

1. **P2X4 Receptors**: 
   - P2X4 receptors are part of a family of receptors known as purinergic receptors, specifically activated by ATP (adenosine triphosphate). These receptors are ion channels that facilitate the flow of cations (such as Na⁺ and Ca²⁺) across cell membranes when activated, which can lead to various cellular responses.
   - P2X4 receptors display high calcium permeability and are found in both neurons and glial cells (supporting cells in the CNS

In [14]:
# expand the answer
expanded_background_response = query_and_expand(document, 
                                    prompt_template=BACKGROUND_TEMPLATE, 
                                    model_name=DEFAULT_MODEL, 
                                    text=document)

print(expanded_background_response.content)

The paper titled "Increased surface P2X4 receptor regulates anxiety and memory in P2X4 internalization-defective knock-in mice" by Eléonore Bertin et al. presents a comprehensive examination of the P2X4 receptor's role in the central nervous system (CNS) and its implications in neuropsychiatric disorders. The study focuses on how the increased surface expression of the P2X4 receptor affects anxiety and memory functions, particularly in the context of genetically modified mouse models that exhibit altered receptor internalization.

### Expanded Background Information

1. **P2X4 Receptors**:
   - P2X4 receptors belong to the family of purinergic receptors and are specifically activated by ATP (adenosine triphosphate). They serve as cation channels that, upon activation, allow the influx of sodium (Na⁺) and calcium (Ca²⁺) ions into cells, which is critical for various physiological functions, including synaptic transmission and modulation.
   - These receptors are predominantly expressed 