In [33]:
from langchain.document_loaders import PyPDFLoader

In [36]:
file_path = "/media/luiz/storage2/taufferconsulting/client_catalystneuro/project_llms/papers/science.1125572.pdf"
loader = PyPDFLoader(file_path)
pages = loader.load_and_split()

In [40]:
del pages[0]

In [None]:
full_text = " ".join([p.page_content for p in pages])
split_text = "Conjunctive Representation of"
full_text = split_text + full_text.split(split_text)[1]
full_text

In [55]:
import openai
import os

In [135]:
def get_llm_chat_answer(prompt: str, system_prompt: str = None, model: str = "gpt-3.5-turbo", task: str="step-by-step"):
    openai.api_key = os.getenv("OPENAI_API_KEY")
    if task == "step-by-step":
        system_prompt = """
You are a helpful neuroscience research assistant. Based on the raw text, your goal is to provide a step-by-step guide to reproduce the results of the paper.
You should always follow these rules:
1. consider that you already have the data used in the paper
2. break the bigger task into a list of numbered smaller tasks
3. for each numbered task, give an one sentence instruction on data analysis only, as if it was a step to a Python script
4. for each numbered task, give a longer context description explaining why this point is important, explain it really well

Example:
1. one sentence describing the goal of this first step.
A longer context explaining the rational of this first step, why this is important and more detailed things one should take into consideration here.

2. one sentence describing the goal of this second step.
A longer context explaining the rational of this second step, why this is important and more detailed things one should take into consideration here.
...
N. one sentence describing the goal of this Nth step.
A longer context explaining the rational of this Nth step, why this is important and more detailed things one should take into consideration here.
"""
    elif task == "explain":
        system_prompt = """
You are a helpful neuroscience research assistant. Based on the raw text, your goal is to provide an objective and informative explanation of the paper.
You should highlight the important aspects of the experiment and explain in a concise manner how these are relevant for the study conclusions.
Try to describe relevant aspects such as:
- experimental methods utilized
- data analysis methods uutilized
- theory and assumptions utilized
Explain to the best of your knowledge, dot not make up information that is not present in the text.
"""
    elif task == "hypotheses":
        system_prompt = """Based on the raw text, your goal is to provide a list of hypothesis used in the article.
You should answer with a numbered list with each item being a description of a hypothesis being tested in the article.
For each hypothesis you find, make sure to include:
- the hypothesis title
- the hypothesis description, what does it mean
- why is that hypothesis relevant for the study
- how the study tests if the hypothesis is tru or false
- think on one way to falsify this hypothesis, that is, one type of test that could prove it wrong

Example:
1. hypothesis XXX, explanations of this hypothesis: what does it mean, why is it relevant, how the article tests this hypothesis
2. hypothesis YYY, explanations of this hypothesis: what does it mean, why is it relevant, how the article tests this hypothesis"""
    completion = openai.ChatCompletion.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt}
        ]
    )
    return completion.choices[0].message["content"]

In [96]:
prompt = f"Paper raw text: {full_text}\nStep-by-step guide: "
r = get_llm_chat_answer(prompt=prompt, model="gpt-3.5-turbo-16k")

In [97]:
print(r)

1. Determine the location of grid cells in the medial entorhinal cortex (MEC) by analyzing the firing patterns of recorded cells in rats that explored two-dimensional environments.
 - Analyze recordings from each principal cell layer of MEC.
 - Look for grid cells in layer II that coincide with the vertices of a periodic triangular grid spanning the complete surface of the environment.
 - Use autocorrelation analysis to visualize the grid structure.

2. Explore the prevalence of grid cells in different layers of the MEC.
 - Estimate the periodicity of the firing patterns of grid cells in each layer by computing a 2D autocorrelation matrix and calculating the correlation between rotated maps.
 - Determine the degree of "gridness" by comparing the correlations at the expected peaks and troughs of the function.
 - Analyze the distribution of gridness values in each layer.

3. Investigate the geometric structure of grids in different layers of the MEC.
 - Define grid cells as the subset of

In [105]:
prompt = f"Paper raw text: {full_text}\nStep-by-step guide: "
r = get_llm_chat_answer(prompt=prompt, model="gpt-3.5-turbo-16k", task="explain")

In [106]:
print(r)

This paper investigates how information about location, direction, and distance is integrated in the grid-cell network in the medial entorhinal cortex (MEC) of rats. The experiment involved recording from each principal cell layer of MEC in rats that explored two-dimensional environments. The study found that grid cells, which are part of an environment-independent spatial coordinate system, were present in all principal cell layers of MEC. Grid cells in deeper layers were colocalized with head-direction cells and conjunctive grid/head-direction cells. All cell types were modulated by running speed. 

The findings suggest that the conjunction of positional, directional, and translational information in a single MEC cell type allows for grid coordinates to be updated during self-motion-based navigation. This suggests that grid cells in MEC may play a role in path integration during navigation.

The experimental methods utilized involved recording neural activity from rats as they explor

In [136]:
prompt = f"Paper raw text: {full_text}\nStep-by-step guide: "
r = get_llm_chat_answer(prompt=prompt, model="gpt-3.5-turbo-16k", task="hypotheses")
print(r)

1. Hypothesis: Grid cells in the medial entorhinal cortex (MEC) are part of an environment-independent spatial coordinate system.
   - Explanation: The hypothesis states that grid cells in the MEC contribute to a spatial coordinate system that is not dependent on the specific environment or external sensory cues.
   - Relevance: This hypothesis is relevant because it suggests that the MEC plays a role in spatial navigation and memory.
   - Testing the hypothesis: The study recorded from each principal cell layer of the MEC in rats that explored two-dimensional environments to determine the presence of grid cells and their relationship to other cell types.
   - Falsification: One way to falsify this hypothesis would be to find that grid cells in the MEC are only activated by specific environmental cues, rather than being able to create an environment-independent spatial coordinate system.

2. Hypothesis: The conjunction of positional, directional, and translational information in a sing