In [1]:
import pymupdf
from ollama import chat, pull, generate
import math
import glob

In [2]:
with pymupdf.open("../../sample_inputs/transformers_paper.pdf") as pdf:
    text = [page.get_text() for page in pdf]

text

['Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗†\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Transformer,\nbased solely on attention mechanisms, dispensing with recurrenc

---

# Single Document w/ Chunking -- Llama 3.2

In [4]:
pull("llama3.2:3b")

ProgressResponse(status='success', completed=None, total=None, digest=None)

In [40]:
chunked_pages = []

for i in range(math.ceil(len(text) // 5)):
    start_idx, end_idx = i*5, (i+1)*5
    chunk = text[start_idx:end_idx]
    chunked_pages.append("\n".join([page for page in chunk]))

chunked_pages[0]

'Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗†\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Transformer,\nbased solely on attention mechanisms, dispensing with recurrence

In [41]:
running_summary = ""

for chunk in chunked_pages:
    per_page_summary_prompt = f"You are an assistant that is tasked with summarizing a set of documents that are given to you. The documents will be given in chunks, and you will be given the current summary. Do not rewrite the summary; just build on it. Do not use bullet points or formatting. Do not add any other text besides the summary. The text is as follows: {chunk}. The current summary is as follows: {running_summary}"

    response = generate(
        model="llama3.2:3b",
        prompt=per_page_summary_prompt,
        options={
            "num_ctx": 8192
        }
    )

    running_summary += response["response"]
    print(running_summary)

The Transformer proposes a new simple network architecture based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. It consists of stacked self-attention and point-wise, fully connected layers for both the encoder and decoder. The model achieves state-of-the-art results in machine translation tasks while being more parallelizable and requiring significantly less time to train than existing models.
The Transformer proposes a new simple network architecture based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. It consists of stacked self-attention and point-wise, fully connected layers for both the encoder and decoder. The model achieves state-of-the-art results in machine translation tasks while being more parallelizable and requiring significantly less time to train than existing models.The Transformer is a novel sequence transduction model that relies exclusively on attention mechanisms, eliminating recurrent lay

In [43]:
print(running_summary)

The Transformer proposes a new simple network architecture based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. It consists of stacked self-attention and point-wise, fully connected layers for both the encoder and decoder. The model achieves state-of-the-art results in machine translation tasks while being more parallelizable and requiring significantly less time to train than existing models.The Transformer is a novel sequence transduction model that relies exclusively on attention mechanisms, eliminating recurrent layers and convolutional operations from traditional architectures. By doing so, it offers significant advantages over its predecessors in terms of training speed, computational efficiency, and scalability. The proposed model architecture consists of two stacked sub-layers: self-attention and point-wise fully connected (ffn) layer for both the encoder and decoder.

Self-attention allows all positions in a sequence to interact with each

---

# Single Document w/ Chunking -- Gemma 3

In [44]:
pull("gemma3:4b")

ProgressResponse(status='success', completed=None, total=None, digest=None)

In [45]:
chunked_pages = []

for i in range(math.ceil(len(text) // 5)):
    start_idx, end_idx = i*5, (i+1)*5
    chunk = text[start_idx:end_idx]
    chunked_pages.append("\n".join([page for page in chunk]))

chunked_pages[0]

'Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗†\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Transformer,\nbased solely on attention mechanisms, dispensing with recurrence

In [63]:
running_summary = ""

for chunk in chunked_pages:
    per_page_summary_prompt = f"You are an assistant that is tasked with summarizing a set of documents that are given to you. The documents will be given in chunks, and you will be given the current summary. Write a summary using the information provided. Do not use bullet points or any other formatting. The summary should be at least 500 words long. The text is as follows: {chunk}. The current summary is as follows: {response['response']}"

    response = generate(
        model="gemma3:4b",
        prompt=per_page_summary_prompt,
        options={
            "num_ctx": 8192
        }
    )

    print(response["response"])

The Transformer, presented by Vaswani et al. (2017), represents a paradigm shift in sequence transduction, decisively abandoning the established recurrent and convolutional neural network paradigms. This meticulously detailed paper outlines the architecture and implementation of the Transformer, specifically engineered for machine translation, demonstrating a substantial leap in both translation quality and computational efficiency compared to prior state-of-the-art models. The core innovation lies in the utilization of scaled dot-product attention, multi-head attention, and positional encoding within a stacked encoder and decoder structure. The authors convincingly argue that this architecture elegantly overcomes the inherent limitations of traditional sequence transduction models, particularly concerning long-range dependencies and the challenges of parallelization. The Transformer achieves a new state-of-the-art BLEU score of 28.4 on the WMT 2014 English-to-German translation task, 

In [65]:
print(response["response"])

The Transformer architecture, as meticulously detailed in Vaswani et al.’s (2017) seminal paper, represents a revolutionary approach to sequence transduction, decisively moving away from the limitations of recurrent and convolutional neural networks. This thoroughly crafted research meticulously outlines the design and implementation of the Transformer, specifically engineered for machine translation, achieving unprecedented translation quality and computational efficiency—significantly surpassing existing state-of-the-art models. The core innovation rests on the utilization of scaled dot-product attention, multi-head attention, and positional encoding within a stacked encoder and decoder structure. The authors convincingly demonstrate that this architecture elegantly addresses the inherent challenges of long-range dependencies and facilitates parallel computation, a crucial factor in dramatically reducing training time. The Transformer achieves a new state-of-the-art BLEU score of 28.

---

# Multiple Documents

In [3]:
all_files_text = []

for fpath in glob.glob("../../sample_inputs/*.pdf"):
    with pymupdf.open(fpath) as file:
        file_text = [page.get_text() for page in file]
        all_files_text.append(file_text)

all_files_text

[['Deep Reinforcement Learning that Matters\nPeter Henderson1∗, Riashat Islam1,2∗, Philip Bachman2\nJoelle Pineau1, Doina Precup1, David Meger1\n1 McGill University, Montreal, Canada\n2 Microsoft Maluuba, Montreal, Canada\n{peter.henderson,riashat.islam}@mail.mcgill.ca, phbachma@microsoft.com\n{jpineau,dprecup}@cs.mcgill.ca, dmeger@cim.mcgill.ca\nAbstract\nIn recent years, signiﬁcant progress has been made in solving\nchallenging problems across various domains using deep re-\ninforcement learning (RL). Reproducing existing work and\naccurately judging the improvements offered by novel meth-\nods is vital to sustaining this progress. Unfortunately, repro-\nducing results for state-of-the-art deep RL methods is seldom\nstraightforward. In particular, non-determinism in standard\nbenchmark environments, combined with variance intrinsic\nto the methods, can make reported results tough to interpret.\nWithout signiﬁcance metrics and tighter standardization of\nexperimental reporting, it is 

In [4]:
pull("gemma3:4b")

ProgressResponse(status='success', completed=None, total=None, digest=None)

In [11]:
file_summaries = []

for idx, doc in enumerate(all_files_text):
    chunked_pages = []
    response = {"response": ""}

    for i in range(math.ceil(len(doc) // 5)):
        start_idx, end_idx = i*5, (i+1)*5
        chunk = doc[start_idx:end_idx]
        chunked_pages.append("\n".join([page for page in chunk]))

    for chunk_idx, chunk in enumerate(chunked_pages):
        print(f"Now processing chunk {chunk_idx+1}/{len(chunked_pages)} of file {idx+1}/{len(all_files_text)}.")

        per_page_summary_prompt = f"You are an assistant that is tasked with summarizing a set of documents that are given to you. The documents will be given in chunks, and you will be given the current summary. Write a summary using the information provided. Do not reference the summary itself in your response. Do not use bullet points or any other formatting. The summary should be at least 500 words long. The text is as follows: {chunk}. The current summary is as follows: {response['response']}"

        response = generate(
            model="gemma3:4b",
            prompt=per_page_summary_prompt,
            options={
                "num_ctx": 16384
            }
        )

    file_summaries.append(response["response"])

Now processing chunk 1/5 of file 1/2.
Now processing chunk 2/5 of file 1/2.
Now processing chunk 3/5 of file 1/2.
Now processing chunk 4/5 of file 1/2.
Now processing chunk 5/5 of file 1/2.
Now processing chunk 1/3 of file 2/2.
Now processing chunk 2/3 of file 2/2.
Now processing chunk 3/3 of file 2/2.


In [27]:
print(file_summaries[0])

Deep reinforcement learning has seen tremendous advancements recently, yet a persistent challenge lies in the often-disparate and inconsistent results reported across different experiments and implementations. Henderson et al.’s insightful paper, “Deep Reinforcement Learning that Matters – A Summary,” delivers a critical examination of this issue, arguing that a lack of standardized methodologies and meticulous reporting has significantly inflated performance claims within the DRL community. The core of the paper’s argument rests on the systematic analysis of benchmark environments – HalfCheetah-v1 and Hopper-v1 – commonly used in continuous control tasks, highlighting the sensitivity of algorithms like TRPO, DDPG, and PPO to subtle variations in network architecture and activation functions. 

The paper’s strength lies in its deliberate control over experimental parameters. Henderson et al. recognized that even seemingly minor adjustments, such as the number of layers, neurons, or act

In [40]:
all_file_summaries = '\n'.join(file_summaries)
final_summary_prompt = f"You are an assistant that is trying to summarize {len(file_summaries)} texts. Combine these texts into one overall summary. Do not use bullet points or any other formatting. The summary should be at least 500 words long. Here are the summaries: {all_file_summaries}"

final_summary_generation = generate(
    model="gemma3:4b",
    prompt=final_summary_prompt,
    options={
        "num_ctx": 8192
    }
)
final_summary = final_summary_generation["response"]

In [41]:
print(final_summary)

The burgeoning field of deep reinforcement learning (DRL) is currently grappling with a significant challenge: the inconsistent and often inflated reporting of performance results. Recent advancements in DRL, while impressive, are frequently hampered by a lack of standardized methodologies and meticulous reporting, a problem highlighted in Henderson et al.’s insightful paper, “Deep Reinforcement Learning that Matters – A Summary.” This paper meticulously analyzes benchmark environments – specifically HalfCheetah-v1 and Hopper-v1 – commonly used in continuous control tasks, demonstrating the remarkable sensitivity of algorithms like TRPO, DDPG, and PPO to even subtle variations in network architecture, activation functions (such as tanh, ReLU, and leaky ReLU), and experimental parameters. The authors’ strategic control over these variables, coupled with the consistent reporting of standard deviations alongside average returns, represents a critical shift toward a more realistic assessme