# Testing Notebook for the app different parts.

## 1. Pdf Extractor and test

IN this part we created a function that host the pdf extractor part. we tested the function with unitest and after checking it, we created the full function within the PdfExtractor.py and within the test.py file we introduced the test.

In [1]:
from langchain_community.document_loaders import PyPDFLoader

In [2]:
from langchain_community.document_loaders import PyPDFLoader
extractor = PyPDFLoader(file_path="2501.00663v1.pdf")
docs = extractor.lazy_load()
docs_list = [doc for doc in docs]

In [3]:
for doc in docs_list[:2]:
    print(type(doc))


<class 'langchain_core.documents.base.Document'>
<class 'langchain_core.documents.base.Document'>


In [4]:
from typing import List
from langchain_core.documents.base import Document
def pdf_extractor(path_pdf: str, extractor: PyPDFLoader)-> List[Document]:
    loader = extractor(file_path= path_pdf)
    docs = loader.load()
    return docs
    


In [5]:
docs = pdf_extractor(path_pdf="2501.00663v1.pdf", extractor=PyPDFLoader)
for doc in docs[:1]:
    print(doc)


page_content='Titans: Learning to Memorize at Test Time
Ali Behrouz
†
, Peilin Zhong
†
, and Vahab Mirrokni
†
†
Google Research
{alibehrouz, peilinz, mirrokni}@google.com
Abstract
Over more than a decade there has been an extensive research effort of how effectively utilize recurrent models and
attentions. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows
attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling
of dependencies, however, comes with a quadratic cost, limiting the model to a fixed-length context. We present a new
neural long-term memory module that learns to memorize historical context and helps an attention to attend to the
current context while utilizing long past information. We show that this neural memory has the advantage of a fast
parallelizable training while maintaining a fast inference. From a memory perspective, we argue that attention due 

In [6]:
import unittest
from unittest.mock import MagicMock

class TestPDFExtractor(unittest.TestCase):
    def test_pdf_extractor(self):
        # Arrange
        mock_path = "sample.pdf"
        mock_content = "This is a test document."
        
        # Create a mock PyPDFLoader
        mock_loader = MagicMock(spec=PyPDFLoader)
        mock_loader_instance = MagicMock()
        mock_loader.return_value = mock_loader_instance
        
        # Mock the loader's load method
        mock_doc = Document(page_content=mock_content)
        mock_loader_instance.load.return_value = [mock_doc]
        
        # Act
        result = pdf_extractor(path_pdf=mock_path, extractor=mock_loader)
        
        # Assert
        self.assertIsInstance(result, list, "Result should be a list.")
        self.assertGreater(len(result), 0, "Result list should not be empty.")
        self.assertIsInstance(result[0], Document, "Result should contain Document objects.")
        self.assertEqual(result[0].page_content, mock_content, "Document content should match expected content.")

In [7]:
unittest.TextTestRunner().run(unittest.TestLoader().loadTestsFromTestCase(TestPDFExtractor))

.
----------------------------------------------------------------------
Ran 1 test in 0.003s

OK


<unittest.runner.TextTestResult run=1 errors=0 failures=0>

## LLM model and test for it

In [2]:
import getpass
import os


def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")


_set_env("OPENAI_API_KEY")

In [3]:
from src.LlmModel import State, process_pdf, extract_information
from src.PydanticSchema import BigQueryEntry

In [4]:
pdf_path ="2501.00663v1.pdf"

In [5]:
initial_state = State(
    pdf_text = "",
    extracted_info = None,
    error=None)

In [6]:
text_state = process_pdf(state=initial_state, pdf_path= pdf_path)

In [7]:
extracted = extract_information(state=text_state)

In [8]:
extracted["extracted_info"]

{'document_id': 'arXiv_2501.00663v1',
 'title': 'Titans: Learning to Memorize at Test Time',
 'publication_date': '2024-12-31',
 'authors': ['Ali Behrouz', 'Peilin Zhong', 'Vahab Mirrokni'],
 'Key_words': ['Titans',
  'neural memory',
  'long-term memory',
  'attention',
  'language modeling',
  'sequence modeling',
  'test-time learning'],
 'key_points': ['Introduces a new neural long-term memory module that learns to memorize historical context and supports attention mechanisms.',
  'Proposes Titans, a family of architectures combining short-term attention and long-term neural memory.',
  "Demonstrates Titans' scalability to context windows larger than 2 million tokens.",
  'Experimental results show Titans outperform Transformers and modern linear recurrent models in language modeling, reasoning, genomics, and time series tasks.',
  'Highlights the importance of interconnected memory modules inspired by human memory systems.'],
 'summary': 'The paper presents Titans, a novel family 

## 3. Graph model and test for it

In [1]:
import getpass
import os


def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")


_set_env("OPENAI_API_KEY")

OPENAI_API_KEY:  ········


In [2]:
from src.GraphModel import workflow_run, create_extraction_pdf_graph
from src.LlmModel import process_pdf, extract_information, State

In [3]:
pdf_path ="2501.00663v1.pdf"

In [4]:
workflow_run(pdf_path=pdf_path)

[36;1m[1;3m[-1:checkpoint][0m [1mState at the end of step -1:
[0m{}
[36;1m[1;3m[0:tasks][0m [1mStarting 1 task for step 0:
[0m- [32;1m[1;3mprocess_pdf[0m -> {'pdf_path': '2501.00663v1.pdf',
 'state': {'error': None, 'extracted_info': None, 'pdf_text': ''}}
[36;1m[1;3m[0:writes][0m [1mFinished step 0 with writes to 1 channel:
[0m- [33;1m[1;3mprocess_pdf[0m -> {'state': {'error': None,
           'extracted_info': None,
           'pdf_text': 'Titans: Learning to Memorize at Test Time\n'
                       'Ali Behrouz\n'
                       '†\n'
                       ', Peilin Zhong\n'
                       '†\n'
                       ', and Vahab Mirrokni\n'
                       '†\n'
                       '†\n'
                       'Google Research\n'
                       '{alibehrouz, peilinz, mirrokni}@google.com\n'
                       'Abstract\n'
                       'Over more than a decade there has been an extensive '
                

{'status': 'success',
 'extracted_info': {'document_id': 'arXiv_2501.00663v1',
  'title': 'Titans: Learning to Memorize at Test Time',
  'publication_date': '2024-12-31',
  'authors': ['Ali Behrouz', 'Peilin Zhong', 'Vahab Mirrokni'],
  'Key_words': ['Titans',
   'neural memory',
   'long-term memory',
   'attention',
   'recurrent models',
   'language modeling',
   'test-time learning'],
  'key_points': ['Introduces a new neural long-term memory module that learns to memorize historical context.',
   'Proposes Titans, a family of architectures combining short-term attention and long-term memory.',
   "Demonstrates Titans' scalability to context windows larger than 2M tokens.",
   'Shows Titans outperform Transformers and modern linear recurrent models in various tasks.',
   'Presents three variants of Titans: Memory as a Context (MAC), Memory as Gating (MAG), and Memory as a Layer (MAL).'],
  'summary': 'The paper introduces Titans, a novel family of architectures that integrate a ne