This tutorial demonstrates how to utilize the `grobidmonkey` package to read GROBID-processed papers.

In [24]:
# import grobid reader
from grobidmonkey import reader

# select one of the three methods: monkey, lxml, x2d
monkeyReader = reader.MonkeyReader('monkey') # or 'lxml' or 'x2d'

# read paper outline
outline = monkeyReader.readOutline('resources/2308.13067.pdf.tei.xml', True)

Article
├── 1 Introduction
├── 2 Informal Summary of the Main Idea of the Paper
├── 3 Formalizing "Correlations of Causal Facts"
├── 4 Testing for Causal Knowledge in Large Language Models
│   ├── 4.1 Methods and Results for "Common Sense" Inference Tasks
│   ├── 4.2 Methods and Results for Causal Discovery on Ground Truth
│   └── 4.3 Method and Results for Knowledge Base Fact Embeddings
├── 5 Related Work
└── 6 Conclusive Discussion


The second argument allows you to print the outline while reading, you can also try:

In [4]:
outline = monkeyReader.readOutline('resources/2308.13067.pdf.tei.xml')

# outline is an anytree object, to print it run
for pre, fill, node in outline:
    print("%s%s" % (pre, node.name))

Article
├── 1 Introduction
├── 2 Informal Summary of the Main Idea of the Paper
├── 3 Formalizing "Correlations of Causal Facts"
├── 4 Testing for Causal Knowledge in Large Language Models
│   ├── 4.1 Methods and Results for "Common Sense" Inference Tasks
│   ├── 4.2 Methods and Results for Causal Discovery on Ground Truth
│   └── 4.3 Method and Results for Knowledge Base Fact Embeddings
├── 5 Related Work
└── 6 Conclusive Discussion


The grobidmonkey reader is also capable of reading the entire essay as a dictionary, where each key represents section titles and the corresponding values are lists of section contents in paragraphs.

In [26]:
essay = monkeyReader.readEssay('/home/com3dian/Downloads/2308.13067.pdf.tei.xml')

for key, value in essay.items():
    print(key)
    for paragraph in value:
        print(' * ' + paragraph[:20] + '...')
    print('-----')

Abstract
 * Some argue scale is ...
-----
Introduction
 * Speaking of causalit...
 * The following block ...
 * With the rise of lar...
 * It is clear how reso...
 * Therefore, in this p...
 * We identify the key ...
 * For reproduction pur...
-----
Informal Summary of the Main Idea of the Paper
 * LLMs are transformer...
 * illustrate this idea...
 * While the philosophi...
-----
Formalizing "Correlations of Causal Facts"
 * "Correlation does no...
 * Or does it? In this ...
 * Namely, what happens...
 * We start by providin...
 * Definition 1. A simp...
 * where I, J are disjo...
 * The graph of a simpl...
 * Example 1 ('Classica...
 * 4 Or rather, it is a...
 * 5 SCMs that allow fo...
 * 7 For examples later...
 * 8 See Def.F.1 of (Bo...
 * X, Y, Z, we can appr...
 * M 1 : =({X, Y, Z}, 3...
 * The structural equat...
 * 2 : =({X, Y }, 3, R ...
 * This second SCM M 2 ...
 * Example 1 serves to ...
 * Insight 1. Let M be ...
 * Returning to Example...
 * f ′ 1 , f ′ 2 , f ′ ...
 * all