run_pageindex.py doesn't return the text of the content and has wrong indices

I followed readme and tried:

```sh
$ python3 run_pageindex.py --pdf_path /path/to/your/document.pdf
```

on my document, however, `run_pageindex.py` doesn't return the text of the content, only summaries:
```
{
'doc_name': 'referee_guidelines_arm.pdf',
 'structure': [{'title': 'Referee Guidelines – Football Tournament',
   'start_index': 1,
   'end_index': 1,
   'nodes': [{'title': 'Match Duration',
     'start_index': 1,
     'end_index': 1,
     'node_id': '0001',
     'summary': 'The partial document outlines referee guidelines for the Football Tournament. It covers match duration (20 minutes with a running clock), procedures for starting the match (coin flip and kick-off), restarts after goals, and rules for throw-ins/outs (played from the ground, no direct goals). Discipline rules include no yellow cards, temporary time-outs for unsporting behavior, and expulsion for serious misconduct. In case of a draw, a penalty shootout is conducted. The document emphasizes fair play, safety, respect, and the finality of referee decisions.'},]
#...
}
```

Also both start_index and end_index = 1 is kinda wrong, isn't it?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

run_pageindex.py doesn't return the text of the content and has wrong indices #42

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

run_pageindex.py doesn't return the text of the content and has wrong indices #42

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions