-
Couldn't load subscription status.
- Fork 220
Open
Description
I followed readme and tried:
$ python3 run_pageindex.py --pdf_path /path/to/your/document.pdfon my document, however, run_pageindex.py doesn't return the text of the content, only summaries:
{
'doc_name': 'referee_guidelines_arm.pdf',
'structure': [{'title': 'Referee Guidelines – Football Tournament',
'start_index': 1,
'end_index': 1,
'nodes': [{'title': 'Match Duration',
'start_index': 1,
'end_index': 1,
'node_id': '0001',
'summary': 'The partial document outlines referee guidelines for the Football Tournament. It covers match duration (20 minutes with a running clock), procedures for starting the match (coin flip and kick-off), restarts after goals, and rules for throw-ins/outs (played from the ground, no direct goals). Discipline rules include no yellow cards, temporary time-outs for unsporting behavior, and expulsion for serious misconduct. In case of a draw, a penalty shootout is conducted. The document emphasizes fair play, safety, respect, and the finality of referee decisions.'},]
#...
}
Also both start_index and end_index = 1 is kinda wrong, isn't it?
Metadata
Metadata
Assignees
Labels
No labels