PDF text extraction is missing pages #164

1jamesthompson1 · 2024-05-23T07:52:33Z

Problem

Currently in the PDFParser the PDFs are parsed into text. There is a problem where some of the pages are missed out.

This affectes sections extraction for #146, for two reasons:

Some of the sections are missed out as they are not in the txt file
The section that is before the missing page will capture until the next higher section as it cant find the end of its own section (becuase it finds the end of its section by trying to find the satrt of the next section).

The text was updated successfully, but these errors were encountered:

1jamesthompson1 added bug Something isn't working Engine labels May 23, 2024

1jamesthompson1 changed the title ~~PDF text extraction is missing pagesf~~ PDF text extraction is missing pages May 23, 2024