Add API for search functionality #231

rvanlaak · 2013-10-29T09:07:53Z

The result that pdf2htmlEX outputs is great, and is very suitable to replace Acrobat Reader. One of the features that makes Acrobat favorable above the browser output, is the ability to search in the document.

Feature request: add an search-API in the library, so it is possible to perform text-searches in the document.

Features of the API could be:

search (iterate through results / search direction)
search & replace
case (in)sensitive search
regular expression search
mark a selection
search in PDF bookmarks
add bookmarks to search results

When this API works, a next step could be to implement an GUI that makes use of this API. I will make another issue for that.

The text was updated successfully, but these errors were encountered:

coolwanglu · 2013-10-29T09:28:31Z

replace is not possible, at least for now. I don't think it's event supported by PDF readers. I also doubt for add bookmarks

search in PDF bookmarks also sounds like a rare use case to me.

I'm not sure if innerText or :contains is enough for these features: see http://stackoverflow.com/questions/12445020/javascript-window-find-doesnt-work-absolutely

But indeed there is a problem when lazy loading is enabled: pages are not loaded until viewed, so we need to load them before searching for any text.

iapain · 2013-12-26T23:51:26Z

Possible solution would be either searching text nodes in DOM and highlight them or generate inverted index to use in search (using https://github.com/fagbokforlaget/pdfiijs or pdftotext and feed it into indexing system).

rvanlaak · 2013-12-29T13:20:55Z

@iapain the library you're proposing sounds great, certainly since I've got both a PDF-file and a pdftotext-output. Does the snowball-js support the following use-case?

My use-case is that I've got fragments from the pdftotext, that I would like to show/mark in the original PDF with its original markup. It would be awesome if I can use pdf2htmlEX in order to preserve the markup from the PDF.

rvanlaak · 2014-11-25T14:15:13Z

I've been digging through the changelog / release notes / blogspot posts, and found out it is possible to search the output, and compare the html like diffs.

Can you elaborate a bit more on those features, because I could not find any documentation about that.

rvanlaak mentioned this issue Oct 29, 2013

Add GUI to use search API #232

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add API for search functionality #231

Add API for search functionality #231

rvanlaak commented Oct 29, 2013

coolwanglu commented Oct 29, 2013

iapain commented Dec 26, 2013

rvanlaak commented Dec 29, 2013

rvanlaak commented Nov 25, 2014

Add API for search functionality #231

Add API for search functionality #231

Comments

rvanlaak commented Oct 29, 2013

coolwanglu commented Oct 29, 2013

iapain commented Dec 26, 2013

rvanlaak commented Dec 29, 2013

rvanlaak commented Nov 25, 2014