Skip to content
This repository has been archived by the owner on Dec 14, 2021. It is now read-only.

Pdf text extraction #11

Closed
hasantayyar opened this issue Jun 10, 2014 · 1 comment
Closed

Pdf text extraction #11

hasantayyar opened this issue Jun 10, 2014 · 1 comment

Comments

@hasantayyar
Copy link

Extract full text of a pdf for search engine.

@hmert
Copy link
Member

hmert commented Jun 11, 2014

I'd try this tools:
http://sourceforge.net/projects/pdf2xml/
http://www.foolabs.com/xpdf/home.html
http://sourceforge.net/projects/pdfreflow/

most of them is cli tools so do we need a worker for submiting full contents to search engine?

hmert added a commit that referenced this issue Jul 11, 2014
@hmert hmert added this to the 1.0 milestone Aug 23, 2014
hmert added a commit that referenced this issue Nov 10, 2014
@hmert hmert modified the milestones: 2.0, 1.0 Nov 16, 2014
@hmert hmert closed this as completed Jan 13, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants