Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Upload and index text from PDF documents. Rails 3 app example.
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
app
config
db
doc
lib
public
script
test
vendor
.DS_Store
.gitignore
Gemfile
Gemfile.lock
README.rdoc
Rakefile
config.ru

README.rdoc

Search for text into PDF documents. App example.

This application is a simple example about how a text search can be done into pdf documents. It works with Sphinx and the pdf-reader gem.

Getting Started

  1. Install Sphinx

  2. Into the app configure the database connection (Sphinx only will work with MySQL or PostgreSQL)

  3. Execute rails s and upload some PDFs

  4. Run rake ts:index and rake ts:start

  5. Run whenever --update-crontab pdf_index to start the cron job that reindex the records

  6. You can also configure the cron job in Rails, now it's working each minute for testing purposes

Limitations

The app stores texts into DB. The limit for MySQL is 4294967295 characters, so biggest PDFs will trim while storing. It's also possible that the DB server will throw a time-out.

Todo

  • Validate texts on size (perhaps too expensive)

  • Write some tests

  • Some refactor

Something went wrong with that request. Please try again.