Skip to content

CV-Gate/search_for_text_into_pdfs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

<img src=“https://codeclimate.com/github/CV-Gate/search_for_text_into_pdfs.png” />

Search for text into PDF documents. App example.

This application is a simple example about how a text search can be done into pdf documents. It works with Sphinx and the pdf-reader gem.

Getting Started

  1. Install Sphinx

  2. Into the app configure the database connection (Sphinx only will work with MySQL or PostgreSQL)

  3. Execute rails s and upload some PDFs

  4. Run rake ts:index and rake ts:start

  5. Run whenever --update-crontab pdf_index to start the cron job that reindex the records

  6. You can also configure the cron job in Rails, now it’s working each minute for testing purposes

Limitations

The app stores texts into DB. The limit for MySQL is 4294967295 characters, so biggest PDFs will trim while storing. It’s also possible that the DB server will throw a time-out.

Todo

  • Validate texts on size (perhaps too expensive)

  • Write some tests

  • Some refactor

About

Upload and index text from PDF documents. Rails 3 app example.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published