GitHub - CV-Gate/search_for_text_into_pdfs: Upload and index text from PDF documents. Rails 3 app example.

Search for text into PDF documents. App example.¶ ↑

This application is a simple example about how a text search can be done into pdf documents. It works with Sphinx and the pdf-reader gem.

Getting Started¶ ↑

Install Sphinx
Into the app configure the database connection (Sphinx only will work with MySQL or PostgreSQL)
Execute rails s and upload some PDFs
Run rake ts:index and rake ts:start
Run whenever --update-crontab pdf_index to start the cron job that reindex the records
You can also configure the cron job in Rails, now it’s working each minute for testing purposes

Limitations¶ ↑

The app stores texts into DB. The limit for MySQL is 4294967295 characters, so biggest PDFs will trim while storing. It’s also possible that the DB server will throw a time-out.

Todo¶ ↑

Validate texts on size (perhaps too expensive)
Write some tests
Some refactor

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
app		app
config		config
db		db
doc		doc
lib		lib
public		public
script		script
test		test
vendor		vendor
.DS_Store		.DS_Store
.gitignore		.gitignore
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.rdoc		README.rdoc
Rakefile		Rakefile
config.ru		config.ru

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Search for text into PDF documents. App example.¶ ↑

Getting Started¶ ↑

Limitations¶ ↑

Todo¶ ↑

About

Releases

Packages

Languages

CV-Gate/search_for_text_into_pdfs

Folders and files

Latest commit

History

Repository files navigation

Search for text into PDF documents. App example.¶ ↑

Getting Started¶ ↑

Limitations¶ ↑

Todo¶ ↑

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages