Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how can i do multiple pdf extraction processes concurrently? #53

Closed
quyen opened this issue Sep 13, 2012 · 5 comments
Closed

how can i do multiple pdf extraction processes concurrently? #53

quyen opened this issue Sep 13, 2012 · 5 comments

Comments

@quyen
Copy link

quyen commented Sep 13, 2012

I'd like to be able to extract pdf concurently, but it is not possible with docsplit gem
I tried to extract 2 ppt files to pdf, the gem fails to process.
The code is as below, please replace path_to_docsplit.rb, path_to_test_file1.ppt, path_to_test_file2.ppt

Im looking forward to your answer.
Thank you,
Quyen

!/usr/bin/ruby

require 'path_to_docsplit.rb'

def extraction(path_to_file)
Docsplit.extract_pdf(path_to_file)
end

puts('start extraction')
t1=Thread.new{extraction('path_to_test_file1.ppt')}
t2=Thread.new{extraction('path_to_test_file2.ppt')}
t1.join
t2.join
puts('end extraction')

@Natim
Copy link

Natim commented Oct 23, 2012

We are using a redis queue with circus to lauch X workers. And it works fine.

http://redis.io/
http://circus.readthedocs.org/

@knowtheory
Copy link
Member

Hey @quyen can you be a little more specific about what errors you're encountering?

DocumentCloud uses Docsplit in a manner similar to what @Natim outlines.

@avlakin
Copy link

avlakin commented Nov 12, 2012

@knowtheory & @Natim - I'm trying to do the same thing as Quyen, but having some trouble figuring out Circus..

Would you guys happen to know of any tutorial covering the set-up for using Circus to run multiple processes?

Thanks in advance.

@Natim
Copy link

Natim commented Nov 12, 2012

@knowtheory
Copy link
Member

Just for some additional details, DocumentCloud uses CloudCrowd for distributed queuing of jobs which use DocSplit. You can check out the actions we've written, and in particular note the document_import action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants