Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerized pipeline #53

Merged
merged 1 commit into from Oct 13, 2021
Merged

Dockerized pipeline #53

merged 1 commit into from Oct 13, 2021

Conversation

rodneykinney
Copy link
Member

@rodneykinney rodneykinney commented Oct 7, 2021

Pattern for wrapping services in a lighter-weight containers.

  • Replace full sub-projects in the services directory with a Dockerfile plus a xxx_api.py file for each service.
  • Add a docker-compose file that build the services plus a python container for running a REPL or scripts
  • Add pipeline.py sample end-to-end pipeline script using the services
  • Add run-pipeline.sh file to run pipeline

@rauthur

context: ..
dockerfile: docker/parsers/symbolscraper/Dockerfile

# layoutparser:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't added these yet. If we agree on this direction, I'll complete it.

@rodneykinney
Copy link
Member Author

If we do this, then we would move the current services directory code into allenai/spp. Currently, that repo checks out this repo as part of its build.

@yoganandc

extras_require={"dev": ["pytest"]},
install_requires=["intervaltree", "tqdm", "pdf2image", "pdfplumber"],
extras_require={
"dev": ["pytest"],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split up the dependencies for the different predictors so you can install only what's needed.

@kyleclo
Copy link
Collaborator

kyleclo commented Oct 8, 2021

@rodneykinney as a part of this, do you mind adding a crude README that walks someone through usage of what youv'e set up here? i.e. assuming I've cloned the repo, what are the cmds I should execute until I can see a PDF processed?

@rodneykinney
Copy link
Member Author

Added pipeline/README.md with basic instructions.

@rodneykinney
Copy link
Member Author

Ready to merge from my point of view. Any other comments?

Copy link
Collaborator

@kyleclo kyleclo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for readme; lgtm!

@rodneykinney rodneykinney merged commit e41e585 into main Oct 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants