-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds Dockerfile #46
adds Dockerfile #46
Conversation
This looks great, @ShawnMilo. Since I'm not as familiar with docker I've got a knucklehead question so please bear with me: can we reuse the provisioning scripts for the Vagrant VM instead of having to maintain another set of installation scripts? Specifically I'm looking at the With the increasing number of command line tools (which I think of as a really good thing!), another approach might be to start to build and release a PPA that can be installed on any ubuntu system with Thoughts? |
Thanks, I'm glad you like it. Yes, it's completely possible to use the existing scripts. However, there is a trade-off. The run.sh script does two things:
Currently I have it set up so as much as possible is done in the Dockerfile so it needs to be done only once. If it's preferable to take the time and bandwidth to install the dependencies in "provision" each time, then we can move the apt-get and pip installs into the run_tests.sh script and it'll happen upon the "docker run" instead of "docker build." |
I poked at it a bit and it looks like it's possible to have the best of both worlds, using the ADD Dockerfile directive. However, it will require moving files around because during the build the Dockerfile can only ADD things in the current working directory. Is there any problem with moving the files in /requirements into /provision (or vice versa)? Also, the Dockerfile and accompanying shell scripts will have to be moved into the same folder. |
Perfect! There isn't a problem moving everything into the same directory as far as I'm concerned. |
Excellent. I'll push it up when it's working. I can see from the comment in python.sh that this stuff has to work from Travis as well, so I'll try not to mess that up. I may ask for help if my initial push breaks the build. |
Sounds good. I'm busy the rest of the day but will definitely get back to you ASAP |
* Consolidates requirements, provision, and docker directories.
FYI: Ignore Travis errors for now. I'm aware of them and working on trying to fix the requirements scripts so they work in both Travis and the Docker container. |
An attempt to make it work both with Travis and Docker.
Okay, it works in Travis and Docker now. The diff is going to look big because files were moved, but no big changes were made. Incidentally, I've temporarily changed the Dockerfile to build an Ubuntu 12.04 image instead of an Ubuntu 14.04 image because in 12.04 the tests all pass. So it's definitely a version issue with the third-party dependencies (or maybe just tesseract-ocr). |
Otherwise it piles up multiple containers you probably don't need.
Update: Now removes the container (but not the image) after running the tests. Also, on my machine it takes over seven minutes to do the initial image build, but under four seconds to run subsequently. That means you can run the tests in about four seconds every time you change code, without setting up the external and Python dependencies on your development machine (or with different versions of those dependencies on your machine). You won't have to do the "seven minute build" again unless the requirements files change or the version of Ubuntu is changed in the Dockerfile. |
Thanks for putting this together; this will give people yet another way to develop and run the textract tests. If you don't mind, here are a few minor things to fix and we can get this all merged in:
|
Okay. I'll get on those other comments and let you know when I think it's ready to look at. Ignore pushes before I tell you -- I may push something to make sure it passes Travis. |
Sounds good. I'm headed out of town on vacation for a long weekend so I
|
Brief explanation of Docker images vs. Containers This is background for section below on combining A Docker image is like a template. It is based on a specific Linux distribution and may have packages installed, configurations set, etc. It is never "run" itself -- it is used as a base for containers. A Docker container is a running instance of an LXC container -- the project upon which Docker is based. It must be based on an image. That can be (for example) a vanilla Ubuntu image provided by Canonical, or a modified Ubuntu image based on the original, modified by a member of the community, and contributed back. Two ways of acquiring an image is are to get the image directly from Docker (which houses and serves them in a repository), or building one automatically using a Dockerfile (which is similar in concept to a Makefile). I think using a Dockerfile with a vanilla image is the way to go, because it dynamically builds the image, thus getting the current versions of dependencies. Also, adding or removing dependencies only requires modifying the dependency files If we built a custom image and stored it in Docker's repository, we'd have to replace that image when A note on combining The mix of the requirements and provisioning isn't strictly necessary, if we move the Dockerfile to the root of the project. Here's why:
|
@ShawnMilo this is terrific! Thanks for all the effort here and for explaining all of this stuff. I'm looking forward to playing around with Docker a bit and this looks like a great place to start for me. I'm merging this in now and it will be a part of the 1.0.0 release. So if I understand your last point about the location of the Dockerfile, it sounds like we can put that in the root of the repository and keep the split between I think I'm also going to move the documentation from |
You're welcome! Yes, you're correct about being able to move the Dockerfile and put back the directories. I was thinking about it more last night and it's also possible to have the Dockerfile in a subdirectory and have the bash script copy it to the repo root and run it from there, then delete it. It's a hack, but if you value the cleanliness of the root it's an option. Since you said you're interested in learning more about Docker, here's something I put together yesterday to teach myself. It's a very simple primer on having client and server apps in separate containers: |
I just finished moving the Dockerfile into the project root and moving a few other things around in 5985e10. When you get a chance, want to try pulling the master branch and make sure everything works with Docker still? In particular, I moved the docker test scripts into the ./tests directory (and renamed |
The commit breaks all the Docker stuff -- as expected because paths have to be changed. That's not a big deal. However, other problems were introduced:
It looks like this branch was somehow polluted with a different branch. Did you have merge conflicts? The documentation not only lost some of my stuff, but also had different (vagrant-related) things added, so it's not a simple overwrite with an earlier version. |
Oof...sorry this breaks the Docker stuff :( Bad form on my part! Would you mind throwing together a pull request that fixes the paths in Docker? Regarding the other problems:
I reshuffled the documentation. I want the
Sorry about that. The Vagrant provisioning scripts need to use absolute paths in lots of places, not relative paths. Things weren't provisioning properly in Vagrant. Maybe we can work together to find a good resolution on this? I'd be up for a google hangout to pair program if that would be helpful. |
No problem about the absolute paths for Vagrant -- I can fix them to take care of that. How can I test the Vagrant setup, or are you able to do that if I push a fix? No problem about fixing all the Docker paths -- that was never a big deal. As for the missing docs, the Docker stuff seems to have been dropped from this file: |
I can test the Vagrant setup if you send a PR. Again, sorry for the hastle. I moved the docs into the primary documentation here, which is in the |
This adds a Dockerfile so an environment can be built and tests run with one command. This should make troubleshooting easier, because users and developers can have a shared, known environment for comparing test failures.
Using a Dockerfile means that (with documentation and supporting scripts), only about a kilobyte is added to the repository for all this goodness.
Running tests with the Dockerfile means that textract is completely rebuilt fresh directly from the repository each time. It does not require that any dependencies (such as tesseract, pip, antiword, etc.) be installed, nor any special Python modules.