Skip to content
This repository has been archived by the owner on Feb 4, 2018. It is now read-only.

Remove Dockerfile option #15

Closed
freeman-lab opened this issue Aug 28, 2015 · 12 comments
Closed

Remove Dockerfile option #15

freeman-lab opened this issue Aug 28, 2015 · 12 comments

Comments

@freeman-lab
Copy link
Member

As discussed with @rgbkrk in a Jupyter dev meeting, for security reasons we should probably remove the ability to specify a binder with a custom Dockerfile, which we currently support so long as the Dockerfile builds on top of our base image. Although it provides an incredibly flexible deployment model, there are too many potential pitfalls with the freedom it provides. The question is, can we satisfy our various use cases without it?

We currently support requirements.txt and conda environment.yml, which together should cover all Python-related builds. We can also add support for other kernels (e.g. R and Julia), and we can add the appropriate package dependency lists for those languages. For Julia, there appears to be a convention of specifying dependencies in a REQUIRE file. Less clear what the appropriate convention should be for R. Comments from R or Julia devs would be welcome on this point!

Is this enough, or can folks suggest use cases for Binder where the Dockerfile is a must have?

Also, to be clear, we will still be using Docker under the hood to build the underlying images! This is just a question of how we expose the configuration options to users.

cc @andrewosh @arokem

@arokem
Copy link

arokem commented Aug 28, 2015

OK - let me think through one realistic use-case to see if it does. For example, what if I have a notebook doing MRI analysis, that has a few cells of bash calling out to some other software to process the data? Say mrtrix (https://github.com/MRtrix3/mrtrix3/wiki), a C++ library with quite a few dependencies (I think)? This is easily handled through docker (https://hub.docker.com/r/arokem/mrtrix/~/dockerfile/). Would it be similarly handled through conda (asking out of ignorance)? I am not saying that it should be covered here (maybe it's not a must-have?), but it is a realistic use-case (and there are many similar ones, I'd think), so if it is covered, that's fantastic.

@freeman-lab
Copy link
Member Author

Thanks for the example @arokem ! I think that at least for now an environment.yml only supports conda dependencies and optionally pip dependencies, so not clear how this would be supported, unless of course the original authors made the library conda or pip installable =)

It's worth noting that in this example you only used RUN commands. At least some of the security-related issues from running people's arbitrary Dockerfiles involves USER calls. So some clever way of sanitizing or screening Dockerfiles to support this might not be out of the question.

@rgbkrk
Copy link
Member

rgbkrk commented Aug 29, 2015

Ah, but those RUN calls require root - they're apt-get installing packages.

You could include apt-get, but then you'll want a whitelist since apt can do very bad things and not every package is reliably secure in how it doles out permissions or handles security in general. If you went this route, I'd use the Travis whitelist so you don't have to be the ones playing whack-a-mole and their stuff is hopefully being audited.

As for easily handling it out of conda, it's a matter of creating your own conda build package or kindly asking @ContinuumIO to put a package together for it. I'm of the opinion that conda is a solid solution for non-root binary installs, I just wish there was more transparency into how they're built as well as the automation behind them.

If we sanitize the Dockerfile, are we really creating something reproducible after it's launched on binder? What if binder is gone? How does someone rectify the difference between what's in the Dockerfile and what binder actually puts together?

Would it be crazy for use to use some subselection of Travis configuration (for the ones that run on containers)? before_install, script, before_script, addons.apt.packages, etc.

@freeman-lab
Copy link
Member Author

Just to follow-up after a chat with @rgbkrk and @andrewosh ... our working plan is to continue to maintain support for Dockerfiles for now, but lock down inter-container communication issues (e.g. #14 ) , and keep exploring vulnerabilities. At the same time, we'll try to add support for straight-up travis.yml files. The advantages of this as an option:

  • it's an existing, well-documented schema that already handles custom dependencies separate from but integrated with Docker
  • many people have one in their repo already!
  • if they don't, it might encourage them to use continuous testing (which is good!)

A guiding philosophy is that whenever possible, we want to let people use a spec file already in their repo, rather than force them to write a new one.

@freeman-lab
Copy link
Member Author

Small update to this plan: in various places we've been discussing a binder.yml that would be similar in spirit to travis.yml but not exactly the same. Full proposed specification coming soon, but in the meantime this issue can serve as a place to keep discussing this option.

@rdhyee
Copy link

rdhyee commented Nov 4, 2015

I certainly understand and sympathize with moving towards binder.yml (in the spirit of travis.yml). Writing as a travis.yml newbie, I sometimes find getting travis.yml to work a challenge. One issue is that, as far as I know, the only run environment for travis.yml is travis-ci.org itself. If I could run my travis.yml in a friendlier environment instead of having to tickle travis-ci.org to run my travis.yml, it would help me a lot. (I have often gotten a good build environment working first in docker and then translating to travis.yml. )

@rgbkrk
Copy link
Member

rgbkrk commented Nov 5, 2015

@rdhyee Something we've talked about and that @andrewosh has hacked on is making https://github.com/binder-project/binder-build for both local builds and for use on the web.

@rgbkrk
Copy link
Member

rgbkrk commented Nov 5, 2015

D'oh! Now I see this was discussed on gitter as well.

@parente
Copy link
Contributor

parente commented Nov 18, 2015

Now that Docker has user namespace mapping in its experimental branch, and assuming it lands in a stable release in the near-ish future (2.0?), does that change whether or not binder should support Dockerfiles? (Vulnerabilities in the implementation aside.)

http://integratedcode.us/2015/10/13/user-namespaces-have-arrived-in-docker/

@davidrpugh
Copy link

I am developing a Scala/Java 8 application for simulating financial markets that integrates with Jupyter notebooks and python for data analysis. Currently I am unable to use binder-base image because debian-jessie does not support Java 8.

Is it really not possible to build a binder using a custom Dockerfile that doesn't inherit from binder-base image? I have hacked together such a Dockerfile. The container builds successfully, but then doesn't load when I try to launch the binder. Perhaps the docker image is checked at runtime to see if the image was built using binder-base?

@cdeil
Copy link

cdeil commented Oct 13, 2016

Is this option to remove Dockerfile still on the table or can this issue be closed?

What if I want to apt-get install something that's not available as a package for pip or conda? Or download or simulate an example dataset? Or ...?

@freeman-lab
Copy link
Member Author

This can probably be closed, for the foreseeable future we'll definitely keep the Dockerfile option. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants