Remove Dockerfile option #15

freeman-lab · 2015-08-28T21:52:58Z

As discussed with @rgbkrk in a Jupyter dev meeting, for security reasons we should probably remove the ability to specify a binder with a custom Dockerfile, which we currently support so long as the Dockerfile builds on top of our base image. Although it provides an incredibly flexible deployment model, there are too many potential pitfalls with the freedom it provides. The question is, can we satisfy our various use cases without it?

We currently support requirements.txt and conda environment.yml, which together should cover all Python-related builds. We can also add support for other kernels (e.g. R and Julia), and we can add the appropriate package dependency lists for those languages. For Julia, there appears to be a convention of specifying dependencies in a REQUIRE file. Less clear what the appropriate convention should be for R. Comments from R or Julia devs would be welcome on this point!

Is this enough, or can folks suggest use cases for Binder where the Dockerfile is a must have?

Also, to be clear, we will still be using Docker under the hood to build the underlying images! This is just a question of how we expose the configuration options to users.

cc @andrewosh @arokem

The text was updated successfully, but these errors were encountered:

arokem · 2015-08-28T22:15:39Z

OK - let me think through one realistic use-case to see if it does. For example, what if I have a notebook doing MRI analysis, that has a few cells of bash calling out to some other software to process the data? Say mrtrix (https://github.com/MRtrix3/mrtrix3/wiki), a C++ library with quite a few dependencies (I think)? This is easily handled through docker (https://hub.docker.com/r/arokem/mrtrix/~/dockerfile/). Would it be similarly handled through conda (asking out of ignorance)? I am not saying that it should be covered here (maybe it's not a must-have?), but it is a realistic use-case (and there are many similar ones, I'd think), so if it is covered, that's fantastic.

freeman-lab · 2015-08-29T00:27:00Z

Thanks for the example @arokem ! I think that at least for now an environment.yml only supports conda dependencies and optionally pip dependencies, so not clear how this would be supported, unless of course the original authors made the library conda or pip installable =)

It's worth noting that in this example you only used RUN commands. At least some of the security-related issues from running people's arbitrary Dockerfiles involves USER calls. So some clever way of sanitizing or screening Dockerfiles to support this might not be out of the question.

rgbkrk · 2015-08-29T22:00:45Z

Ah, but those RUN calls require root - they're apt-get installing packages.

You could include apt-get, but then you'll want a whitelist since apt can do very bad things and not every package is reliably secure in how it doles out permissions or handles security in general. If you went this route, I'd use the Travis whitelist so you don't have to be the ones playing whack-a-mole and their stuff is hopefully being audited.

As for easily handling it out of conda, it's a matter of creating your own conda build package or kindly asking @ContinuumIO to put a package together for it. I'm of the opinion that conda is a solid solution for non-root binary installs, I just wish there was more transparency into how they're built as well as the automation behind them.

If we sanitize the Dockerfile, are we really creating something reproducible after it's launched on binder? What if binder is gone? How does someone rectify the difference between what's in the Dockerfile and what binder actually puts together?

Would it be crazy for use to use some subselection of Travis configuration (for the ones that run on containers)? before_install, script, before_script, addons.apt.packages, etc.

freeman-lab · 2015-08-30T18:33:33Z

Just to follow-up after a chat with @rgbkrk and @andrewosh ... our working plan is to continue to maintain support for Dockerfiles for now, but lock down inter-container communication issues (e.g. #14 ) , and keep exploring vulnerabilities. At the same time, we'll try to add support for straight-up travis.yml files. The advantages of this as an option:

it's an existing, well-documented schema that already handles custom dependencies separate from but integrated with Docker
many people have one in their repo already!
if they don't, it might encourage them to use continuous testing (which is good!)

A guiding philosophy is that whenever possible, we want to let people use a spec file already in their repo, rather than force them to write a new one.

freeman-lab · 2015-11-04T17:09:23Z

Small update to this plan: in various places we've been discussing a binder.yml that would be similar in spirit to travis.yml but not exactly the same. Full proposed specification coming soon, but in the meantime this issue can serve as a place to keep discussing this option.

rdhyee · 2015-11-04T17:21:34Z

I certainly understand and sympathize with moving towards binder.yml (in the spirit of travis.yml). Writing as a travis.yml newbie, I sometimes find getting travis.yml to work a challenge. One issue is that, as far as I know, the only run environment for travis.yml is travis-ci.org itself. If I could run my travis.yml in a friendlier environment instead of having to tickle travis-ci.org to run my travis.yml, it would help me a lot. (I have often gotten a good build environment working first in docker and then translating to travis.yml. )

rgbkrk · 2015-11-05T02:32:43Z

@rdhyee Something we've talked about and that @andrewosh has hacked on is making https://github.com/binder-project/binder-build for both local builds and for use on the web.

rgbkrk · 2015-11-05T02:35:56Z

D'oh! Now I see this was discussed on gitter as well.

parente · 2015-11-18T17:51:21Z

Now that Docker has user namespace mapping in its experimental branch, and assuming it lands in a stable release in the near-ish future (2.0?), does that change whether or not binder should support Dockerfiles? (Vulnerabilities in the implementation aside.)

http://integratedcode.us/2015/10/13/user-namespaces-have-arrived-in-docker/

davidrpugh · 2016-03-03T12:37:27Z

I am developing a Scala/Java 8 application for simulating financial markets that integrates with Jupyter notebooks and python for data analysis. Currently I am unable to use binder-base image because debian-jessie does not support Java 8.

Is it really not possible to build a binder using a custom Dockerfile that doesn't inherit from binder-base image? I have hacked together such a Dockerfile. The container builds successfully, but then doesn't load when I try to launch the binder. Perhaps the docker image is checked at runtime to see if the image was built using binder-base?

cdeil · 2016-10-13T15:57:37Z

Is this option to remove Dockerfile still on the table or can this issue be closed?

What if I want to apt-get install something that's not available as a package for pip or conda? Or download or simulate an example dataset? Or ...?

freeman-lab · 2016-10-13T16:52:05Z

This can probably be closed, for the foreseeable future we'll definitely keep the Dockerfile option. Thanks!

freeman-lab added the discussion label Aug 28, 2015

andrewosh mentioned this issue Sep 24, 2015

Jupyer terminal? #26

Closed

olgabot mentioned this issue Nov 24, 2015

Building from environment.yml fails - possibly outdated conda #40

Open

freeman-lab closed this as completed Oct 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Dockerfile option #15

Remove Dockerfile option #15

freeman-lab commented Aug 28, 2015

arokem commented Aug 28, 2015

freeman-lab commented Aug 29, 2015

rgbkrk commented Aug 29, 2015

freeman-lab commented Aug 30, 2015

freeman-lab commented Nov 4, 2015

rdhyee commented Nov 4, 2015

rgbkrk commented Nov 5, 2015

rgbkrk commented Nov 5, 2015

parente commented Nov 18, 2015

davidrpugh commented Mar 3, 2016

cdeil commented Oct 13, 2016

freeman-lab commented Oct 13, 2016