Skip to content
This repository has been archived by the owner on Jan 24, 2018. It is now read-only.

Write a user guide for creating Docker images compatible with tmpnb #125

Open
rgbkrk opened this issue Jan 26, 2015 · 15 comments
Open

Write a user guide for creating Docker images compatible with tmpnb #125

rgbkrk opened this issue Jan 26, 2015 · 15 comments
Assignees
Milestone

Comments

@rgbkrk
Copy link
Member

rgbkrk commented Jan 26, 2015

We have decent reference images that tmpnb.org uses but it would be really nice to help communities across the globe launch their own setup that has a sane security posture.

@rgbkrk
Copy link
Member Author

rgbkrk commented Feb 8, 2015

It would be good to point out @godber's additions of desertpy material to the demo images, since we're more than happy to host community content in the available images.

/cc @willingc

@rothnic
Copy link
Contributor

rothnic commented Feb 13, 2015

I haven't yet tried it with tmpnb, but I did just get my docker image working that I plan to use with tmpnb: rothnic/anaconda-notebook. The intent was to provide a similar setup as the demo image, except with anaconda. I didn't know there was a terminal available in IPython 3 until recently, so I wanted to still install things as needed via conda/pip. I'm not sure what this does from a security standpoint though.

Other things that need to be considered would be whether to have the kernels be in virtual environments.

Have you thought at all about having a open source web ide integrated in, so you could factor out larger functions into py files, like codebox. There is a docker image that sets up codebox, so it would be just a matter of launching codebox, or other ide, from the ipython tree view.

@rgbkrk
Copy link
Member Author

rgbkrk commented Feb 13, 2015

I did just get my docker image working that I plan to use with tmpnb: rothnic/anaconda-notebook... I'm not sure what this does from a security standpoint though.

Awesome! It looks like you use a non-root user, so you've done the first step in mitigating potential issues. We also recommend tweaking the networking settings for Docker. You probably want --icc=false in your /etc/default/docker file.

For the deployment on tmpnb.org, we also set --ip-forward=false to prevent all networking. If you want people to be able to install whatever they want via conda, grab data sets, etc. I'd just ignore that one.

Other things that need to be considered would be whether to have the kernels be in virtual environments.

I'll probably install miniconda for Python 3, set up Python 2 as a conda environment, install the Python 2 kernel as part of the conda environment as the next iteration of the demo image.

Have you thought at all about having a open source web ide integrated in, so you could factor out larger functions into py files, like codebox. There is a docker image that sets up codebox, so it would be just a matter of launching codebox, or other ide, from the ipython tree view.

Other than the current editor that is built into IPython 3, no. codebox looks pretty cool though!

@rothnic
Copy link
Contributor

rothnic commented Feb 13, 2015

The initial thing I tried with tmpnb was installing another package, which was when I ran into the user not being root. I initially used miniconda, especially since it was quicker for building the images, to easily provide a user owned python environment. I may setup a branch that is miniconda as an option. I do plan to set it up exactly how you mention with the python 2 kernel in an environment, so I'll see if I can get that implemented.

I was unsure of the base image. I wanted to avoid the existing python images, since I was trying to install python for the specific user. I just saw this morning that the one I'm using isn't trusted, so it would be good to find a better option.

The main issue I ran into with building a custom docker image was that it seems difficult to run something from a regular user's permission levels in docker, so it made things more difficult to install anaconda for the user. If you were actually logged into the user in linux, you can just run the script without interaction and no sudo, and you'd get what I was after. Instead I had to install anaconda to a specific directory, then chown it for the user. Even still, I had to manually chown the ipython security folder for some reason, otherwise ipython would fail to start.

Thanks for the other tips, I'll check them out. If there is a place you'd prefer to capture some of this information, let me know and I'll contribute towards it.

@rothnic
Copy link
Contributor

rothnic commented Feb 14, 2015

I was able to get tmpnb working with my image with this command:

docker run --net=host -d -e CONFIGPROXY_AUTH_TOKEN=$TOKEN -v /var/run/docker.sock:/docker.sock --name=tmpnb jupyter/tmpnb python orchestrate.py --image='rothnic/anaconda-notebook' --command='/home/condauser/anaconda/bin/ipython notebook --NotebookApp.base_url=/{base_path} --ip=0.0.0.0 --port {port}'

@willingc
Copy link
Member

@rothnic That's great! Thanks for sharing the details. This is a good place to keep sharing tips for the user guide.

@rothnic
Copy link
Contributor

rothnic commented Feb 17, 2015

I have updated anaconda-notebook with a base install of python 3. There is also now a python 2 kernel installed in a conda environment, although it is barebones compared to python 3 which still has all of anaconda's components. I also added a conditional check for a local version of anaconda, so that you can build locally without downloading it over and over.

dependencies
I did have some troubles with dependencies on qt with matplotlib, even when trying to set matplotlib to inline mode in the profile. I ended up fixing it by installing some x server linux dependencies, but I need to track down why I still need to input %matplotlib inline within the notebook, instead of it being applied by default.

python 2 vs. python 3 for base environment
I'm not sure at this point about having the base in python 2 versus python 3, then the other in the environment. Many of the notebooks I was grabbing to test the setup out with assume python 2. I'd like to focus on python 3, but at the same time, if this is to support beginners messing around in a sandbox environment, it may not be the best setup. I'll likely set up a branch for the alternate configuration.

reducing image layers
One other thing I think I need to do is to consolidate some of the docker run statements so there aren't so many large intermediate images. I think the main culprit may be having the chown operation of the user's anaconda folder separate from the main install. I think this requires downloading the majority of the image over again.

@rgbkrk
Copy link
Member Author

rgbkrk commented Feb 17, 2015

That's great, thank you. I'd like to use this for demo images, though I want to understand what's in the phusion baseimage. They make excellent technical arguments, but I'd prefer to see that stuff go towards upstream and for the images to be signed.

@rothnic
Copy link
Contributor

rothnic commented Feb 17, 2015

Yeah, that's right. I'm going to switch over to a different base image. Looks like people have been asking for it to be trusted for some time.

@rgbkrk
Copy link
Member Author

rgbkrk commented Feb 17, 2015

If you're wondering what to pick, I'm leaning towards debian:jessie for our base. It's a bit smaller than ubuntu but still gives you apt goodness.

@rothnic
Copy link
Contributor

rothnic commented Feb 19, 2015

I have it working with debian:jessie, but I'm not sure what I'm doing wrong on handling matplotlib. I'm having to install way too many QT dependencies, which greatly increases the size. Even then, I still have to use %matplotlib inline before you can use matplotlib, even with a notebook config that contains c.IPKernelApp.matplotlib = 'inline'. (realized while typing this that i may need to check if the config file is executable)

I was hoping that I could just tell ipython that I don't want to use matplotlib with qt at all.

I'll go ahead and commit this version. It works, just is a bit big for now and requires manually using %matplotlib inline.

@rothnic
Copy link
Contributor

rothnic commented Feb 22, 2015

Getting closer to getting this working. It seems that c.IPKernelApp.matplotlib = 'inline' doesn't work for me in IPython 3.

The only automated way I've found to use inline by default is with this matplotlib import hook, which I'm still working some issues out with. I found multiple issues related to this topic.

Anaconda's matplotlib is too old, so needs to be updated: matplotlib/matplotlib#3464
Mentioned inline should be default in this project, but was closed: #33, pointing to jupyter/docker-notebook#7, pointing to ipython/ipython#6424

I think I've narrowed down that you can switch the backend to nbagg, then after doing that, set matplotlib to use inline, then import pyplot. This makes sure that matplotlib doesn't try to utilize the qt libraries, even though I'm really not using nbagg. Nbagg works, but seems buggy at this point, at least with the version in conda. I have this working from within the notebook, but am trying to get it working in the import hook.

@rothnic
Copy link
Contributor

rothnic commented Feb 23, 2015

I was able to ignore qt with matplotlib by using this startup script.

I wasn't able to get the import hook working because it never sees the matplotlib import, just the import of the matplotlib dependencies.

@rgbkrk
Copy link
Member Author

rgbkrk commented Feb 23, 2015

Nice startup script! I think that's worthy.

@gpillemer
Copy link

Hi i tried this configuration.
When purely running the docker image locally, it works fine.
Running it using the command above for tmpnb - gives me the following error.

File "orchestrate.py", line 277, in
main()
File "orchestrate.py", line 257, in main
ioloop.run_sync(pool.heartbeat)
File "/usr/local/lib/python2.7/dist-packages/tornado/ioloop.py", line 418, in run_sync
return future_cell[0].result()
File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 109, in result
raise_exc_info(self._exc_info)
File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 631, in run
yielded = self.gen.throw(_sys.exc_info())
File "/srv/tmpnb/spawnpool.py", line 189, in heartbeat
yield tasks
File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 628, in run
value = future.result()
File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 109, in result
raise_exc_info(self._exc_info)
File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 464, in cal lback
result_list = [i.result() for i in children]
File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 109, in result
raise_exc_info(self._exc_info)
File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 631, in run
yielded = self.gen.throw(_sys.exc_info())
File "/srv/tmpnb/spawnpool.py", line 227, in _launch_container
container_config=self.container_config)
File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 628, in run
value = future.result()
File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 109, in result
raise_exc_info(self._exc_info)
File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 633, in run
yielded = self.gen.send(value)
File "/srv/tmpnb/dockworker.py", line 116, in create_notebook_server
host_port = container_network[0]['HostPort']
TypeError: 'NoneType' object has no attribute 'getitem'

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants