Unexpected output? #5

jbmaxwell · 2017-09-04T22:32:03Z

A couple of strange things: 1) it pauses/hangs for a very long time before starting training, 2) output images (continuous variable) are 280x140 (I expected square 280x280), and the reproduction is certainly not approaching parity with the training data (this is image 50).

Maybe 50 epochs (the default) is too few? But still, the shape seems odd... It seems like there's still a bug somewhere.

[EDIT: Sorry, I just noticed that the images on your git page are actually 5x10 digits. So the output is probably correct. I guess it's just a question of whether 50 is too few epochs.]

anibali · 2017-09-04T22:53:17Z

The only time I've observed a long hang was when loading cudnn on the GTX 1080. Can you please run the following command and post the output:

time nvidia-docker run -it --rm infogan-torch th -e 'require "cudnn"'

[EDIT: For reference, I see 3.956 total on my Maxwell Titan X]

50 epochs is plenty to get recognisable results. For example, I just ran the training code and got the following after only 10 epochs (a couple of minutes):

So I think there's probably something suspect going on in your particular setup. What graphics card are you using? Which version of CUDA is installed?

jbmaxwell · 2017-09-04T23:39:22Z

Your docker run command (from above) is just running. I'm just on a GTX 1060.
I wouldn't be at all surprised if it's a cuda thing. I'm new to python, so the v2/v3 dance, environments, and all the rest, has been an "adventure", for sure.

For reference:
nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Tue_Jan_10_13:22:03_CST_2017 Cuda compilation tools, release 8.0, V8.0.61

anibali · 2017-09-04T23:46:58Z

I think your best bet would be to try using a CUDA 8.0 base image for the Docker container, since it seems like that new Pascal cards do not play well with CUDA 7.5. Unfortunately this does mean rebuilding the Docker image for a third time.

Edit the first line of the Dockerfile like so:

- FROM nvidia/cuda:7.5-cudnn5-devel
+ FROM nvidia/cuda:8.0-cudnn5-devel

Then rebuild the image:

nvidia-docker build -t infogan-torch .

jbmaxwell · 2017-09-05T02:20:17Z

Okay, all good now with FROM nvidia/cuda:8.0-cudnn5-devel
Thanks for your time.

anibali · 2017-09-05T02:33:15Z

Fantastic! I've added a note to the readme (7b51a16) so that future users don't run into this problem.

anibali self-assigned this Sep 4, 2017

anibali closed this as completed Sep 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected output? #5

Unexpected output? #5

jbmaxwell commented Sep 4, 2017 •

edited

Loading

anibali commented Sep 4, 2017 •

edited

Loading

jbmaxwell commented Sep 4, 2017

anibali commented Sep 4, 2017

jbmaxwell commented Sep 5, 2017

anibali commented Sep 5, 2017

Unexpected output? #5

Unexpected output? #5

Comments

jbmaxwell commented Sep 4, 2017 • edited Loading

anibali commented Sep 4, 2017 • edited Loading

jbmaxwell commented Sep 4, 2017

anibali commented Sep 4, 2017

jbmaxwell commented Sep 5, 2017

anibali commented Sep 5, 2017

jbmaxwell commented Sep 4, 2017 •

edited

Loading

anibali commented Sep 4, 2017 •

edited

Loading