Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected output? #5

Closed
jbmaxwell opened this issue Sep 4, 2017 · 5 comments
Closed

Unexpected output? #5

jbmaxwell opened this issue Sep 4, 2017 · 5 comments
Assignees

Comments

@jbmaxwell
Copy link

jbmaxwell commented Sep 4, 2017

A couple of strange things: 1) it pauses/hangs for a very long time before starting training, 2) output images (continuous variable) are 280x140 (I expected square 280x280), and the reproduction is certainly not approaching parity with the training data (this is image 50).

varying_c2_0050

Maybe 50 epochs (the default) is too few? But still, the shape seems odd... It seems like there's still a bug somewhere.

[EDIT: Sorry, I just noticed that the images on your git page are actually 5x10 digits. So the output is probably correct. I guess it's just a question of whether 50 is too few epochs.]

@anibali
Copy link
Owner

anibali commented Sep 4, 2017

  1. The only time I've observed a long hang was when loading cudnn on the GTX 1080. Can you please run the following command and post the output:
time nvidia-docker run -it --rm infogan-torch th -e 'require "cudnn"'

[EDIT: For reference, I see 3.956 total on my Maxwell Titan X]

  1. 50 epochs is plenty to get recognisable results. For example, I just ran the training code and got the following after only 10 epochs (a couple of minutes):

varying_c1_0010

So I think there's probably something suspect going on in your particular setup. What graphics card are you using? Which version of CUDA is installed?

@anibali anibali self-assigned this Sep 4, 2017
@jbmaxwell
Copy link
Author

Your docker run command (from above) is just running. I'm just on a GTX 1060.
I wouldn't be at all surprised if it's a cuda thing. I'm new to python, so the v2/v3 dance, environments, and all the rest, has been an "adventure", for sure.

For reference:
nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Tue_Jan_10_13:22:03_CST_2017 Cuda compilation tools, release 8.0, V8.0.61

@anibali
Copy link
Owner

anibali commented Sep 4, 2017

I think your best bet would be to try using a CUDA 8.0 base image for the Docker container, since it seems like that new Pascal cards do not play well with CUDA 7.5. Unfortunately this does mean rebuilding the Docker image for a third time.

Edit the first line of the Dockerfile like so:

- FROM nvidia/cuda:7.5-cudnn5-devel
+ FROM nvidia/cuda:8.0-cudnn5-devel

Then rebuild the image:

nvidia-docker build -t infogan-torch .

@jbmaxwell
Copy link
Author

Okay, all good now with FROM nvidia/cuda:8.0-cudnn5-devel
Thanks for your time.

@anibali
Copy link
Owner

anibali commented Sep 5, 2017

Fantastic! I've added a note to the readme (7b51a16) so that future users don't run into this problem.

@anibali anibali closed this as completed Sep 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants