Resuming from pretrained network checkpoints on CPU fails: unknown Torch class <torch.CudaTensor> #3

JCBrouwer · 2017-10-12T13:10:37Z

Trying to resume training from one of the models on CPU returns an error regarding an unknown Torch class.

DATA_ROOT=myimages dataset=folder gpu=0 netD=checkpoints/landscapes_776_net_D.t7 netG=checkpoints/landscapes_776_net_G.t7 th main-128.lua
{
ntrain : inf
netD : "checkpoints/landscapes_776_net_D.t7"
nThreads : 4
niter : 100
batchSize : 64
netG : "checkpoints/landscapes_776_net_G.t7"
ndf : 40
fineSize : 128
nz : 100
loadSize : 129
gpu : 0
ngf : 160
dataset : "folder"
lr : 0.0002
noise : "normal"
name : "experiment1"
beta1 : 0.5
display_id : 10
display : 1
}
Random Seed: 8411
Starting donkey with id: 2 seed: 8413
table: 0x0a0c0bc8
Starting donkey with id: 1 seed: 8412
table: 0x0a0e2528
Starting donkey with id: 4 seed: 8415
table: 0x0a100ae0
Starting donkey with id: 3 seed: 8414
table: 0x0a122460
Loading train metadata from cache
Loading train metadata from cache
Loading train metadata from cache
Loading train metadata from cache
Dataset: folder Size: 5209
Initializing generator network from checkpoints/landscapes_776_net_G.t7
/Users/hans/torch/install/bin/luajit: /Users/hans/torch/install/share/lua/5.1/torch/File.lua:343: unknown Torch class <torch.CudaTensor>
stack traceback:
[C]: in function 'error'
/Users/hans/torch/install/share/lua/5.1/torch/File.lua:343: in function 'readObject'
/Users/hans/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/Users/hans/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read'
/Users/hans/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
/Users/hans/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
main-128.lua:72: in main chunk
[C]: in function 'dofile'
...hans/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x010912cd60

This seems to be because the models were trained using a GPU and thus require CUNN to load. According to this comment however, this can be remedied simply by converting the models to float before saving them. I would test it out myself and pull request (seeing as this might be as simple as adding 2 lines) but I don't have an NVIDIA graphics card.

Next to this I have found this script, which seems to be able to convert checkpoints after the fact. This also requires CUNN though, so it would be nice if the checkpoints could be converted for us CPU users!

robbiebarrat · 2017-10-12T18:46:49Z

Hey! Thanks for bringing this to my attention;

I actually had no idea that CPU users couldn't use the pre-trained models I put up; so sorry about that.

I'll make a commit once I get home from work tonight with the converted models.

Caselles · 2017-10-13T14:53:17Z

Indeed, it would fantastic to obtain the CPU-compatible models !

robbiebarrat · 2017-10-13T20:37:07Z

@Caselles Agreed - CPU-compatible models are a must;

On my work computer - the conversion script in Kaparthy's repo doesn't run (gives a very strange error)...

I think I'm going to try and add the "two-line change" that @JCBrouwer mentioned and see if it runs on a computer with CUDA_VISIBLE_DEVICES=""

Caselles · 2017-10-17T14:01:57Z

@robbiebarrat Did you have the time to progress on the issue? I really want to try out the pretrained models but it is currently not possible for me.

robbiebarrat · 2017-10-17T18:51:01Z

Sorry - I have been working on this.

I keep getting errors when trying to run a generation with the converted model, and recently a very discouraging one

In 1 module of nn.Sequential: ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:32: Only Cuda supported duh!

Here; I've uploaded one of the models for you to try. See if it works by running net=landscapes_776_net_G_cpu.t7 th generate.lua' - and maybe comment out some of the lines that require cuda/cudnn/cunn with --`...

Get the model here and please let me know if you find anything out. In the meantime; I'll keep trying to solve this.

https://drive.google.com/open?id=0B-_m9VM1w1bKRnJMZmkzWEVtSDA

Also; the script I used for conversion is as follows:

require 'nn'
require 'optim'
require 'cunn'
require 'cudnn'

modelName = 'landscapes_776_net_G.t7'

model = torch.load(modelName)
model = model:float()
torch.save(modelName .. '_cpu', model:clearState())

Caselles · 2017-10-17T21:20:08Z

Thanks for your work. I tried, and got the same error as you, related to cudnn : unknown Torch class <cudnn.SpatialFullConvolution>

It might be too much work, but isn't there a way to convert the code to python ? The problem seems related to torch, I guess with Keras or Tensorflow these problems does not exist. I know it is a lot of work but I assure you that a lot of people would be grateful, since it is VERY HARD to find pre-trained art GANs models. I think this repo would be much more used if it was in python.

I'll try to seek answers for the cudnn problem though. Keep us in touch please :)

robbiebarrat · 2017-10-17T21:31:13Z

It might not be too much work, honestly; python is the language that im most comfortable with, and keras is my favorite library. I just finished converting an old project i wrote in pybrain, of all things, to keras and had great success doing that.

I'll keep you updated on the conversion process - I'm a little busy right now since I'm starting to apply to college, so it might take like, two or so weeks, but I feel like it'd definitely be worth it.

Caselles · 2017-10-17T21:39:33Z

Two or so weeks would be great ! I would really appreciate you doing that. Thanks

robbiebarrat · 2017-10-17T21:45:43Z

no problem - It'll help me a lot with some art projects I'm doing, too, so it's a win-win

robbiebarrat · 2017-10-24T21:17:14Z

@Caselles @JCBrouwer
Alright - so I've finished the data-loading part in python, hopefully this week I'll be able to finish the actual GAN part (not that hard once you already know the architecture); i very well may double the resolution, too, and have it be 256x256...

Anyways; yeah expect a python rewrite sometime this week(end?)

Caselles · 2017-10-25T07:12:09Z

Thanks a lot for the update. Do you mean we will have pre-trained models in 256x256 resolution, loadable in python ? This would be really great !

robbiebarrat · 2017-10-25T08:26:57Z

Yeah - I plan on making that the case. At my workplace, we have these insane GPU clusters that I'll set the models to train on; so training won't take forever, and once I finish defining the model in keras (i'm making some slight improvements in the architecture), it'll be relatively easy for me to train.

Let me know if you have any ideas for a pre-trained GAN you'd like to see included; I'm definitely going to do landscapes and nude portraits, but if you think of anything else you'd like just let me know.

JCBrouwer · 2017-10-25T09:56:52Z

@robbiebarrat Thanks a ton for working on this!

Some more training sets I can think of are: cartoon characters, pixel art, graffiti, space, and psychedelic art.

Feel like all of those should have more than enough examples to train on. They could also all be interesting checkpoints to resume from and train with new data.

Caselles · 2017-10-25T14:53:02Z

Maybe try the flowers dataset too : http://www.robots.ox.ac.uk/~vgg/data/flowers/

I would looove to see the results of a pre trained GAN on flowers that you fine tune on abstract art ! :)

robbiebarrat · 2017-10-25T18:00:51Z

@JCBrouwer @Caselles The things that jump out at me as really cool ideas is the space gan, and the flowers + abstract fine-tuning GAN. I've been meaning to have a network that can sort of "show off" the whole micro-training-on-a-different-dataset thing I've come up with and flowers sound really good for that...

Thanks so much for the suggestions; I'll keep you updated on the progress of the rewrite in the coming week.

Caselles · 2017-11-03T16:43:03Z

@robbiebarrat Any progress on the rewrite ? :)

No hurry, just want to know if you had the time to work on it.

robbiebarrat · 2017-11-03T17:34:49Z

@Caselles yeah - I've finished pretty much everything except for the network implementation itself (like defining the model in keras, but that shouldn't take long at all).

I'll put it up as soon as it's like, presentable with results and stuff, which I think is going to be ~a week from now. By the end of next weekend for sure I'll have something ready to put up.

Caselles · 2017-11-04T09:34:21Z

Thanks a lot ! Looking forward to try out these models !

robbiebarrat · 2017-11-12T05:18:55Z

Hey guys - I'm actually running into a lot of trouble with the Keras model; it's insanely hard to train at 256x256 resolution so I'm messing around with architectural changes... Really sorry, but this might take longer to do than I initially thought it would.

Caselles · 2017-11-12T12:07:25Z

Hey @robbiebarrat, no worries and thanks for the update! Take your time, as long as you keep us updated about your work it is perfectly ok :) Good luck with the architectural changes !

JCBrouwer · 2017-11-13T15:34:42Z

@robbiebarrat training for 256x256 is quite difficult! Most implementations have a lot of issues with mode collapse above 128x128 resolution.

Perhaps it's an idea to get a working implementation and trained datasets for 128x128 up first and then expand to larger resolutions later.

Otherwise some tips that might help with larger resolution training can be found here and maybe something to look at implementing eventually would be progressive growing of resolution.

Caselles · 2017-12-04T14:49:03Z

Even if you have results in 128x128 I am very much interested in being able to get the pretrained models and code in python. Such repository do not really exist at the moment so even if it does not seem really perfect to publish I think you should consider it :)

Caselles · 2017-12-27T13:18:54Z

?

robbiebarrat · 2017-12-27T20:23:23Z

@Caselles Hey - sorry about this; but I've run into a lot of problems with the python networks (mostly with regard to training stability). I've tried a bunch of things from ganhacks, different architectures, loss functions, etc. but to no avail. I thought about using the progressive growing of gans paper, but that takes multiple months to train. I'm putting this project on hold right now, since it's not really working out, and also I'm pretty swamped with applying to colleges right now...

I might come back to it in the next few months if I come across something that'll help me out, or if some cool new GAN paper comes out (like if there's one that does higher resolution generations easily) - but I really don't know if that's very likely.

Caselles · 2017-12-28T18:36:21Z

Ok, really disappointed. Could you at least provide the code for the 128x128 Gan in Keras ?

robbiebarrat · 2017-12-28T20:13:39Z

Unfortunately I don't have that working; otherwise i'd absolutely provide it.

I'm going to try something using tf-gan in the near future, which will just be straight up tensorflow as opposed to keras, but I'll update this repo with some equivalent networks in tf - can't promise 256x256 though.

JCBrouwer · 2018-01-22T15:18:34Z

Hello @Caselles @robbiebarrat. I've finally gotten myself a cuda enabled GPU. I wrote a simple script that converts GPU checkpoints to CPU checkpoints and converted all the included pretrained networks with it. See #6

Landscape GAN

Generator

Discriminator

Nude-Portrait GAN

Generator

Discriminator

Portrait GAN

Generator

Discriminator

vdaita · 2018-01-23T17:03:41Z

Thank you so much! I will try this out at once!

vdaita · 2018-01-23T21:09:37Z

This worked! Thank you soooooo much! @JCBrouwer

Eastkap · 2018-10-26T21:03:02Z

I'm French and I feel you robbie

karl-schulz · 2018-10-29T17:01:44Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resuming from pretrained network checkpoints on CPU fails: unknown Torch class <torch.CudaTensor> #3

Resuming from pretrained network checkpoints on CPU fails: unknown Torch class <torch.CudaTensor> #3

JCBrouwer commented Oct 12, 2017 •

edited

robbiebarrat commented Oct 12, 2017

Caselles commented Oct 13, 2017

robbiebarrat commented Oct 13, 2017 •

edited

Caselles commented Oct 17, 2017

robbiebarrat commented Oct 17, 2017 •

edited

Caselles commented Oct 17, 2017

robbiebarrat commented Oct 17, 2017

Caselles commented Oct 17, 2017

robbiebarrat commented Oct 17, 2017

robbiebarrat commented Oct 24, 2017 •

edited

Caselles commented Oct 25, 2017

robbiebarrat commented Oct 25, 2017

JCBrouwer commented Oct 25, 2017

Caselles commented Oct 25, 2017

robbiebarrat commented Oct 25, 2017

Caselles commented Nov 3, 2017

robbiebarrat commented Nov 3, 2017

Caselles commented Nov 4, 2017

robbiebarrat commented Nov 12, 2017

Caselles commented Nov 12, 2017

JCBrouwer commented Nov 13, 2017

Caselles commented Dec 4, 2017

Caselles commented Dec 27, 2017

robbiebarrat commented Dec 27, 2017

Caselles commented Dec 28, 2017

robbiebarrat commented Dec 28, 2017

JCBrouwer commented Jan 22, 2018

vdaita commented Jan 23, 2018

vdaita commented Jan 23, 2018

Eastkap commented Oct 26, 2018

karl-schulz commented Oct 29, 2018 •

edited

Resuming from pretrained network checkpoints on CPU fails: unknown Torch class <torch.CudaTensor> #3

Resuming from pretrained network checkpoints on CPU fails: unknown Torch class <torch.CudaTensor> #3

Comments

JCBrouwer commented Oct 12, 2017 • edited

robbiebarrat commented Oct 12, 2017

Caselles commented Oct 13, 2017

robbiebarrat commented Oct 13, 2017 • edited

Caselles commented Oct 17, 2017

robbiebarrat commented Oct 17, 2017 • edited

Caselles commented Oct 17, 2017

robbiebarrat commented Oct 17, 2017

Caselles commented Oct 17, 2017

robbiebarrat commented Oct 17, 2017

robbiebarrat commented Oct 24, 2017 • edited

Caselles commented Oct 25, 2017

robbiebarrat commented Oct 25, 2017

JCBrouwer commented Oct 25, 2017

Caselles commented Oct 25, 2017

robbiebarrat commented Oct 25, 2017

Caselles commented Nov 3, 2017

robbiebarrat commented Nov 3, 2017

Caselles commented Nov 4, 2017

robbiebarrat commented Nov 12, 2017

Caselles commented Nov 12, 2017

JCBrouwer commented Nov 13, 2017

Caselles commented Dec 4, 2017

Caselles commented Dec 27, 2017

robbiebarrat commented Dec 27, 2017

Caselles commented Dec 28, 2017

robbiebarrat commented Dec 28, 2017

JCBrouwer commented Jan 22, 2018

Landscape GAN

Generator

Discriminator

Nude-Portrait GAN

Generator

Discriminator

Portrait GAN

Generator

Discriminator

vdaita commented Jan 23, 2018

vdaita commented Jan 23, 2018

Eastkap commented Oct 26, 2018

karl-schulz commented Oct 29, 2018 • edited

JCBrouwer commented Oct 12, 2017 •

edited

robbiebarrat commented Oct 13, 2017 •

edited

robbiebarrat commented Oct 17, 2017 •

edited

robbiebarrat commented Oct 24, 2017 •

edited

karl-schulz commented Oct 29, 2018 •

edited