Training a model fails #4

randomrandom · 2016-07-15T20:28:29Z

Hi, I tried to run the command from the tutorial for model training, but it failed with the following error:

 CUDA_VISIBLE_DEVICES=0 th feedforward_neural_doodle.lua -model_name skip_noise_4 -masks_hdf5 data/starry/gen_doodles.hdf5 -batch_size 4 -num_mask_noise_times 0 -num_noise_channels 0 -learning_rate 1e-1 -half false
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/hdf5/group.lua:312: HDF5Group:read() - no such child 'style_img' for [HDF5Group 33554432 /]
stack traceback:
    [C]: in function 'error'
    /root/torch/install/share/lua/5.1/hdf5/group.lua:312: in function 'read'
    feedforward_neural_doodle.lua:49: in main chunk
    [C]: in function 'dofile'
    /root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

any ideas why hdf5 might fail with such error?

The text was updated successfully, but these errors were encountered:

DmitryUlyanov · 2016-07-16T08:34:21Z

did you generate hdf5 file first?

randomrandom · 2016-07-16T11:00:54Z

yes, initially I thought that something with the generation didn't go good - since this script never completed:

python generate.py --n_jobs 30 --n_colors 4 --style_image data/starry/style.png --style_mask data/starry/style_mask.png --out_hdf5 data/starry/gen_doodles.hdf5
even though a new hdf5 file was generated

So I decided to try the sample command that you have put in the README - so it should use the sample hdf5 file from the repo, unfortunately it made no difference.

Is it possible that the two fail due to bad hdf5 setup?

DmitryUlyanov · 2016-07-16T17:11:22Z

there's no sample hdf5 file, since it is too large. You should let it work till it finishes.

randomrandom · 2016-07-16T17:36:26Z

thanks, I'll try that! How much time does it take on ur setup?

Do you advise to increase the jobs? I'm using a Tesla K10 setup

randomrandom · 2016-07-16T18:08:08Z

I managed to get it working, unfortunately it looks like the VRAM (3.5GB) is not enough. What's the best way to reduce the memory footprint?

p.s.: I'm familiar with Johnson's implementation and know what I can do there, but I still haven't read your blogpost and the code documentation :(

Edit 1: From first glance - looks like reducing the batch_size and n_colors might do the trick? I increased them to 8, maybe that's why it fails..

Edit 2: Is it even possible to squeeze the training into 3.5GB? I started going through the code and I noticed that you are already doing a lot of the memory optimizations (e.g. using cudnn and the ADAM optimizer)..

DmitryUlyanov · 2016-07-16T21:22:06Z

Try doing batch_size = 1, do not change ncolors, you can also downsize the image to 256x256 for example

randomrandom · 2016-07-17T06:39:50Z

looks like batch_size=1 did the trick, I previously tried with 2 and 3 with no success. Does this affect the quality or just the speed of the training?

DmitryUlyanov · 2016-07-17T08:01:57Z

The quality will be ok, I used batch_size = 1, but at test time you need to experiment with midel:evaluate() or model:training()

randomrandom · 2016-07-18T17:08:09Z

BTW, do you recommend this repo for artistic neural transfer? probably to do it well there should be some semantic analysis that determines the masks :? Is there any other approach that you can recommend

randomrandom mentioned this issue Jul 15, 2016

Training a model fails DmitryUlyanov/fast-neural-doodle#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training a model fails #4

Training a model fails #4

randomrandom commented Jul 15, 2016

DmitryUlyanov commented Jul 16, 2016

randomrandom commented Jul 16, 2016

DmitryUlyanov commented Jul 16, 2016

randomrandom commented Jul 16, 2016 •

edited

Loading

randomrandom commented Jul 16, 2016 •

edited

Loading

DmitryUlyanov commented Jul 16, 2016

randomrandom commented Jul 17, 2016

DmitryUlyanov commented Jul 17, 2016

randomrandom commented Jul 18, 2016

Training a model fails #4

Training a model fails #4

Comments

randomrandom commented Jul 15, 2016

DmitryUlyanov commented Jul 16, 2016

randomrandom commented Jul 16, 2016

DmitryUlyanov commented Jul 16, 2016

randomrandom commented Jul 16, 2016 • edited Loading

randomrandom commented Jul 16, 2016 • edited Loading

DmitryUlyanov commented Jul 16, 2016

randomrandom commented Jul 17, 2016

DmitryUlyanov commented Jul 17, 2016

randomrandom commented Jul 18, 2016

randomrandom commented Jul 16, 2016 •

edited

Loading

randomrandom commented Jul 16, 2016 •

edited

Loading