Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to change KP_detector and dense_motion parameters to train on Higher resolution? #81

Open
stark-akib opened this issue Apr 5, 2020 · 69 comments

Comments

@stark-akib
Copy link

Hello @AliaksandrSiarohin . First of all, congratulations on the great work and thank you for sharing the repository.

I'm planning to train the model to generate higher resolution output (such as 512x512, 1024x1024). I would really appreciate your insight on my approach.

You mentioned here #14

Currently keypoint detector and dense-motion net operate on 64x64 images

Do I need to change this behavior for better motion transfer performance (while training on higher resolution)? How would you suggest doing it?

Looking forward to hearing from you. :)

@AliaksandrSiarohin
Copy link
Owner

Hi @stark-akib,
I don't have a receipt here, you should try and see for yourself.
I would try first to use 64x64 resolution for keypoint detector and dense-motion (use scale_factor = 0.125 and 0.0625 for resolution 512 and 1024 respectively), in case you will see that keypoints is not accurate increase the resolution for keypoint detector, in case you will see that deformations need to be more precise increase resolution for dense motion.

@stark-akib
Copy link
Author

stark-akib commented Apr 6, 2020

Thank you @AliaksandrSiarohin for the direction. How can I increase the resolution for the Keypoint detector & Dense motion model?

For say, If I want to increase the Keypoint detector's resolution to 256x256, do I only change the scale_factor to 0.5 for 512x512 resolution input? Or do I need to change any other parameters, functions or files?

@AliaksandrSiarohin
Copy link
Owner

Yes just change scale_factor

@stark-akib
Copy link
Author

Great. Thank you.

Another quick question, I want to preprocess both VoxCeleb1 and VoxCeleb2 dataset. As you have mentioned in the Video_preprocessing page

Note .png format take aproximatly 300GB.

Does VoxCeleb1 require approx 300GB space for preprocessing? Then how much space will VoxCeleb2 require (as it has more data than VoxCeleb1) for preprocessing?

@AliaksandrSiarohin
Copy link
Owner

No idea, never download it entirely.

@stark-akib
Copy link
Author

Okay. Thank you again.

@stark-akib
Copy link
Author

Hello @AliaksandrSiarohin

I'm gonna start the training on VoxCeleb1 at 512x512. As you mentioned here
I'm looking for a similar training time as I have 4 NVIDIA Tesla V100 GPUs.

  1. Can you help me specify how much storage will be needed to complete the training process?
  2. Will 1-2TB storage suffice(considering the intermediate files generated while training)?

Also, when should the training terminate? Is 1000 epoch is enough (as stated in the YAML file)?

@stark-akib stark-akib reopened this Apr 13, 2020
@AliaksandrSiarohin
Copy link
Owner

Vox celeb in png format is 300Gb, 300Gb x 4 is 1200Gb. Intermediate files consume less than several Gb. 1000 epochs? Guess should be 100.

@stark-akib
Copy link
Author

Great. I'll change the parameters accordingly. Thank you.

@newExplore-hash
Copy link

@AliaksandrSiarohin
hi, for VoxCeleb Dataset if i want to replace yours KP_detector with existing keypoint detector, such as dlib, what should i do? i have no idea how to handle jacobian_map.

@stark-akib
Copy link
Author

@AliaksandrSiarohin

Hello, Just giving an update on the Voxceleb1 preprocessing. The 512x512 preprocessing of VoxCeleb1 took around 870GB space for .png format. So, the required storage for training would be 900GB x 4 = 3.6 TB.

@AliaksandrSiarohin
Copy link
Owner

Why? Training don't need additional space. x4 was an estimate for 512x512 space occupancy. Because 512 image is roughly 4 times larger.

@stark-akib
Copy link
Author

Sorry, I mistook it as a parallel multiplier. The preprocessing I have performed contains 18,671 folders in "train" and 510 folders in "test" folder. The rest of the videos are either showing a broken link or skipped message in the console. I guess no additional space is needed to train then.
Thank you.

@AliaksandrSiarohin
Copy link
Owner

I guess you may need to filter out low resolution videos. So to create vox-metadata.csv I used all the videos where size of the bbox was greater than 256. You can infer size of the bbox from bbox parameter in vox-metadata.csv.

@stark-akib
Copy link
Author

Thank you for the tip. I'll have a look.

@stark-akib
Copy link
Author

@AliaksandrSiarohin
Just a quick question. What's the difference between "vox-adv-256.yaml" and "vox-256.yaml"?
The parameter such as
use_kp: True
and
sn: True
what is the difference?

Also, what's the use of epoch_milestones: [60, 90]?

@AliaksandrSiarohin
Copy link
Owner

  1. vox-adv is with adversarial loss
  2. use_kp add key-points heatmaps to discriminator.
  3. sn - spectral normalization
  4. epoch_milestones is epochs at which learning rate dropped.

@stark-akib
Copy link
Author

stark-akib commented Apr 16, 2020

Thank you. Which one you would suggest using as the config file? "vox-adv-256.yaml" and "vox-256.yaml"? ( Considering that I will only change the frame_shape and scale factors for 512x512)

@AliaksandrSiarohin
Copy link
Owner

Without adversarial it is more stable.

@stark-akib
Copy link
Author

Thank you for your insight.

@stark-akib
Copy link
Author

stark-akib commented Apr 16, 2020

Hello @AliaksandrSiarohin ,

I've started the training using the following command.
CUDA_VISIBLE_DEVICES=0,1,2,3 python run.py --config config/vox-adv-512.yaml --device_ids 0,1,2,3

After 15 seconds, this error is occurring. Can you help me find the problem?
I'm using batch size 40, but still getting the OOM error.

run.py:40: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
Use predefined train-test split.
Training...
0%| | 0/150 [00:00<?, ?it/s]/home/ubuntu/anaconda3/envs/alethea/lib/python3.7/site-packages/torch/nn/functional.py:2423: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/home/ubuntu/anaconda3/envs/alethea/lib/python3.7/site-packages/torch/nn/functional.py:1332: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
File "run.py", line 81, in
train(config, generator, discriminator, kp_detector, opt.checkpoint, log_dir, dataset, opt.device_ids)
File "/home/ubuntu/Downloads/first-order-model/train.py", line 51, in train
losses_generator, generated = generator_full(x)
File "/home/ubuntu/anaconda3/envs/alethea/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/alethea/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 143, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/ubuntu/anaconda3/envs/alethea/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/ubuntu/anaconda3/envs/alethea/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
raise output
File "/home/ubuntu/anaconda3/envs/alethea/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in worker
output = module(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/alethea/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/Downloads/first-order-model/modules/model.py", line 166, in forward
x_vgg = self.vgg(pyramide_generated['prediction
' + str(scale)])
File "/home/ubuntu/anaconda3/envs/alethea/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/Downloads/first-order-model/modules/model.py", line 45, in forward
h_relu2 = self.slice2(h_relu1)
File "/home/ubuntu/anaconda3/envs/alethea/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/alethea/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/ubuntu/anaconda3/envs/alethea/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/alethea/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 320, in forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 160.00 MiB (GPU 0; 15.78 GiB total capacity; 14.45 GiB already allocated; 21.88 MiB free; 148.08 MiB cached)

@AliaksandrSiarohin
Copy link
Owner

40 is too large. Batch size should be approximately 4 times smaller than for 256, e.g. 16 or 12.

@stark-akib
Copy link
Author

Thank you. Lowering the batch size to 16 solved the issue. Just showing those UserWarnings.
What's the expected output of the command prompt? Is it supposed to stay like this? There is no output in the log.txt.
state1

@stark-akib
Copy link
Author

stark-akib commented Apr 16, 2020

@AliaksandrSiarohin Checked again after leaving it for an hour, it's still stuck here. When closing by "Ctrl + C" the output of the console is like this.

state2

Also, I've set the num_epochs: 100, num_repeats: 50 and batch size down to 12, still, the training is stuck here. So, are there any changes needed in the loss values or in model.py?

@AliaksandrSiarohin
Copy link
Owner

Probably just slow. You can try to change num_repeats to 1 to see. Also you may want to start with pretrained 256 checkpoint to accelerate convergence.

@stark-akib
Copy link
Author

@AliaksandrSiarohin Thank you. Changing the number of repeat to 1 seems to work. The log file is showing loss. As your YAML files suggest,
For Voxceleb, 256x256, num_epochs: 100, num-repeats: 75
For Voxceleb adv, 256x256, num_epochs: 150, num-repeats: 75
Then what should be thenum_epochs:' & 'num-repeats for 512x512?

@AliaksandrSiarohin
Copy link
Owner

Depends on how much training you can afford. The more the better.

@stark-akib
Copy link
Author

stark-akib commented Apr 17, 2020

I can train up to 5 days on my setup, what num_epoch and num_repeats will you suggest? (Considering the 4 NVIDIA Tesla V100 GPUs I've mentioned earlier, I can add another 4. So total 8 V100 GPUs)

@MitalPattani
Copy link

@stark-akib can you please share the checkpoints or the config file? thanks

@alessiapacca
Copy link

Hey @AliaksandrSiarohin

I re-trained the net for 512. the script https://github.com/AliaksandrSiarohin/video-preprocessing/blob/master/crop_vox.py in my case was giving many errors and not working, so I just took the https://github.com/AliaksandrSiarohin/video-preprocessing/blob/master/vox-metadata.csv and selected the ones >= 512x512. In this case, I had 5827 mp4 videos in the train folder, and 166 mp4 videos int he test folder.

I performed 100 epochs with num_repeats = 20, and the result is not extremely good:
biden

biden2

In the training, I increased the resolution of KP_detector and Dense motion to 256 (by using scale_factor 0.5).
Do you think the cause of the flickering and of the artifacts is:

  • too less num_repeats
  • too small dataset
  • mp4 dataset (instead of png)
    ?

the losses at the last epoch are:
00000099) perceptual - 95.23073; equivariance_value - 0.12323; equivariance_jacobian - 0.33881

@AliaksandrSiarohin
Copy link
Owner

I guess the problem is high resolution for KP_detector and Dense motion, have you tried scale_factor: 0.125 and maybe even take a pretrained dense motion and KP_detector?

@alessiapacca
Copy link

@AliaksandrSiarohin Oh, I used 0.5 cause i read on this issue you were suggesting to increase the resolution for the keypoint detector and dense-motion.
So you think 0.125 would help more?
What do you mean by taking a pretrained dense motion and KP detector?

@AliaksandrSiarohin
Copy link
Owner

I don't know you should try.
Initialize them with weight from my checkpoint.

@alessiapacca
Copy link

alessiapacca commented Oct 30, 2020

If I try to start the training from your weights, it gives me the error:

RuntimeError: Error(s) in loading state_dict for OcclusionAwareGenerator:
        size mismatch for dense_motion_network.down.weight: copying a param with shape torch.Size([3, 1, 13, 13]) from checkpoint, the shape in current model is torch.Size([3, 1, 29, 29]).

probably because I am using scale_factor = 0.125 instead you used 0.25 (as you trained for resolution 256, instead I am training for resolution 512)?

@alessiapacca
Copy link

100 epochs, 20 num_repeats, scale_factor 0.125 for both dense motion and kp_detector and this is the result

do you think the dataset is too small? @AliaksandrSiarohin
or training with mp4 gives worse results?
or I am doing something else in the wrong way?
It doesn't even move the mouth or close the eyes

result

@AliaksandrSiarohin
Copy link
Owner

Well hard to say based on a single photo. Hardset sigma in AntialiasingInterpolation and try with pretrained.

@alessiapacca
Copy link

if I use the pretrained model, with hardset sigma, with 512x512, it works but it works in a very bad way and this is why I was trying to retrain.
So your suggestion is to re-train with scale_factor = 0.125 but hardset sigma? Cause the previous 2 experiments I did where using the original sigma.
@AliaksandrSiarohin

@alessiapacca
Copy link

I mean, I seriously did the same stuff that @stark-akib did:

  • changed the scale_factor for dense motion and kp detector to 0.5, for 100 epochs and 20 num_repeats, and it looked very bad (I think this is how @stark-akib trained his model, as I can read from this issue)
  • changed the scale_factor for dense motion and kp detector to 0.125, for 100 epochs and 20 num_repeats, and again it couldn't animate the output.

So the possibilities here are three:

  • either I am using a too small dataset (approximately 6000 videos, taken from your metadata file where I selected only the ones that had a bigger bbox than 512x512)
  • either I am training for a too small time
  • either I should use png format to train

@AliaksandrSiarohin
Copy link
Owner

I tought you were using png format, that is why it is so slow for you.
Yes, you should use .png format.

@alessiapacca
Copy link

hey @AliaksandrSiarohin , I downloaded in png format.
Now I have the test and train folders. They contain other folders inside, having the name of the corresponding video.
Inside every folder there are all the frames in png format.
However, if I try to make the training start, I get the error:

TypeError: Traceback (most recent call last):
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/first-order-model/frames_dataset.py", line 155, in __getitem__
    return self.dataset[idx % self.dataset.__len__()]
  File "/first-order-model/frames_dataset.py", line 115, in __getitem__
    video_array = [img_as_float32(io.imread(os.path.join(path, frames[idx]))) for idx in frame_idx]
  File "/first-order-model/frames_dataset.py", line 115, in <listcomp>
    video_array = [img_as_float32(io.imread(os.path.join(path, frames[idx]))) for idx in frame_idx]
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/posixpath.py", line 94, in join
    genericpath._check_arg_types('join', a, *p)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/genericpath.py", line 155, in _check_arg_types
    raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: Can't mix strings and bytes in path components

what could be the reason for this?

@AliaksandrSiarohin
Copy link
Owner

This I don't know.
I guess you can fix this by replacing frame[idx] with frame[idx]..decode("utf-8")

@alessiapacca
Copy link

@AliaksandrSiarohin tried that. Now it gives this one:

ValueError: Traceback (most recent call last):
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/first-order-model/frames_dataset.py", line 155, in __getitem__
    return self.dataset[idx % self.dataset.__len__()]
  File "/first-order-model/frames_dataset.py", line 115, in __getitem__
    video_array = [img_as_float32(io.imread(os.path.join(path, (frames[idx]).decode("utf-8")))) for idx in frame_idx]
  File "/first-order-model/frames_dataset.py", line 115, in <listcomp>
    video_array = [img_as_float32(io.imread(os.path.join(path, (frames[idx]).decode("utf-8")))) for idx in frame_idx]
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/skimage/io/_io.py", line 48, in imread
    img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/skimage/io/manage_plugins.py", line 210, in call_plugin
    return func(*args, **kwargs)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/skimage/io/_plugins/imageio_plugin.py", line 10, in imread
    return np.asarray(imageio_imread(*args, **kwargs))
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/core/functions.py", line 265, in imread
    reader = read(uri, format, "i", **kwargs)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/core/functions.py", line 182, in get_reader
    "Could not find a format to read the specified file in %s mode" % modename
ValueError: Could not find a format to read the specified file in single-image mode

maybe it's a problem of the names of the folders? they are saved with the name of the video and the mp4 extension. For example, the name of a folder is
id10001#7w0IBEWc9Qw#000993#001143.mp4

@AliaksandrSiarohin
Copy link
Owner

No, this should not be a problem if you are on linux. Check what is inside the folder id10001#7w0IBEWc9Qw#000993#001143.mp4 and send what filenames and some files from there.

@alessiapacca
Copy link

alessiapacca commented Nov 6, 2020

yes I am on linux. Inside that folder there are 150 png frames going from 0000000.png to 0000149.png
photo_2020-11-06_10-40-55
an example of frame is this one
photo_2020-11-06_10-41-52

there are other folders with more frames inside, but they are always named starting either from 0000000.png or continuing the previous frame number (if we are talking of the same video of another folder)

@alessiapacca
Copy link

now I substituted that line with
video_array = [img_as_float32(io.imread(path + '/' + frames[idx].decode('utf-8')) )for idx in frame_idx]

the training starts but after a while I get

Traceback (most recent call last):
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/first-order-model/frames_dataset.py", line 155, in __getitem__
    return self.dataset[idx % self.dataset.__len__()]
  File "/first-order-model/frames_dataset.py", line 115, in __getitem__
    video_array = [img_as_float32(io.imread(path + '/' + frames[idx].decode('utf-8')) )for idx in frame_idx]
  File "/first-oder-model/frames_dataset.py", line 115, in <listcomp>
    video_array = [img_as_float32(io.imread(path + '/' + frames[idx].decode('utf-8')) )for idx in frame_idx]
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/skimage/io/_io.py", line 48, in imread
    img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/skimage/io/manage_plugins.py", line 210, in call_plugin
    return func(*args, **kwargs)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/skimage/io/_plugins/imageio_plugin.py", line 10, in imread
    return np.asarray(imageio_imread(*args, **kwargs))
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/core/functions.py", line 265, in imread
    reader = read(uri, format, "i", **kwargs)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/core/functions.py", line 186, in get_reader
    return format.get_reader(request)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/core/format.py", line 170, in get_reader
    return self.Reader(self, request)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/core/format.py", line 221, in __init__
    self._open(**self.request.kwargs.copy())
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/plugins/pillow.py", line 298, in _open
    return PillowFormat.Reader._open(self, pilmode=pilmode, as_gray=as_gray)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/plugins/pillow.py", line 135, in _open
    pil_try_read(self._im)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/plugins/pillow.py", line 680, in pil_try_read
    raise ValueError(error_message)
ValueError: Could not load "" 
Reason: "image file is truncated"
Please see documentation at: http://pillow.readthedocs.io/en/latest/installation.html#external-libraries

it seems like there is a problem on the images, like it's not reading them

@AliaksandrSiarohin
Copy link
Owner

Guess you are right, try to print the names of the images that case the error and inspect them manually.

@alessiapacca
Copy link

alessiapacca commented Nov 7, 2020

@AliaksandrSiarohin AliaksandrSiarohin I did that.

It was printing them as bytes, so something like
b'0000000.png'

So I Changed that converting the paths to strings and the name was correct, it could identify the correct numer of frames and also the correct names for the frames. However then, after a bit that the training started, I got this

ValueError: Traceback (most recent call last):
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/first-order-model/frames_dataset.py", line 161, in __getitem__
    return self.dataset[idx % self.dataset.__len__()]
  File "/first-order-model/frames_dataset.py", line 121, in __getitem__
    video_array = [img_as_float32(io.imread(os.path.join(path, frames[idx]))) for idx in frame_idx]
  File "/first-order-model/frames_dataset.py", line 121, in <listcomp>
    video_array = [img_as_float32(io.imread(os.path.join(path, frames[idx]))) for idx in frame_idx]
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/skimage/io/_io.py", line 48, in imread
    img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/skimage/io/manage_plugins.py", line 210, in call_plugin
    return func(*args, **kwargs)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/skimage/io/_plugins/imageio_plugin.py", line 10, in imread
    return np.asarray(imageio_imread(*args, **kwargs))
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/core/functions.py", line 265, in imread
    reader = read(uri, format, "i", **kwargs)
  File "/local/home/user/anaconda3/envs/first-order-model/lib/python3.7/site-packages/imageio/core/functions.py", line 182, in get_reader
    "Could not find a format to read the specified file in %s mode" % modename
ValueError: Could not find a format to read the specified file in single-image mode

it is strange cause I never had problems with mp4 format. Maybe its the "animation" format in the config file that should be .png?

If I print num_frames, frames and path in frames_dataset.py, I get the correct names:
path: /media/user/hdd/vox/train/id10068#5M2EGef.0f4#001945#002065.mp4
num frames: 140
frames: ['0000067.png', '0000000.png', '0000128.png', '0000042.png', '0000024.png', '0000080.png', '0000081.png', '0000087.png', '0000122.png', '0000139.png', '0000025.png', '0000079.png', '0000015.png', '0000021.png', '0000012.png', '0000010.png', '0000044.png', '0000022.png', '0000055.png', '0000125.png', '0000070.png', '0000033.png', '0000065.png', '0000101.png', '0000132.png', '0000103.png', '0000026.png', '0000085.png', '0000074.png', '0000089.png', '0000083.png', '0000001.png', '0000061.png', '0000088.png', '0000041.png', '0000131.png', '0000105.png', '0000097.png', '0000073.png', '0000077.png', '0000110.png', '0000082.png', '0000071.png', '0000109.png', '0000095.png', '0000058.png', '0000098.png', '0000049.png', '0000027.png', '0000048.png', '0000078.png', '0000059.png', '0000066.png', '0000126.png', '0000134.png', '0000005.png', '0000069.png', '0000037.png', '0000057.png', '0000115.png', '0000002.png', '0000031.png', '0000052.png', '0000060.png', '0000117.png', '0000034.png', '0000113.png', '0000006.png', '0000090.png', '0000068.png', '0000133.png', '0000072.png', '0000091.png', '0000019.png', '0000118.png', '0000028.png', '0000045.png', '0000040.png', '0000102.png', '0000023.png', '0000018.png', '0000130.png', '0000029.png', '0000137.png', '0000011.png', '0000035.png', '0000093.png', '0000111.png', '0000106.png', '0000036.png', '0000084.png', '0000053.png', '0000016.png', '0000032.png', '0000136.png', '0000124.png', '0000050.png', '0000020.png', '0000051.png', '0000064.png', '0000100.png', '0000123.png', '0000094.png', '0000039.png', '0000054.png', '0000116.png', '0000121.png', '0000008.png', '0000017.png', '0000099.png', '0000092.png', '0000076.png', '0000063.png', '0000104.png', '0000047.png', '0000138.png', '0000003.png', '0000043.png', '0000129.png', '0000127.png', '0000046.png', '0000108.png', '0000004.png', '0000014.png', '0000096.png', '0000007.png', '0000056.png', '0000114.png', '0000086.png', '0000120.png', '0000038.png', '0000075.png', '0000013.png', '0000135.png', '0000112.png', '0000107.png', '0000030.png', '0000119.png', '0000062.png', '0000009.png']

I even printed the os.path.join output for some of them, and it looks correct:
joining : /media/user/hdd/vox/train/id10098#8f2ReesQMrs#001291#001410.mp4/0000080.png
joining : /media/user/hdd/vox/train/id10098#8f2ReesQMrs#001291#001410.mp4/0000050.png
joining : /media/user/hdd/vox/train/id10748#XaQk7W-ySMo#005166#005379.mp4/0000033.png
joining : /media/user/hdd/vox/train/id10748#XaQk7W-ySMo#005166#005379.mp4/0000148.png
joining : /media/user/hdd/vox/train/id10909#M3rfGq1-lXg#008731#009013.mp4/0000212.png
joining : /media/user/hdd/vox/train/id10909#M3rfGq1-lXg#008731#009013.mp4/0000052.png

@AliaksandrSiarohin
Copy link
Owner

Yes, yes this I get. Can you check specifically which image is producing an error and validate if it is a good image?

@alessiapacca
Copy link

Ok I made it work . It was a problem with some corrupted files.
It will take long to train but I will see if like this the results are better than the mp4 version.

@Aaron2286
Copy link

@alessiapacca hi I also want to get a higher resolution, just want to know how about your results, can you tell me, thank you

@alessiapacca
Copy link

@Aaron2286 the training is extremely slow, so I still don't know whether the result will be good or not. I am training it though

@Aaron2286
Copy link

@alessiapacca yes I know, thank you very much. Actually, I am not very good at this, but I think this is very important to my grandmother, so I am studying hard. If there are results, can you provide some information? Thank you.

@sicilyliu
Copy link

Thank you. I'll give a try on that.

@stark-akib Can you share your checkpoints/model weights?thanks。

@chloejihye
Copy link

@stark-akib @alessiapacca Hello, can you share the result of your training? :) I'm really curious about the video quality after training with 512 sizes cos I'm trying to do the same. Your answer would be very appreciated. Thank you !!

@celikmustafa89
Copy link

TO SUM UP THE WHOLE ISSUE:

There is no model for higher resolution (e.g., 512x512). Am I right? If you have can you share it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests