-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to change KP_detector and dense_motion parameters to train on Higher resolution? #81
Comments
Hi @stark-akib, |
Thank you @AliaksandrSiarohin for the direction. How can I increase the resolution for the Keypoint detector & Dense motion model? For say, If I want to increase the Keypoint detector's resolution to 256x256, do I only change the scale_factor to 0.5 for 512x512 resolution input? Or do I need to change any other parameters, functions or files? |
Yes just change scale_factor |
Great. Thank you. Another quick question, I want to preprocess both VoxCeleb1 and VoxCeleb2 dataset. As you have mentioned in the Video_preprocessing page
Does VoxCeleb1 require approx 300GB space for preprocessing? Then how much space will VoxCeleb2 require (as it has more data than VoxCeleb1) for preprocessing? |
No idea, never download it entirely. |
Okay. Thank you again. |
Hello @AliaksandrSiarohin I'm gonna start the training on VoxCeleb1 at 512x512. As you mentioned here
Also, when should the training terminate? Is 1000 epoch is enough (as stated in the YAML file)? |
Vox celeb in png format is 300Gb, 300Gb x 4 is 1200Gb. Intermediate files consume less than several Gb. 1000 epochs? Guess should be 100. |
Great. I'll change the parameters accordingly. Thank you. |
@AliaksandrSiarohin |
Hello, Just giving an update on the Voxceleb1 preprocessing. The 512x512 preprocessing of VoxCeleb1 took around 870GB space for .png format. So, the required storage for training would be 900GB x 4 = 3.6 TB. |
Why? Training don't need additional space. x4 was an estimate for 512x512 space occupancy. Because 512 image is roughly 4 times larger. |
Sorry, I mistook it as a parallel multiplier. The preprocessing I have performed contains 18,671 folders in "train" and 510 folders in "test" folder. The rest of the videos are either showing a broken link or skipped message in the console. I guess no additional space is needed to train then. |
I guess you may need to filter out low resolution videos. So to create vox-metadata.csv I used all the videos where size of the bbox was greater than 256. You can infer size of the bbox from bbox parameter in vox-metadata.csv. |
Thank you for the tip. I'll have a look. |
@AliaksandrSiarohin Also, what's the use of |
|
Thank you. Which one you would suggest using as the config file? "vox-adv-256.yaml" and "vox-256.yaml"? ( Considering that I will only change the frame_shape and scale factors for 512x512) |
Without adversarial it is more stable. |
Thank you for your insight. |
Hello @AliaksandrSiarohin , I've started the training using the following command. After 15 seconds, this error is occurring. Can you help me find the problem?
|
40 is too large. Batch size should be approximately 4 times smaller than for 256, e.g. 16 or 12. |
@AliaksandrSiarohin Checked again after leaving it for an hour, it's still stuck here. When closing by "Ctrl + C" the output of the console is like this. Also, I've set the |
Probably just slow. You can try to change num_repeats to 1 to see. Also you may want to start with pretrained 256 checkpoint to accelerate convergence. |
@AliaksandrSiarohin Thank you. Changing the number of repeat to 1 seems to work. The log file is showing loss. As your YAML files suggest, |
Depends on how much training you can afford. The more the better. |
I can train up to 5 days on my setup, what num_epoch and num_repeats will you suggest? (Considering the 4 NVIDIA Tesla V100 GPUs I've mentioned earlier, I can add another 4. So total 8 V100 GPUs) |
@stark-akib can you please share the checkpoints or the config file? thanks |
I re-trained the net for 512. the script https://github.com/AliaksandrSiarohin/video-preprocessing/blob/master/crop_vox.py in my case was giving many errors and not working, so I just took the https://github.com/AliaksandrSiarohin/video-preprocessing/blob/master/vox-metadata.csv and selected the ones >= 512x512. In this case, I had 5827 mp4 videos in the train folder, and 166 mp4 videos int he test folder. I performed 100 epochs with num_repeats = 20, and the result is not extremely good: In the training, I increased the resolution of KP_detector and Dense motion to 256 (by using scale_factor 0.5).
the losses at the last epoch are: |
I guess the problem is high resolution for KP_detector and Dense motion, have you tried scale_factor: 0.125 and maybe even take a pretrained dense motion and KP_detector? |
@AliaksandrSiarohin Oh, I used 0.5 cause i read on this issue you were suggesting to increase the resolution for the keypoint detector and dense-motion. |
I don't know you should try. |
If I try to start the training from your weights, it gives me the error:
probably because I am using scale_factor = 0.125 instead you used 0.25 (as you trained for resolution 256, instead I am training for resolution 512)? |
100 epochs, 20 num_repeats, scale_factor 0.125 for both dense motion and kp_detector and this is the result do you think the dataset is too small? @AliaksandrSiarohin |
Well hard to say based on a single photo. Hardset sigma in AntialiasingInterpolation and try with pretrained. |
if I use the pretrained model, with hardset sigma, with 512x512, it works but it works in a very bad way and this is why I was trying to retrain. |
I mean, I seriously did the same stuff that @stark-akib did:
So the possibilities here are three:
|
I tought you were using png format, that is why it is so slow for you. |
hey @AliaksandrSiarohin , I downloaded in png format.
what could be the reason for this? |
This I don't know. |
@AliaksandrSiarohin tried that. Now it gives this one:
maybe it's a problem of the names of the folders? they are saved with the name of the video and the mp4 extension. For example, the name of a folder is |
No, this should not be a problem if you are on linux. Check what is inside the folder id10001#7w0IBEWc9Qw#000993#001143.mp4 and send what filenames and some files from there. |
now I substituted that line with the training starts but after a while I get
it seems like there is a problem on the images, like it's not reading them |
Guess you are right, try to print the names of the images that case the error and inspect them manually. |
@AliaksandrSiarohin AliaksandrSiarohin I did that. It was printing them as bytes, so something like So I Changed that converting the paths to strings and the name was correct, it could identify the correct numer of frames and also the correct names for the frames. However then, after a bit that the training started, I got this
it is strange cause I never had problems with mp4 format. Maybe its the "animation" format in the config file that should be .png? If I print num_frames, frames and path in frames_dataset.py, I get the correct names: I even printed the os.path.join output for some of them, and it looks correct: |
Yes, yes this I get. Can you check specifically which image is producing an error and validate if it is a good image? |
Ok I made it work . It was a problem with some corrupted files. |
@alessiapacca hi I also want to get a higher resolution, just want to know how about your results, can you tell me, thank you |
@Aaron2286 the training is extremely slow, so I still don't know whether the result will be good or not. I am training it though |
@alessiapacca yes I know, thank you very much. Actually, I am not very good at this, but I think this is very important to my grandmother, so I am studying hard. If there are results, can you provide some information? Thank you. |
@stark-akib Can you share your checkpoints/model weights?thanks。 |
@stark-akib @alessiapacca Hello, can you share the result of your training? :) I'm really curious about the video quality after training with 512 sizes cos I'm trying to do the same. Your answer would be very appreciated. Thank you !! |
TO SUM UP THE WHOLE ISSUE: There is no model for higher resolution (e.g., 512x512). Am I right? If you have can you share it? |
Hello @AliaksandrSiarohin . First of all, congratulations on the great work and thank you for sharing the repository.
I'm planning to train the model to generate higher resolution output (such as 512x512, 1024x1024). I would really appreciate your insight on my approach.
You mentioned here #14
Do I need to change this behavior for better motion transfer performance (while training on higher resolution)? How would you suggest doing it?
Looking forward to hearing from you. :)
The text was updated successfully, but these errors were encountered: