Line 442 in PDCNet.py. #5

zwyking · 2021-08-07T10:34:48Z

Hello, thanks for your excellent works and codes. Here, i have an issue for the line 442 in PDCNet.py. I think the codes should be "c_t=c23, c_s=c13 ", which means the source is image1 and target is image2. (Maybe I wrongly understand your code, please correct me (＠_＠;))

PruneTruong · 2021-08-09T06:32:38Z

Hi, so im1 is the target image (also referred to as the reference image, for example in PDC-Net paper) and im2 is the source image (also referred to as the query image, for example in PDC-Net paper). Therefore, the provided original version of the code is correct. Here, we estimate the flow field relating the target/reference to the source/query. This flow can be used to align the source/query to the target/reference by warping the source/query according to the flow. Let me know if you have other questions :)

zwyking · 2021-08-11T02:09:33Z

Thanks for your explanation for this issue. I have got it and understood this part. I find another problem, that memory consumption will gradually become larger with the training epoch. I want to know if it's normal.

PruneTruong · 2021-08-11T09:12:05Z

I don't think I had this issue before, but let me double check the published version, I will get back to you.

zwyking · 2021-08-11T09:19:13Z

Ok. This phenomenon is obvious in PDCNet-stage2. During my training precossing, the memory consumption is nearly to 200G with epochs=25.

PruneTruong · 2021-08-12T14:18:28Z

You're right, there seems to be a memory leakage that I didn't have before. I will investigate, in the meanwhile, setting the number of workers to 0 considerably reduces the required memory but might make the training slower. Also, if it crashes, you can restart it, it will restart from where it crashed. Sorry for the inconvenience

zwyking · 2021-08-15T03:15:18Z

Ok, thanks for your patient responses. This is a great work.

zwyking · 2021-08-21T15:11:35Z

I think the memory leakage would occur in the dataloader, because i find the memory add when a new epoch starts.

PruneTruong · 2021-08-26T09:26:33Z

Hi, it should be fixed now. There is increase within one epoch but only very minor increase between the epochs ( I am also storing some val logs). I trained 16 epochs of PDCNet_stage2 with about 40 GB of CPU memory without issue. Let me know if you still encounter some problem!

zwyking · 2021-08-26T11:10:13Z

Sorry for response you later. I want to know which file you have fixed, then I can directly replace my code (I have modefied on the orginal code) to watch the memory. Thanks a lot!

PruneTruong · 2021-08-26T11:25:02Z

Ah sure! The main thing is https://github.com/PruneTruong/DenseMatching/blob/main/datasets/mixture_of_datasets.py, where i had a list that kept growing at each sampling (therefore at the beginning of each epoch). I also added another sampling for megadepth (https://github.com/PruneTruong/DenseMatching/blob/main/datasets/MegaDepth/megadepth.py) that uses less memory but it is not absolutely essential. It also requires changing the arguments in the train_settings.
I also fixed the coco loader (https://github.com/PruneTruong/DenseMatching/blob/main/datasets/object_augmented_dataset/coco.py) so the try except in the object dataset (https://github.com/PruneTruong/DenseMatching/blob/main/datasets/object_augmented_dataset/synthetic_object_augmentation_for_pairs_multiple_ob.py) is not needed anymore. However, I think with only the first fix of the mixture dataset, you should not get a leak anymore. If you try, could you let me know how it goes please.

zwyking · 2021-08-26T11:27:04Z

Ok, i have a try.

…

------------------ 原始邮件 ------------------ 发件人: "PruneTruong/DenseMatching" ***@***.***>; 发送时间: 2021年8月26日(星期四) 晚上7:25 ***@***.***>; ***@***.***>;"State ***@***.***>; 主题: Re: [PruneTruong/DenseMatching] Line 442 in PDCNet.py. (#5) Ah sure! The main thing is https://github.com/PruneTruong/DenseMatching/blob/main/datasets/mixture_of_datasets.py, where i had a list that kept growing at each sampling (therefore at the beginning of each epoch). I also added another sampling for megadepth (https://github.com/PruneTruong/DenseMatching/blob/main/datasets/MegaDepth/megadepth.py) that uses less memory but it is not absolutely essential. It also requires changing the arguments in the train_settings. I also fixed the coco loader (https://github.com/PruneTruong/DenseMatching/blob/main/datasets/object_augmented_dataset/coco.py) so the try except in the object dataset (https://github.com/PruneTruong/DenseMatching/blob/main/datasets/object_augmented_dataset/synthetic_object_augmentation_for_pairs_multiple_ob.py) is not needed anymore. However, I think with only the first fix of the mixture dataset, you should not get a leak anymore. If you try, could you let me know how it goes please. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

zwyking · 2021-08-27T12:40:02Z

Hi, I have trained your new code without changes. The PDCNet_stage1 still has an obvious memory increase, I don't know if it's normal.

zwyking · 2021-08-28T02:08:21Z

This is a snapshot of PDCNet_stage2 with 6 epochs. It's will be found a clear memory leak. Meanwhile, I notice a big memory increase after a validation.

PruneTruong · 2021-08-28T12:29:16Z

Hi, I am really sorry I was running the code on my main internal repo and I only realized now that i hadn't correctly pushed all files to this repo. Therefore I was not seeing the leak that you see.
It was in https://github.com/PruneTruong/DenseMatching/blob/main/training/losses/multiscale_loss.py, .item() were forgotten in the logging of the loss, so i was dragging the whole gradient throughout the epochs. Without this, I trained on exactly this repo.

In Stage 1, I have an increase of about 1.8 GB WITHIN one epoch (reduced to about 1GB if using cv2.imread instead of imageio.imread) but the memory is freed at the end of the epoch. Therefore, i only have an absolute increase of 400MB at the end of first epoch (because creating all the classes for logging metrics) and top 200MB between the epochs after that (due to the logging in memory).

For stage2, I have an increase of about 5GB within one epoch, also released at the end of the epoch. Absolute increase between the epochs of top 100MB.

I am very sorry about the inconvenience and thanks for your patience!

zwyking · 2021-08-28T13:09:17Z

yeah, thanks for your reply. I have corrected this problem during my debug. Thanks for your excellent work!

zwyking closed this as completed Aug 15, 2021

zwyking reopened this Aug 21, 2021

PruneTruong closed this as completed Aug 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Line 442 in PDCNet.py. #5

Line 442 in PDCNet.py. #5

zwyking commented Aug 7, 2021

PruneTruong commented Aug 9, 2021

zwyking commented Aug 11, 2021

PruneTruong commented Aug 11, 2021

zwyking commented Aug 11, 2021

PruneTruong commented Aug 12, 2021

zwyking commented Aug 15, 2021

zwyking commented Aug 21, 2021

PruneTruong commented Aug 26, 2021 •

edited

zwyking commented Aug 26, 2021

PruneTruong commented Aug 26, 2021

zwyking commented Aug 26, 2021 via email

zwyking commented Aug 27, 2021

zwyking commented Aug 28, 2021

PruneTruong commented Aug 28, 2021

zwyking commented Aug 28, 2021

Line 442 in PDCNet.py. #5

Line 442 in PDCNet.py. #5

Comments

zwyking commented Aug 7, 2021

PruneTruong commented Aug 9, 2021

zwyking commented Aug 11, 2021

PruneTruong commented Aug 11, 2021

zwyking commented Aug 11, 2021

PruneTruong commented Aug 12, 2021

zwyking commented Aug 15, 2021

zwyking commented Aug 21, 2021

PruneTruong commented Aug 26, 2021 • edited

zwyking commented Aug 26, 2021

PruneTruong commented Aug 26, 2021

zwyking commented Aug 26, 2021 via email

zwyking commented Aug 27, 2021

zwyking commented Aug 28, 2021

PruneTruong commented Aug 28, 2021

zwyking commented Aug 28, 2021

PruneTruong commented Aug 26, 2021 •

edited