Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line 442 in PDCNet.py. #5

Closed
zwyking opened this issue Aug 7, 2021 · 15 comments
Closed

Line 442 in PDCNet.py. #5

zwyking opened this issue Aug 7, 2021 · 15 comments

Comments

@zwyking
Copy link

zwyking commented Aug 7, 2021

Hello, thanks for your excellent works and codes. Here, i have an issue for the line 442 in PDCNet.py. I think the codes should be "c_t=c23, c_s=c13 ", which means the source is image1 and target is image2. (Maybe I wrongly understand your code, please correct me (@_@;))

@PruneTruong
Copy link
Owner

Hi, so im1 is the target image (also referred to as the reference image, for example in PDC-Net paper) and im2 is the source image (also referred to as the query image, for example in PDC-Net paper). Therefore, the provided original version of the code is correct. Here, we estimate the flow field relating the target/reference to the source/query. This flow can be used to align the source/query to the target/reference by warping the source/query according to the flow. Let me know if you have other questions :)

@zwyking
Copy link
Author

zwyking commented Aug 11, 2021

Thanks for your explanation for this issue. I have got it and understood this part. I find another problem, that memory consumption will gradually become larger with the training epoch. I want to know if it's normal.

@PruneTruong
Copy link
Owner

I don't think I had this issue before, but let me double check the published version, I will get back to you.

@zwyking
Copy link
Author

zwyking commented Aug 11, 2021

Ok. This phenomenon is obvious in PDCNet-stage2. During my training precossing, the memory consumption is nearly to 200G with epochs=25.

@PruneTruong
Copy link
Owner

You're right, there seems to be a memory leakage that I didn't have before. I will investigate, in the meanwhile, setting the number of workers to 0 considerably reduces the required memory but might make the training slower. Also, if it crashes, you can restart it, it will restart from where it crashed. Sorry for the inconvenience

@zwyking
Copy link
Author

zwyking commented Aug 15, 2021

Ok, thanks for your patient responses. This is a great work.

@zwyking zwyking closed this as completed Aug 15, 2021
@zwyking zwyking reopened this Aug 21, 2021
@zwyking
Copy link
Author

zwyking commented Aug 21, 2021

I think the memory leakage would occur in the dataloader, because i find the memory add when a new epoch starts.

@PruneTruong
Copy link
Owner

PruneTruong commented Aug 26, 2021

Hi, it should be fixed now. There is increase within one epoch but only very minor increase between the epochs ( I am also storing some val logs). I trained 16 epochs of PDCNet_stage2 with about 40 GB of CPU memory without issue. Let me know if you still encounter some problem!

@zwyking
Copy link
Author

zwyking commented Aug 26, 2021

Sorry for response you later. I want to know which file you have fixed, then I can directly replace my code (I have modefied on the orginal code) to watch the memory. Thanks a lot!

@PruneTruong
Copy link
Owner

Ah sure! The main thing is https://github.com/PruneTruong/DenseMatching/blob/main/datasets/mixture_of_datasets.py, where i had a list that kept growing at each sampling (therefore at the beginning of each epoch). I also added another sampling for megadepth (https://github.com/PruneTruong/DenseMatching/blob/main/datasets/MegaDepth/megadepth.py) that uses less memory but it is not absolutely essential. It also requires changing the arguments in the train_settings.
I also fixed the coco loader (https://github.com/PruneTruong/DenseMatching/blob/main/datasets/object_augmented_dataset/coco.py) so the try except in the object dataset (https://github.com/PruneTruong/DenseMatching/blob/main/datasets/object_augmented_dataset/synthetic_object_augmentation_for_pairs_multiple_ob.py) is not needed anymore. However, I think with only the first fix of the mixture dataset, you should not get a leak anymore. If you try, could you let me know how it goes please.

@zwyking
Copy link
Author

zwyking commented Aug 26, 2021 via email

@zwyking
Copy link
Author

zwyking commented Aug 27, 2021

Hi, I have trained your new code without changes. The PDCNet_stage1 still has an obvious memory increase, I don't know if it's normal.

@zwyking
Copy link
Author

zwyking commented Aug 28, 2021

QQ截图20210828094629
This is a snapshot of PDCNet_stage2 with 6 epochs. It's will be found a clear memory leak. Meanwhile, I notice a big memory increase after a validation.

@PruneTruong
Copy link
Owner

Hi, I am really sorry I was running the code on my main internal repo and I only realized now that i hadn't correctly pushed all files to this repo. Therefore I was not seeing the leak that you see.
It was in https://github.com/PruneTruong/DenseMatching/blob/main/training/losses/multiscale_loss.py, .item() were forgotten in the logging of the loss, so i was dragging the whole gradient throughout the epochs. Without this, I trained on exactly this repo.

In Stage 1, I have an increase of about 1.8 GB WITHIN one epoch (reduced to about 1GB if using cv2.imread instead of imageio.imread) but the memory is freed at the end of the epoch. Therefore, i only have an absolute increase of 400MB at the end of first epoch (because creating all the classes for logging metrics) and top 200MB between the epochs after that (due to the logging in memory).

For stage2, I have an increase of about 5GB within one epoch, also released at the end of the epoch. Absolute increase between the epochs of top 100MB.

I am very sorry about the inconvenience and thanks for your patience!

@zwyking
Copy link
Author

zwyking commented Aug 28, 2021

yeah, thanks for your reply. I have corrected this problem during my debug. Thanks for your excellent work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants