Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU memory usage is too high and other queries #37

Closed
prajjwal1 opened this issue Jan 19, 2020 · 19 comments
Closed

CPU memory usage is too high and other queries #37

prajjwal1 opened this issue Jan 19, 2020 · 19 comments

Comments

@prajjwal1
Copy link

prajjwal1 commented Jan 19, 2020

Thanks for sharing this code. When I'm performing finetuning with VQA, my RAM usage blows up. With num_workers set to 4, it requires 207 GB. I've tried with different batch sizes also. The script with --tiny flag runs successfully. But when I'm loading both train and nominival, the memory usage blows up. I get memory can't be allocated. Do you know a workaround for this ? I think this is because we are storing all the features from faster_rcnn in RAM ?

@prajjwal1 prajjwal1 changed the title CPU memeory usage is too high CPU memory usage is too high Jan 19, 2020
@airsplay
Copy link
Owner

Thanks.

It is because the data loader would load image features to memory 4 times (i.e., 4 copies of the image features are in memory).

The num_workers setting in PyTorch data loader would only speed up memory-loading because due to GIL. Since the image features are already in memory, the num_workers > 1 is not needed. Setting num_workers==1 is efficient.

@prajjwal1
Copy link
Author

prajjwal1 commented Jan 19, 2020

Thanks for replying.

  1. Why 4 ? Are you using multi scaling ? Like 4 features extracted from intermediate layers from the backbone.
  2. Did you train the faster_rcnn on the COCO with cross entropy only ? Or did you use the five mentioned pretraining objectives with BERT for training it ? I'm asking because I want to extend your approach outside VQA, and I will have to train faster_rcnn myself since features won't be available.

@airsplay
Copy link
Owner

  1. It is because you have set up "num_workers" to 4, thus it will be 4 copies.

  2. The faster_rcnn is only trained with object detection and is frozen when extracting features.

@prajjwal1 prajjwal1 changed the title CPU memory usage is too high CPU memory usage is too high and other queries Jan 20, 2020
@prajjwal1
Copy link
Author

Could you suggest a better way of loading features ? I'm not able to fit them even with num_workers=1. Should I use faster_rcnn from torchvision (it is also pretrained on COCO) for VQA and obtain features on the fly ? But LXMERT+faster_rcnn won't fit on a single GPU ? How did you manage ?

@airsplay
Copy link
Owner

May I ask how large your main memory is?
You might use the fp16 branch to halve the memory usage with command

git checkout fp16

If it still exceeds the memory limitation, the code might need to load features from disk.

@prajjwal1
Copy link
Author

prajjwal1 commented Jan 22, 2020

I have 8 cores in my GCP instance (around 48 GB). It just tried with with num_workers=0 though (it worked, but slow). I am thinking of using pandas with chunksize. This implementation of reading from a very large tsv file at once seems inefficient. What do you think ?

@airsplay
Copy link
Owner

Thanks. WIth 48 GB main memory, it should work fine with fp16, which takes around 30 GB memory.

Loading features from disk is definitely possible; Multiple workers should be involved to balance the loading. The current choice loads from memory thus ultimately remove the cost of memory loading but would be inefficient when the memory is not big enough.

@prajjwal1
Copy link
Author

prajjwal1 commented Jan 23, 2020

Thanks for your reply. In your experiment, did you try with other configurations (9,6,6 for L,X,R ) ?

@airsplay
Copy link
Owner

I did not try them given the computational resources.

@prajjwal1
Copy link
Author

prajjwal1 commented Jan 27, 2020

Thanks.

  1. For pretraining, you used all 5 datasets (Visual Genome, MS COCO, VQA 2.0, GQA, VG-QA), I wanted to ask for finetuning on VQA, the model will be coming across same sentence and image pairs as it encountered during pretraining right ? So during finetuning, is the model already been exposed to the dataset being used for finetuning ?

  2. You don't seem to be using lr_scheduler, any reason for that ?

@airsplay
Copy link
Owner

  1. May I ask whether you would consider using VQA in pre-training as a problematic setup? And could you specify the reason? Actually, using part of data in pre-training is a common strategy since the limitation of data. As long as it does not touch the test data, the improvement on the test data would be considered as solid. For example, every work following bottom-up attention (if you are not familiar with the thread of VQA works, they generally include every recent works on VQA in the last two years) takes an object detection pretrained on Visual Genome. Visual genomes contain half of VQA training images, which means that the ground truth objects annotation of the training image is used in the pre-training of every VQA paper. However, the test data have never been touched in training the detection system thus the validity of these works still holds.

  2. It has a triangular lr scheduler inside the optimizer.

@prajjwal1
Copy link
Author

prajjwal1 commented Jan 28, 2020

Thanks for replying. Sorry, I just wanted to learn more about the pretraining and I don't consider it problematic. The reason why I asked is that pretraining isn't feasible for me right now.

  1. I am working on the finetuning part specifically, and I think the effect of finetuning will not be predominant if the model has already seen the data during pretraining. Although the performance will improve as your paper (and other works e.g ViLBERT, VLBERT, UNITER) have shown, I think there would some upper bound imposed on the performance to some extent (Moreover, these datasets have some overlap). For ex. using non COCO images and pairs (along with your proposed objectives) for pretraining, and using VQA (which has COCO images) to finetune (similar to what we do in imagenet training).

  2. Thanks for providing such a wonderful codebase, really helpful. I wanted to clarify if the pretrained model provided by you has been trained on all 4 objectives (Image QA, XM matching, Masked obj pred, Masked XM LM) ?

@airsplay
Copy link
Owner

  1. ViLBERT is mainly trained with Conceptual Captions, which contains (mostly) out-of-domain images and data. However, another paper, i.e. UNITER, somehow shows that a clean dataset would still be better in handling vision-and-language tasks (see Table 3). On out-of-domain datasets (NLVR2), the COCO + VG setup still wins although Conceptual Captions has 10x more images. I currently do not have a clear answer for it and I am waiting for more results. A clear comparison between clean/in-domain/small datasets and noisy/out-of-domain/large datasets requires too many computational resources.

  2. Yes. The losses are added together.

@prajjwal1
Copy link
Author

Hi,
In the paper, you've reported results on test set. But predict provides results on validation (this repo). How did you calculate results on test-dev, test-std ?

@airsplay
Copy link
Owner

Thanks, the results of test-dev and test-std require using the test servers. The detailed processes for each dataset are provided at the end of each section. E.g., https://github.com/airsplay/lxmert#submitted-to-vqa-test-server,
https://github.com/airsplay/lxmert#submitted-to-gqa-test-server, https://github.com/airsplay/lxmert#unreleased-test-sets

@prajjwal1
Copy link
Author

prajjwal1 commented Mar 1, 2020

Hi,
Could you please share the attention visualization code (as in appendix section of your paper)? It seems very useful for interpretability point of view. That'd be really useful.

@airsplay
Copy link
Owner

airsplay commented Mar 3, 2020

Currently, I did not find a clean way to fetch the attention graphs thus the code is badly organized. I just gathered all the output, saved them in tsv files, and visualize the output by ipynb. So for now, I do not have a plan to release them.

@prajjwal1 prajjwal1 reopened this Mar 15, 2020
@prajjwal1
Copy link
Author

How did you gathered the output, if you could please point the line in your current codebase, where you got the output from, that'd be really helpful.

@airsplay
Copy link
Owner

My way is simple but not elegant. I create a global list and append the output here to the list. The list is cleared before each forward operation and logged after the forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants