CPU memory usage is too high and other queries #37

prajjwal1 · 2020-01-19T10:06:03Z

Thanks for sharing this code. When I'm performing finetuning with VQA, my RAM usage blows up. With num_workers set to 4, it requires 207 GB. I've tried with different batch sizes also. The script with --tiny flag runs successfully. But when I'm loading both train and nominival, the memory usage blows up. I get memory can't be allocated. Do you know a workaround for this ? I think this is because we are storing all the features from faster_rcnn in RAM ?

The text was updated successfully, but these errors were encountered:

airsplay · 2020-01-19T16:43:45Z

Thanks.

It is because the data loader would load image features to memory 4 times (i.e., 4 copies of the image features are in memory).

The num_workers setting in PyTorch data loader would only speed up memory-loading because due to GIL. Since the image features are already in memory, the num_workers > 1 is not needed. Setting num_workers==1 is efficient.

prajjwal1 · 2020-01-19T17:35:49Z

Thanks for replying.

Why 4 ? Are you using multi scaling ? Like 4 features extracted from intermediate layers from the backbone.
Did you train the faster_rcnn on the COCO with cross entropy only ? Or did you use the five mentioned pretraining objectives with BERT for training it ? I'm asking because I want to extend your approach outside VQA, and I will have to train faster_rcnn myself since features won't be available.

airsplay · 2020-01-19T20:34:10Z

It is because you have set up "num_workers" to 4, thus it will be 4 copies.
The faster_rcnn is only trained with object detection and is frozen when extracting features.

prajjwal1 · 2020-01-21T12:14:58Z

Could you suggest a better way of loading features ? I'm not able to fit them even with num_workers=1. Should I use faster_rcnn from torchvision (it is also pretrained on COCO) for VQA and obtain features on the fly ? But LXMERT+faster_rcnn won't fit on a single GPU ? How did you manage ?

airsplay · 2020-01-22T03:37:25Z

May I ask how large your main memory is?
You might use the fp16 branch to halve the memory usage with command

git checkout fp16

If it still exceeds the memory limitation, the code might need to load features from disk.

prajjwal1 · 2020-01-22T05:31:43Z

I have 8 cores in my GCP instance (around 48 GB). It just tried with with num_workers=0 though (it worked, but slow). I am thinking of using pandas with chunksize. This implementation of reading from a very large tsv file at once seems inefficient. What do you think ?

airsplay · 2020-01-22T13:57:17Z

Thanks. WIth 48 GB main memory, it should work fine with fp16, which takes around 30 GB memory.

Loading features from disk is definitely possible; Multiple workers should be involved to balance the loading. The current choice loads from memory thus ultimately remove the cost of memory loading but would be inefficient when the memory is not big enough.

prajjwal1 · 2020-01-23T06:49:01Z

Thanks for your reply. In your experiment, did you try with other configurations (9,6,6 for L,X,R ) ?

airsplay · 2020-01-23T18:43:12Z

I did not try them given the computational resources.

prajjwal1 · 2020-01-27T10:07:55Z

Thanks.

For pretraining, you used all 5 datasets (Visual Genome, MS COCO, VQA 2.0, GQA, VG-QA), I wanted to ask for finetuning on VQA, the model will be coming across same sentence and image pairs as it encountered during pretraining right ? So during finetuning, is the model already been exposed to the dataset being used for finetuning ?
You don't seem to be using lr_scheduler, any reason for that ?

airsplay · 2020-01-27T17:57:22Z

May I ask whether you would consider using VQA in pre-training as a problematic setup? And could you specify the reason? Actually, using part of data in pre-training is a common strategy since the limitation of data. As long as it does not touch the test data, the improvement on the test data would be considered as solid. For example, every work following bottom-up attention (if you are not familiar with the thread of VQA works, they generally include every recent works on VQA in the last two years) takes an object detection pretrained on Visual Genome. Visual genomes contain half of VQA training images, which means that the ground truth objects annotation of the training image is used in the pre-training of every VQA paper. However, the test data have never been touched in training the detection system thus the validity of these works still holds.
It has a triangular lr scheduler inside the optimizer.

prajjwal1 · 2020-01-28T07:22:46Z

Thanks for replying. Sorry, I just wanted to learn more about the pretraining and I don't consider it problematic. The reason why I asked is that pretraining isn't feasible for me right now.

I am working on the finetuning part specifically, and I think the effect of finetuning will not be predominant if the model has already seen the data during pretraining. Although the performance will improve as your paper (and other works e.g ViLBERT, VLBERT, UNITER) have shown, I think there would some upper bound imposed on the performance to some extent (Moreover, these datasets have some overlap). For ex. using non COCO images and pairs (along with your proposed objectives) for pretraining, and using VQA (which has COCO images) to finetune (similar to what we do in imagenet training).
Thanks for providing such a wonderful codebase, really helpful. I wanted to clarify if the pretrained model provided by you has been trained on all 4 objectives (Image QA, XM matching, Masked obj pred, Masked XM LM) ?

airsplay · 2020-01-28T19:16:01Z

ViLBERT is mainly trained with Conceptual Captions, which contains (mostly) out-of-domain images and data. However, another paper, i.e. UNITER, somehow shows that a clean dataset would still be better in handling vision-and-language tasks (see Table 3). On out-of-domain datasets (NLVR2), the COCO + VG setup still wins although Conceptual Captions has 10x more images. I currently do not have a clear answer for it and I am waiting for more results. A clear comparison between clean/in-domain/small datasets and noisy/out-of-domain/large datasets requires too many computational resources.
Yes. The losses are added together.

prajjwal1 · 2020-02-13T03:53:03Z

Hi,
In the paper, you've reported results on test set. But predict provides results on validation (this repo). How did you calculate results on test-dev, test-std ?

airsplay · 2020-02-13T20:14:35Z

Thanks, the results of test-dev and test-std require using the test servers. The detailed processes for each dataset are provided at the end of each section. E.g., https://github.com/airsplay/lxmert#submitted-to-vqa-test-server,
https://github.com/airsplay/lxmert#submitted-to-gqa-test-server, https://github.com/airsplay/lxmert#unreleased-test-sets

prajjwal1 · 2020-03-01T07:31:50Z

Hi,
Could you please share the attention visualization code (as in appendix section of your paper)? It seems very useful for interpretability point of view. That'd be really useful.

airsplay · 2020-03-03T16:12:10Z

Currently, I did not find a clean way to fetch the attention graphs thus the code is badly organized. I just gathered all the output, saved them in tsv files, and visualize the output by ipynb. So for now, I do not have a plan to release them.

prajjwal1 · 2020-03-15T12:13:13Z

How did you gathered the output, if you could please point the line in your current codebase, where you got the output from, that'd be really helpful.

airsplay · 2020-03-15T17:14:30Z

My way is simple but not elegant. I create a global list and append the output here to the list. The list is cleared before each forward operation and logged after the forward.

prajjwal1 changed the title ~~CPU memeory usage is too high~~ CPU memory usage is too high Jan 19, 2020

prajjwal1 changed the title ~~CPU memory usage is too high~~ CPU memory usage is too high and other queries Jan 20, 2020

prajjwal1 closed this as completed Mar 15, 2020

prajjwal1 reopened this Mar 15, 2020

prajjwal1 closed this as completed Mar 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU memory usage is too high and other queries #37

CPU memory usage is too high and other queries #37

prajjwal1 commented Jan 19, 2020 •

edited

airsplay commented Jan 19, 2020

prajjwal1 commented Jan 19, 2020 •

edited

airsplay commented Jan 19, 2020

prajjwal1 commented Jan 21, 2020

airsplay commented Jan 22, 2020

prajjwal1 commented Jan 22, 2020 •

edited

airsplay commented Jan 22, 2020

prajjwal1 commented Jan 23, 2020 •

edited

airsplay commented Jan 23, 2020

prajjwal1 commented Jan 27, 2020 •

edited

airsplay commented Jan 27, 2020

prajjwal1 commented Jan 28, 2020 •

edited

airsplay commented Jan 28, 2020

prajjwal1 commented Feb 13, 2020

airsplay commented Feb 13, 2020

prajjwal1 commented Mar 1, 2020 •

edited

airsplay commented Mar 3, 2020

prajjwal1 commented Mar 15, 2020

airsplay commented Mar 15, 2020

CPU memory usage is too high and other queries #37

CPU memory usage is too high and other queries #37

Comments

prajjwal1 commented Jan 19, 2020 • edited

airsplay commented Jan 19, 2020

prajjwal1 commented Jan 19, 2020 • edited

airsplay commented Jan 19, 2020

prajjwal1 commented Jan 21, 2020

airsplay commented Jan 22, 2020

prajjwal1 commented Jan 22, 2020 • edited

airsplay commented Jan 22, 2020

prajjwal1 commented Jan 23, 2020 • edited

airsplay commented Jan 23, 2020

prajjwal1 commented Jan 27, 2020 • edited

airsplay commented Jan 27, 2020

prajjwal1 commented Jan 28, 2020 • edited

airsplay commented Jan 28, 2020

prajjwal1 commented Feb 13, 2020

airsplay commented Feb 13, 2020

prajjwal1 commented Mar 1, 2020 • edited

airsplay commented Mar 3, 2020

prajjwal1 commented Mar 15, 2020

airsplay commented Mar 15, 2020

prajjwal1 commented Jan 19, 2020 •

edited

prajjwal1 commented Jan 19, 2020 •

edited

prajjwal1 commented Jan 22, 2020 •

edited

prajjwal1 commented Jan 23, 2020 •

edited

prajjwal1 commented Jan 27, 2020 •

edited

prajjwal1 commented Jan 28, 2020 •

edited

prajjwal1 commented Mar 1, 2020 •

edited