Split .tsv files #79

yikuan8 · 2020-08-24T05:52:08Z

Thank you for your great repo. I am trying to create a colab version of a bunch of V+L models (lxmert, uniter, visualbert etc.). However, due to the RAM limit of colab. It is hard to read the entire huge .tsv files. Would you mind if I split your entire .tsv file by image to multiple tiny files and public it. For sure, I will send it back to you first.

Thank you,

airsplay · 2020-08-25T18:07:40Z

Great! You could do that for sure :).

yezhengli-Mr9 · 2021-01-04T20:48:09Z

How are train2014_obj36.tsv and val2014_obj36.tsv from mscoco_imgfeat generated?
In case I want to run on my personal datasets with personal images.

airsplay · 2021-01-04T21:18:18Z

@yezhengli-Mr9 Thanks. Please refer to here.

yezhengli-Mr9 · 2021-01-04T21:24:29Z

here

@airsplay Thanks, I know it is generated from Faster-RCNN. Struggling with incompatibility of different package versions. Is that possible that Mask-RCNN might provide useful *.tsv files as well? (well, personally study Mask-RCNN on other tasks more often).

yezhengli-Mr9 · 2021-01-04T21:26:35Z

@yezhengli-Mr9 Thanks. Please refer to here.

OK, I see the Docker. Let me try.

yezhengli-Mr9 · 2021-01-05T00:46:54Z

@yezhengli-Mr9 Thanks. Please refer to here.

Did not see /workspace/features within the docker let alone, /workspace/features/extract_nlvr2_image.py. Is it necessary setups for the "regeneration"?

Anyway, I am looking into butd/tools/generate_tsv.py in case /workspace/features/extract_nlvr2_image.py is necessary.

yezhengli-Mr9 · 2021-01-05T06:46:47Z

@yezhengli-Mr9 Thanks. Please refer to here.

Did not see /workspace/features within the docker let alone, /workspace/features/extract_nlvr2_image.py. Is it necessary setups for the "regeneration"?

Anyway, I am looking into butd/tools/generate_tsv.py in case /workspace/features/extract_nlvr2_image.py is necessary.

Retry the docker multiple times, still do not see the not see /workspace/features within the docker let alone, /workspace/features/extract_nlvr2_image.py. @airsplay @jayleicn

freeIsa · 2021-01-05T16:44:01Z

@yezhengli-Mr9
I can share with you the notes I wrote down when experimenting with this (it's mainly copy&paste of instructions I found in the repository itself). Perhaps these can help you (note that maybe you will need to change details like folder names):

download Docker container with pre-set environment
docker pull airsplay/bottom-up-attention
download faster-rcnn model in folder data/mscoco_imgfeat/
wget --no-check-certificate 'https://www.dropbox.com/s/2h4hmgcvpaewizu/resnet101_faster_rcnn_final_iter_320000.caffemodel?dl=1' -O data/mscoco_imgfeat/resnet101_faster_rcnn_final_iter_320000.caffemodel
run docker: /path/to/mscoco/images is the folder with your images (in subfolders train, test, valid); features will be saved to lxmert/data/mscoco_imgfeat. Use absolute paths.
docker run --gpus all -v /path/to/mscoco/images:/workspace/images:ro -v $(pwd)/data/mscoco_imgfeat:/workspace/features --rm -it airsplay/bottom-up-attention bash
inside docker:
cd /workspace/features
CUDA_VISIBLE_DEVICES=0 python extract_coco_image.py --split train
CUDA_VISIBLE_DEVICES=0 python extract_coco_image.py --split valid
CUDA_VISIBLE_DEVICES=0 python extract_coco_image.py --split test
if you get CUDA errors, edit the config file faster_rcnn_end2end_resnet.yml under /opt/butd/experiments/cfgs using vim.tiny to add, under the item TEST the following two lines (inside the docker):

TEST:
  SCALES: [300,]
  MAX_SIZE: 400

Reason: "Faster R-CNN is single batched when testing, thus the batch size would not affect the memory usage. And it uses a large size of image (i.e., short edge >= 600, long edge <= 1000 instead of (224, 224) in ResNet) in order to capture small objects. To save memory, the current solution I came up with is to reduce the size of the image."

yezhengli-Mr9 · 2021-01-06T01:40:47Z

@yezhengli-Mr9
I can share with you the notes I wrote down when experimenting with this (it's mainly copy&paste of instructions I found in the repository itself). Perhaps these can help you (note that maybe you will need to change details like folder names):

download Docker container with pre-set environment
docker pull airsplay/bottom-up-attention

download faster-rcnn model in folder data/mscoco_imgfeat/
wget --no-check-certificate 'https://www.dropbox.com/s/2h4hmgcvpaewizu/resnet101_faster_rcnn_final_iter_320000.caffemodel?dl=1' -O data/mscoco_imgfeat/resnet101_faster_rcnn_final_iter_320000.caffemodel

run docker: /path/to/mscoco/images is the folder with your images (in subfolders train, test, valid); features will be saved to lxmert/data/mscoco_imgfeat. Use absolute paths.
docker run --gpus all -v /path/to/mscoco/images:/workspace/images:ro -v $(pwd)/data/mscoco_imgfeat:/workspace/features --rm -it airsplay/bottom-up-attention bash

inside docker:
cd /workspace/features
CUDA_VISIBLE_DEVICES=0 python extract_coco_image.py --split train
CUDA_VISIBLE_DEVICES=0 python extract_coco_image.py --split valid
CUDA_VISIBLE_DEVICES=0 python extract_coco_image.py --split test

if you get CUDA errors, edit the config file faster_rcnn_end2end_resnet.yml under /opt/butd/experiments/cfgs using vim.tiny to add, under the item TEST the following two lines (inside the docker):
TEST:
  SCALES: [300,]
  MAX_SIZE: 400
Reason: "Faster R-CNN is single batched when testing, thus the batch size would not affect the memory usage. And it uses a large size of image (i.e., short edge >= 600, long edge <= 1000 instead of (224, 224) in ResNet) in order to capture small objects. To save memory, the current solution I came up with is to reduce the size of the image."

Hi @freeIsa , just noticed your response. Thanks a lot! and I think yours are an adjusted version of the official tutorial.
However, I get stuck on step 4:

inside docker:
cd /workspace/features
CUDA_VISIBLE_DEVICES=0 python extract_coco_image.py --split train
CUDA_VISIBLE_DEVICES=0 python extract_coco_image.py --split valid
CUDA_VISIBLE_DEVICES=0 python extract_coco_image.py --split test

because I do not see /workspace/features within the docker let alone /workspace/features/extract_nlvr2_image.py I pulled the docker image multiple times and I suspect the image is somehow updated and then /workspace/features/extract_nlvr2_image.py no longer exists.

yezhengli-Mr9 · 2021-01-06T17:59:38Z

OK, I just see issue#3 The file extract_nlvr2_image.py is the same as the python file here. Thanks anyway. Let me try.

LeungWaiHo · 2021-09-13T02:33:09Z

Thank you for your great repo. I am trying to create a colab version of a bunch of V+L models (lxmert, uniter, visualbert etc.). However, due to the RAM limit of colab. It is hard to read the entire huge .tsv files. Would you mind if I split your entire .tsv file by image to multiple tiny files and public it. For sure, I will send it back to you first.

Thank you,

Could you provide the split code of tsv file? Thanks a lot!

yezhengli-Mr9 mentioned this issue Jan 4, 2021

number of data characters(457269) cannot be 1 more than a multiple of 4 #70

Open

yezhengli-Mr9 mentioned this issue Jan 5, 2021

which proto is for generate_tsv.py？ peteanderson80/bottom-up-attention#38

Open

This was referenced Jan 5, 2021

Using LXMERT for image captioning #86

Open

Put back LXMert example huggingface/transformers#9401

Merged

This was referenced Jan 5, 2021

Can not use multiply gpus for bottom-up feature extraction #48

Open

Non-docker way to extract image features? #64

Open

LeungWaiHo mentioned this issue Sep 13, 2021

About the split of tsv file yikuan8/Transformers-VQA#9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split .tsv files #79

Split .tsv files #79

yikuan8 commented Aug 24, 2020

airsplay commented Aug 25, 2020

yezhengli-Mr9 commented Jan 4, 2021

airsplay commented Jan 4, 2021

yezhengli-Mr9 commented Jan 4, 2021

yezhengli-Mr9 commented Jan 4, 2021 •

edited

yezhengli-Mr9 commented Jan 5, 2021

yezhengli-Mr9 commented Jan 5, 2021 •

edited

freeIsa commented Jan 5, 2021

yezhengli-Mr9 commented Jan 6, 2021 •

edited

yezhengli-Mr9 commented Jan 6, 2021

LeungWaiHo commented Sep 13, 2021

Split .tsv files #79

Split .tsv files #79

Comments

yikuan8 commented Aug 24, 2020

airsplay commented Aug 25, 2020

yezhengli-Mr9 commented Jan 4, 2021

airsplay commented Jan 4, 2021

yezhengli-Mr9 commented Jan 4, 2021

yezhengli-Mr9 commented Jan 4, 2021 • edited

yezhengli-Mr9 commented Jan 5, 2021

yezhengli-Mr9 commented Jan 5, 2021 • edited

freeIsa commented Jan 5, 2021

yezhengli-Mr9 commented Jan 6, 2021 • edited

yezhengli-Mr9 commented Jan 6, 2021

LeungWaiHo commented Sep 13, 2021

yezhengli-Mr9 commented Jan 4, 2021 •

edited

yezhengli-Mr9 commented Jan 5, 2021 •

edited

yezhengli-Mr9 commented Jan 6, 2021 •

edited