Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split .tsv files #79

Open
yikuan8 opened this issue Aug 24, 2020 · 11 comments
Open

Split .tsv files #79

yikuan8 opened this issue Aug 24, 2020 · 11 comments

Comments

@yikuan8
Copy link

yikuan8 commented Aug 24, 2020

Thank you for your great repo. I am trying to create a colab version of a bunch of V+L models (lxmert, uniter, visualbert etc.). However, due to the RAM limit of colab. It is hard to read the entire huge .tsv files. Would you mind if I split your entire .tsv file by image to multiple tiny files and public it. For sure, I will send it back to you first.

Thank you,

@airsplay
Copy link
Owner

Great! You could do that for sure :).

@yezhengli-Mr9
Copy link

How are train2014_obj36.tsv and val2014_obj36.tsv from mscoco_imgfeat generated?
In case I want to run on my personal datasets with personal images.

@airsplay
Copy link
Owner

airsplay commented Jan 4, 2021

@yezhengli-Mr9 Thanks. Please refer to here.

@yezhengli-Mr9
Copy link

here

@airsplay Thanks, I know it is generated from Faster-RCNN. Struggling with incompatibility of different package versions. Is that possible that Mask-RCNN might provide useful *.tsv files as well? (well, personally study Mask-RCNN on other tasks more often).

@yezhengli-Mr9
Copy link

yezhengli-Mr9 commented Jan 4, 2021

@yezhengli-Mr9 Thanks. Please refer to here.

OK, I see the Docker. Let me try.

@yezhengli-Mr9
Copy link

@yezhengli-Mr9 Thanks. Please refer to here.

Did not see /workspace/features within the docker let alone, /workspace/features/extract_nlvr2_image.py. Is it necessary setups for the "regeneration"?

Anyway, I am looking into butd/tools/generate_tsv.py in case /workspace/features/extract_nlvr2_image.py is necessary.

@yezhengli-Mr9
Copy link

yezhengli-Mr9 commented Jan 5, 2021

@yezhengli-Mr9 Thanks. Please refer to here.

Did not see /workspace/features within the docker let alone, /workspace/features/extract_nlvr2_image.py. Is it necessary setups for the "regeneration"?

Anyway, I am looking into butd/tools/generate_tsv.py in case /workspace/features/extract_nlvr2_image.py is necessary.

Retry the docker multiple times, still do not see the not see /workspace/features within the docker let alone, /workspace/features/extract_nlvr2_image.py. @airsplay @jayleicn

@freeIsa
Copy link

freeIsa commented Jan 5, 2021

@yezhengli-Mr9
I can share with you the notes I wrote down when experimenting with this (it's mainly copy&paste of instructions I found in the repository itself). Perhaps these can help you (note that maybe you will need to change details like folder names):

  1. download Docker container with pre-set environment
    docker pull airsplay/bottom-up-attention

  2. download faster-rcnn model in folder data/mscoco_imgfeat/
    wget --no-check-certificate 'https://www.dropbox.com/s/2h4hmgcvpaewizu/resnet101_faster_rcnn_final_iter_320000.caffemodel?dl=1' -O data/mscoco_imgfeat/resnet101_faster_rcnn_final_iter_320000.caffemodel

  3. run docker: /path/to/mscoco/images is the folder with your images (in subfolders train, test, valid); features will be saved to lxmert/data/mscoco_imgfeat. Use absolute paths.
    docker run --gpus all -v /path/to/mscoco/images:/workspace/images:ro -v $(pwd)/data/mscoco_imgfeat:/workspace/features --rm -it airsplay/bottom-up-attention bash

  4. inside docker:
    cd /workspace/features
    CUDA_VISIBLE_DEVICES=0 python extract_coco_image.py --split train
    CUDA_VISIBLE_DEVICES=0 python extract_coco_image.py --split valid
    CUDA_VISIBLE_DEVICES=0 python extract_coco_image.py --split test

  5. if you get CUDA errors, edit the config file faster_rcnn_end2end_resnet.yml under /opt/butd/experiments/cfgs using vim.tiny to add, under the item TEST the following two lines (inside the docker):

TEST:
  SCALES: [300,]
  MAX_SIZE: 400

Reason: "Faster R-CNN is single batched when testing, thus the batch size would not affect the memory usage. And it uses a large size of image (i.e., short edge >= 600, long edge <= 1000 instead of (224, 224) in ResNet) in order to capture small objects. To save memory, the current solution I came up with is to reduce the size of the image."

@yezhengli-Mr9
Copy link

yezhengli-Mr9 commented Jan 6, 2021

@yezhengli-Mr9
I can share with you the notes I wrote down when experimenting with this (it's mainly copy&paste of instructions I found in the repository itself). Perhaps these can help you (note that maybe you will need to change details like folder names):

  1. download Docker container with pre-set environment
    docker pull airsplay/bottom-up-attention
  2. download faster-rcnn model in folder data/mscoco_imgfeat/
    wget --no-check-certificate 'https://www.dropbox.com/s/2h4hmgcvpaewizu/resnet101_faster_rcnn_final_iter_320000.caffemodel?dl=1' -O data/mscoco_imgfeat/resnet101_faster_rcnn_final_iter_320000.caffemodel
  3. run docker: /path/to/mscoco/images is the folder with your images (in subfolders train, test, valid); features will be saved to lxmert/data/mscoco_imgfeat. Use absolute paths.
    docker run --gpus all -v /path/to/mscoco/images:/workspace/images:ro -v $(pwd)/data/mscoco_imgfeat:/workspace/features --rm -it airsplay/bottom-up-attention bash
  4. inside docker:
    cd /workspace/features
    CUDA_VISIBLE_DEVICES=0 python extract_coco_image.py --split train
    CUDA_VISIBLE_DEVICES=0 python extract_coco_image.py --split valid
    CUDA_VISIBLE_DEVICES=0 python extract_coco_image.py --split test
  5. if you get CUDA errors, edit the config file faster_rcnn_end2end_resnet.yml under /opt/butd/experiments/cfgs using vim.tiny to add, under the item TEST the following two lines (inside the docker):
TEST:
  SCALES: [300,]
  MAX_SIZE: 400

Reason: "Faster R-CNN is single batched when testing, thus the batch size would not affect the memory usage. And it uses a large size of image (i.e., short edge >= 600, long edge <= 1000 instead of (224, 224) in ResNet) in order to capture small objects. To save memory, the current solution I came up with is to reduce the size of the image."

Hi @freeIsa , just noticed your response. Thanks a lot! and I think yours are an adjusted version of the official tutorial.
However, I get stuck on step 4:

inside docker:
cd /workspace/features
CUDA_VISIBLE_DEVICES=0 python extract_coco_image.py --split train
CUDA_VISIBLE_DEVICES=0 python extract_coco_image.py --split valid
CUDA_VISIBLE_DEVICES=0 python extract_coco_image.py --split test

because I do not see /workspace/features within the docker let alone /workspace/features/extract_nlvr2_image.py I pulled the docker image multiple times and I suspect the image is somehow updated and then /workspace/features/extract_nlvr2_image.py no longer exists.

@yezhengli-Mr9
Copy link

OK, I just see issue#3 The file extract_nlvr2_image.py is the same as the python file here. Thanks anyway. Let me try.

@LeungWaiHo
Copy link

Thank you for your great repo. I am trying to create a colab version of a bunch of V+L models (lxmert, uniter, visualbert etc.). However, due to the RAM limit of colab. It is hard to read the entire huge .tsv files. Would you mind if I split your entire .tsv file by image to multiple tiny files and public it. For sure, I will send it back to you first.

Thank you,

Could you provide the split code of tsv file? Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants