Some doubts regarding executing the scripts #19

architlatkar27 · 2021-09-10T08:13:38Z

I was following the instructions on this page regarding how to use the builtin mscoco dataset for image captioning. I have doubts on the following points:

In which directory should i have my images and annotations?
I was trying to run the tools/create_feats.py script for converting the karpathy_train_resnet101_faster_rcnn_genome.tsv.0 into .npz format. however I am running out of space on colab pro. What could be the reason for this and is there any fix available?

architlatkar27 · 2021-09-12T06:53:30Z

Another error that i got while running train_net.py
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).

YehLi · 2021-09-12T11:00:42Z

I was following the instructions on this page regarding how to use the builtin mscoco dataset for image captioning. I have doubts on the following points:

In which directory should i have my images and annotations?

I was trying to run the tools/create_feats.py script for converting the karpathy_train_resnet101_faster_rcnn_genome.tsv.0 into .npz format. however I am running out of space on colab pro. What could be the reason for this and is there any fix available?

You can follow the structure:
open_source_dataset/
mscoco_dataset/
annotations files
features/
up_down/*.npz
xmodaler/
configs/
xmodaler/
tools/
train_net.py
Please try the following script:
python2 tools/create_feats.py --infeats bottom_up_tsv --outfolder ./mscoco/feature/up_down_10_100

YehLi · 2021-09-12T11:09:13Z

Another error that i got while running train_net.py
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).

Which pytorch/cuda/driver versions do you use? I test on torch 1.9.0, CUDA 10.2, driver Version: 440.95.01 on P40/V100 and there is no error.

architlatkar27 · 2021-09-12T11:16:43Z

Hey for that, i downgraded pytorch to 1.6.0 and it worked properly

architlatkar27 · 2021-09-14T05:34:05Z

FileNotFoundError: [Errno 2] No such file or directory: '../open_source_dataset/mscoco_dataset/features/mil/resnet101_mil.pkl'

Where can i find this file? and what exactly does it contain?

YehLi · 2021-09-15T16:42:17Z

The features have been uploaded to the mscoco_dataset folder (https://drive.google.com/drive/folders/16rCskpeG5ci02YiE8uR6P8AimV6WIkcB?usp=sharing).

architlatkar27 · 2021-09-15T17:31:41Z

Thanks a lot

architlatkar27 · 2021-09-16T05:54:11Z

Hi even this file is missing -
../open_source_dataset/mscoco_dataset/features/global_feat/resnet101_pool5.pkl'

YehLi · 2021-09-16T09:09:23Z

The file has been uploaded.

architlatkar27 · 2021-09-19T07:30:56Z

Hi thanks to the two files you uploaded, i was able to run the train_net for lstm-A.
However, i am a bit confused as the whole script ran within few seconds without leaving any errors. Here is a snapshot of the last part of the output -

So what exactly does this script do?
Also in this snapshot, it says that it failed to import detectron2. Is it a requirement for image captioning?

Thank for helping out

YehLi · 2021-09-24T06:17:55Z

The first output is about loading data for training and evaluation.
detectron2 is not required for image captioning and other tasks. You can ignore the message.

architlatkar27 · 2021-09-24T06:32:19Z

Hi, thanks for getting back.
How exactly do we put it to training then? If i run train_net.py it just loads the data and exits.

YehLi · 2021-10-12T14:34:17Z

merge the issue into #21

YehLi closed this as completed Oct 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some doubts regarding executing the scripts #19

Some doubts regarding executing the scripts #19

architlatkar27 commented Sep 10, 2021

architlatkar27 commented Sep 12, 2021

YehLi commented Sep 12, 2021

YehLi commented Sep 12, 2021

architlatkar27 commented Sep 12, 2021

architlatkar27 commented Sep 14, 2021 •

edited

YehLi commented Sep 15, 2021

architlatkar27 commented Sep 15, 2021

architlatkar27 commented Sep 16, 2021

YehLi commented Sep 16, 2021

architlatkar27 commented Sep 19, 2021

YehLi commented Sep 24, 2021

architlatkar27 commented Sep 24, 2021

YehLi commented Oct 12, 2021

Some doubts regarding executing the scripts #19

Some doubts regarding executing the scripts #19

Comments

architlatkar27 commented Sep 10, 2021

architlatkar27 commented Sep 12, 2021

YehLi commented Sep 12, 2021

YehLi commented Sep 12, 2021

architlatkar27 commented Sep 12, 2021

architlatkar27 commented Sep 14, 2021 • edited

YehLi commented Sep 15, 2021

architlatkar27 commented Sep 15, 2021

architlatkar27 commented Sep 16, 2021

YehLi commented Sep 16, 2021

architlatkar27 commented Sep 19, 2021

YehLi commented Sep 24, 2021

architlatkar27 commented Sep 24, 2021

YehLi commented Oct 12, 2021

architlatkar27 commented Sep 14, 2021 •

edited