Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some doubts regarding executing the scripts #19

Closed
architlatkar27 opened this issue Sep 10, 2021 · 13 comments
Closed

Some doubts regarding executing the scripts #19

architlatkar27 opened this issue Sep 10, 2021 · 13 comments

Comments

@architlatkar27
Copy link

I was following the instructions on this page regarding how to use the builtin mscoco dataset for image captioning. I have doubts on the following points:

  1. In which directory should i have my images and annotations?
  2. I was trying to run the tools/create_feats.py script for converting the karpathy_train_resnet101_faster_rcnn_genome.tsv.0 into .npz format. however I am running out of space on colab pro. What could be the reason for this and is there any fix available?
@architlatkar27
Copy link
Author

Another error that i got while running train_net.py
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).

@YehLi
Copy link
Owner

YehLi commented Sep 12, 2021

I was following the instructions on this page regarding how to use the builtin mscoco dataset for image captioning. I have doubts on the following points:

  1. In which directory should i have my images and annotations?
  2. I was trying to run the tools/create_feats.py script for converting the karpathy_train_resnet101_faster_rcnn_genome.tsv.0 into .npz format. however I am running out of space on colab pro. What could be the reason for this and is there any fix available?
  1. You can follow the structure:
    open_source_dataset/
    mscoco_dataset/
    annotations files
    features/
    up_down/*.npz
    xmodaler/
    configs/
    xmodaler/
    tools/
    train_net.py
  2. Please try the following script:
    python2 tools/create_feats.py --infeats bottom_up_tsv --outfolder ./mscoco/feature/up_down_10_100

@YehLi
Copy link
Owner

YehLi commented Sep 12, 2021

Another error that i got while running train_net.py
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, invalid usage, NCCL version 2.7.8
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).

Which pytorch/cuda/driver versions do you use? I test on torch 1.9.0, CUDA 10.2, driver Version: 440.95.01 on P40/V100 and there is no error.

@architlatkar27
Copy link
Author

Hey for that, i downgraded pytorch to 1.6.0 and it worked properly

@architlatkar27
Copy link
Author

architlatkar27 commented Sep 14, 2021

FileNotFoundError: [Errno 2] No such file or directory: '../open_source_dataset/mscoco_dataset/features/mil/resnet101_mil.pkl'

Where can i find this file? and what exactly does it contain?

@YehLi
Copy link
Owner

YehLi commented Sep 15, 2021

The features have been uploaded to the mscoco_dataset folder (https://drive.google.com/drive/folders/16rCskpeG5ci02YiE8uR6P8AimV6WIkcB?usp=sharing).

@architlatkar27
Copy link
Author

Thanks a lot

@architlatkar27
Copy link
Author

Hi even this file is missing -
../open_source_dataset/mscoco_dataset/features/global_feat/resnet101_pool5.pkl'

@YehLi
Copy link
Owner

YehLi commented Sep 16, 2021

The file has been uploaded.

@architlatkar27
Copy link
Author

Hi thanks to the two files you uploaded, i was able to run the train_net for lstm-A.
However, i am a bit confused as the whole script ran within few seconds without leaving any errors. Here is a snapshot of the last part of the output -
image
So what exactly does this script do?
Also in this snapshot, it says that it failed to import detectron2. Is it a requirement for image captioning?
image

Thank for helping out

@YehLi
Copy link
Owner

YehLi commented Sep 24, 2021

  1. The first output is about loading data for training and evaluation.
  2. detectron2 is not required for image captioning and other tasks. You can ignore the message.

@architlatkar27
Copy link
Author

Hi, thanks for getting back.
How exactly do we put it to training then? If i run train_net.py it just loads the data and exits.

@YehLi
Copy link
Owner

YehLi commented Oct 12, 2021

merge the issue into #21

@YehLi YehLi closed this as completed Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants