Question about the dataset. #111

Rexzhan · 2023-12-30T02:58:45Z

Nice work! But I met an error when trying to reproduce your training progress. Could you tell me how to prepare these three? "coco/annotations/panoptic_train2017_filtrefgumdval.json", "coco/annotations/captions_train2017_filtrefgumdval.json", "coco/annotations/grounding_train2017_filtrefgumd.json".
I think I only got panoptic_train2017_filtrefgumdval_filtvlp/captions_train2017_filtrefgumdval_filtvlp/grounding_train2017_filtrefgumdval_filtvlp following your dataset prepare guide.

MaureenZOU · 2023-12-30T03:26:21Z

Thanks so much for your reminder, I just updated the new files at:

https://huggingface.co/xdecoder/SEEM/blob/main/panoptic_train2017_filtrefgumdval.json
https://huggingface.co/xdecoder/SEEM/blob/main/grounding_train2017_filtrefgumd.json
https://huggingface.co/xdecoder/SEEM/blob/main/coco_train2017_filtrefgumdval_lvis.json
https://huggingface.co/xdecoder/SEEM/blob/main/captions_train2017_filtrefgumdval.json

ziqipang · 2023-12-30T23:48:32Z

@MaureenZOU Thanks for the update! It seems these new files are not reflected on the README yet. To reproduce SEEM, I assume these new files are correct?

Rexzhan · 2023-12-31T03:16:38Z

@ziqipang , @MaureenZOU May i ask when the model consumes coco_caption_karpathy_test.arrow,filtcoco2017val_caption_karpathy_train.arrow... in pretrain_arrows_code224 during training stage? I followed the script TRAIN.md, but the provided config files seem to not be using vlp_dataset/coco_caption_karpathy? Please correct me if I am wrong，，

ziqipang · 2024-01-02T17:08:28Z

@Rexzhan I think XDecoder uses those vision-language data, but SEEM only uses the segmentation data. I am just starting to work in this field. Please double check if my understanding is correct.

xpzwzwz · 2024-01-04T12:04:27Z

Thanks so much for your reminder, I just updated the new files at:

https://huggingface.co/xdecoder/SEEM/blob/main/panoptic_train2017_filtrefgumdval.json https://huggingface.co/xdecoder/SEEM/blob/main/grounding_train2017_filtrefgumd.json https://huggingface.co/xdecoder/SEEM/blob/main/coco_train2017_filtrefgumdval_lvis.json https://huggingface.co/xdecoder/SEEM/blob/main/captions_train2017_filtrefgumdval.json

sorry,but i don't see the file "panoptic_val2017.json " you mentioned in "DATASET.md",could you upload it?thx.

xpzwzwz · 2024-01-04T12:04:55Z

@MaureenZOU sorry,but i don't see the file "panoptic_val2017.json " you mentioned in "DATASET.md",could you upload it?thx.

MaureenZOU · 2024-01-04T14:36:52Z

This could be download from the official website

seungyoungshin · 2024-01-19T03:24:03Z

According to the DATESET.md, I have downloaded related to coco2017 dataset from official website(https://cocodataset.org/#download)
I also create some *.arrow files but I can't create all of files for training.

How to create "4M Image Text Pairs" with ViLT?

4M Image Text Pairs (X-Decoder)
We follow the exact data preparation for the image text pairs data with [ViLT](https://github.com/dandelin/ViLT/blob/master/DATA.md).

# The pretrained arrow file are put under .xdecoder_data/pretrain_arrows_code224 with the following list of files.
["filtcoco2017val_caption_karpathy_train.arrow", "filtcoco2017val_caption_karpathy_val.arrow", "filtcoco2017val_caption_karpathy_restval.arrow"] + ["code224_vg.arrow"] + [f"code224_sbu_{i}.arrow" for i in range(9)] + [f"code224_conceptual_caption_train_{i}.arrow" for i in range(31)]
# ["filtcoco2017val_caption_karpathy_train.arrow", "filtcoco2017val_caption_karpathy_val.arrow", "filtcoco2017val_caption_karpathy_restval.arrow"] are originated from ["filtcoco2017val_caption_karpathy_train.arrow", "filtcoco2017val_caption_karpathy_val.arrow", "filtcoco2017val_caption_karpathy_restval.arrow"] with deletion of coco val2017 overlapped images to avoid information leakage.
To get quick started:

# Download coco karparthy test set (we hack the training data to be coco_caption_karpathy_test.arrow only for quick start in the codebase)
wget https://huggingface.co/xdecoder/X-Decoder/resolve/main/coco_caption_karpathy_test.arrow
After dataset preparation, the dataset structure would be:

.xdecoder_data
└── pretrain_arrows_code224/
    ├── coco_caption_karpathy_test.arrow
    ├── *filtcoco2017val_caption_karpathy_train.arrow
    ├── ...
    ├── *code224_vg.arrow
    ├── *code224_sbu_0.arrow
    ├── ...
    ├── *code224_conceptual_caption_train_0.arrow
    └── ...
* Those datasets are optional for debugging the pipeline. ! NEED to add back when you are training the model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the dataset. #111

Question about the dataset. #111

Rexzhan commented Dec 30, 2023

MaureenZOU commented Dec 30, 2023

ziqipang commented Dec 30, 2023

Rexzhan commented Dec 31, 2023

ziqipang commented Jan 2, 2024

xpzwzwz commented Jan 4, 2024

xpzwzwz commented Jan 4, 2024

MaureenZOU commented Jan 4, 2024

seungyoungshin commented Jan 19, 2024

Question about the dataset. #111

Question about the dataset. #111

Comments

Rexzhan commented Dec 30, 2023

MaureenZOU commented Dec 30, 2023

ziqipang commented Dec 30, 2023

Rexzhan commented Dec 31, 2023

ziqipang commented Jan 2, 2024

xpzwzwz commented Jan 4, 2024

xpzwzwz commented Jan 4, 2024

MaureenZOU commented Jan 4, 2024

seungyoungshin commented Jan 19, 2024