Fine tuning DiT for object detection task #175

deshwalmahesh · 2022-08-28T06:01:16Z

I want to fine tune DiT for object detection (text, diagrams detection only) etc for my own dataset. Been searching through the web for quite some time but could not find anything on fine tuning a Transformers backbone for object detection.

Yout github answer for DETR for custom backbone describes how to change the backbone as you said that you can use ANY models from timm library and since there are almost 890 models present but unfortunately, not DiT.
HuggingFace model supports Feature Extraction as BeitFeatureExtractor.from_pretrained("microsoft/dit-large") so I think it could be used as a backbone but I found nothing on this one either.

I tried changing the code on your tutorial for how to train DETR on custom data by replacing code in Cell 8,

#feature_extractor = DetrFeatureExtractor.from_pretrained("facebook/detr-resnet-50")

feature_extractor = BeitFeatureExtractor.from_pretrained("microsoft/dit-large")

but while running the code for Cell 11,

from torch.utils.data import DataLoader

def collate_fn(batch):
  pixel_values = [item[0] for item in batch]
  encoding = feature_extractor.pad_and_create_pixel_mask(pixel_values, return_tensors="pt")
  labels = [item[1] for item in batch]
  batch = {}
  batch['pixel_values'] = encoding['pixel_values']
  batch['pixel_mask'] = encoding['pixel_mask']
  batch['labels'] = labels
  return batch

train_dataloader = DataLoader(train_dataset, collate_fn=collate_fn, batch_size=4, shuffle=True)
val_dataloader = DataLoader(val_dataset, collate_fn=collate_fn, batch_size=2)
batch = next(iter(train_dataloader))

it gave me error as:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-11-446d81c845dd> in <module>
     13 train_dataloader = DataLoader(train_dataset, collate_fn=collate_fn, batch_size=4, shuffle=True)
     14 val_dataloader = DataLoader(val_dataset, collate_fn=collate_fn, batch_size=2)
---> 15 batch = next(iter(train_dataloader))

5 frames
/usr/local/lib/python3.7/dist-packages/transformers/feature_extraction_utils.py in __getitem__(self, item)
     85         """
     86         if isinstance(item, str):
---> 87             return self.data[item]
     88         else:
     89             raise KeyError("Indexing with integers is not available when using Python based feature extractors")

KeyError: 'labels'

Can you please help me with the problem at hand?

Thank you :)

The text was updated successfully, but these errors were encountered:

arvisioncode · 2023-07-10T09:55:11Z

any solution to this?

deshwalmahesh · 2023-07-20T06:01:34Z

No Solutions to this. For now, you can use Detectron 2 as is given in the official Dit for object detection

NielsRogge · 2023-07-20T07:34:28Z

Yes for the moment you need to use Detectron2 if you want to use DiT + Mask R-CNN.

However I'm working on adding support for it in Transformers

vbeutner · 2024-02-14T22:24:55Z

Hi @NielsRogge Any update on this? I assume it's probably lower prio for you. Just curious

vegabs · 2024-04-16T15:38:58Z

Downgrading transformers to version 4.32.0 worked for me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine tuning DiT for object detection task #175

Fine tuning DiT for object detection task #175

deshwalmahesh commented Aug 28, 2022

arvisioncode commented Jul 10, 2023

deshwalmahesh commented Jul 20, 2023

NielsRogge commented Jul 20, 2023

vbeutner commented Feb 14, 2024

vegabs commented Apr 16, 2024

Fine tuning DiT for object detection task #175

Fine tuning DiT for object detection task #175

Comments

deshwalmahesh commented Aug 28, 2022

arvisioncode commented Jul 10, 2023

deshwalmahesh commented Jul 20, 2023

NielsRogge commented Jul 20, 2023

vbeutner commented Feb 14, 2024

vegabs commented Apr 16, 2024