-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blogs/the-illustrated-image-captioning-using-transformers/ #2
Comments
To train on a large set, you can use a torch data iterator.import torch
from PIL import Image
class ImageCapatioingDataset(torch.utils.data.Dataset):
def __init__(self, ds, ds_type, max_target_length):
self.ds = ds
self.max_target_length = max_target_length
self.ds_type = ds_type
def __getitem__(self, idx):
image_path = self.ds[self.ds_type]['image_path'][idx]
caption = self.ds[self.ds_type]['caption'][idx]
model_inputs = dict()
model_inputs['labels'] = self.tokenization_fn(caption, self.max_target_length)
model_inputs['pixel_values'] = self.feature_extraction_fn(image_path)
return model_inputs
def __len__(self):
return len(self.ds[self.ds_type])
# text preprocessing step
def tokenization_fn(self, caption, max_target_length):
"""Run tokenization on caption."""
labels = tokenizer(caption,
padding="max_length",
max_length=max_target_length).input_ids
return labels
# image preprocessing step
def feature_extraction_fn(self, image_path):
"""
Run feature extraction on images
If `check_image` is `True`, the examples that fails during `Image.open()` will be caught and discarded.
Otherwise, an exception will be thrown.
"""
image = Image.open(image_path)
encoder_inputs = feature_extractor(images=image, return_tensors="np")
return encoder_inputs.pixel_values[0]
train_ds = ImageCapatioingDataset(ds, 'train', 64)
eval_ds = ImageCapatioingDataset(ds, 'validation', 64)
# instantiate trainer
trainer = Seq2SeqTrainer(
model=model,
tokenizer=feature_extractor,
args=training_args,
compute_metrics=compute_metrics,
train_dataset=train_ds,
eval_dataset=eval_ds,
data_collator=default_data_collator,
) |
Hi thanks for the tutorial!
Thanks in advance! |
@yesidc |
I finally managed to run it. For one thing, the |
Unrecognized feature extractor in /content/image-captioning-output. Should have a I am getting this kind of error why is it so? |
The error is occurring in the inference stage when i am trying to load the pipeline. |
Hi Ankur, what if we want multiple captions of the same image? |
Hi Ankur, i want to do something between encoder and decoder, so i define the model as follows: class caption_model(nn.Module):
def __init__(self, args):
super(caption_model, self).__init__()
self.args = args
self.gpt2_type = self.args.gpt2_type
self.config = GPT2Config.from_pretrained('./gpt/' + self.gpt2_type)
self.config.add_cross_attention = True
# self.config.is_decoder = True
self.config.is_encoder_decoder = True
self.encoder = ViTModel.from_pretrained('./vit', local_files_only=True)
self.decoder = GPT2LMHeadModel.from_pretrained('./gpt/'+self.gpt2_type, config=self.config)
def forward(self, pixel_values, input_ids):
image_feat = self.encoder(pixel_values)
encoder_outputs = image_feat.last_hidden_state
# encoder_outputs = do something
output = self.decoder(input_ids=input_ids, encoder_hidden_states=encoder_outputs)
return output.logits while i get some throuble at the inference stage. it seems i should set is_encoder_decoder = True to use the "class BeamSearchEncoderDecoderOutput(ModelOutput):" in the generation_utils.py, but there will "torch.nn.modules.module.ModuleAttributeError: 'GPT2LMHeadModel' object has no attribute 'get_encoder'", |
@newbietuan I think, you should ask this to huggingface https://github.com/huggingface/transformers/issues. They will give you better response. I will try, If I get anything will update you here. |
@Aaryan562 |
You may have to use combination of num_return_sequences, num_beams, penalty_alpha, top_k, top_p etc. You can refer from:
from transformers import pipeline
image_to_text = pipeline("image-to-text", model="nlpconnect/vit-gpt2-image-captioning")
generate_kwargs = {
"num_return_sequences":3,
"num_beams":3
}
image_to_text("https://ankur3107.github.io/assets/images/image-captioning-example.png", generate_kwargs=generate_kwargs) |
Thank you very much! |
Are you sure that if I copy pasted each and every line would not give me any errors? or there are some changes |
I also check the config.json and it had model_type key ='vit' in it then also it is giving value error |
Can you also tell me how to resolve the version issue pls |
got it @Ankur3107 thankyou for the explanation. |
Hi, I also have this issue. Have you found a solution? |
No, i have not are you also getting the error in the inference stage?? |
So, which version of Transformer should we use? |
How to load a custom local dataset using the load_data() ? I have downloaded the flickr30k dataset which has images and captions in separate folders. |
Hi, I also got the same error during inference stage |
I resolved it by downgrading transfomers to |
HELLO, |
Hi @Ankur, thanks for this amazing work. Is there a way to extract the probability for the predicted tokens in inference? Best, |
@katiele47 this seems to solve it :) |
I receive |
I meet the same problem . as your solve , turn out 'You are using a model of type vit to instantiate a model of type vision-encoder-decoder. This is not supported for all configurations of models and can yield errors.' question. |
I find the solution about 'ValueError: Unrecognized feature extractor in ./instagram-captioning-output. Should have a |
dramab's solution "you just add the "feature_extractor_type": "ViTFeatureExtractor" sentence into preprocessor_config.json file" worked for me to avoid the error, however when I run image_captioner("sample_image.png") as the last step I just get a warning and no other output. What is the expected output of running this line? I just get "UserWarning: Using the model-agnostic default |
@pleomax0730 can you provide me your colab please! |
@Aaryan562 did you find the solution for the error? I am also getting the same error. |
Hello Ankur,
Apologies for the delayed response; I couldn't resolve the issue despite
attempting various solutions. Ultimately, I resorted to using a different
model, though it too didn't achieve 100% accuracy. Nevertheless, we
successfully incorporated it into our Final Year Project.
…On Sun, Feb 25, 2024, 2:40 PM rohan9446 ***@***.***> wrote:
@Aaryan562 <https://github.com/Aaryan562> did you find the solution for
the error? I am also getting the same error.
—
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOZD74VYZ2346VFHSEWU3IDYVMIKVAVCNFSM6AAAAAATC5XSUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRSHA4DSNZRGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hi, did u succeeded to solve that? I am trying to solve the exact same problem. |
Hi @Ankur, if I want certain type of caption, can I provide prompt to the model ? I've been trying it but not able to get desired results. |
you may create a variable to keep the result and then print it out |
The Illustrated Image Captioning using transformers - Ankur NLP Enthusiast
The Illustrated Image Captioning using transformers
https://ankur3107.github.io/blogs/the-illustrated-image-captioning-using-transformers/
The text was updated successfully, but these errors were encountered: