# Image Captioning with InternVL


In [4]:
!pip install transformers torch torchvision pandas tqdm pillow huggingface-hub

Defaulting to user installation because normal site-packages is not writeable


## Download HuggingFace InternVL2 model and dataset from hugging face

In [2]:
from download_script import download_internvl2_model, download_vlm4bio_dataset

  from .autonotebook import tqdm as notebook_tqdm


### Download InternVL2

In [4]:
# Uncomment if you want to direcly interact with the model in jupyter
#model, tokenizer = download_internvl2_model()

Downloading OpenGVLab/InternVL2-2B model...


InternLM2ForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From ðŸ‘‰v4.50ðŸ‘ˆ onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.


### Download VLM4Bio Dataset

In [6]:
data_dir = download_vlm4bio_dataset()

Downloading VLM4Bio dataset...


Fetching 31482 files: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 31482/31482 [00:20<00:00, 1563.29it/s]

Dataset downloaded to: /storage/ice1/6/1/tdeatherage3/Image-Captioning/data/VLM4Bio





In [1]:
# Reorganize the weird VM4Bio structure:

import reorganize_vlm4bio

reorganize_vlm4bio.reorganize_vlm4bio_dataset()
reorganize_vlm4bio.verify_reorganization()

Dataset reorganization completed!

Butterfly:
  - Number of images: 10013

Bird:
  - Number of images: 11092

Fish:
  - Number of images: 10347


# Run Image Captioning Script

In [1]:
import internvl_img_caption as caption_images

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
data_dir = "data/VLM4Bio"
output_path = "vlm4bio_captions.csv"

caption_images.process_vlm4bio_dataset(data_dir, output_path)


2024-11-06 10:57:00,071 - INFO - vision_select_layer: -1
2024-11-06 10:57:00,083 - INFO - ps_version: v2
2024-11-06 10:57:00,085 - INFO - min_dynamic_patch: 1
2024-11-06 10:57:00,085 - INFO - max_dynamic_patch: 12
2024-11-06 10:57:01,962 - INFO - num_image_token: 256
2024-11-06 10:57:01,962 - INFO - ps_version: v2
InternLM2ForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From ðŸ‘‰v4.50ðŸ‘ˆ onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
2024-11-06 10:57:14,521 - INFO -

## Split Training and Test data

In [1]:
import train_test_split

In [2]:
train_test_split.create_train_test_split('vlm4bio_captions.csv')


Dataset split statistics:
-------------------------

Bird:
Training samples: 7764
Test samples: 3328

Fish:
Training samples: 7242
Test samples: 3105

Butterfly:
Training samples: 7009
Test samples: 3004


Unnamed: 0,category,image_name,scientific_name,caption,image_path,has_metadata,split
0,Bird,Red_Headed_Woodpecker_0011_182803.jpg,Melanerpes erythrocephalus,This image depicts a Melanerpes erythrocephalu...,data/VLM4Bio/datasets/Bird/images/Red_Headed_W...,True,train
1,Bird,Belted_Kingfisher_0032_70573.jpg,Megaceryle alcyon,"This image depicts Megaceryle alcyon, a majest...",data/VLM4Bio/datasets/Bird/images/Belted_Kingf...,True,test
2,Bird,Slaty_Backed_Gull_0023_796030.jpg,Larus schistisagus,"This image depicts a Larus schistisagus, a sea...",data/VLM4Bio/datasets/Bird/images/Slaty_Backed...,True,train
3,Bird,Brewer_Sparrow_0024_107439.jpg,Spizella breweri,This image depicts a Spizella breweri bird per...,data/VLM4Bio/datasets/Bird/images/Brewer_Sparr...,True,train
4,Bird,Magnolia_Warbler_0090_166087.jpg,Setophaga magnolia,"The image depicts a Setophaga magnolia, a brig...",data/VLM4Bio/datasets/Bird/images/Magnolia_War...,True,test
...,...,...,...,...,...,...,...
31447,Butterfly,Butterfly_imbalanced_train_Heliconius_melpomen...,Heliconius melpomene,"Heliconius melpomene, commonly known as the bu...",data/VLM4Bio/datasets/Butterfly/images/Butterf...,True,train
31448,Butterfly,Butterfly_imbalanced_train_Mechanitis_lysimnia...,Mechanitis lysimnia,The image shows a close-up of two wings of Mec...,data/VLM4Bio/datasets/Butterfly/images/Butterf...,True,test
31449,Butterfly,Butterfly_imbalanced_train_Heliconius_elevatus...,Heliconius elevatus,"Heliconius elevatus, commonly known as the but...",data/VLM4Bio/datasets/Butterfly/images/Butterf...,True,train
31450,Butterfly,Butterfly_imbalanced_train_Heliconius_telesiph...,Heliconius telesiphe,"Heliconius telesiphe, a butterfly with iridesc...",data/VLM4Bio/datasets/Butterfly/images/Butterf...,True,test
