Image Captioning with Occlusion Analysis

This project implemets a ViT-GPT2-based image captioning model, with experiments analyzing model robustness under different levels of image occlusion (10%, 50%, 80%). Link to dataset: https://github.com/eco-mini/custom_captions_dataset

1. Notebooks
- 20_PartA-1.ipynb: Contains zero-shot smolVLM captioning and ImageCaptioningModel along with training and evaluation on the given dataset in the PS.
- 20_PartB-1.ipynb: Part 1 of the robustness analysis where we evaluate captions generated using smolVLM on occluded images with varying levels of occlusion - [10% , 50%, 80%]
- 20_PartB-2.ipynb: Part 2 of the robustness analysis where we evaluate captions generated using ImageCaptioningModel on occluded images with varying levels of occlusion - [10% , 50%, 80%]
- 20_PartC-1.ipynb: Contains code for a custom BERT Classifier that classifies captions generated into 2 classes
  1. smolVLM (0)
  2. ImageCaptioningModel (1) This notebook trains and evaluates the model.
2. Captions
- Captions - Custom: This folder contains captions generated by the custom captioning model under all above given occlusion levels.
- Captions - SmolVLM: This folder contains captions generated by the smolVLM under all above given occlusion levels.
3. Results
- Results: This folder contains the scores for generated captions (both smolVLM and custom model) under all occlusion levels. Also the file predictions.csv has the results of the CaptionClassifier.

Running Instructions

There are no specific dependencies beyond standard Python libraries and Hugging Face Transformers. All experiments were run in Kaggle Notebooks.

To reproduce results:

Open any notebook in Kaggle.
Ensure GPU is enabled.
Run all cells top to bottom.

The trained model file (best_captioning_model.pt) is loaded directly from a Kaggle dataset in the captioning and evaluation notebooks.

Team Members

Gayathri Anant
Email: gayathrianant05@gmail.com Roll No: 22CS30026
Tuhin Mondal
Email: email2tuhin04@gmail.com Roll No: 22CS10087
Diganta Mandal
Email: digantamindia@gmail.com Roll No: 22CS30062

Thanks and Acknowledgements

We sincerely thank our Deep Learning course instructor and TAs for their support and feedback throughout the course.

We also acknowledge the open-source community and tools that made this work possible:

Hugging Face Transformers
PyTorch
Kaggle Datasets and Notebooks

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Captions - Custom		Captions - Custom
Captions - SmolVLM		Captions - SmolVLM
Results		Results
.DS_Store		.DS_Store
20_A2_Report.docx		20_A2_Report.docx
20_A2_Report.pdf		20_A2_Report.pdf
20_PartA-1.ipynb		20_PartA-1.ipynb
20_PartB-1.ipynb		20_PartB-1.ipynb
20_PartB-2.ipynb		20_PartB-2.ipynb
20_PartC-1.ipynb		20_PartC-1.ipynb
Problem-Statement.pdf		Problem-Statement.pdf
README.md		README.md
classification_dataset.csv		classification_dataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Image Captioning with Occlusion Analysis

Contents

Running Instructions

Team Members

Thanks and Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Ecolash/Image-Captioning-Model

Folders and files

Latest commit

History

Repository files navigation

Image Captioning with Occlusion Analysis

Contents

Running Instructions

Team Members

Thanks and Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages