COCO Caption Classification

This project focuses on Multimodal Classification using the COCO dataset. It evaluates the model’s ability to align visual semantics with complex linguistic context through Zero-shot and Few-shot learning.

Instead of standard categorical classification, the model is presented with an image and a set of $N$ candidate text descriptions ($N \ge 12$). The objective is to identify the one true ground-truth description, overcoming the $N-1$ distractors.

📂 Project Structure

Coco_caption_classification/
├── streamlit_app.py                 # Streamlit demo entry
├── requirements_streamlit.txt       # Streamlit dependencies
├── README.md                        # Documentation
├── coco_subset_images/              # Extracted image data + JSON
│   ├── coco_multimodal_subset.json  # COCO subset metadata
│   └── images/                      # Image files
├── docs/                            # Results landing page and assets
│   ├── index.html                   # Results dashboard
│   ├── images/                      # Figures and diagrams
│   └── plot/                        # Plotly JSON exports
│       ├── acc.json
│       ├── f1_macro.json
│       ├── f1_micro.json
│       ├── f1_weighted.json
│       └── time.json
├── models/                          # Checkpoints
│   ├── best_rn50_8shot.pth
│   └── best_vit_b32_8shot.pth
├── notebooks/                       # Training and experiment logs
│   ├── train_colab.ipynb            # Colab training notebook
│   └── wandb/                       # W and B runs
├── settings/                        # Configuration and requirements
│   ├── config.yaml                  # Model and W and B variables
│   └── requirements.txt             # Primary environment targets
└── src/                             # Core code
	├── __init__.py
	├── data_loader.py               # Dataset objects, subset loading and splits
	├── model_arch.py                # Wrapper models based on openai/CLIP
	├── notebook_utils.py            # Notebook helpers
	├── utils.py                     # Evaluation functions and counters
	└── Image_ID_Exclusive_Sampling.ipynb # Data subset generation

🚀 How to Run

Development & Training

Clone the project to your drive/local machine.
Upload the coco_multimodal_subset.json and the corresponding images to coco_subset_images/.
Open notebooks/train_colab.ipynb in Colab.
Mount your drive and install dependencies from settings/requirements.txt.
Run the cells to train. The configuration can be modified in settings/config.yaml.

Streamlit Demo (Local)

To launch the Streamlit demo:

pip install -r requirements_streamlit.txt
streamlit run streamlit_app.py

The demo supports Zero-shot and 8-shot inference with attention visualizations.

Streamlit Demo (Hugging Face Spaces)

Create a new Space and select the Streamlit SDK.
Add the following files to the Space repo:
- streamlit_app.py
- requirements.txt (copy from requirements_streamlit.txt)
- src/ and models/ (or download weights inside the app)
Commit and push to trigger the build.

🔬 Core Workflow

Vision Extractor: Converts inputs using normalizations standard to CLIP variants (ViT-B/32 or RN50).
Text Extractor: Standardizes $N$-size list of texts into embedding tensors.
Linear Classification Probing: Employs Cross-Entropy across dot-product similarities spanning $k={0, 8}$.
Analysis: Metrics (Acc, Macro-F1), inference compute sizes, and GradCAM/EigenCam map generations.

📊 Report and Info

Please refer to the GitHub Pages Site or the provided Youtube Links for the complete metric visualization and architectural diagrams breakdown.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COCO Caption Classification

📂 Project Structure

🚀 How to Run

Development & Training

Streamlit Demo (Local)

Streamlit Demo (Hugging Face Spaces)

🔬 Core Workflow

📊 Report and Info

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
coco_subset_images		coco_subset_images
docs		docs
models		models
notebooks		notebooks
settings		settings
src		src
.gitignore		.gitignore
README.md		README.md
requirements_streamlit.txt		requirements_streamlit.txt
streamlit_app.py		streamlit_app.py

Folders and files

Latest commit

History

Repository files navigation

COCO Caption Classification

📂 Project Structure

🚀 How to Run

Development & Training

Streamlit Demo (Local)

Streamlit Demo (Hugging Face Spaces)

🔬 Core Workflow

📊 Report and Info

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages