A Vision Transformer for Fine-grained Classification by Reducing Noise and Enhancing Discrimnative Information

PyTorch code for the paper: A Vision Transformer for Fine-grained Classification by Reducing Noise and Enhancing Discrimnative Information

Framework

Dependencies:

python 3.7.3
PyTorch 1.8.0
torchvision 0.9.0
ml_collections 0.1.0
numpy 1.20.1
pandas 1.2.3
scipy 1.6.2

Usage

1. Download Google pre-trained ViT models

We use ViT-B_16 as the backbone. The official download link for pretrained weight https://storage.googleapis.com/vit_models/sam/ViT-B_16.npz.

2. Prepare data

In the paper, we use images from there publicly available datasets:

Please download them from the official websites and put them in the corresponding folders.

In order to read datasets in a unified approach, we use CSV files to read data. Please organize the train set and test set into the following forms:

	img_name	label
0	001.Black_footed_Albatross/Black_Footed_Albatross_0009_34.jpg	1
1	001.Black_footed_Albatross/Black_Footed_Albatross_0074_59.jpg	1
...	...	...
5993	200.Common_Yellowthroat/Common_Yellowthroat_0049_190708.jpg	200

Note:

The label of categories starts from 1 not 0.
The order of each image does not matter.

3. Install required packages

Install dependencies with the following command:

pip install -r requirements.txt

4. Train

You can modify the configuration in the config.py file and run it directly:

python main.py

5 Validate

After model training, you will get the weight in the checkpoints folder. You can modify the configuration in the config.py file and run it directly:

python val.py

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
datasets/CUB		datasets/CUB
README.md		README.md
config.py		config.py
datasets.py		datasets.py
framework.png		framework.png
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt
train.py		train.py
val.py		val.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datasets/CUB

datasets/CUB

README.md

README.md

config.py

config.py

datasets.py

datasets.py

framework.png

framework.png

main.py

main.py

model.py

model.py

requirements.txt

requirements.txt

train.py

train.py

val.py

val.py

Repository files navigation

A Vision Transformer for Fine-grained Classification by Reducing Noise and Enhancing Discrimnative Information

Framework

Dependencies:

Usage

1. Download Google pre-trained ViT models

2. Prepare data

3. Install required packages

4. Train

5 Validate

About

Releases

Packages

Languages

Hector0426/fine-grained-image-classification-with-vit

Folders and files

Latest commit

History

Repository files navigation

A Vision Transformer for Fine-grained Classification by Reducing Noise and Enhancing Discrimnative Information

Framework

Dependencies:

Usage

1. Download Google pre-trained ViT models

2. Prepare data

3. Install required packages

4. Train

5 Validate

About

Resources

Stars

Watchers

Forks

Languages