A Vision Transformer for Fine-grained Classification by Reducing Noise and Enhancing Discrimnative Information
PyTorch code for the paper: A Vision Transformer for Fine-grained Classification by Reducing Noise and Enhancing Discrimnative Information
- python 3.7.3
- PyTorch 1.8.0
- torchvision 0.9.0
- ml_collections 0.1.0
- numpy 1.20.1
- pandas 1.2.3
- scipy 1.6.2
We use ViT-B_16 as the backbone. The official download link for pretrained weight https://storage.googleapis.com/vit_models/sam/ViT-B_16.npz.
In the paper, we use images from there publicly available datasets:
Please download them from the official websites and put them in the corresponding folders.
In order to read datasets in a unified approach, we use CSV files to read data. Please organize the train set and test set into the following forms:
img_name | label | |
---|---|---|
0 | 001.Black_footed_Albatross/Black_Footed_Albatross_0009_34.jpg | 1 |
1 | 001.Black_footed_Albatross/Black_Footed_Albatross_0074_59.jpg | 1 |
... | ... | ... |
5993 | 200.Common_Yellowthroat/Common_Yellowthroat_0049_190708.jpg | 200 |
Note:
- The label of categories starts from 1 not 0.
- The order of each image does not matter.
Install dependencies with the following command:
pip install -r requirements.txt
You can modify the configuration in the config.py
file and run it directly:
python main.py
After model training, you will get the weight in the checkpoints folder. You can modify the configuration in the config.py
file and run it directly:
python val.py