This repository contains a script to fine-tune a transformer encoder model for Pokemon images classification. The model is based on Vision Transformer (VIT) and has been fine-tuned on 1st Generation Pokemon images.
The final model is able to discriminate between the pokemons from the 1st Generation. And tell when the provided images are not pokemons, or at least when it is unable to recognize them (e.g. pokemons from posterior generations).
Feel free to try it out on my demo hosted on Spaces!
-
Clone this repository:
git clone https://github.com/A-Duss/GottaClassifyEmAll.git cd GottaClassifyEmAll
-
Install the required packages:
pip install -r requirements.txt
You can use my fine-tuned model hosted on Hugging Face Model Hub: Dusduo/Pokemon-classification-1stGen, by running the predict.py
as in the following code:
python predict.py --img_path=./data/sample_imgs/01abra.jpg --load_from_hf
Change the --img_path
value to correspond to the path of the image you want to classify.
Fine-tune the model by running the train.py
script.
python train.py
After fine-tuning, the model can be used to classify images by running predict.py
in the following fashion:
python predict.py --img_path=./data/sample_imgs/01abra.jpg
Change the --img_path
value to correspond to the path of the image you want to classify.
- Pre-trained model: google/vit-base-patch16-224-in21k
- Fine-tuning dataset: Dusduo/1stGen-Pokemon-Images
You can find my final fine-tuned model on Hugging Face Model Hub: Dusduo/Pokemon-classification-1stGen
It achieves the following results on the evaluation set:
- Loss: 0.4182
- F1: 0.9272
Don't forget to try out my demo hosted on Spaces!