This repository gathers the code for car image classification from the in-class Kaggle challenge. See more details in report.
Our model achieve 95.04% accuracy in testing set.
To reproduct my submission without retrainig, do the following steps:
All requirements should be detailed in requirements.txt. Using Anaconda is strongly recommended.
conda create -n carClassifier python=3.7
conda activate carClassifier
pip install -r requirements.txt
Official image can be downloaded from Kaggle challenge or just downloaded from my repository.
After downloading images from Kaggle challenge, we expect the data directory is structured as:
data
+- training_data # all training data from kaggle
∣- 000001.jpg
∣- 000002.jpg
∣- ..........
+- testing_data # all testing data from kaggle
∣- 000004.jpg
∣- 000005.jpg
∣- ..........
- training_labels.csv # csv file contain img's id and label
Run the following command to build the data directory above
Run:
mkdir data\train
mkdir data\val
python src/dataPreprocessing.py
After run the command, the data directory should be following struture:
data
+- training_data # all training data from kaggle
+- testing_data # all testing data from kaggle
+- train # training set split
+- val # validatoin set split from training_data
- train.csv # record img's id and label in train folder
- val.csv # record img's id and albel in val folder
- training_labels.csv # csv file contain img's id and label
- training_labelsWithInt.csv # csv file contain img's id and label and label(int)
Training configuration can be specified in src/configs.py
.
Then, run:
python src/train.py
Default model will use pretrain ResNet-50.
In addition, you can use parameter -m "model_name"
specify what kinds of pretrained model you want to train.
For instance:
python src/train.py -m "resnet50"
Pretrained model come from PyTorch office [1], rwightman/pytorch-image-models [2]. Avaliable pretrained model_name
in this task are showed below:
"resnet50", "resnet101", "tresnet_l", "tresnet_m", "densenet121", "resnext50_32x4d", "resnext101_32x8d"
Trained model will be save as src/model/trained_model/model.pth
If trained model are prepared, use it to infer your testing data.
Run:
python src/infer.py -m "model_name" -mp "trained_model_path"
This will save the testing predictions in test_result/test_pred.csv
.
In the task, we apply bagging (Majority vote) method, one kinds of ensemble, to improve our accuracy in testing set.
Make sure your testing results (*.csv) in Inference stage inside ./test_result
. The program will use all .csv file in ./test_result
to perform bagging (Majority vote) method.
Run:
python src/ensemble.py
Result will be save in ./ensemble.csv
.
For Reproducing Submission: There are some testing result in ./test_result
that we tested before. You can directly run python src/ensemble.py
without traning.
Provide more argument parsing parameters to make the user easier to use our program.