Description: A comparative study of CNN vs ViT architectures for the purpose of anomaly detection and defect classification in low-resolution leather surface images.
- EfficientNet-B0 - Model modified for required input and output tensors/dimensions.
- Inception-V3 - Model modified for required input and output tensors/dimensions.
- ResNet-50 - Model modified for required input and output tensors/dimensions.
- Vision Transformer 01 - Dataset and preprocessing adapted to the leather dataset, original model structure adhered to.
- Vision Transformer 02 - ViT Model structure modified for optimised results based on the same modified leather dataset.
- MVTec AD - Small but high-res defect benchmark dataset with annotated ground truths (Leather module only)
- Kaggle - Low-resolution open-source Leather Defect Dataset