LKCA Official Repository/Mini ViTs Trainer

Welcome to the official GitHub repository for LKCA! This repository is designed to be a centralized resource for developers, researchers, and tech enthusiasts to explore and implement the cutting-edge in machine learning technology. We focus on providing high-quality code, pre-trained models, and comprehensive documentation to help you leverage state-of-the-art technologies across various applications. 🚀

What You Can Do with This Repository

1. Official Repository of LKCA 📚

As the official repository for LKCA, we offer a suite of tools and libraries to support the development of machine learning and artificial intelligence projects. Whether you're looking for the latest research findings or reliable code implementations, this is your go-to starting point.

2. Train Various ViTs on Small Datasets with Just a single GPU 💻

This repository emphasizes machine learning practices in resource-constrained environments, specifically training various Vision Transformers (ViTs) on small datasets. We understand that not everyone has access to large-scale computational resources, so we provide optimized algorithms and practical advice for effective training with just a single GPU. Our goal is to lower the entry barrier, enabling more people to innovate and research with the latest technologies.

Train a Vision Transformer on a 2080Ti in Just 4 Hours 🕓

Efficient training of machine learning models on accessible hardware is a critical need for researchers and developers with limited resources. This repository enables practical training of Vision Transformer (ViT) models from scratch on a single 2080Ti GPU within just four hours 🚀. It's a step towards making advanced AI models more accessible to a broader audience.

What You Can Find Here

Training Code for Mainstream Small-Scale Datasets 📚: We provide ready-to-use training scripts for CIFAR-10, CIFAR-100, SVHN, and Tiny ImageNet. These datasets allow for quick experimentation and development, ideal for those looking to work within hardware or time constraints.
Variety of ViT Models 🔍: The repository hosts a selection of over ten ViT variants, including but not limited to ViT, CaiT, CvT, MobileViT, CrossViT, DeepViT, RegionViT, RvT, and T2T. This variety ensures that you can explore different architectures and find the one that suits your project's needs best.

This repository aims to provide a straightforward, no-frills approach to training ViT models 😊. By focusing on small-scale datasets and offering a selection of ViT architectures, we hope to facilitate easier entry into the field of computer vision for those without access to large computational resources.

Whether you're exploring different ViT architectures for academic purposes or developing applications with state-of-the-art AI technology, this repository offers the tools and resources needed to achieve efficient results without extensive computational power 💪.

Getting Started with Our Repository 🚀

Welcome to our repository! Before diving into training Vision Transformers (ViTs) on a 2080Ti in just 4 hours, let's set up your environment. Follow these steps to ensure you have all the necessary packages and dependencies installed 🛠️.

Requirements 📋

To get started, you'll need to install the required Python packages. We recommend creating a new conda environment and then using pip to install the provided requirements.txt packages. Follow these steps in your terminal:

conda create --name myenv python=3.8
conda activate myenv
pip install -r requirements.txt

Here are the specific versions of the packages you'll need:

einops==0.7.0 numpy==1.18.0 ply==3.11 pyasn1-modules==0.2.8 PyQt5-sip==12.11.0 requests-oauthlib==1.3.0 tensorboard==2.10.0 timm==0.6.7 torch==1.10.0 torchsummary==1.5.1 torchvision==0.11.1 tqdm==4.66.1 colorama==0.4.6 setuptools==59.5.0 google-auth==2.6.0 google-auth-oauthlib==0.4.2 grpcio==1.48.2 protobuf==3.16.0 six==1.16.0

Ensure that you have Python 3.7 or newer installed on your machine before proceeding with the installation of these packages 🐍. This setup is essential for running the training scripts and utilizing the full potential of our repository 🌟.

Train & Test

To train any model in our repository, you can use the following command, which allows you to specify the GPU devices, model, dataset, and number of epochs. This flexibility ensures you can tailor the training process to your specific needs and hardware capabilities.

CUDA_VISIBLE_DEVICES={0-7} python train.py --model {model name} --dataset {dataset name}

Available Model Names

You can specify one of the following model names using the --model parameter:

vit, vit-lkca, mbv2, swin-s, swin-t, cait-s, cait-t, t2t-b, cvt-b, deepvit-t, deepvit-s, rvt-s, rvt-t, regionvit-b, crossvit-t, crossvit-s, xcit-s, xcit-t, twinssvt-b, twinssvt-s

Each model offers unique configurations and capabilities, ranging from standard ViT models to specialized architectures like MobileViT, Swin Transformer, and more.

Supported Datasets

The repository supports training and testing on the following datasets, which you can specify using the --dataset parameter:

CIFAR10 - A dataset of 60,000 32x32 color images in 10 classes, with 6,000 images per class.
CIFAR100 - Similar to CIFAR10 but with 100 classes.
SVHN - The Street View House Numbers (SVHN) dataset, a real-world image dataset for developing machine learning and object recognition algorithms.
T-IMNET (Tiny ImageNet) - A scaled-down version of the ImageNet dataset, consisting of 200 classes, each with 500 training images.

Example Usage

For example, to train a vit model on the CIFAR10 dataset for 100 epochs, your command would look like this:

CUDA_VISIBLE_DEVICES=0 python train.py --model vit --dataset CIFAR10 --epochs 100

Calculating Model Parameters and Computational Complexity 📊

To understand the efficiency and demands of each model, you can calculate the model's parameters and its computational complexity using the count.py script. This script provides valuable insights into the model's size and the computational resources it requires, which is crucial for evaluating its suitability for your specific hardware and use case. 🔍

How to Use 🛠️

Run the following command in your terminal, replacing {model name} with the name of the model you wish to evaluate:

python count.py --model {model name}

Adding and Running Your Own Custom Model 🛠️

Integrating and testing your own custom models within our framework is straightforward and can be accomplished in just two steps.

Step 1: Add Your Custom Model

First, you need to place your custom-defined model into the models folder. Your model should be a PyTorch nn.Module. Ensure your model file follows the best practices for defining PyTorch models, including proper initialization and forward pass definitions.

Step 2: Register Your Model

Next, you will need to import and register your model within the create_model.py file. To do this, add an import statement at the top of the file to include your custom model. Then, add the following code snippet to the model creation logic, replacing 'model name' with your model's unique identifier and Your_Model with the class name of your custom model.

elif args.model == 'model name':
    model = Your_Model(**kargs)

Experimental Results

The table below presents the experimental results of various models on the Tiny-ImageNet dataset, including model accuracy, the number of parameters, and floating-point operations (Flops).

Model	Tiny-ImageNet Accuracy (%)	# Parameters	Flops
T2T-T	53.92	1.07M	78.23M
RvT-T	50.65	1.10M	57.61M
Swin-T	54.93	1.06M	38.90M
CaiT-T	54.76	1.03M	61.85M
XCiT-T	56.78	0.96M	51.44M
ViT-Lite	53.46	1.11M	69.64M
DeepViT-T	34.64	0.99M	62.96M
RegionViT-T	54.32	0.97M	29.38M
CrossViT-T	47.03	1.04M	57.59M
LKCA-T	57.29	1.07M	66.10M
T2T-S	41.25	2.56M	52.96M
RvT-S	55.51	2.72M	145.09M
Swin-S	58.61	2.93M	95.55M
CaiT-S	59.21	2.77M	164.46M
XCiT-S	60.09	2.81M	157.54M
ViT-Small	55.74	2.76M	176.06M
DeepViT-S	44.45	2.54M	162.85M
Twins SVT-S	37.13	2.76M	197.00M
RegionViT-S	53.96	2.86M	53.82M
CrossViT-S	52.70	2.40M	126.11M
LKCA-B	60.95	2.76M	172.78M
T2T-B	58.46	13.45M	853.02M
CvT-B	55.88	6.52M	102.56M
MobileViTv2	58.28	8.17M	189.77M
Twins SVT-B	49.24	9.04M	308.74M
RegionViT-B	57.83	12.39M	195.02M
LKCA-L	63.43	12.65M	802.99M

For more experimental data and details, please refer to the preprint paper LKCA: Large Kernel Convolutional Attention.

Acknowledgements

We would like to express our great appreciation to the code authors of the pytorch-image-models and Vision Transformer for Small-Size Datasets repositories, for their great help to the machine learning community.

Citation

If you would like to cite this work, you can use the following citation format:

@article{li2024lkca,
title={LKCA: Large Kernel Convolutional Attention},
author={Li, Chenghao and Zeng, Boheng and Lu, Yi and Shi, Pengbo and Chen, Qingzi and Liu, Jirui and Zhu, Lingyun},
journal={arXiv preprint arXiv:2401.05738},
year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
models		models
utils		utils
README.md		README.md
count.py		count.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models

models

utils

utils

README.md

README.md

count.py

count.py

main.py

main.py

requirements.txt

requirements.txt

Repository files navigation

LKCA Official Repository/Mini ViTs Trainer

What You Can Do with This Repository

1. Official Repository of LKCA 📚

2. Train Various ViTs on Small Datasets with Just a single GPU 💻

Train a Vision Transformer on a 2080Ti in Just 4 Hours 🕓

What You Can Find Here

Getting Started with Our Repository 🚀

Requirements 📋

Train & Test

Available Model Names

Supported Datasets

Example Usage

Calculating Model Parameters and Computational Complexity 📊

How to Use 🛠️

Adding and Running Your Own Custom Model 🛠️

Step 1: Add Your Custom Model

Step 2: Register Your Model

Experimental Results

Acknowledgements

Citation

About

Releases

Packages

Languages

CatworldLee/LKCA-MiniViTsTrainer-Pytorch-CIFAR-TinyImageNet

Folders and files

Latest commit

History

Repository files navigation

LKCA Official Repository/Mini ViTs Trainer

What You Can Do with This Repository

1. Official Repository of LKCA 📚

2. Train Various ViTs on Small Datasets with Just a single GPU 💻

Train a Vision Transformer on a 2080Ti in Just 4 Hours 🕓

What You Can Find Here

Getting Started with Our Repository 🚀

Requirements 📋

Train & Test

Available Model Names

Supported Datasets

Example Usage

Calculating Model Parameters and Computational Complexity 📊

How to Use 🛠️

Adding and Running Your Own Custom Model 🛠️

Step 1: Add Your Custom Model

Step 2: Register Your Model

Experimental Results

Acknowledgements

Citation

About

Resources

Stars

Watchers

Forks

Languages