GitHub

Deep Learning in Practice with Python and LUA (VITMAV45) - project work

Topic

Convolutional Neural Network (CNN) vs Vision Transformer (ViT) for Cloud Image Classification

Team details

name: Obscure Intelligence

Barancsuk Lilla - BMMMRS
Kássa Kristóf - HGNB1P

Abstract

Convolution Neural Network (CNN) algorithms have been prominent models for image classification, but in recetn years Transformer based methods have also started to gain popularity and usage. In an attempt to get a clear view and understanding of the two architecture for image classification tasks on a cloud dataset of approximately 2000 data points, the project is designed to compare the charasteristics of CNN and Vision Transformer (ViT). For each models we provide a clear and comprehensive review of architectural and functional differences. Then we compare their computing capacity requirements, validation accuracy and training time on an online image dataset with our own implementation of input pipeline from scratch.

Documentation

/preprocessing

main.py: Cloud image dataset download and preprocess script.
definitions.py: Basic category-definitions
load_data.py: Download script
visualization.py: Visualization of the data
preprocessing.py: Data preparation for learning to create teaching, validation and test inputs and outputs.
scaler.py: Dataset standardization.

CNN_vs_ViT.ipybn

Model building and evaluation in Google Colab

CNN_vs_ViT_model_comparison.ipybn

Build and evalutate ViT and CNN models with various hyperparameters in Google Colab.

/documentation

CNN_vs_ViT.tex: LaTex source code of the documentation
CNN_vs_ViT.pdf: Documentation PDF
bibliography.bib: Bibliography file containing the cited references
nips_2016.sty: NIPS 2016 LaTex style file
/figures: Figures included in the documentation

test_set.zip

Independent test set for evaluating the models

Milestone 1

Open Milestone 1 in Google Colab here.

Milestone 2

Transfomer achitecture

Open the transformer code in Google Colab here.

For file preprocessing, run cells under the section "1.) Data preprocessing" in the notebook.

For training the network, run cells under the section "2.) Training ViT model" in the notebook.

For model evaluation, run cells under the section "3.) Evaluating ViT model" in the notebook.

CNN architecture

Open the CNN code in the same Google Colab here.

For file preprocessing, run cells under the section "1.) Data preprocessing" in the notebook.

For training the cnn network, run cells the section "4.) Training CNN model".

For evaluating the cnn network, run cells the section "5.) Evaluating CNN model".

Milestone 3

Define, evaluate and compare CNN and ViT models with various hyperparameter sets. Open the comparison of the various models in Google Colab here.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
documentation		documentation
preprocessing		preprocessing
.gitignore		.gitignore
CNN_vs_ViT.ipynb		CNN_vs_ViT.ipynb
CNN_vs_ViT_model_comparison.ipynb		CNN_vs_ViT_model_comparison.ipynb
LICENSE		LICENSE
README.md		README.md
persentation.pdf		persentation.pdf
test_data.zip		test_data.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning in Practice with Python and LUA (VITMAV45) - project work

Topic

Team details

Abstract

Documentation

Milestone 1

Milestone 2

Transfomer achitecture

CNN architecture

Milestone 3

About

Releases

Packages

Contributors 2

Languages

License

Lilol/DeepLearningHw

Folders and files

Latest commit

History

Repository files navigation

Deep Learning in Practice with Python and LUA (VITMAV45) - project work

Topic

Team details

Abstract

Documentation

Milestone 1

Milestone 2

Transfomer achitecture

CNN architecture

Milestone 3

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages