Skip to content

Implementation of Vision Transformer, ViT paper, for urban sound classification. In Pytorch

Notifications You must be signed in to change notification settings

adinmg/Vit-classifier-pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Urban Sound Classification Using PyTorch Vision Transformer

In this project, I've implemented the Vision Transformer (ViT) architecture to tackle the task of classifying urban sounds.

My goal is to replicate the ViT computer vision model described in the paper titled "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" and adapt it for classifying urban sounds. I've applied this model to the UrbanSound8K dataset.

Libraries Used

To accomplish this project, I've utilized several libraries:

Results

In the "results" folder, you can find a series of CSV files for comparing different image sizes. You can download them to your environment and inspect them in Section 9 with the plot_summary() function.

About

Implementation of Vision Transformer, ViT paper, for urban sound classification. In Pytorch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published