Urban Sound Classification Using PyTorch Vision Transformer

In this project, I've implemented the Vision Transformer (ViT) architecture to tackle the task of classifying urban sounds.

My goal is to replicate the ViT computer vision model described in the paper titled "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" and adapt it for classifying urban sounds. I've applied this model to the UrbanSound8K dataset.

Libraries Used

To accomplish this project, I've utilized several libraries:

Results

In the "results" folder, you can find a series of CSV files for comparing different image sizes. You can download them to your environment and inspect them in Section 9 with the plot_summary() function.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
results		results
README.md		README.md
ViT_classifier_urban_sound.ipynb		ViT_classifier_urban_sound.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Urban Sound Classification Using PyTorch Vision Transformer

Libraries Used

Results

About

Releases

Packages

Languages

adinmg/Vit-classifier-pytorch

Folders and files

Latest commit

History

Repository files navigation

Urban Sound Classification Using PyTorch Vision Transformer

Libraries Used

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages