- mnistdatadouble.py creates dataset for ynet.
- gpuautotransformer.py has the code for image auto encoder
- imageprep.py has the code for audio autotransformer.
- The main code is written from the tutorial found here: https://medium.com/mlearning-ai/vision-transformers-from-scratch-pytorch-a-step-by-step-guide-96c3313c2e0c
- The audiomnist dataset can be found at https://www.kaggle.com/datasets/sripaadsrinivasan/audio-mnist
- The MFCC code is adapted from https://github.com/aniruddhapal211316/spoken_digit_recognition/blob/main/dataset.py
- Ynet100.py and ynet100a.py refer to 100 pairings of one vision input and one audio input respectively with 100 inputs from the cross modality.
IBM/autotransformer
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
n/a
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published