Skip to content

IBM/autotransformer

Repository files navigation

Auto-encoding transformer

  1. mnistdatadouble.py creates dataset for ynet.
  2. gpuautotransformer.py has the code for image auto encoder
  3. imageprep.py has the code for audio autotransformer.
  4. The main code is written from the tutorial found here: https://medium.com/mlearning-ai/vision-transformers-from-scratch-pytorch-a-step-by-step-guide-96c3313c2e0c
  5. The audiomnist dataset can be found at https://www.kaggle.com/datasets/sripaadsrinivasan/audio-mnist
  6. The MFCC code is adapted from https://github.com/aniruddhapal211316/spoken_digit_recognition/blob/main/dataset.py
  7. Ynet100.py and ynet100a.py refer to 100 pairings of one vision input and one audio input respectively with 100 inputs from the cross modality.