Skip to content

Kevin-Hayes-UNCG/Combination_transformer

Repository files navigation

Combination_transformer

Holds a multimodal model that can interact with varying numbers of modalities

This repositorty holds a multimodal model that can detect the type of sleep event that is occuring with a given sample in the CAP sleep database. The point of this is to test whether it is better to combine our inputs early on into a shared language (Called late fusion, as the work is done late into the process) so there can be more interactions between the modalities or if it is better to let those be processed in their own models and then do a smaller ammount of work combining them (Called early fusion, as the work is done earlyer in the process.). [NOTE: Late in the process of wrinting the paper we changed our deffinitions of "Early Fusion" and "Late Fusion". This means that the code that is referenced in here as being early fusion is called late fusion in the paper and vice-versa.]

This model can currently handle text data, arbatrary table data, image data, and EDF data.

In order to get the base models: run example.py, found in the main directory, to construct both the early fusion and late fusion models. NOTE: THIS NEEDS TO BE RUN IN THE MAIN DIRECTORY, NOT IN THE Evaluation DIRECTORY.

The EDF data can be found at the following link: https://physionet.org/content/capslpdb/1.0.0/

  • This is because the data were too large for us to store them in GitHub

The following pretrained models were used for each modality:

  • Text: openai/clip-vit-large-patch14
  • Numerical: openai/clip-vit-large-patch14
  • image: timm/vit_base_patch16_224.augreg_in21k_ft_in1k
  • edf: bookbot/distil-ast-audioset

We have also made thorough use of the torch (I.E PyTorch), numpy, pyedflib, and transformers libraries.

Running get_data.py runs the training and testing phases and reports on the sucsess of the models. Withen this program there are 7 variables:

  • test_early_untrained
  • train_early
  • test_early_trained
  • test_late_untrained
  • train_late
  • test_late_trained
  • use_gemini

Setting one of these to true makes the program do the action associated with it (I.E test_early_untrained makes the untrained early fusion model go through testing, train_early trains the early fusion model, etc.)

Setting use_gemini to True causes the program to work with Enhanced Fusion rather than Basic Fusion.

In order to adjust which modalities are being used, you need to edit the output of the method getitem in MultimodalDataSet (found in get_data.py) to include or exclude the needed modalities. This is due to an issue in the pytorch Dataset class that causes errors when it trys to return None. You need to also make the same changes in train_model and test_model to match the output of the dataloader.

If you want to add new modalities to this, you will need to edit example.py to load these models into the main model, as well as edit get_data.py to include the new modality.

About

Holds a multimodal model that can interact with varying numbers of modalities

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages