For details, please see the Baseline Paper. If you want to sign up for the challenge, please fill out the form here.
-
MuSe-Perception: predicting 16 different dimensions of social perception (e.g. Assertiveness, Likability, Warmth,...). Official baseline: .3573 mean Pearson's correlation over all 16 classes.
-
MuSe-Humor: predicting the presence/absence of humor in cross-cultural (German/English) football press conference recordings. Official baseline: .8682 AUC.
It is highly recommended to run everything in a Python virtual environment. Please make sure to install the packages listed
in requirements.txt
and adjust the paths in config.py
(especially BASE_PATH
and HUMOR_PATH
and/or PERCEPTION_PATH
, respectively).
You can then, e.g., run the unimodal baseline reproduction calls in the *_full.sh
file provided for each sub-challenge.
The main.py
script is used for training and evaluating models. Most important options:
--task
: choose eitherperception
orhumor
--feature
: choose a feature set provided in the data (in thePATH_TO_FEATURES
defined inconfig.py
). Adding--normalize
ensures normalization of features (recommended foreGeMAPS
features).- Options defining the model architecture:
d_rnn
,rnn_n_layers
,rnn_bi
,d_fc_out
- Options for the training process:
--epochs
,--lr
,--seed
,--n_seeds
,--early_stopping_patience
,--reduce_lr_patience
,--rnn_dropout
,--linear_dropout
- In order to use a GPU, please add the flag
--use_gpu
- Predict labels for the test set:
--predict
- Specific parameter for MuSe-Perception:
label_dim
(one of the 16 labels, cf.config.py
),win_len
andhop_len
for segmentation.
For more details, please see the parse_args()
method in main.py
.
Please note that exact reproducibility can not be expected due to dependence on hardware.
For every challenge, a *_full.sh
file is provided with the respective call (and, thus, configuration) for each of the precomputed features.
Moreover, you can directly load one of the checkpoints corresponding to the results in the baseline paper. Note that
the checkpoints are only available to registered participants.
A checkpoint model can be loaded and evaluated as follows:
main.py --task humor --feature faus --eval_model /your/checkpoint/directory/humor_faus/model_102.pth
We utilize a simple late fusion approach, which averages different models' predictions.
First, predictions for development and test set have to be created using the --predict
option in main.py
.
This will create prediction folders under the folder specified as the prediction directory in config.py
.
Then, late_fusion.py
merges these predictions:
--task
: choose eitherhumor
orperception
--label_dim
: for MuSe-Perception, cf.PERCEPTION_LABELS
inconfig.py
--model_ids
: list of model IDs, whose predictions are to be merged. These predictions must first be created (--predict
inmain.py
orpersonalisation.py
). The model id is a folder under the{config.PREDICTION_DIR}/humor
for humor and{config.PREDICTION_DIR}/perception/{label_dim}
for perception. It is the parent folder of the folders named after the seeds (e.g.101
). These contain the filespredictions_devel.csv
andpredictoins_test.csv
--seeds
: seeds for the respective model IDs.
Checkpoints for the Perception Sub-Challenge
Checkpoints for the Humor Sub-Challenge
The MuSe2024 baseline paper is only available in a preliminary version as of now: https://www.researchgate.net/publication/380664467_The_MuSe_2024_Multimodal_Sentiment_Analysis_Challenge_Social_Perception_and_Humor_Recognition