Skip to content

ThomasGorges/pasal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PASAL

This repository contains the implementation of the paper "PASAL: Progress- and sparsity-aware loss balancing for heterogeneous dataset fusion" from Gorges et al. published in Information Fusion.

This is a fork of the initial implementation hosted on the Fraunhofer GitLab. This fork includes changes required for the revision of the article. The last commit from Fraunhofer was commit 6308136f with the message "Replace files with their compressed versions". Subsequent commits were made by the Friedrich-Alexander-Universität Erlangen-Nürnberg.

Citation

If you use PASAL in your research, please cite it as follows:

@article{gorges2025pasal,
  title={PASAL: Progress-and sparsity-aware loss balancing for heterogeneous dataset fusion},
  author={Gorges, Thomas and Scholz, Teresa and Saloman, Stefan and Zinnen, Mathias and Hoffmann, Juliane and Gourmelon, Nora and Maier, Andreas and Hettenkofer, Sebastian and Christlein, Vincent},
  journal={Information Fusion},
  pages={103038},
  year={2025},
  publisher={Elsevier},
  doi={10.1016/j.inffus.2025.103038}
}

Getting started

A Dockerfile is provided to ease with the setup of the environment.

To build the image:

docker build . -t pasal

To create the container:

docker run --gpus all -it -v ./paper:/home/pasal/paper:Z pasal

GPU acceleration is not necessary, but speeds up the calculation of embeddings.

Preprocessing

Preprocessing is split into multiple parts:

Dataset preprocessing

To process the DREAM Olfaction Prediction Challenge datasets and datasets from Pyrfume, run:

cd paper/source/preprocessing/datasets/
./preprocess.sh

Old feature calculation

During the development, the input for the feature selection was frozen to ensure reproducibility. These files are located at source/preprocessing/old_features/.

Steps to reproduce:

cd paper/source/preprocessing/old_features/datasets/
./preprocess.sh
cd ..
python3 fusion.py
python3 to_sdf.py
python3 calculate_map4.py
python3 calculate_mordred_features_.py

PaDEL features were calculated with the GUI, which can be obtained here.

Note that the feature calculation relies on third-party tools, that can produce non-deterministic results.

Feature calculation

To run the feature calculation steps, execute following command, which may take a while:

cd paper/source/preprocessing/
./preprocess.sh

Training

The flag "num_worker" must be adjusted based on the existing hardware setup.

Ablation study:

cd paper/source/
./run_ablation_study.sh
python3 analyze_ablation_study.py descriptors
python3 analyze_ablation_study.py embeddings

Hyperparameter search (single):

cd paper/source/
python3 main.py --study_name 11082023_fold_999 --num_workers 70 --num_models 1000 --random_search_seed 0 --fold_id 999
python3 main.py --study_name 11082023_fold_999 --fetch_results --fold_id 999

Hyperparameter search (ensemble):

cd paper/source/
python3 main.py --study_name 07082023 --num_workers 70 --num_models 1000 --random_search_seed 0
python3 main.py --study_name 07082023 --fetch_results

Output will be saved at paper/output/study_results/.

Results

Single model (non ensemble)

cd paper/source/
python3 analyze_results.py single

Ensemble

cd paper/source/
python3 analyze_results.py ensemble

Respective Z-Score will be printed and the predictions will be saved at paper/output/predictions/.

Retraining models & loss balancing

For constant loss balancing:

python3 main.py --retrain study:///11082023_fold_999_999/718 --fold_id 999 --alpha 0.0 --beta 0.0

Alpha and beta can be evalauted with different values. Evaluated combinations for alpha are 1.4, 1.5 and 1.6 & for beta 0.7, 0.8 and 0.9.

Plots

To reproduce the plots, execute following commands:

cd paper/plots/
python3 [FILE_NAME]

Output will be saved at paper/output/plots/. Significance test is included in the human_performance.py script.

Third-party tools & data

This software uses third-party sources. See the license folder.

Data

Following additional third-party data is used:

This work uses information derived from the IFRA Fragrance Ingredient Glossary, developed by The International Fragrance Association.

License

See LICENSE file.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors