Follow these steps to set up the conda environment for the MultiTab project:
- Anaconda or Miniconda installed on your system
- NVIDIA GPU with CUDA support (recommended for training)
-
Create the conda environment:
conda env create -f environment.yml
-
Activate the environment:
conda activate multitab
-
Verify the installation:
python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}')"
This project supports three datasets for multitask learning experiments:
- Higgs Dataset: High-energy physics dataset for binary classification with additional regression targets
- ACS Income Dataset: American Community Survey data for income prediction and demographic analysis
- AliExpress Dataset: E-commerce dataset for click-through rate and conversion prediction
The easiest way to set up all datasets is to use our preprocessed H5 files available on Hugging Face:
-
Configure the data root directory: Edit
download_data.shand set your desired data root:DATA_ROOT="/path/to/your/data/" -
Make the script executable:
chmod +x download_data.sh
-
Run the download script:
./download_data.sh
This will automatically download all three preprocessed datasets (Higgs, AliExpress, and ACS Income) in H5 format and organize them in the correct directory structure.
If you prefer to download the datasets from their original sources and perform the preprocessing yourself, please refer to the manual dataset setup instructions.
Once you have set up your datasets, you can run experiments using the provided training script:
-
Make the script executable:
chmod +x run.sh
-
Configure the experiment parameters: Edit the variables at the top of
run.sh:DATA_ROOT="/path/to/data/" # Path to your processed datasets MODEL_NAME="mtt" # Model to use (mtt, mmoe, ple, etc.) DATASET="acs_income" # Dataset name (acs_income, higgs, etc.) GPU_ID=0 # GPU ID for training SEED=42 # Random seed for reproducibility PATIENCE=5 # Early stopping patience
-
Run the experiment:
./run.sh
The script will automatically start training with the specified configuration and save results to the logs directory.
If you use this code or find our work helpful, please cite:
@inproceedings{sinodinos2026multitab,
title={MultiTab: A Scalable Foundation for Multitask Learning on Tabular Data},
author={Sinodinos, Dimitrios and Wei, Jack Yi and Armanfard, Narges},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={40},
number={30},
pages={25499--25507},
year={2026}
}