Code for "Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework" published at AAAI 2026.
- Check requirements.txt file. We recommend to install pyxclib library following pyxclib.
Wherever you have your storage, choose a root folder to place data and artifacts generated by ViXML training. The structure must be as follows:
+-- <root_dir>
| +-- data
| +-- <dataset_name>
| +-- models
| +-- resultsBasically, you place all datasets with their corresponding <dataset_name> under the data folder. When running the training code, models and results folders will be created (with the same <dataset_name>).
* Download the (zipped file) raw data from The XML repository [5].
* Extract the zipped file into data directory.
* Process the following files using extract_text_and_labels.py
* The following files should be available in <work_dir>/data/<dataset> (create empty filter file if unavailable):
- trn.raw.txt
- tst.raw.txt
- lbl.raw.txt
- trn_X_Y.txt
- tst_X_Y.txt
- filter_labels_text.txtDownload data. You may use one of the following options:
- We provide SIGLIP2 embeddings for the datasets used in the paper, so after downloading place embedding information (<split_name>_embs_imgs_3_1152_siglip2.npy and <split_name>_imgs_384_3_map.npy) in the corresponding <datase_name> folder.
- We provide the image paths of LF-AmazonTitles datasets in the file img_urls.parquet. Run download_amazon_imgs.py (set internally the datapath to the corresponding <dataset_name>) to download the images (<split_name>_imgs_384_3.bin and <split_name>_imgs_384_3_map.npy). Then, run extract_img_embed.py to extract image embeddings with SIGLIP2, i.e. creating <split_name>_embs_imgs_3_1152_siglip2.npy. For MM-AmazonTitles-300K, you can get the urls from the official dataset and, then, create a img_urls.parquet file to proceed as in LF-AmazonTitles datasets.
All resulting files mentioned above must be placed at <dataset_name> folder.
1) Tokenize the data using create_tokenized_files.py
2) Training: we provide examples for running encoder or decoder alternatives in LF-AmazonTitles-131K. See run_vixml_miniLML3_amzTitles131K.sh and run_vixml_qwen25_3B_amzTitles131K.sh.
./run_vixml_qwen25_3B_amzTitles131K.sh <gpu_id> LF-AmazonTitles-131K
This run will train the model and store the embeddings for each of the splits and the model at the end. During training the model is evaluated at certain epochs.
3) If you need to generate your own embeddings, run encode_test_and_labels_mm.py.
4) To evaluate your model, run eval_mm.py.
@InProceedings{Ortego26,
author = "Ortego, D., Rodr{\'i}guez, M., Almagro, M., Dahiya, K., Jim{\'e}nez, D. and SanMiguel, J.C.",
title = "Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework",
booktitle = "Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence (AAAI)",
year = 2026}