Skip to content

BundleRec/bundle_recommendation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bundle Recommendation

This project aims to provide new data sources for product bundling in real e-commerce platforms with the domains of Electronic, Clothing and Food. We construct three high-quality bundle datasets with rich meta information, particularly bundle intents, through a carefully designed crowd-sourcing task.

1. Worker Basic Information

Figure 1 shows the distribution of workers' age, education, country, occupation, gender and shopping frequency for the two batches. In particular, `Others' in the country distribution includes Argentina, Australia, Anguilla, Netherlands, Albania, Georgia, Tunisia, Belgium, Armenia, Guinea, Austria, Switzerland, Iceland, Lithuania, Egypt, Venezuela, Bangladesh, American Samoa, Vanuatu, Colombia, United Arab Emirates, Ashmore and Cartier Island, Estados Unidos, Wales, Turkey, Angola, Scotland, Philippines, Iran and Bahamas.

basic_information

Figure 1: Worker basic information in the first and second batches.

2. Feedback from Workers

Figure 2 visualizes worker feedback towards our task, where (a) shows the feedback regarding the difficulty on identifying bundles and naming the corresponding intents for the two batches; and (b-c) depict the general feedback for the two batches.

Figure 2: Feedback from workers for the two batches.

3. Bundle Detection

Source code

  • Pattern Mining (code)

Parameter Settings

A grid search in {0.0001, 0.001, 0.01} is applied to find out the optimal settings for support and confidence, and both are set as 0.001 across the three domains.

4. Bundle Completion

Source code

  • ItemKNN & BPRMF (code)

  • Mean-VAE & Concat-VAE & Concat-CVAE (code)

  • TSF (code)

Parameter Settings

The dimension d of item and bundle representations for all methods is 20. Grid search is adopted to find out the best settings for other key parameters. In particular, learning rate and regularization coefficient are searched in {0.0001, 0.001, 0.01}; the number of neighbors K in ItemKNN is searched in {10, 20, 30, 50}; the weight of KL divergence in VAE is searched in {0.001, 0.01, 0.1}; the hidden layer sizes are searched in {50,100,200}; The batch size is searched in {64, 128, 256}. The number of heads (i.e., n_heads) for TSF is searched in {1,2,4}. The number of layers for TSF is searched in {1,2,3}. The optimal parameter settings are shown in Table 1.

        Table 1: Parameter settings for bundle completion (d=20).

Electronic Clothing Food
ItemKNN equation equation equation
BPRMF equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
mean-VAE equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
concat-VAE equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
concat-CVAE equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
TSF equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation

5. Bundle Ranking

Source code

Parameter Settings

The dimension d of representations is set as 20. We apply a same grid search for , , and batch size as in bundle completion. Besides, the predictive layer D for AttList is searched from {20, 50, 100}; the node and message dropout rate for GCN and BGCN is searched in {0, 0.1, 0.3, 0.5}. As the training complexity for GCN and BGCN is quite high, we set the batch size as 2048 as suggested by the original paper. The optimal parameter settings are presented in Table 2. Note that the parameter settings for BGCN is the version without pre-training (i.e. ).

       Table 2: Parameter settings for bundle ranking (d=20).

Electronic Clothing Food
ItemKNN equation equation equation
BPRMF equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
DAM equation
equation
equation
equation
equation
equation
equation
equation
equation
AttList equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
GCN equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
BGCN equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation

6. Bundle Generation Explanation

Source code

Please refer to code

Parameter Settings

For LSTM, BiLSTM and Transformer, the dimension of word embeddings is 300; learning rate is searched in {0.0001, 0.001, 0.01}; batch size is searched in {16, 32, 64}; the hidden size is searched in {128, 256, 512}; the number of heads (i.e., nhead) for Transformer is searched in the range of [1, 8] stepped by 1; the number of encoder/decoder layers is searched in {1, 2, 3, 4}. For the pre-trained models, i.e., BertGeneration, BART-base and T5-base, the maximum length in encoder is set to 512, and the maximum length in decoder is set to 64; learning rate is searched in {0.00002, 0.00005, 0.00007, 0.0001}; the number of epochs is searched in {3, 4, 5}. The optimal parameter settings are shown in Table 3.

  Table 3: Parameter settings for bundle generation explanation.

Optimal Parameter Settings
LSTM equation
equation
BiLSTM equation
equation
Transformer equation
equation
BertGeneration equation
BART-base equation
T5-base equation

7. Bundle Ranking Explanation

Source code

Parameter Settings

For RM, we apply a grid search in {0.0001, 0.001, 0.01, 0.1} for support and confidence; and a grid search in {1, 2, 3, 4} for lift to find out their optimal settings. For EFM, the regularization coefficients and are searched in the range of (0, 1] with a step of 0.1, while , and are searched in {0.0001, 0.001, 0.01, 0.1}; the total number of factors r is searched from {20, 50, 100}; the ratio of explicit factors is searched in a range of [0, 1] with a step of 0.1; the number of most cared features k in searched from [10, 100] with a step of 10. For PGPR and KGAT, we apply the same grid search for , batch size, the node and message dropout rate as in bundle ranking; the dimension of representations (d) is searched in {20, 50, 100}; the action space and the weight of entropy loss for PGPR are searched in {100, 200, 300} and {0.0001, 0.001, 0.01}, respectively; the sampling sizes at the 3 steps (i.e., equation) for PGPR are searched in {20, 25, 30}, {5, 10, 15} and {1}, respectively; and the regularization coefficient for KGAT is searched from {0.0001, 0.001, 0.01, 0.1}. The optimal parameter settings are shown in Table 4.

            Table 4: Parameter settings for bundle ranking explanation.

Electronic Clothing Food
RM equation
equation
equation
equation
equation
equation
equation
equation
equation
EFM equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
PGPR equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
KGAT equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation
equation

8. Bundle Auto-Naming

Source code

Please refer to code

Parameter Settings

For the ImageCap model, the maximum length in decoder is set to 64; learning rate is searched in {0.00002, 0.00005, 0.00007, 0.0001}; the number of epochs is searched in {3, 4, 5}. The optimal parameter settings are shown in Table 5.

 Table 5: Parameter settings for bundle generation explanation.

Optimal Parameter Settings
ImageCap equation

9. Statistics of Datasets

        Table 6: Statistics of datasets.

Electronic Clothing Food
#Users 888 965 879
#Items 3499 4487 3767
#Sessions 1145 1181 1161
#Bundles 1750 1910 1784
#Intents 1422 1466 1156
Average Bundle Size 3.52 3.31 3.58
#User-Item Interactions 6165 6326 6395
#User-Bundle Interactions 1753 1912 1785
Density of User-Item Interactions 0.20% 0.15% 0.19%
Density of User-Bundle Interactions 0.11% 0.10% 0.11%

10. Descriptions of Data Files

Under the 'dataset' folder, there are three domains, including clothing, electronic and food. Each domain contains the following 9 data files.

Table 7: The descriptions of the data files.

File Name Descriptions
user_item_pretrain.csv This file contains the user-item interactions aiming to obtain the pre-trained item representations via BPRMF for model initialization.
This is a tab separated list with 3 columns: user ID | item ID | timestamp |
user_item.csv This file contains the user-item interactions.
This is a tab separated list with 3 columns: user ID | item ID | timestamp |
session_item.csv This file contains sessions and their associated items. Each session has at least 2 items.
This is a tab separated list with 2 columns: session ID | item ID |
user_session.csv This file contains users and their associated sessions.
This is a tab separated list with 3 columns: user ID | session ID | timestamp |
session_bundle.csv This file contains sessions and their detected bundles. Each session has at least 1 bundle.
This is a tab separated list with 2 columns: session ID | bundle ID |
The session ID contained in the session_item.csv but not in session_bundle.csv indicates there is no bundle detected in this session.
bundle_intent.csv This file contains bundles and their annotated intents.
This is a tab separated list with 2 columns: bundle ID | intent |
bundle_item.csv This file contains bundles and their associated items. Each bundle has at least 2 items.
This is a tab separated list with 2 columns: bundle ID | item ID |
user_bundle.csv This file contains the user-bundle interactions.
This is a tab separated list with 3 columns: user ID | bundle ID | timestamp |
item_categories.csv This file contains items and their affiliated categories.
This is a tab separated list with 2 columns: item ID | categories |
The format of data in categories column is a list of string.
item_titles.csv This file contains items and their affiliated titles.
This is a tab separated list with 2 columns: item ID | titles |
item_idx_mapping.csv This file contains items and their source ID in Amazon datasets.
This is a tab separated list with 2 columns: item ID | source ID |
user_idx_mapping.csv This file contains users and their source ID in Amazon datasets.
This is a tab separated list with 2 columns: user ID | source ID |

Cite

Please cite the following papers if you use our dataset in a research paper:

@inproceedings{sun2022revisiting,
  title={Revisiting Bundle Recommendation: Datasets, Tasks, Challenges and Opportunities for Intent-Aware Product Bundling},
  author={Sun, Zhu and Yang, Jie and Feng, Kaidong and Fang, Hui and Qu, Xinghua and Ong, Yew Soon},
  booktitle={Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year={2022}
}

@article{sun2024revisiting,
  title={Revisiting Bundle Recommendation for Intent-aware Product Bundling},
  author={Sun, Zhu and Feng, Kaidong and Yang, Jie and Fang, Hui and Qu, Xinghua and Ong, Yew-Soon and Liu, Wenyuan},
  journal={ACM Transactions on Recommender Systems},
  year={2024},
  publisher={ACM New York, NY}
}

Acknowledgements

Our datasets are constructed on the basis of Amazon datasets (http://jmcauley.ucsd.edu/data/amazon/links.html).

All pre-trained models in bundle generation explanation and bundle auto-naming are implemented based on Hugging Face (https://huggingface.co/).

All seq2seq models in bundle generation explanation are implemented based on PyTorch tutorials (https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published