Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can we use this model on custom data sets that are other then benchmark datasets? #1

Closed
satsaras opened this issue Mar 23, 2022 · 5 comments

Comments

@satsaras
Copy link

No description provided.

@hyp1231
Copy link
Member

hyp1231 commented Mar 23, 2022

Good question! If you are familiar with RecBole, you can just clone the latest code and run, as NCL has been added.

Otherwise, you can follow the steps below to run NCL on a customized dataset in a few minutes.

Data preparation

Firstly, please just run on the benchmarks for one time. For example,

python main.py --dataset ml-1m

Then you can find a file named dataset/ml-1m/ml-1m.inter containing the user-item interactions.

If your dataset is named as abc, then you can create dataset/abc/abc.inter and adopts the same format as dataset/ml-1m/ml-1m.inter. (Actually these files are called Atomic Files.)

Properties

cp properties/ml-1m.yaml abc.yaml

The detailed args can be modified as you wish.

By the way, please remove the following two lines in abc.yaml (not in ml-1m.yaml) if you don't need to filter interactions by rating.

val_interval:
rating: "[3,inf)"

Running

python main.py --dataset abc --config abc.yaml

@hyp1231
Copy link
Member

hyp1231 commented Mar 31, 2022

Closing due to inactivity. Please comment if you're still having issues.

@hyp1231 hyp1231 closed this as completed Mar 31, 2022
@Iwillcome
Copy link

There exist some question about alibaba datasets,when i try autodown it, i find that not exits. The datasets seem exist some problem. so, i want to ask you how to run alibaba. Meantime, The result in amazon-book can't reach the result in parper, other datasets can do it.

@Iwillcome
Copy link

we use the parameter that come from amazon-books.yaml file, but the result is poor and can't reach the expect value.

@Iwillcome
Copy link

General Hyper Parameters:
gpu_id = 0
use_gpu = True
seed = 2020
state = INFO
reproducibility = True
data_path = dataset/amazon-books
show_progress = True
save_dataset = False
save_dataloaders = False
benchmark_filename = None

Training Hyper Parameters:
checkpoint_dir = saved
epochs = 300
train_batch_size = 4096
learner = adam
learning_rate = 0.001
eval_step = 1
stopping_step = 10
clip_grad_norm = None
weight_decay = 0.0
loss_decimal_place = 4

Evaluation Hyper Parameters:
eval_args = {'split': {'RS': [0.8, 0.1, 0.1]}, 'group_by': 'user', 'order': 'RO', 'mode': 'full'}
metrics = ['Recall', 'NDCG']
topk = [10, 20, 50]
valid_metric = NDCG@10
valid_metric_bigger = True
eval_batch_size = 4096000
metric_decimal_place = 4

Dataset Hyper Parameters:
field_separator =
seq_separator =
USER_ID_FIELD = user_id
ITEM_ID_FIELD = item_id
RATING_FIELD = rating
TIME_FIELD = timestamp
seq_len = None
LABEL_FIELD = label
threshold = None
NEG_PREFIX = neg_
load_col = {'inter': ['user_id', 'item_id', 'rating']}
unload_col = None
unused_col = None
additional_feat_suffix = None
rm_dup_inter = None
val_interval = {'rating': '[3,inf)'}
filter_inter_by_user_or_item = True
user_inter_num_interval = [15,inf)
item_inter_num_interval = [15,inf)
alias_of_user_id = None
alias_of_item_id = None
alias_of_entity_id = None
alias_of_relation_id = None
preload_weight = None
normalize_field = None
normalize_all = None
ITEM_LIST_LENGTH_FIELD = item_length
LIST_SUFFIX = _list
MAX_ITEM_LIST_LENGTH = 50
POSITION_FIELD = position_id
HEAD_ENTITY_ID_FIELD = head_id
TAIL_ENTITY_ID_FIELD = tail_id
RELATION_ID_FIELD = relation_id
ENTITY_ID_FIELD = entity_id

Other Hyper Parameters:
neg_sampling = {'uniform': 1}
repeatable = False
MODEL_TYPE = ModelType.GENERAL
eval_setting = {'split': {'RS': [0.8, 0.1, 0.1]}, 'order': 'RO', 'group_by': 'user', 'mode': 'full'}
embedding_size = 64
n_layers = 3
reg_weight = 1e-06
ssl_temp = 0.05
ssl_reg = 1e-06
hyper_layers = 1
alpha = 0.8
proto_reg = 1e-07
num_clusters = 2000
m_step = 1
warm_up_step = 20
MODEL_INPUT_TYPE = InputType.PAIRWISE
eval_type = EvaluatorType.RANKING
device = cuda
train_neg_sample_args = {'strategy': 'by', 'by': 1, 'distribution': 'uniform'}
eval_neg_sample_args = {'strategy': 'full', 'distribution': 'uniform'}

26 Nov 22:20 INFO amazon-books
The number of users: 4610
Average actions of users: 34.132566717292256
The number of items: 4138
Average actions of items: 38.026831036983324
The number of inters: 157317
The sparsity of the dataset: 99.17532231295783%
Remain Fields: ['user_id', 'item_id', 'rating']
26 Nov 22:20 INFO [Training]: train_batch_size = [4096] negative sampling: [{'uniform': 1}]
26 Nov 22:20 INFO [Evaluation]: eval_batch_size = [4096000] eval_args: [{'split': {'RS': [0.8, 0.1, 0.1]}, 'group_by': 'user', 'order': 'RO', 'mode': 'full'}]
26 Nov 22:20 INFO NCL(
(user_embedding): Embedding(4610, 64)
(item_embedding): Embedding(4138, 64)
(mf_loss): BPRLoss()
(reg_loss): EmbLoss()
)
Trainable parameters: 559872
26 Nov 22:20 INFO Running E-step !
WARNING clustering 4610 points to 2000 centroids: please provide at least 78000 training points
WARNING clustering 4138 points to 2000 centroids: please provide at least 78000 training points
26 Nov 22:20 INFO epoch 0 training [time: 3.59s, train_loss1: 22.1784, train_loss2: 0.0474]
26 Nov 22:20 INFO epoch 0 evaluating [time: 0.25s, valid_score: 0.001300]
26 Nov 22:20 INFO valid result:
recall@10 : 0.0021 recall@20 : 0.0052 recall@50 : 0.0134 ndcg@10 : 0.0013 ndcg@20 : 0.0023 ndcg@50 : 0.0044
26 Nov 22:20 INFO Saving current best: saved/NCL-Nov-26-2022_22-20-46.pth
26 Nov 22:20 INFO Running E-step !
WARNING clustering 4610 points to 2000 centroids: please provide at least 78000 training points
WARNING clustering 4138 points to 2000 centroids: please provide at least 78000 training points
26 Nov 22:20 INFO epoch 1 training [time: 3.55s, train_loss1: 22.1781, train_loss2: 0.0056]
26 Nov 22:20 INFO epoch 1 evaluating [time: 0.25s, valid_score: 0.001300]
26 Nov 22:20 INFO valid result:
recall@10 : 0.0021 recall@20 : 0.0049 recall@50 : 0.0127 ndcg@10 : 0.0013 ndcg@20 : 0.0021 ndcg@50 : 0.0042
26 Nov 22:20 INFO Running E-step !
WARNING clustering 4610 points to 2000 centroids: please provide at least 78000 training points
WARNING clustering 4138 points to 2000 centroids: please provide at least 78000 training points
26 Nov 22:20 INFO epoch 2 training [time: 1.40s, train_loss1: 22.1778, train_loss2: 0.0035]
26 Nov 22:20 INFO epoch 2 evaluating [time: 0.26s, valid_score: 0.001500]
26 Nov 22:20 INFO valid result:
recall@10 : 0.0024 recall@20 : 0.0044 recall@50 : 0.0125 ndcg@10 : 0.0015 ndcg@20 : 0.0021 ndcg@50 : 0.0042
26 Nov 22:20 INFO Saving current best: saved/NCL-Nov-26-2022_22-20-46.pth
26 Nov 22:20 INFO Running E-step !
WARNING clustering 4610 points to 2000 centroids: please provide at least 78000 training points
WARNING clustering 4138 points to 2000 centroids: please provide at least 78000 training points
26 Nov 22:21 INFO epoch 3 training [time: 2.12s, train_loss1: 22.1775, train_loss2: 0.0027]
26 Nov 22:21 INFO epoch 3 evaluating [time: 0.22s, valid_score: 0.001400]
26 Nov 22:21 INFO valid result:
recall@10 : 0.0023 recall@20 : 0.0044 recall@50 : 0.0133 ndcg@10 : 0.0014 ndcg@20 : 0.0021 ndcg@50 : 0.0043
26 Nov 22:21 INFO Running E-step !
WARNING clustering 4610 points to 2000 centroids: please provide at least 78000 training points
WARNING clustering 4138 points to 2000 centroids: please provide at least 78000 training points
26 Nov 22:21 INFO epoch 4 training [time: 3.51s, train_loss1: 22.1772, train_loss2: 0.0022]
26 Nov 22:21 INFO epoch 4 evaluating [time: 0.23s, valid_score: 0.001500]
26 Nov 22:21 INFO valid result:
recall@10 : 0.0025 recall@20 : 0.0044 recall@50 : 0.0134 ndcg@10 : 0.0015 ndcg@20 : 0.002 ndcg@50 : 0.0043
26 Nov 22:21 INFO Running E-step !
WARNING clustering 4610 points to 2000 centroids: please provide at least 78000 training points
WARNING clustering 4138 points to 2000 centroids: please provide at least 78000 training points
26 Nov 22:21 INFO epoch 5 training [time: 3.53s, train_loss1: 22.1769, train_loss2: 0.0019]
26 Nov 22:21 INFO epoch 5 evaluating [time: 0.23s, valid_score: 0.001500]
26 Nov 22:21 INFO valid result:
recall@10 : 0.0025 recall@20 : 0.0041 recall@50 : 0.013 ndcg@10 : 0.0015 ndcg@20 : 0.002 ndcg@50 : 0.0042
26 Nov 22:21 INFO Running E-step !
WARNING clustering 4610 points to 2000 centroids: please provide at least 78000 training points
WARNING clustering 4138 points to 2000 centroids: please provide at least 78000 training points
26 Nov 22:21 INFO epoch 6 training [time: 3.56s, train_loss1: 22.1764, train_loss2: 0.0016]
26 Nov 22:21 INFO epoch 6 evaluating [time: 0.25s, valid_score: 0.001500]
26 Nov 22:21 INFO valid result:
recall@10 : 0.0024 recall@20 : 0.0046 recall@50 : 0.0126 ndcg@10 : 0.0015 ndcg@20 : 0.0021 ndcg@50 : 0.0041
26 Nov 22:21 INFO Running E-step !
WARNING clustering 4610 points to 2000 centroids: please provide at least 78000 training points
WARNING clustering 4138 points to 2000 centroids: please provide at least 78000 training points
26 Nov 22:21 INFO epoch 7 training [time: 3.51s, train_loss1: 22.1759, train_loss2: 0.0014]
26 Nov 22:21 INFO epoch 7 evaluating [time: 0.26s, valid_score: 0.001400]
26 Nov 22:21 INFO valid result:
recall@10 : 0.0023 recall@20 : 0.005 recall@50 : 0.0125 ndcg@10 : 0.0014 ndcg@20 : 0.0022 ndcg@50 : 0.004
26 Nov 22:21 INFO Running E-step !
WARNING clustering 4610 points to 2000 centroids: please provide at least 78000 training points
WARNING clustering 4138 points to 2000 centroids: please provide at least 78000 training points
26 Nov 22:21 INFO epoch 8 training [time: 3.49s, train_loss1: 22.1753, train_loss2: 0.0012]
26 Nov 22:21 INFO epoch 8 evaluating [time: 0.23s, valid_score: 0.001300]
26 Nov 22:21 INFO valid result:
recall@10 : 0.0022 recall@20 : 0.0046 recall@50 : 0.0126 ndcg@10 : 0.0013 ndcg@20 : 0.002 ndcg@50 : 0.004
26 Nov 22:21 INFO Running E-step !
WARNING clustering 4610 points to 2000 centroids: please provide at least 78000 training points
WARNING clustering 4138 points to 2000 centroids: please provide at least 78000 training points
26 Nov 22:21 INFO epoch 9 training [time: 3.47s, train_loss1: 22.1747, train_loss2: 0.0011]
26 Nov 22:21 INFO epoch 9 evaluating [time: 0.29s, valid_score: 0.001400]
26 Nov 22:21 INFO valid result:
recall@10 : 0.0024 recall@20 : 0.0048 recall@50 : 0.0131 ndcg@10 : 0.0014 ndcg@20 : 0.0022 ndcg@50 : 0.0042
26 Nov 22:21 INFO Running E-step !
WARNING clustering 4610 points to 2000 centroids: please provide at least 78000 training points
WARNING clustering 4138 points to 2000 centroids: please provide at least 78000 training points
26 Nov 22:21 INFO epoch 10 training [time: 3.53s, train_loss1: 22.1739, train_loss2: 0.0010]
26 Nov 22:21 INFO epoch 10 evaluating [time: 0.25s, valid_score: 0.001400]
26 Nov 22:21 INFO valid result:
recall@10 : 0.0021 recall@20 : 0.0048 recall@50 : 0.0127 ndcg@10 : 0.0014 ndcg@20 : 0.0023 ndcg@50 : 0.0042
26 Nov 22:21 INFO Running E-step !
WARNING clustering 4610 points to 2000 centroids: please provide at least 78000 training points
WARNING clustering 4138 points to 2000 centroids: please provide at least 78000 training points
26 Nov 22:21 INFO epoch 11 training [time: 1.40s, train_loss1: 22.1729, train_loss2: 0.0009]
26 Nov 22:21 INFO epoch 11 evaluating [time: 0.24s, valid_score: 0.001400]
26 Nov 22:21 INFO valid result:
recall@10 : 0.0021 recall@20 : 0.0051 recall@50 : 0.0131 ndcg@10 : 0.0014 ndcg@20 : 0.0023 ndcg@50 : 0.0043
26 Nov 22:21 INFO Running E-step !
WARNING clustering 4610 points to 2000 centroids: please provide at least 78000 training points
WARNING clustering 4138 points to 2000 centroids: please provide at least 78000 training points
26 Nov 22:21 INFO epoch 12 training [time: 1.65s, train_loss1: 22.1718, train_loss2: 0.0009]
26 Nov 22:21 INFO epoch 12 evaluating [time: 0.23s, valid_score: 0.001500]
26 Nov 22:21 INFO valid result:
recall@10 : 0.0022 recall@20 : 0.0055 recall@50 : 0.0126 ndcg@10 : 0.0015 ndcg@20 : 0.0025 ndcg@50 : 0.0043
26 Nov 22:21 INFO Running E-step !
WARNING clustering 4610 points to 2000 centroids: please provide at least 78000 training points
WARNING clustering 4138 points to 2000 centroids: please provide at least 78000 training points
26 Nov 22:21 INFO epoch 13 training [time: 3.52s, train_loss1: 22.1705, train_loss2: 0.0008]
26 Nov 22:21 INFO epoch 13 evaluating [time: 0.22s, valid_score: 0.001500]
26 Nov 22:21 INFO valid result:
recall@10 : 0.0021 recall@20 : 0.0057 recall@50 : 0.0129 ndcg@10 : 0.0015 ndcg@20 : 0.0025 ndcg@50 : 0.0044
26 Nov 22:21 INFO Finished training, best eval result in epoch 2
26 Nov 22:21 INFO Loading model structure and parameters from saved/NCL-Nov-26-2022_22-20-46.pth
26 Nov 22:21 INFO best valid : {'recall@10': 0.0024, 'recall@20': 0.0044, 'recall@50': 0.0125, 'ndcg@10': 0.0015, 'ndcg@20': 0.0021, 'ndcg@50': 0.0042}
26 Nov 22:21 INFO test result: {'recall@10': 0.0024, 'recall@20': 0.0058, 'recall@50': 0.0133, 'ndcg@10': 0.0016, 'ndcg@20': 0.0026, 'ndcg@50': 0.0045}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants