Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛BUG] Dataset save_path mismatches load_path for sequential dataset #1697

Closed
ShadowTinker opened this issue Mar 16, 2023 · 6 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@ShadowTinker
Copy link
Contributor

ShadowTinker commented Mar 16, 2023

Describe the bug
I set save_dataset=True in recbole/properties/overall.yaml but find the model re-processes the dataset every time.

To Reproduce
Steps to reproduce the behavior:

  1. set save_dataset=True in recbole/properties/overall.yaml
  2. execute python run_recbole.py --model=DIN --dataset=ml-100k
  3. results
  • save_path: path_prefix/ml-100k-dataset.pth
  • load_path: path_prefix/ml-100k-SequentialDataset.pth
  1. way to fix it
    I change the following code
    file = os.path.join(save_dir, f'{self.config["dataset"]}-dataset.pth')

    to
file = os.path.join(save_dir, f'{self.config["dataset"]}-{self.__class__.__name__}.pth')

. And it works for DIN, AFM, BPR and SASRec.

Desktop (please complete the following information):

  • OS: Linux
  • RecBole Version 1.1.1
  • Python Version 3.9.15
  • PyTorch Version 1.13.1
  • cudatoolkit Version 11.6
@ShadowTinker
Copy link
Contributor Author

ShadowTinker commented Mar 16, 2023

Besides, I also find some bugs in loading saved dataloaders by setting save_dataloaders=True in the config file. In the following code, I find eval_neg_sample_args does not exist in the config while valid_neg_sample_args or test_neg_sample_args exists,

def update_config(self, config):
self._set_neg_sample_args(
config, self._dataset, InputType.POINTWISE, config["eval_neg_sample_args"]
)
super().update_config(config)

, which will result in a key error in line139 in the following code:
def _set_neg_sample_args(self, config, dataset, dl_format, neg_sample_args):
self.uid_field = dataset.uid_field
self.iid_field = dataset.iid_field
self.dl_format = dl_format
self.neg_sample_args = neg_sample_args
self.times = 1
if (
self.neg_sample_args["distribution"] == "uniform"
or "popularity"
and self.neg_sample_args["sample_num"] != "none"
):

I opened a pull request #1698 to fix the two problems mentioned above.

@Paitesanshi Paitesanshi self-assigned this Mar 18, 2023
@Paitesanshi
Copy link
Collaborator

Hello @ShadowTinker,

Thank you for your attention and contributions to RecBole! You are correct that there are some bugs. To improve the rationality of the code structure and convenience of the changes, I have made some minor modifications. Once you have reviewed the modifications in #1698, the updates can be merged into the main code.

Thank you again for your support to our team!

@ShadowTinker
Copy link
Contributor Author

Hello @Paitesanshi,

Thank you for your reply! But I just find another problem, which is that the saved dataloader will still be loaded when modifying train_batch_size or eval_batch_size, while the batch_size is not changed correspondingly. And I find it is caused by the following lines:

for arg in dataset_arguments + ["seed", "repeatable", "eval_args"]:
if config[arg] != train_data.config[arg]:
return None

where train_batch_size or eval_batch_size is not checked.

But I'm not sure whether it is designed on purpose, because these arguments usually are fixed. So I wonder if it is needed to add the two args to the above lines.

@Paitesanshi
Copy link
Collaborator

@ShadowTinker We think that user loading existing data_loader means using fixed batch_size. We will consider whether to modify this setting after collecting more user feedback. Thanks for your suggestion again!

@ShadowTinker
Copy link
Contributor Author

@Paitesanshi Thank you for the explanation. I've reviewed the modifications in #1698 and It will be my honor to contribute to recbole. Thank you again for the great repo and contribution to the community.

Paitesanshi added a commit that referenced this issue Mar 18, 2023
Fix: fix saving and loading for datasets and dataloaders (#1697)
@christopheralex
Copy link

This fails again when this is run in multiprocessing with 4GPUs on 1 node. One process writes to disk while another process thinks the file exists and tries to read. Even if this doesnt happen since 4 processes write to disk in a pickle file with the mode "wb", the file is going to be corrupt on loading it later for inference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants