Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the Tmall datasets #3

Closed
xiaxin1998 opened this issue Dec 21, 2020 · 6 comments
Closed

About the Tmall datasets #3

xiaxin1998 opened this issue Dec 21, 2020 · 6 comments

Comments

@xiaxin1998
Copy link

Hi,
Thanks for your sharing of this paper!
About the datasets Tmall, I find the website you provide in your paper and download the datasets. But I find that in the test dataset, there are no given labels for it. Only the training set has labels. So would you mind share the Tmall test dataset with sessions' labels for us?
Thanks!

@Mikrokosmos1997
Copy link
Collaborator

Thank you for your interest in our paper.
The label information in the raw data indicates whether the user is a repeat buyer, which is not related to our task and has not been used in our work. For the Tmall dataset used in our work, the label of each session is the last item in the session.

@xiaxin1998
Copy link
Author

Thanks four your response!
But when I preprocess the datasets, the statistics are different with your papers.
For Tmall datasets, I use the train_format2.csv and test_format2.csv in the website.
For the preprocess, I firstly get all the sessions, and leave the first 120000 sessions.
For the 120000 sessions, I filter sessions whose lengths are 1 and larger than 40. Then I filter the items who appear less than 5 times. And then split the sessions.
But I still have different datasets with yours showed in your paper, Table1.
So would you mind share the preprocessed datasets for us or tell me my mistakes in the preprocess?

@xiaxin1998
Copy link
Author

If you share the preprocessed datasets for us, we'll appreciate. It is more convenient for us to make comparisons with your model.

@xiaxin1998
Copy link
Author

UPDATE: in the train_format2.csv, there are only 4995 items, but in your paper, there should be 40728 items.

@Mikrokosmos1997
Copy link
Collaborator

Thank you for your interest.
We have updated the preprocessed Tmall datasets. You can also download the raw Tmall data from the dropbox link in the paper Evaluation of Session-based Recommendation Algorithms.

@xiaxin1998
Copy link
Author

Thank you very much!!!!!!!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants