New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datasets are not sorted by time, model uses information from the future #3
Comments
Hello, your finding is really crucial. How about the results on the other datasets? Are they incorrect as the ml-100k? |
Hello, I haven't checked the results for other datasets. I've only checked, whether there was a similar sorting problem and unfortunately there was. If you run a process_amazon.py without any changes the output isn't sorted by timestamp, so I would expect that it will impact the results as well. I'm not sure in what way though. |
Hi all, The codebase did have some bugs that indices are not sorted based on the timestamps. I have updated the preprocess code for amazon datasets and ml100k datasets. We will rerun the experiments as soon as we can. Best to all, Ziwei |
Hello, may I ask you some questions about the code of this paper? I'm sorry to disturb you. Could you leave a contact information if it's convenient for you. |
has the paper been updated? |
I went through the code and one thing is bothering me. I think there is a major bug in the implementation. It is possible that I don't understand something, so please correct me if I'm wrong, but as of my current understanding this code trains and validates using the information "from the future" .
If you examine values in the code below you will see that there negative values for the delta.
TGSRec/model.py
Line 557 in 0c7ba17
I can see that mask is created only for the 0 values so negatives values are still used.
TGSRec/model.py
Line 575 in 0c7ba17
Data here is sorted by edge_ids not timestamps, so the possible fix for that would by sorting by x[2], instead of x[1]
TGSRec/graph.py
Lines 34 to 39 in 0c7ba17
If you look at the:
TGSRec/datasets/ml-100k/u.data
data is not sorted, by the timestamp and there is no point in your codebase, where this sorting happens (I guess).
I tried to run experiments for ml-100 for both scenarios: your original implementation and with the sorted input data and the results I got are significantly worse, at least for the early stages of training. I haven't run it for 200 epochs, so maybe the final results are closer to each other, but firstly I would like to see if my assumption is correct.
Results afters 20 epochs:
Without sorting:
With sorting:
The text was updated successfully, but these errors were encountered: