Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train on custom dataset #7

Closed
mehrazi opened this issue Nov 20, 2022 · 4 comments
Closed

Train on custom dataset #7

mehrazi opened this issue Nov 20, 2022 · 4 comments

Comments

@mehrazi
Copy link

mehrazi commented Nov 20, 2022

I want to train on a custom dataset for a recommendation task.
The dataset is based on user/group data that users participate in some groups.
The problem is how I should split data for test and train?
when I randomly select some node and edge, I will face this error:
File "/home/BiGI/BiGI_src/utils/GraphMaker.py", line 135, in preprocessing UV_adj = sp.coo_matrix((np.ones(UV_edges.shape[0]), (UV_edges[:, 0], UV_edges[:, 1])), File "/home/.local/lib/python3.8/site-packages/scipy/sparse/coo.py", line 196, in __init__ self._check() File "/home/.local/lib/python3.8/site-packages/scipy/sparse/coo.py", line 283, in _check raise ValueError('row index exceeds matrix dimensions') ValueError: row index exceeds matrix dimensions

@caojiangxia
Copy link
Owner

Hi~, thank your pay attention to our work.

The reason might be that you should re-index the user/item id from 0, then use the pre-processed datasets as input.

@mehrazi
Copy link
Author

mehrazi commented Nov 21, 2022

Thank you for your reply.
You are right and I'm aware of this problem.
I'm reindexing user/item id from zero, but when I randomly select some user/items for testing, it will be some index removed from training data and cause this problem.
I want to find a way to split user/item ids into two parts for testing and training without this problem happening.
for example, if I have data like this:
1 5
2 6
3 7
4 8
and pick 3 7 for the test, this problem will happen.

@caojiangxia
Copy link
Owner

You can try this way.

For each user, randomly select his 80% of interactions for training, and others for testing.

If the item id causes the above problem, please repeat the above operation (low probability).

@mehrazi
Copy link
Author

mehrazi commented Nov 23, 2022

The problem is solved.
Thanks for your hints.

@mehrazi mehrazi closed this as completed Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants