Train on custom dataset #7

mehrazi · 2022-11-20T12:43:58Z

I want to train on a custom dataset for a recommendation task.
The dataset is based on user/group data that users participate in some groups.
The problem is how I should split data for test and train?
when I randomly select some node and edge, I will face this error:
File "/home/BiGI/BiGI_src/utils/GraphMaker.py", line 135, in preprocessing UV_adj = sp.coo_matrix((np.ones(UV_edges.shape[0]), (UV_edges[:, 0], UV_edges[:, 1])), File "/home/.local/lib/python3.8/site-packages/scipy/sparse/coo.py", line 196, in __init__ self._check() File "/home/.local/lib/python3.8/site-packages/scipy/sparse/coo.py", line 283, in _check raise ValueError('row index exceeds matrix dimensions') ValueError: row index exceeds matrix dimensions

The text was updated successfully, but these errors were encountered:

caojiangxia · 2022-11-21T08:08:25Z

Hi~, thank your pay attention to our work.

The reason might be that you should re-index the user/item id from 0, then use the pre-processed datasets as input.

mehrazi · 2022-11-21T15:37:13Z

Thank you for your reply.
You are right and I'm aware of this problem.
I'm reindexing user/item id from zero, but when I randomly select some user/items for testing, it will be some index removed from training data and cause this problem.
I want to find a way to split user/item ids into two parts for testing and training without this problem happening.
for example, if I have data like this:
1 5
2 6
3 7
4 8
and pick 3 7 for the test, this problem will happen.

caojiangxia · 2022-11-22T03:38:33Z

You can try this way.

For each user, randomly select his 80% of interactions for training, and others for testing.

If the item id causes the above problem, please repeat the above operation (low probability).

mehrazi · 2022-11-23T08:23:47Z

The problem is solved.
Thanks for your hints.

mehrazi closed this as completed Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train on custom dataset #7

Train on custom dataset #7

mehrazi commented Nov 20, 2022

caojiangxia commented Nov 21, 2022

mehrazi commented Nov 21, 2022

caojiangxia commented Nov 22, 2022

mehrazi commented Nov 23, 2022

Train on custom dataset #7

Train on custom dataset #7

Comments

mehrazi commented Nov 20, 2022

caojiangxia commented Nov 21, 2022

mehrazi commented Nov 21, 2022

caojiangxia commented Nov 22, 2022

mehrazi commented Nov 23, 2022