Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reshape data in dataset.py #3

Closed
ZiweiHou opened this issue Oct 21, 2022 · 4 comments
Closed

reshape data in dataset.py #3

ZiweiHou opened this issue Oct 21, 2022 · 4 comments

Comments

@ZiweiHou
Copy link

Hi,

in line 10 of dataset.py file, it reshapes the feature to (1+neg_num, feature.shape[-1]).
What is neg_num? and why there is a need to reshape feature?

@MogicianXD
Copy link
Owner

We need to use negative samples to compute cross-entropy loss, which is a traditional training way in FM models.
See our loss function:

    def fit_nll_neg(self, input_batch, epsilon=1e-9):
        preds = torch.sigmoid(self.forward(input_batch))
        cost = - torch.log(preds[:, 0] + epsilon).sum() - torch.log(1 - preds[:, 1:] + epsilon).sum()
        return cost / preds.shape[0]

In my data file, one positive sample is the first line and its negative samples follows. And the following is the second positive one. Repeat... So the raw data is shaped like (n_samples * (1 + neg_num) $\times$ n_features). We need to reshape it as (n_samples $\times$ (1 + neg_num) $\times$ n_features), and the dataloader read the rows.

@ZiweiHou
Copy link
Author

Hi @MogicianXD,

Thanks for your reply. Is there any particular reason for using such a data format? Can you release an example of your data?

@MogicianXD
Copy link
Owner

No particular reason. The data format is determined by your data preprocessing.
I've push the frappe dataset now. The codes are written two years ago, and I've not tested if it works well.

@ZiweiHou
Copy link
Author

Thank you soooo much!

@MogicianXD MogicianXD mentioned this issue Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants