Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss not reducing, high validation and test metric values #11

Open
parth-shettiwar opened this issue Jul 9, 2022 · 10 comments
Open

Loss not reducing, high validation and test metric values #11

parth-shettiwar opened this issue Jul 9, 2022 · 10 comments

Comments

@parth-shettiwar
Copy link

I tried to run the code with DLA algorithm on Yahoo dataset. Following is the output attached. I am not sure of the following observation where I am getting almost constant training loss of about 4 (with each rank loss and exam loss as about 2), and high validation and testing metric values of more than 0.9. I did try to observe the parameter values of 2 models, which are actually updating. Also the loss is just fluctuating in range of 3.9 to 4.5 always. Is there something I should do with hyperparameters, have kept the default learning rate of 0.05 and selection_bias_cutoff = 10. This is with respect to the pytorch implementation of the code
Screen Shot 2022-07-08 at 5 02 56 PM

@rowedenny
Copy link
Collaborator

Yes. I think the value of loss looks normal. As you mentioned, the rank loss is about 2. Remind that it is a listwise loss computed over 10 documents, so the cross-entropy on average is 0.2, which is OK.
I would encourage you to run 10K steps and then check the performance on testset. In general, the ndcg@10 on Yahoo with DLA should be approximately 0.756. If your result is much lower than that, then there should be something wrong.

@parth-shettiwar
Copy link
Author

Thanks @rowedenny for the clarification. Another thing I wanted to ask is that the train output scores for the documents keep increasing with steps and even become in the range of 10^5. Since its log softmax, and relative difference of scores between documents matter, the loss remains almost same.
For now, I tried putting a small L2 regularization during training to keep this in check, and the train output scores are now between 0-10, but is it required or am I missing something?

@rowedenny
Copy link
Collaborator

Yes, I do observe that the scale of output scores keeps increasing. However, for ranking tasks trained with pairwise or listwise approaches, the scores between different queries are not comparable. In other words, we usually care about the order within a query, but barely worry about scores across different queries.

In addition, L2 reg may help, yet the performance ranking model may suffer.

@parth-shettiwar
Copy link
Author

parth-shettiwar commented Jul 25, 2022

Adding on, I am bit confused with the current implementation of loss. As described in paper, the variable pi_q represents the ranked list produced by our model. However in the code, the train_outputs are not sorted at all. This will lead to incorrect multiplication of weights while computing both IRW and IPW loses.
What exactly is ranked list pi_q referring to ? (The initial ranked list produced by SVM ranker or list produced by the ranking model?).
Attached is the code given for computation of loss
Screen Shot 2022-07-24 at 5 11 45 PM

@rowedenny
Copy link
Collaborator

rowedenny commented Jul 25, 2022

However in the code, the train_outputs are not sorted at all. This will lead to incorrect multiplication of weights while computing both IRW and IPW loses.

Actually, there is no need to sort, because we have train_output as the prediction from positions 1 to 10, and the model is optimized with labels via clicks from positions 1 to 10. Thus, they are matched.

Recall we optimize the IPW or IRW with clicked items, so the multiplication here just ignores the items without click and assigns a small value.

@parth-shettiwar
Copy link
Author

Thanks for the reply

  1. "Model is optimized with labels via clicks", Is this due to initial ordering from SVM ranker at start?
  2. I am currently trying to obtain results on a different dataset, will:
    a) The model still be optimized for positions from 1 to 10 ?
    b) What about for positions after 10?
    c) Will sorting train_output affect the performance ?

@rowedenny
Copy link
Collaborator

  1. "Model is optimized with labels via clicks", Is this due to initial ordering from SVM ranker at start?

Unbiased Learning to Rank uses clicks as positive labels while unclicks as negative labels.

  1. I am currently trying to obtain results on a different dataset, will:
    a) The model still be optimized for positions from 1 to 10 ?

We predict 1 to 10 because of the selection_bias_cutoff, which indicates the maximum number of items that users can see based on the assumption.

b) What about for positions after 10?
It depends on the examination model, specifically the examination probability for items after position 10. If the prob is 0, then it means item after pos@10 will not be examined.

c) Will sorting train_output affect the performance ?

No, it will not.

@parth-shettiwar
Copy link
Author

parth-shettiwar commented Jul 25, 2022

I am still unsure on the underlying assumption here. As we know, the examination model doesn't depend on the documents features. Every-time we feed it a one hot encoded vector for all rank positions and get representation weights for them.
So are we assuming some sort of ordering in the documents while loading the batch? Can we load the query-document in any order, even keeping the relevant documents in end?
(Since in the formulation in paper for observation probability of document, we denote it by variable o^x_q, for document x and query q, showing dependence on document x. But in implementation in codebase, I don't see any dependence of the observation probability on the document x or on its rank.)

@rowedenny
Copy link
Collaborator

rowedenny commented Jul 25, 2022 via email

@parth-shettiwar
Copy link
Author

Thanks, is there any particular time stamp or page you would like me to go through to address the above query?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants