Loss not reducing, high validation and test metric values #11

parth-shettiwar · 2022-07-09T00:03:33Z

I tried to run the code with DLA algorithm on Yahoo dataset. Following is the output attached. I am not sure of the following observation where I am getting almost constant training loss of about 4 (with each rank loss and exam loss as about 2), and high validation and testing metric values of more than 0.9. I did try to observe the parameter values of 2 models, which are actually updating. Also the loss is just fluctuating in range of 3.9 to 4.5 always. Is there something I should do with hyperparameters, have kept the default learning rate of 0.05 and selection_bias_cutoff = 10. This is with respect to the pytorch implementation of the code

rowedenny · 2022-07-12T23:10:59Z

Yes. I think the value of loss looks normal. As you mentioned, the rank loss is about 2. Remind that it is a listwise loss computed over 10 documents, so the cross-entropy on average is 0.2, which is OK.
I would encourage you to run 10K steps and then check the performance on testset. In general, the ndcg@10 on Yahoo with DLA should be approximately 0.756. If your result is much lower than that, then there should be something wrong.

parth-shettiwar · 2022-07-13T00:36:25Z

Thanks @rowedenny for the clarification. Another thing I wanted to ask is that the train output scores for the documents keep increasing with steps and even become in the range of 10^5. Since its log softmax, and relative difference of scores between documents matter, the loss remains almost same.
For now, I tried putting a small L2 regularization during training to keep this in check, and the train output scores are now between 0-10, but is it required or am I missing something?

rowedenny · 2022-07-13T00:42:44Z

Yes, I do observe that the scale of output scores keeps increasing. However, for ranking tasks trained with pairwise or listwise approaches, the scores between different queries are not comparable. In other words, we usually care about the order within a query, but barely worry about scores across different queries.

In addition, L2 reg may help, yet the performance ranking model may suffer.

parth-shettiwar · 2022-07-25T00:12:58Z

Adding on, I am bit confused with the current implementation of loss. As described in paper, the variable pi_q represents the ranked list produced by our model. However in the code, the train_outputs are not sorted at all. This will lead to incorrect multiplication of weights while computing both IRW and IPW loses.
What exactly is ranked list pi_q referring to ? (The initial ranked list produced by SVM ranker or list produced by the ranking model?).
Attached is the code given for computation of loss

rowedenny · 2022-07-25T01:35:39Z

However in the code, the train_outputs are not sorted at all. This will lead to incorrect multiplication of weights while computing both IRW and IPW loses.

Actually, there is no need to sort, because we have train_output as the prediction from positions 1 to 10, and the model is optimized with labels via clicks from positions 1 to 10. Thus, they are matched.

Recall we optimize the IPW or IRW with clicked items, so the multiplication here just ignores the items without click and assigns a small value.

parth-shettiwar · 2022-07-25T02:27:42Z

Thanks for the reply

"Model is optimized with labels via clicks", Is this due to initial ordering from SVM ranker at start?
I am currently trying to obtain results on a different dataset, will:
a) The model still be optimized for positions from 1 to 10 ?
b) What about for positions after 10?
c) Will sorting train_output affect the performance ?

rowedenny · 2022-07-25T02:39:51Z

"Model is optimized with labels via clicks", Is this due to initial ordering from SVM ranker at start?

Unbiased Learning to Rank uses clicks as positive labels while unclicks as negative labels.

I am currently trying to obtain results on a different dataset, will:
a) The model still be optimized for positions from 1 to 10 ?

We predict 1 to 10 because of the selection_bias_cutoff, which indicates the maximum number of items that users can see based on the assumption.

b) What about for positions after 10?
It depends on the examination model, specifically the examination probability for items after position 10. If the prob is 0, then it means item after pos@10 will not be examined.

c) Will sorting train_output affect the performance ?

No, it will not.

parth-shettiwar · 2022-07-25T15:56:37Z

I am still unsure on the underlying assumption here. As we know, the examination model doesn't depend on the documents features. Every-time we feed it a one hot encoded vector for all rank positions and get representation weights for them.
So are we assuming some sort of ordering in the documents while loading the batch? Can we load the query-document in any order, even keeping the relevant documents in end?
(Since in the formulation in paper for observation probability of document, we denote it by variable o^x_q, for document x and query q, showing dependence on document x. But in implementation in codebase, I don't see any dependence of the observation probability on the document x or on its rank.)

rowedenny · 2022-07-25T15:59:50Z

I would like to refer you to the unbiased learning to rank tutorials as follow, https://drive.google.com/file/d/1fyd3AbmtxTGLeIU6zPcYmaMFXfMQjk-D/view <https://drive.google.com/file/d/1fyd3AbmtxTGLeIU6zPcYmaMFXfMQjk-D/view> https://www.youtube.com/watch?v=BEEfMrn9T9c&t=1511s&ab_channel=HarrieOosterhuis <https://www.youtube.com/watch?v=BEEfMrn9T9c&t=1511s&ab_channel=HarrieOosterhuis>

…

On Jul 25, 2022, at 11:56 AM, Parth Shettiwar ***@***.***> wrote: I am still unsure on the underlying assumption here. As we know, the examination model doesn't depend on the documents features. Every-time we feed it a one hot encoded vector for all rank positions and get representation weights for them. So are we assuming some sort of ordering in the documents while loading the batch? Can we load the query-document in any order, even keeping the relevant documents in end? Since in the formulation of observation probability for document, we denote it by variable o^x_q, for document x and query q. But in implementation I don't see any dependence of the observation probability on the document x or on its rank. — Reply to this email directly, view it on GitHub <#11 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHCCEBEUIZBG4FHVHRNAPTLVV22L7ANCNFSM53COM52Q>. You are receiving this because you were mentioned.

parth-shettiwar · 2022-07-25T16:04:38Z

Thanks, is there any particular time stamp or page you would like me to go through to address the above query?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss not reducing, high validation and test metric values #11

Loss not reducing, high validation and test metric values #11

parth-shettiwar commented Jul 9, 2022

rowedenny commented Jul 12, 2022

parth-shettiwar commented Jul 13, 2022

rowedenny commented Jul 13, 2022

parth-shettiwar commented Jul 25, 2022 •

edited

rowedenny commented Jul 25, 2022 •

edited

parth-shettiwar commented Jul 25, 2022

rowedenny commented Jul 25, 2022

parth-shettiwar commented Jul 25, 2022 •

edited

rowedenny commented Jul 25, 2022 via email

parth-shettiwar commented Jul 25, 2022

Loss not reducing, high validation and test metric values #11

Loss not reducing, high validation and test metric values #11

Comments

parth-shettiwar commented Jul 9, 2022

rowedenny commented Jul 12, 2022

parth-shettiwar commented Jul 13, 2022

rowedenny commented Jul 13, 2022

parth-shettiwar commented Jul 25, 2022 • edited

rowedenny commented Jul 25, 2022 • edited

parth-shettiwar commented Jul 25, 2022

rowedenny commented Jul 25, 2022

parth-shettiwar commented Jul 25, 2022 • edited

rowedenny commented Jul 25, 2022 via email

parth-shettiwar commented Jul 25, 2022

parth-shettiwar commented Jul 25, 2022 •

edited

rowedenny commented Jul 25, 2022 •

edited

parth-shettiwar commented Jul 25, 2022 •

edited