in a supervised setting， why are the results of the test set tested during training, #2

Gyyyym · 2022-07-08T02:21:43Z

Hello, in a supervised setting, why are the results of the test set tested during training, so that the results of the test set will not be high?
I don't quite understand this part, please advise, thanks.

YJiangcm · 2022-07-08T02:33:47Z

Hi, I am a little confused about your question. During training, we only use the dev set for evaluation in order to save the best checkpoint. Do you mean that you cannot reproduce the results under supervised settings?

Gyyyym · 2022-07-08T03:10:50Z

Hi, I am a little confused about your question. During training, we only use the dev set for evaluation in order to save the best checkpoint. Do you mean that you cannot reproduce the results under supervised settings?

I use the dataset of another task. The accuracy rate on the training set, validation set and test set can reach 100%, and the loss value is above 4.0. I feel this result is very strange. I see that there is a 5-fold cross in the code. is it also trained on the test set, resulting in a very high accuracy rate? and Why is 5-fold cross-validation used in the evaluation process? Thank you very much for your reply

YJiangcm · 2022-07-08T06:51:33Z

Why is 5-fold cross-validation is used:
After the model is well trained, it can derive sentence embedding vectors, which can be directly used to compute the cosine similarity for STS tasks. However, for transfer tasks like text classification tasks, we still need to train a logistic regression classifier on top of (frozen) sentence embeddings. So, after the training of DCPCSE is finished, we use the 5-fold cross-validation following SentEval toolkit (https://github.com/facebookresearch/SentEval) to evaluate the performance on transfer tasks.

Why "The accuracy rate on the training set, validation set, and test set can reach 100%, and the loss value is above 4.0":
I haven't come across such a phenomenon yet. Which dataset did you use?

Gyyyym · 2022-07-10T05:10:03Z

Why is 5-fold cross-validation is used: After the model is well trained, it can derive sentence embedding vectors, which can be directly used to compute the cosine similarity for STS tasks. However, for transfer tasks like text classification tasks, we still need to train a logistic regression classifier on top of (frozen) sentence embeddings. So, after the training of DCPCSE is finished, we use the 5-fold cross-validation following SentEval toolkit (https://github.com/facebookresearch/SentEval) to evaluate the performance on transfer tasks.

Why "The accuracy rate on the training set, validation set, and test set can reach 100%, and the loss value is above 4.0": I haven't come across such a phenomenon yet. Which dataset did you use?

I use CMV datasets. I don't know why the loss value doesn't drop and the accuracy improves still.

YJiangcm closed this as completed Jan 14, 2023

YJiangcm reopened this Jan 14, 2023

YJiangcm closed this as completed Jan 14, 2023

YJiangcm added the question Further information is requested label Apr 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

in a supervised setting， why are the results of the test set tested during training, #2

in a supervised setting， why are the results of the test set tested during training, #2

Gyyyym commented Jul 8, 2022

YJiangcm commented Jul 8, 2022

Gyyyym commented Jul 8, 2022

YJiangcm commented Jul 8, 2022

Gyyyym commented Jul 10, 2022

in a supervised setting， why are the results of the test set tested during training, #2

in a supervised setting， why are the results of the test set tested during training, #2

Comments

Gyyyym commented Jul 8, 2022

YJiangcm commented Jul 8, 2022

Gyyyym commented Jul 8, 2022

YJiangcm commented Jul 8, 2022

Gyyyym commented Jul 10, 2022