Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in a supervised setting, why are the results of the test set tested during training, #2

Closed
Gyyyym opened this issue Jul 8, 2022 · 4 comments
Labels
question Further information is requested

Comments

@Gyyyym
Copy link

Gyyyym commented Jul 8, 2022

Hello, in a supervised setting, why are the results of the test set tested during training, so that the results of the test set will not be high?
I don't quite understand this part, please advise, thanks.

@YJiangcm
Copy link
Owner

YJiangcm commented Jul 8, 2022

Hi, I am a little confused about your question. During training, we only use the dev set for evaluation in order to save the best checkpoint. Do you mean that you cannot reproduce the results under supervised settings?

@Gyyyym
Copy link
Author

Gyyyym commented Jul 8, 2022

Hi, I am a little confused about your question. During training, we only use the dev set for evaluation in order to save the best checkpoint. Do you mean that you cannot reproduce the results under supervised settings?

I use the dataset of another task. The accuracy rate on the training set, validation set and test set can reach 100%, and the loss value is above 4.0. I feel this result is very strange. I see that there is a 5-fold cross in the code. is it also trained on the test set, resulting in a very high accuracy rate? and Why is 5-fold cross-validation used in the evaluation process? Thank you very much for your reply

@YJiangcm
Copy link
Owner

YJiangcm commented Jul 8, 2022

Why is 5-fold cross-validation is used:
After the model is well trained, it can derive sentence embedding vectors, which can be directly used to compute the cosine similarity for STS tasks. However, for transfer tasks like text classification tasks, we still need to train a logistic regression classifier on top of (frozen) sentence embeddings. So, after the training of DCPCSE is finished, we use the 5-fold cross-validation following SentEval toolkit (https://github.com/facebookresearch/SentEval) to evaluate the performance on transfer tasks.

Why "The accuracy rate on the training set, validation set, and test set can reach 100%, and the loss value is above 4.0":
I haven't come across such a phenomenon yet. Which dataset did you use?

@Gyyyym
Copy link
Author

Gyyyym commented Jul 10, 2022

Why is 5-fold cross-validation is used: After the model is well trained, it can derive sentence embedding vectors, which can be directly used to compute the cosine similarity for STS tasks. However, for transfer tasks like text classification tasks, we still need to train a logistic regression classifier on top of (frozen) sentence embeddings. So, after the training of DCPCSE is finished, we use the 5-fold cross-validation following SentEval toolkit (https://github.com/facebookresearch/SentEval) to evaluate the performance on transfer tasks.

Why "The accuracy rate on the training set, validation set, and test set can reach 100%, and the loss value is above 4.0": I haven't come across such a phenomenon yet. Which dataset did you use?

I use CMV datasets. I don't know why the loss value doesn't drop and the accuracy improves still.

@YJiangcm YJiangcm reopened this Jan 14, 2023
@YJiangcm YJiangcm added the question Further information is requested label Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants