Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result reproduction #10

Closed
Pwoer-zy opened this issue Nov 17, 2022 · 4 comments
Closed

Result reproduction #10

Pwoer-zy opened this issue Nov 17, 2022 · 4 comments

Comments

@Pwoer-zy
Copy link

Hello, I got the following results when I ran your model:
The verification result of SumMe dataset in table3 is AverageFscore=55.64; The verification result of SumMe dataset in table4 is AverageFscore=55.64.
However, using the code you provided to train the model, the model verification result is AverageFscore=38.52. Do you know why? Hope to get your answer, thank you!

@e-apostolidis
Copy link
Owner

Hi. Thanks for your interest in our method. First of all, the reported summarization performance for SumMe on Tables 3 and 4, is 57.1 and 55.6, respectively. These different performances relate to different models of our PGL-SUM architecture. The first is trained in a one-by-one manner (i.e. the architecture is being updated after seeing every single training sample) and we keep the maximum performance on the test set. The second is trained in a full-batch manner (i.e. all training samples are being seen before an update of the architecture at the end of the training epoch) and we report the performance of an automatically-selected model using the training losses.

Now, concerning the significantly-lower performance observed in your experiments: can you please let me know if these experiments we made using the software packages mentioned in our github page (see "Main dependencies"), and what type of GPU are you using (sometimes the used hardware can affect the training process and thus the model selection step of our method)?

@Pwoer-zy
Copy link
Author

Hello,Thanks for your reply.First of all, the software packages mentioned in your github page (see "Main dependencies") I use is: Python=1.10.0, CUDA Version=11.3 and other major dependencies are the same.And the type of GPU I use is :NVIDIA GeForce GTX 1080 Ti.
I want to konw that your code evaluation/evaluate_exp.sh is the training method provided in full-batch manner ? Hope to get your reply again, thank you !

@Pwoer-zy
Copy link
Author

Hello,Thanks for your reply.First of all, the software packages mentioned in your github page (see "Main dependencies") I use is: Python=1.10.0, CUDA Version=11.3 and other major dependencies are the same.And the type of GPU I use is :NVIDIA GeForce GTX 1080 Ti.
I want to konw that your code evaluation/evaluate_exp.sh is the training method provided in full-batch manner ? Hope to get your reply again, thank you !

@e-apostolidis
Copy link
Owner

Hi. First, regarding your question: the code in evaluation/evaluate_exp.sh is independent of the training methodology. It just evaluates the performance of a trained model, without being affected by the training mode. The training mode (which affects the outcome of the training process only) can defined in https://github.com/e-apostolidis/PGL-SUM/blob/master/model/configs.py using the --batch_size argument. Full-batch training in SumMe and TVSum datasets is selected by setting batch_size=20 and 40, respectively (as the remaining 5 and 10 videos of each dataset, are used for testing). Other possible choices could be: batch_size=1 (training in a single-batch manner) or batch_size=X with X < 20 and 40 for SumMe and TVSum, respectively (training using equally-sized mini-batches of the training data).

Second, concerning your experimental settings: can you please re-run our experiments using the software packages listed in "Main dependencies" section of the github page (PyTorch 3.8.8; Python 1.7.1, and CUDA 11.0)? The used software (and in some cases hardware) can affect the evaluation outcomes, especially, when training relies on small datasets and the selection of a trained model relies on the training losses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants