Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SubER computation on more than 1 srt file #1

Closed
sarapapi opened this issue May 28, 2022 · 5 comments
Closed

SubER computation on more than 1 srt file #1

sarapapi opened this issue May 28, 2022 · 5 comments

Comments

@sarapapi
Copy link

Hi, this more than a problem is a question: if I have more than 1 .srt file with which I can make a comparison, how can I compute the SubER metric (and also the AS-/T-BLEU) metrics?
Is it sufficient to concatenate them and then compute the metrics or we need something more sophisticated?

Thanks for your work.

@patrick-wilken
Copy link
Collaborator

Hi, let me make sure I understand the question:

I have more than 1 .srt file with which I can make a comparison

So you are referring to using multiple references? This is not yet supported and it's also not fully obvious how to do with SubER, actually an interesting question we hadn't considered yet! I could put some thought into it. 😊

Or do you mean you have the subtitles for one video in multiple files corresponding to subsequent video segments? Then yes, you would need to concatenate to score the video as a whole.

@sarapapi
Copy link
Author

Hi, thanks for your quick response!
The second option you mentioned was my original question but also the first one would be helpful.
I was thinking about how to concatenate them in a correct way because if I concatenate them directly I obtain a terrible SubER score compared to the ones obtained by computing the metric on single srt files.
Another option is to offset the start time of the srt files that follow the first such that they resemble a unique subtitle but I think that this will influence the evaluation (maybe the end of the previous srt could match the beginning of the next srt file in the reference). Maybe I am wrong. Do you have any suggestions about doing it more correctly?

Another question, that is not related to this "issue", is about the outcome of some evaluations I made. One system results to have a high SubER but also a high t-BLEU and AS-BLEU scores while the other system shows exactly the opposite behavior. Do you have any hint about how to interpret these results?
Thanks again

@patrick-wilken
Copy link
Collaborator

patrick-wilken commented May 29, 2022

Another option is to offset the start time [...] but I think that this will influence the evaluation.

Reference and hypothesis subtitles have to be consistent on the time scale, whatever you do. So if one is shifted this will lead to terrible SubER (and t-BLEU) scores. I will make that clear in the README. (But the absolute position in time should not matter, so you could shift both by the same duration.)

If I understand you correctly you want to build a test set out of several video clips. And I guess what happens if you just concatenate the files the clips all start at 0 seconds and will therefore overlap in time, which breaks the metric computation. I will add an assertion to only allow input files where the subtitle timings are monotonically increasing.

So yes, you currently would have to shift all segments in time when concatenating. This simply corresponds to concatenating the original audio / video files to create a test set in the first place.

maybe the end of the previous srt could match the beginning of the next srt file in the reference

That's also how I would do it. We could add support for multiple files and do this automatically. Then again, this should be possible with other existing software. I'm not sure... Also note, that evaluating single files and then computing a weighted average should get you close to the score for the concatenated file. Although having an exact score is obviously better...

@patrick-wilken
Copy link
Collaborator

One system results to have a high SubER but also a high t-BLEU and AS-BLEU scores while the other system shows exactly the opposite behavior.

Hard to say without seeing the file. But very bad segmentation, i.e. many line breaks at different positions than in the reference, would be one explanation.

@sarapapi
Copy link
Author

Another option is to offset the start time [...] but I think that this will influence the evaluation.

Reference and hypothesis subtitles have to be consistent on the time scale, whatever you do. So if one is shifted this will lead to terrible SubER (and t-BLEU) scores. I will make that clear in the README. (But the absolute position in time should not matter, so you could shift both by the same duration.)

If I understand you correctly you want to build a test set out of several video clips. And I guess what happens if you just concatenate the files the clips all start at 0 seconds and will therefore overlap in time, which breaks the metric computation. I will add an assertion to only allow input files where the subtitle timings are monotonically increasing.

So yes, you currently would have to shift all segments in time when concatenating. This simply corresponds to concatenating the original audio / video files to create a test set in the first place.

maybe the end of the previous srt could match the beginning of the next srt file in the reference

That's also how I would do it. We could add support for multiple files and do this automatically. Then again, this should be possible with other existing software. I'm not sure... Also note, that evaluating single files and then computing a weighted average should get you close to the score for the concatenated file. Although having an exact score is obviously better...

Yes is exactly what I meant. I will shift the following segments when I concatenate as you suggested. I think that will be useful to add this option to the library in the future since it can happen to evaluate subtitling systems on a corpus test set containing more files and consequently srt.
Thank you for your replies!

patrick-wilken added a commit that referenced this issue Jun 1, 2022
Necessary for hyp to ref time alignment to work correctly.
Also prevents user from using e.g. concatenated SRT files,
see #1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants