-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SubER computation on more than 1 srt file #1
Comments
Hi, let me make sure I understand the question:
So you are referring to using multiple references? This is not yet supported and it's also not fully obvious how to do with SubER, actually an interesting question we hadn't considered yet! I could put some thought into it. 😊 Or do you mean you have the subtitles for one video in multiple files corresponding to subsequent video segments? Then yes, you would need to concatenate to score the video as a whole. |
Hi, thanks for your quick response! Another question, that is not related to this "issue", is about the outcome of some evaluations I made. One system results to have a high SubER but also a high t-BLEU and AS-BLEU scores while the other system shows exactly the opposite behavior. Do you have any hint about how to interpret these results? |
Reference and hypothesis subtitles have to be consistent on the time scale, whatever you do. So if one is shifted this will lead to terrible SubER (and t-BLEU) scores. I will make that clear in the README. (But the absolute position in time should not matter, so you could shift both by the same duration.) If I understand you correctly you want to build a test set out of several video clips. And I guess what happens if you just concatenate the files the clips all start at 0 seconds and will therefore overlap in time, which breaks the metric computation. I will add an assertion to only allow input files where the subtitle timings are monotonically increasing. So yes, you currently would have to shift all segments in time when concatenating. This simply corresponds to concatenating the original audio / video files to create a test set in the first place.
That's also how I would do it. We could add support for multiple files and do this automatically. Then again, this should be possible with other existing software. I'm not sure... Also note, that evaluating single files and then computing a weighted average should get you close to the score for the concatenated file. Although having an exact score is obviously better... |
Hard to say without seeing the file. But very bad segmentation, i.e. many line breaks at different positions than in the reference, would be one explanation. |
Yes is exactly what I meant. I will shift the following segments when I concatenate as you suggested. I think that will be useful to add this option to the library in the future since it can happen to evaluate subtitling systems on a corpus test set containing more files and consequently srt. |
Necessary for hyp to ref time alignment to work correctly. Also prevents user from using e.g. concatenated SRT files, see #1
Hi, this more than a problem is a question: if I have more than 1 .srt file with which I can make a comparison, how can I compute the SubER metric (and also the AS-/T-BLEU) metrics?
Is it sufficient to concatenate them and then compute the metrics or we need something more sophisticated?
Thanks for your work.
The text was updated successfully, but these errors were encountered: