New metric "SubER-cased"; also use tokenizer for "WER-cased" #6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See #5
Adds the metric "SubER-cased", which is a case- and punctuation sensitive variant of SubER. Tokenization is used to treat punctuation as separate tokens. Note that the analysis in our paper shows weaker correlation with human post-editing effort. However, this variant might be useful when punctuation and casing errors are considered to be of high importance.
I also added tokenization to "WER-cased" to be consistent with "SubER-cased", because it makes sense intuitively, and also because it shows a slightly higher correlation than what we reported for "WER + case/punct" in the paper. (The numbers in Table 1 row 2 become -0.685, -0.520, -0.504, -0.657.) I think no one relies on the exact behaviour of "WER-cased" yet and it's ok to make a breaking change.