New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metrics] WER #52
Comments
I can work on this issue, but I need help with defining UI, originally this metric works over strings, is it okay if metrics takes list of strings as input? It is possible to run it on tokens, but it will be working right only with char based tokenizer(nowadays most people use BPE, etc) |
@Borda what do you think? |
@stas6626 we already have bleu working on strings. So this wouldn't be a problem. Why do you want to have list of strings here? Shouldn't a predicted and a groundtruth string be sufficient? |
@stas6626 How is it going there? |
I believe both CER (Character Error Rate) and WER (Word Error Rate) metrics are pretty commonly used in OCR problems and might be in audio/speech as well. I did implement the metrics in Lightning before but in the older version, so I think it would not be a problem to upgrade the code. However, it might rely on another library such as |
It could be interesting to use pytorch-edit-distance instead of editdistance. It should support CUDA operations. |
I think this implementation has some redundant features, such as Moreover, the current implementation requires both
I think, in general, both CER and WER metrics are basically computed using the Levenshtein distance whose inputs might only require a list of string of predicted tokens and the list of coresponding target tokens. Thus, |
I agree with you @VinhLoiIT Although I have found the package https://github.com/life4/textdistance to be better supported, for example: roy-ht/editdistance#46 If we agree on a package, we can move forward with the implementation |
Add WER cc @oplatek
Lightning-AI/pytorch-lightning#973 (comment)
The text was updated successfully, but these errors were encountered: