[Metrics] WER #52

justusschock · 2020-05-19T15:47:13Z

Lightning-AI/pytorch-lightning#973 (comment)

stas6626 · 2020-10-07T18:34:36Z

I can work on this issue, but I need help with defining UI, originally this metric works over strings, is it okay if metrics takes list of strings as input? It is possible to run it on tokens, but it will be working right only with char based tokenizer(nowadays most people use BPE, etc)

stas6626 · 2020-10-07T18:35:21Z

@Borda what do you think?

justusschock · 2020-10-08T07:27:16Z

@stas6626 we already have bleu working on strings. So this wouldn't be a problem. Why do you want to have list of strings here? Shouldn't a predicted and a groundtruth string be sufficient?

justusschock · 2020-11-09T09:49:31Z

@stas6626 How is it going there?

VinhLoiIT · 2021-03-21T05:59:24Z

I believe both CER (Character Error Rate) and WER (Word Error Rate) metrics are pretty commonly used in OCR problems and might be in audio/speech as well.

I did implement the metrics in Lightning before but in the older version, so I think it would not be a problem to upgrade the code. However, it might rely on another library such as editdistance that I don't know whether I could be able to use the external library or not when making a contribution.

janvainer · 2021-03-24T10:04:36Z

It could be interesting to use pytorch-edit-distance instead of editdistance. It should support CUDA operations.

VinhLoiIT · 2021-03-25T00:58:03Z

I think this implementation has some redundant features, such as remove_blank and strip, which are not necessary, but the CUDA support is useful though.

Moreover, the current implementation requires both blank and space characters, as I skim through the CUDA code, which does not cover the general use of this metric. For example:

In OCR, we also need the CER (Character Error Rate) metric, which is basically the same as WER but it keeps the space character as a token as well.
In OCR, splitting words should be based on a user-defined function that should not be coalesence with compute_wer.
Lastly, in my opinion, the user when using this metric does not have to know about the blank character, because the blank character is only specifically used when we use CTC Loss but sometimes we don't.

I think, in general, both CER and WER metrics are basically computed using the Levenshtein distance whose inputs might only require a list of string of predicted tokens and the list of coresponding target tokens. Thus, editdistance package is just fine though.

carmocca · 2021-03-25T12:25:31Z

I agree with you @VinhLoiIT

Although I have found the package https://github.com/life4/textdistance to be better supported, for example: roy-ht/editdistance#46

If we agree on a package, we can move forward with the implementation

Borda transferred this issue from Lightning-AI/pytorch-lightning Mar 12, 2021

Borda added enhancement New feature or request help wanted Extra attention is needed labels Mar 17, 2021

stale bot added the wontfix label Jun 1, 2021

Lightning-AI deleted a comment from stale bot Jun 1, 2021

stale bot removed the wontfix label Jun 1, 2021

carmocca mentioned this issue Jun 29, 2021

training_epoch_end is not computed at the end of epoch Lightning-AI/pytorch-lightning#8204

Closed

SkafteNicki added this to To do in Text via automation Jul 7, 2021

This was referenced Jul 15, 2021

Adding NLP Metrics SQuADv2 and WER #368

Closed

Adding WER metric #383

Merged

Lightning-AI deleted a comment from stale bot Jul 16, 2021

Borda added this to the v0.5 milestone Jul 16, 2021

SkafteNicki closed this as completed in #383 Jul 24, 2021

Text automation moved this from To do to Done Jul 24, 2021

Borda added the topic: Text label Aug 25, 2023

karwojan mentioned this issue May 23, 2024

DataLoader worker is killed in Docker #2559

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metrics] WER #52

[Metrics] WER #52

justusschock commented May 19, 2020

stas6626 commented Oct 7, 2020

stas6626 commented Oct 7, 2020

justusschock commented Oct 8, 2020

justusschock commented Nov 9, 2020

VinhLoiIT commented Mar 21, 2021

janvainer commented Mar 24, 2021

VinhLoiIT commented Mar 25, 2021

carmocca commented Mar 25, 2021

[Metrics] WER #52

[Metrics] WER #52

Comments

justusschock commented May 19, 2020

stas6626 commented Oct 7, 2020

stas6626 commented Oct 7, 2020

justusschock commented Oct 8, 2020

justusschock commented Nov 9, 2020

VinhLoiIT commented Mar 21, 2021

janvainer commented Mar 24, 2021

VinhLoiIT commented Mar 25, 2021

carmocca commented Mar 25, 2021