Metric | Publication Year | Conference | NLG Metricverse | Jury | HF/datasets | NLG-eval | TorchMetrics |
---|---|---|---|---|---|---|---|
BLEU | 2002 | ACL | ✅ | ✅ | ✅ | ✅ | ✅ |
NIST | 2002 | HLT | ✅ | ❌ | ✅ | ❌ | ❌ |
CER | 2004 | ICSLP | ✅ | ✅ | ✅ | ❌ | ✅ |
ROUGE | 2004 | ACL | ✅ | ✅ | ✅ | ✅ | ✅ |
WER | 2004 | ICSLP | ✅ | ✅ | ✅ | ❌ | ✅ |
CIDEr | 2005 | / | ✅ | ❌ | ❌ | ✅ | ❌ |
METEOR | 2005 | ACL | ✅ | ✅ | ✅ | ❌ | ❌ |
TER | 2006 | AMTA | ✅ | ✅ | ✅ | ❌ | ❌ |
ChrF(++) | 2015 | ACL | ✅ | ✅ | ✅ | ❌ | ✅ |
WMD | 2015 | ICML | ✅ | ❌ | ❌ | ❌ | ❌ |
SacreBLEU | 2018 | ACL | ✅ | ✅ | ✅ | ❌ | ✅ |
MOVERScore | 2019 | ACL | ✅ | ❌ | ❌ | ❌ | ❌ |
BERTScore | 2020 | ICLR | ✅ | ✅ | ✅ | ❌ | ✅ |
BLEURT | 2020 | ACL | ✅ | ✅ | ✅ | ❌ | ❌ |
COMET | 2020 | EMNLP | ✅ | ✅ | ✅ | ❌ | ❌ |
NUBIA | 2020 | EvalNLGEval NeurIPS talk |
✅ | ❌ | ❌ | ❌ | ❌ |
PRISM | 2020 | EMNLP | ✅ | ✅ | ❌ | ❌ | ❌ |
BARTScore | 2021 | NeurIPS | ✅ | ✅ | ❌ | ❌ | ❌ |
MAUVE | 2021 | NeurIPS | ✅ | ❌ | ✅ | ❌ | ❌ |
Abstractness | / | / | ✅ | ❌ | ❌ | ❌ | ❌ |
Accuracy | / | / | ✅ | ✅ | ✅ | ❌ | ❌ |
Average Unique N-gram Ratios | / | / | ✅ | ❌ | ❌ | ❌ | ❌ |
Coleman-Liau Index | / | / | ✅ | ❌ | ❌ | ❌ | ❌ |
F1 | / | / | ✅ | ✅ | ✅ | ❌ | ❌ |
Flesch-Kincaid Index | / | / | ✅ | ❌ | ❌ | ❌ | ❌ |
Gunning-Fog Index | / | / | ✅ | ❌ | ❌ | ❌ | ❌ |
Perplexity | / | / | ✅ | ❌ | ✅ | ❌ | ❌ |
Precision | / | / | ✅ | ✅ | ✅ | ❌ | ❌ |
Recall | / | / | ✅ | ✅ | ✅ | ❌ | ❌ |
Repetitiveness | / | / | ✅ | ❌ | ❌ | ❌ | ❌ |
Carburacy | / | / | ✅ | ❌ | ❌ | ❌ | ❌ |
Compression | / | / | ✅ | ❌ | ❌ | ❌ | ❌ |
Coverage | / | / | ✅ | ❌ | ❌ | ❌ | ❌ |
Density | / | / | ✅ | ❌ | ❌ | ❌ | ❌ |
NID | / | / | ✅ | ❌ | ❌ | ❌ | ❌ |
UNR | / | / | ✅ | ❌ | ❌ | ❌ | ❌ |