<a href="https://colab.research.google.com/github/PINC-Project/EMU-utils/blob/master/Calculating_WER.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Calculating Word Error Rate

This is a simple example how to easily compute WER in Python.

The cell below has to be run for each new session in order to install the library for computing WER:

In [1]:
!pip install jiwer



Here we will use a famous sentence from the TIMIT corpus as a sample reference. This sentence was created to include many different phonemes in English:

In [2]:
reference_sentence='she had her dark suit in greasy wash water all year'

Now let's create a sample hypothesis that includes errors:

In [3]:
hypothesis_sentence='she had her dank suite in greasy water all last year'

This simple command computes several of the word error measures, including WER, MER and MIL. Refer to this paper to learn all about them:

https://www.researchgate.net/publication/221478089_From_WER_and_RIL_to_MER_and_WIL_improved_evaluation_measures_for_connected_speech_recognition

In [4]:
import jiwer
jiwer.compute_measures(reference_sentence,hypothesis_sentence)

{'deletions': 1,
 'hits': 8,
 'insertions': 1,
 'mer': 0.3333333333333333,
 'substitutions': 2,
 'wer': 0.36363636363636365,
 'wil': 0.47107438016528924,
 'wip': 0.5289256198347108}

## Visualizing the errors

Apart of getting the simple score, it's useful to actually see the errors. That takes a bit more code. 

The code below relies on the custom method that displays the operations in therms of "replace", "delete" and "insert" that have to be performed on the reference side in order to obtain the hypothesis:

In [5]:
from Levenshtein import editops

ref,hyp=jiwer.measures._preprocess([reference_sentence],[hypothesis_sentence],jiwer.measures.wer_default,jiwer.measures.wer_default)
ops=editops(ref[0],hyp[0])

print(ops)

[('replace', 3, 3), ('replace', 4, 4), ('delete', 7, 7), ('insert', 10, 9)]


Here I wrote a method that does this and draws a neat HTML table to display the results:

In [10]:
import tabulate
from IPython.display import display,HTML

style='<style>table{border:1px solid black;width:100%;table-layout:fixed;text-align:center}</style>'

def visualize(reference: str, hypothesis:str):
  ref,hyp=jiwer.measures._preprocess([reference],[hypothesis],jiwer.measures.wer_default,jiwer.measures.wer_default)
  ops=editops(ref[0],hyp[0])

  ref_tok=reference.split()
  hyp_tok=hypothesis.split()  
  op_tok=['C']*len(ref_tok)
  ref_off=0
  hyp_off=0
  for op,ref_idx,hyp_idx in ops:
    if op=='replace':
      op_tok[ref_idx+ref_off]='S'
    elif op=='delete':
      hyp_tok.insert(hyp_idx+hyp_off,'***')      
      hyp_off+=1
      op_tok[ref_idx+ref_off]='D'
    elif op=='insert':
      ref_tok.insert(ref_idx+ref_off,'***')
      op_tok.insert(ref_idx+ref_off,'I')
      ref_off+=1
  
  tab=tabulate.tabulate([ref_tok,op_tok,hyp_tok], tablefmt='html')
  display(HTML(style+tab))


This is how we use the method:

In [11]:
visualize(reference_sentence,hypothesis_sentence)

0,1,2,3,4,5,6,7,8,9,10,11
she,had,her,dark,suit,in,greasy,wash,water,all,***,year
C,C,C,S,S,C,C,D,C,C,I,C
she,had,her,dank,suite,in,greasy,***,water,all,last,year


First row contains the reference, second the description of operation and third the hypothesis.

The letters in the ops row are:
* C - correct
* S - substitute/replace
* I - insert
* D - delete

In case of an insertion or deletion, a `***` token is added to one of the sequences in order to keep everything aligned.