**Automatic Evaluation Metrics**

Install library for rouge score

In [109]:
pip install nltk rouge-score



In [110]:
import nltk
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
from nltk.translate.meteor_score import meteor_score
from rouge_score import rouge_scorer

nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

Input the machine translation and reference to compute metrics

In [111]:
machine_translation = [
    "Having lived until this age, whether it was good or bad, we have lived a significant part of our lives: distance, separation, arguments, conflicts - we have come through experiencing these troubles. Now, having reached middle age, we are weary and tired; we have seen the futility and instability of the things we have done, and realized that everything is vanity. So, how should we spend the rest of our lives? I myself am perplexed by this question. To serve the people? No, I have no capacity to serve the people. If those who are already burdened by their own troubles don't serve the people, and if the young and impetuous don't serve the people, then God forbid! To raise livestock? No, I cannot raise livestock. Children will raise themselves according to their needs. Now, in old age, I do not wish to waste my remaining life serving as food for thieves, the cruel, and the beggars, not being able to fully enjoy life. To pursue knowledge? No, there are no people to discuss knowledge with. To whom would you teach what you know, and from whom would you learn what you don't know? What is the use of spreading a mat in an empty place and sitting on it? Since there is no one to share your sorrows with, knowledge itself is a quick aging process. To become a Sufi, to practice religion? No, that won't work either; that requires peace. There is no peace in my heart or in the world I see; what kind of Sufism would that be in this land and in this place? To raise children? No, I cannot. Even if I wanted to, I don't know how to raise them properly; what if I raise them, to which nation or cause would I contribute them? I haven't found a peaceful place where my children can benefit from their life and knowledge; I don't know where to go or what to do, so what should I do? I couldn't even find amusement in that. Finally, I thought: I will just write down these thoughts on paper, I will amuse myself with white paper and black ink. Whoever finds a useful word in it, let them write it down or read it; if they don't need it, then it's my own words, I said, and finally I committed myself to this, and now I have no other work."
]
reference_translation = [
    "Whether for good or ill, I have lived my life, traveling a long road fraught with struggles and quarrels, disputes and arguments, suffering and anxiety, and reached these advanced years to find myself at the end of my tether, tired of everything. I have realized the vanity and futility of my labors and the meanness of my existence. What shall I occupy myself with now and how shall I live out the rest of my days? I am puzzled that I can find no answer to this question. Rule the people? No, the people are ungovernable. Let this burden be shouldered by someone who is willing to contract an incurable malady, or else by an ardent youth with a burning heart. But may Allah spare me this load which is beyond my powers! Shall I multiply the herds? No, I cannot do that. Let the young folk raise livestock if they need them. But I shall not darken the evening of my days by tending livestock to give joy to rogues, thieves and spongers. Occupy myself with learning? But how shall I engage in scholarship when I have no one to exchange an intelligent word with? And then to whom shall I pass on the knowledge I will have amassed? Whom shall I ask what I do not know myself? What's the good of sitting on a desolate steppe with an arshin[1] in hand trying to sell cloth? Too much knowledge becomes gall and wormwood that hastens old age if you have no one by your side to share your joys and sorrows. Choose the path of the Sufi and dedicate myself to the service of religion? No, I'm afraid that won't do either. This vocation calls for serenity and complete peace of mind. But I have not known peace either in my soul or in my life—and what sort of piety can there be amongst these people, in this land! Educate children, maybe? No, this, too, is beyond my powers. I could instruct children, true, but I don't know what I should teach them and how. For what occupation, for what purpose and for what kind of community am I to educate them? How can I instruct them and direct their paths if I don't see where my pupils could usefully apply their learning? And so here, too, I have been unable to put myself to any good use. Well, I have decided at length: henceforth, pen and paper shall be my only solace, and I shall set down my thoughts. Should anyone find something useful here, let him copy it down or memorise it. And if no one has any need of my words, they will remain with me anyway. And now I have no other concern than that."
]

Calculation of metrics

In [112]:
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

reference_tokens = nltk.word_tokenize(reference_translation[0])
machine_tokens = nltk.word_tokenize(machine_translation[0])

# BLEU Score
bleu_score = sentence_bleu([reference_tokens], machine_tokens, smoothing_function=SmoothingFunction().method1)


# METEOR Score
meteor = meteor_score([reference_tokens], machine_tokens)

# ROUGE Scores
rouge = scorer.score(reference_translation[0], machine_translation[0])

Printing the metrics with analysis

In [113]:
print(f"BLEU Score: {bleu_score}")

if bleu_score > 0.5:
    print("This BLEU score indicates a high level of overlap.")
elif bleu_score > 0.3:
    print("This BLEU score indicates a moderate level of overlap.")
else:
    print("This BLEU score indicates a low level of overlap.")


print(f"\nMETEOR Score: {meteor}")

if meteor > 0.5:
    print("This METEOR score indicates a high level of semantic similarity.")
elif meteor > 0.3:
    print("This METEOR score indicates a moderate level of semantic similarity.")
else:
    print("This METEOR score indicates a low level of semantic similarity.")


print(f"\nROUGE1 Score: {rouge['rouge1'].fmeasure}")

if rouge['rouge1'].fmeasure > 0.5:
    print("This ROUGE-1 score suggests good recall of important content.")
else:
    print("This ROUGE-1 score suggests the translation misses some key content.")

print(f"\nROUGE2 Score: {rouge['rouge2'].fmeasure}")

if rouge['rouge2'].fmeasure > 0.3:
    print("This ROUGE-2 score indicates that some bigrams were preserved in the translation.")
else:
    print("This ROUGE-2 score suggests more work is needed on bigram recall.")


print(f"\nROUGEL Score: {rouge['rougeL'].fmeasure}")

if rouge['rougeL'].fmeasure > 0.4:
    print("This ROUGEL score indicates good sequence preservation in the translation.")
else:
    print("This ROUGEL score suggests poor sequence preservation.")


BLEU Score: 0.09520625399304872
This BLEU score indicates a low level of overlap.

METEOR Score: 0.33767660114289905
This METEOR score indicates a moderate level of semantic similarity.

ROUGE1 Score: 0.5545454545454545
This ROUGE-1 score suggests good recall of important content.

ROUGE2 Score: 0.13211845102505693
This ROUGE-2 score suggests more work is needed on bigram recall.

ROUGEL Score: 0.2863636363636363
This ROUGEL score suggests poor sequence preservation.
