# BLEU

## Let's have a look at one of submissions

In [1]:
!ls ../data/webnlg2017/submissions/upf/

UPF_All_sent_final.txt	WebNLG_V0.2.pdf


In [2]:
!head ../data/webnlg2017/submissions/upf/UPF_All_sent_final.txt

the abilene regional airport serves abilene ( texas ) .
adolfo suárez madrid–barajas airport is in madrid ( paracuellos de jarama , san sebastián de los reyes and alcobendas ) .
the name of the runway of adolfo suárez madrid–barajas airport is 18l/36r .
the icao location identifier of the afonso pena international airport is sbct .
the afonso pena international airport serves curitiba .
the al-taqaddum air base serves fallujah .
the length of the runway of al-taqaddum air base is 3684 meters .
the name of the runway of alderney airport is 14/32 .
the length of the runway at allama iqbal international airport is 3360.12 meters .
the number of the first runway at amsterdam airport schiphol is 18 .


In [3]:
with open('../data/webnlg2017/submissions/upf/UPF_All_sent_final.txt') as f:
    
    candidates = f.readlines()

## Let's have a look at reference texts

In [4]:
import xml.etree.ElementTree as ET

tree = ET.parse("../data/webnlg2017/testdata_with_lex.xml")
root = tree.getroot()

references = []

for entry in root.iter('entry'):
    
    references_of_entry = [ref.text.lower() for ref in entry.findall('lex')]
    references.append(references_of_entry)

In [5]:
references[0]

['abilene, texas is served by the abilene regional airport.',
 'abilene regional airport serves the city of abilene in texas.']

## Let's calculate BLEU with NLTK

In [12]:
from nltk.translate.bleu_score import corpus_bleu

split_candidates = [candidate.split() for candidate in candidates]
split_references = [[reference_text.split() for reference_text in reference] for reference in references]

corpus_bleu(split_references, split_candidates)

0.22506998751796697

In [9]:
from nltk.translate.bleu_score import sentence_bleu

In [13]:
sentence_bleu(split_references[0], split_candidates[0])

0.35930411196308415

# Let's calculate BLEU with multi-bleu.perl

https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/multi-bleu.perl

## First we need to generate reference files

In [34]:
import xml.etree.ElementTree as ET

tree = ET.parse('test_reference_files_bleu/testdata_with_lex.xml')
root = tree.getroot()

references = []
for entry in root.iter('entry'):
    
    references.append([lex.text for lex in entry.findall('lex')])

In [56]:
ref_per_id = [[] for _ in range(8)]

for i in range(8):
    
    for reference_list in references:
        
        if len(reference_list) > i:
            
            ref_per_id[i].append(reference_list[i])
        else:
            ref_per_id[i].append('')

In [57]:
for i, refs in enumerate(ref_per_id):
    
    with open('test_reference_files_bleu/ref_{}.txt'.format(i), 'w') as f:
        
        f.writelines(('{}\n'.format(ref) for ref in refs))

In [58]:
!head test_reference_files_bleu/ref_2.txt


Adolfo Suarez Madrid-Barajas Airport is located in Madrid, Paracuellos de Jarama, San Sebastian de los Reyes and Alcobendas.
The runway name of Adolfo Suarez Madrid-Barajas Airport is 18L/36R.




The runway name of Alderney Airport is 14/32.
The runway at Allama Iqbal International Airport is 3360.12 long.
The number of the 1st runway at Amsterdam Airport Schiphol is 18.


## And then let's try multi-bleu.perl

In [61]:
!../evaluation/webnlg2017/webnlg-baseline-master/multi-bleu.perl test_reference_files_bleu/ref_0.txt test_reference_files_bleu/ref_1.txt test_reference_files_bleu/ref_2.txt < ../data/webnlg2017/submissions/upf/UPF_All_sent_final.txt

BLEU = 6.51, 33.2/12.7/4.4/1.0 (BP=1.000, ratio=1.168, hyp_len=44700, ref_len=38257)


In [62]:
!../evaluation/webnlg2017/webnlg-baseline-master/multi-bleu.perl test_reference_files_bleu/ref_0.txt test_reference_files_bleu/ref_1.txt test_reference_files_bleu/ref_2.txt < baseline_sorted.txt

BLEU = 10.26, 34.8/14.8/7.0/3.2 (BP=0.993, ratio=0.993, hyp_len=35312, ref_len=35561)


In [63]:
!../evaluation/webnlg2017/webnlg-baseline-master/multi-bleu.perl test_reference_files_bleu/ref_0.txt test_reference_files_bleu/ref_1.txt test_reference_files_bleu/ref_2.txt < test_reference_files_bleu/ref_0.txt

BLEU = 100.00, 100.0/100.0/100.0/100.0 (BP=1.000, ratio=1.000, hyp_len=36351, ref_len=36351)
