
## Text Named Entity Similarity Score (evaluation) <br>
                                                 
In the following, we compute the similarity scores for named entities extracted from the translated body text of the articles in the evaluation dataset. <br>

One can argue that extracting named entities after translation is a risky procedure (e.g. translating the German city name _"Essen"_ to the word _"Food"_). We keep track of these issues in **insert DIRECTORY or NAME OF FILE**

In [16]:
import pandas as pd
import spacy 
nlp = spacy.load('en_core_web_sm') 

In [27]:
# 'complete' means no feature (title,keywords, text) is missing from the scrapped json files
# as we feed these features into the neural network, entries missing at least one of them
# cannot be used 
df = pd.read_csv('eval/_EVAL_details_complete_in_df.csv')

In [28]:
df.columns

Index(['pair_id', 'url1_lang', 'url2_lang', 'title1', 'title2', 'keywords1',
       'keywords2', 'text1', 'text2', 'Geography', 'Entities', 'Time',
       'Narrative', 'Overall', 'Style', 'Tone', 'translated_body1',
       'translated_body2'],
      dtype='object')


### Use overlap score for named entities <br>

As a measure of similarity for named entities we use the overlap coefficient. The chosen classes of named entities exclude the ones like _"CARDINAL"_, _"DATE"_, _"MONEY"_, _"ORDINAL"_, _"PERCENT"_, _"QUANTITY"_, and _"TIME"_. <br><br>

The reason for this is that the normalization of NEs in these categories might prove problematic. For instance, understanding that _"20/3/22"_ and _"the previous Sunday"_ correspond to the same date is not obvious. Moreover, we agreed that named entities, such as the ones referring to proper nouns of people or places, might be enough to capture the basic data of the story.

In [30]:
ignore_labels = ['CARDINAL','DATE', 'MONEY', 'ORDINAL', 'PERCENT', 'QUANTITY', 'TIME']

def overlapScore(key1, key2):
    key1 = set(key1)
    key2 = set(key2)
    interset = key1.intersection(key2)
    union = key1.union(key2)
    if len(key1) <1 or len(key2) <1 or len(union) < 1 or len(interset)<1:
        return 0
    return (len(interset)/(len(union)))


text_key_df = pd.DataFrame(columns = ["pair_id","url1_lang", "url2_lang","text1","text2","translated_body1","translated_body2","key1","key2", "key_score"])

for index, row in df.iterrows(): 
    body1 = str(row['translated_body1'])
    body2 = str(row['translated_body2'])
    doc1 = nlp(body1) 
    doc2 = nlp(body2) 
    ner1 = [(((ent.text.lower()).replace("'s", "")).replace("-"," ")).replace(".","") for ent in doc1.ents if ent.label_ not in ignore_labels]
    ner2 = [(((ent.text.lower()).replace("'s", "")).replace("-"," ")).replace(".","") for ent in doc2.ents if ent.label_ not in ignore_labels]
    score = overlapScore(ner1, ner2)
    if score > 0:
        print('---------------------',index,'-----------------------')
        print(body1[:20])
        print(ner1[:3])
        print(body2[:20])
        print(ner2[:3])
        print(score)
    
    pair = str(row['pair_id'])
    entry = {"pair_id": pair,"url1_lang":row['url1_lang'],"url2_lang":row['url2_lang'],"text1":row["text1"],"text2":row["text2"]
             ,"translated_body1":row['translated_body1'],"translated_body2":row['translated_body2'],"key1":ner1,"key2":ner2,"key_score":score}
    text_key_df = text_key_df.append(entry, ignore_index = True) ####

--------------------- 0 -----------------------
MARTINSBURG, W.Va. —
['martinsburg', 'wva', 'new year’s day']
PORT-AU-PRINCE, Hait
['haiti', 'haitian', 'jovenel moïse']
0.045454545454545456
--------------------- 1 -----------------------
Uber has sold its on
['india', 'zomato', 'china']
Rapid digitisation a
['india', 'google', 'boston consulting group']
0.12195121951219512
--------------------- 2 -----------------------
BENGALURU (Reuters) 
['bengaluru', 'reuters', 'india']
BANGALORE: India pla
['india', 'indian space research organisation', 'k sivan']
0.3181818181818182
--------------------- 4 -----------------------
From The Guardian

F
['guardian', 'boeing', 'whole foods']
Police have arrested
['orange country', 'ny', 'jews']
0.03125
--------------------- 5 -----------------------
An exhibition outlin
['spike island', 'the war of independence', 'cork harbour']
Taoiseach Leo Varadk
['taoiseach leo varadkar', 'state', 'new year']
0.1282051282051282
--------------------- 7 ------------

--------------------- 63 -----------------------


George Mason Natio
['tim evanson', 'aristocracy', 'monarchy']
Karl R. H. Frick on 
['karl r h frick', 'karl r h frick’s die erleuchteten', 'ende des 18']
0.005847953216374269
--------------------- 64 -----------------------
DETROIT – Police sai
['detroit', 'detroit', 'oakland']
LIVONIA (WWJ) - The 
['wwj', 'detroit', 'the taco bell']
0.35
--------------------- 67 -----------------------
(CNN) Much has been 
['cnn', 'donald trump', 'christianity']
Who Cares: OZY's 202
['cares', 'ozy', 'cares']
0.1282051282051282
--------------------- 69 -----------------------
2019: Oregon Interna
['oregon international air show', 'mcminnville', 'hillsboro']
2019: Deputies wound
['hagg lake', 'gaston', "the news times'"]
0.13513513513513514
--------------------- 71 -----------------------
Text Size: A- A+

Du
['iraqi', 'us embassy', 'baghdad']
Iran’s Supreme Leade
['iran', 'supreme leader', 'ayatollah ali khamenei']
0.25
--------------------- 73 -------

--------------------- 132 -----------------------
As Northern Ireland 
['northern ireland', 'arlene foster', 'northern ireland']
Irish Foreign Affair
['irish', 'simon coveney', 'arlene foster']
0.23529411764705882
--------------------- 133 -----------------------
Sign up to FREE emai
['free', 'cornwalllive   court insider subscribe thank', 'invalid email']
Sign up to FREE dail
['free', 'invalid email', 'new year']
0.012422360248447204
--------------------- 135 -----------------------
QNA

A two-day works
['qna', 'texas a&m university', 'qatar']
(MENAFN - The Penins
['menafn   the peninsula', 'qatar', 'ooredoo']
0.11428571428571428
--------------------- 137 -----------------------
From Common Dreams


['us', 'dmca\n\n\n\nimpeachment', 'trump']


Protesters burn pr
['the us embassy', 'baghdad', 'khalid mohammed/ap']
0.14414414414414414
--------------------- 138 -----------------------
( MENAFN - Saudi Pre
['menafn   saudi press agency', 'riyadh', 'salman bin']
(MENAFN - Saudi Pres
['mena

--------------------- 189 -----------------------
A woman accused of s
['jewish', 'brooklyn', 'tiffany harris']
A Brooklyn miscreant
['brooklyn', 'orthodox', 'jewish']
0.2647058823529412
--------------------- 190 -----------------------
New Delhi: Delhi Dep
['new delhi', 'delhi', 'manish sisodia']
Guwahati: The Assam 
['guwahati', 'esma', 'the essential services maintenance (assam) act']
0.2916666666666667
--------------------- 191 -----------------------
LOS ANGELES (AFP) - 
['los angeles', 'american', 'syd mead']
LOS ANGELES — The ce
['los angeles', 'southern california’s', 'the central coast']
0.04
--------------------- 192 -----------------------
New Delhi (Sputnik):
['new delhi', 'sputnik', 'india']
India seeking to bec
['india', 'russia', 'the united states']
0.3333333333333333
--------------------- 194 -----------------------
With its healthcare 
['the middle east', 'qatar', 'legatum institute']
Narrowing down a lis
['us', 'barack obama', 'obama']
0.03773584905660377
-----------

--------------------- 252 -----------------------


(Image by SANA) De
['dmca', 'syrians', 'sunnis']
The Archbishop of Ca
['the archbishop of canterbury', 'justin portal welby', 'christians']
0.10526315789473684
--------------------- 253 -----------------------
From Truthdig



Car
['truthdig\n\n\n\ncarbon', 'brian boucheron / flickr', 'the us house of representatives']
As we head into the 
['the roaring twenties 20', 'senate', 'donald trump']
0.07142857142857142
--------------------- 254 -----------------------
Hours after North Ko
['north korean', 'kim jong un', 'donald trump']
North Korean leader 
['north korean', 'kim jong un', 'north korean']
0.4
--------------------- 255 -----------------------
‘Tell them that we C
['christians', 'east', 'west']
» 05/03/2010 09:56


['christian', 'mosul', 'the university of mosul']
0.12195121951219512
--------------------- 256 -----------------------
HYDERABAD: A 37-year
['chennai', 'panjagutta', 'osmania general hospital']
The 37-year-old woma
[

--------------------- 311 -----------------------
A prominent media mo
['pakistan', 'hameed haroon', 'pakistani']
Hundreds of patients
['nhs england', 'calea', 'runcorn']
0.03812316715542522
--------------------- 314 -----------------------
A Montana man was ar
['montana', 'daniel scheihing', 'yellowstone county']
RAPID CITY, S.D. (AP
['rapid city', 'sd', 'ap']
0.14814814814814814
--------------------- 315 -----------------------
Start the New Year o
['the new year', 'the kent street activity centre', 'the kent street activity centre']
There are a lot of f
['the new year', 'course', 'jenn']
0.07142857142857142
--------------------- 316 -----------------------
A hotel chain is sla
['the texas hill country', 'wyndham hotels & resorts', 'dolce hotels']
Texas has long faced
['texas', 'canada', 'us']
0.05128205128205128
--------------------- 317 -----------------------
Advertisement

While
['uk', 'new year', 'kirkwall']
Poor old Rudolph (Pi
['rudolph', 'the north sea', 'rudolph']
0.06976744

--------------------- 368 -----------------------
Larry Jones

The fir
['larry jones', 'oaklawn', 'havre de grace']
Pauseforthecause

Ch
['pauseforthecause', 'mary broman', 'pauseforthecause']
0.030303030303030304
--------------------- 371 -----------------------
This piece was repri
['deafening silence on latest wikileaks drops is its own scandal', 'caitlin johnstone', 'the organisation for the prohibition of chemical weapons']
This piece was repri
['putin meddling', 'folsomnatural', 'the united states']
0.07142857142857142
--------------------- 372 -----------------------
Emami Ltd is quoting
['nifty', 'nifty', 'sensex']
﻿

Indian shares are
['\ufeff\n\n', 'indian', 'asian']
0.09090909090909091
--------------------- 373 -----------------------
FINANCE

(Yonhap)

B
['finance', 'yonhap', 'south korea’s']
US President Donald 
['us', 'donald trump', 'china']
0.2222222222222222
--------------------- 375 -----------------------
US Chief Justice Joh
['us', 'john roberts', 'donald trump']
Ch

--------------------- 432 -----------------------
The Royal Irish Cons
['the royal irish constabulary', 'britain', 'ireland']
The Government is to
['the royal irish constabulary', 'the dublin metropolitan police', 'dmp']
0.2857142857142857
--------------------- 433 -----------------------
“Pasadena, Californi
['pasadena', 'california', 'usa']
After a flyover by a
['b 2', 'california', 'the 131st rose parade']
0.23529411764705882
--------------------- 435 -----------------------
Fire broke out in an
['las vegas', '3750 s palos verdes st', 'glenn puit/']
A report of shots fi
['las vegas', 'atlas apartment homes', 'las vegas']
0.26666666666666666
--------------------- 436 -----------------------
The Reserve Bank rel
['the reserve bank', 'india', 'international investment position']
The Indian Space Res
['the indian space research organisation', 'chinese', 'xiaomi']
0.21052631578947367
--------------------- 439 -----------------------
Horse racing lost so
['marylou whitney', 'randy romero'

--------------------- 500 -----------------------
In a letter to Senat
['senate', 'mitch mcconnell', 'the house of representatives nancy pelosi']
In much of the right
['fox news', 'breitbart', 'donald trump']
0.04838709677419355
--------------------- 501 -----------------------
The Witcher season 1
['henry cavill', 'yennefer', 'vengerberg']
The Witcher season 1
['geralt', 'henry cavill', 'yennefer']
0.2727272727272727
--------------------- 502 -----------------------
The Witcher season 1
['henry cavill', 'yennefer', 'vengerberg']
The Witcher season 1
['geralt', 'henry cavill', 'yennefer']
0.2727272727272727
--------------------- 504 -----------------------
A yellow box indicat
['ai', 'northwestern university', 'ai']
A yellow box indicat
['ai', 'northwestern university', 'reuters']
0.2894736842105263
--------------------- 505 -----------------------
The University has a
['university', 'the phd common entrance test', 'the periyar university ug']
Periyar University (
['periyar university 

--------------------- 561 -----------------------
  anti-semitism  (Im
['muslims', 'jewish', 'jewish']
We’re still more tha
['democratic', 'politico', 'vermont']
0.13513513513513514
--------------------- 562 -----------------------
Aligarh: Aligarh Mus
['aligarh muslim university', 'amu', 'tariq mansoor']
The winter vacation 
['aligarh muslim university', 'the citizenship (amendment) act', 'amu']
0.3076923076923077
--------------------- 564 -----------------------
Conspiracy theorist 
['alex jones', 'infowars', 'texas']
A Texas judge has or
['texas', 'alex jones', 'infowars']
0.5
--------------------- 565 -----------------------
Chicago: Two toddler
['chicago', 'chicago', 'chicago']
Guwahati: The Assam 
['guwahati', 'esma', 'the essential services maintenance (assam) act']
0.5
--------------------- 566 -----------------------
Sign up to FREE dail
['free', 'invalid email', 'new year day']
ROCK HILL, S.C. — Po
['rock hill', 'sc', 'rock hill']
0.09090909090909091
--------------------- 568

--------------------- 620 -----------------------
THE unending saga of
['education', 'ruel reid', 'caribbean maritime university']
Entornointeligente.c
['entornointeligentecom', 'andrew holness', 'administration']
0.05084745762711865
--------------------- 622 -----------------------
One way or another, 
['guardian', 'cowspiracy', 'ellie goulding']
Financial inclusion 
['the 17 sustainable development goals', 'the bill & melinda gates foundation’s', 'goalkeeper’s report']
0.015625
--------------------- 623 -----------------------
Jennifer Dulos’ atto
['jennifer dulos', 'fotis', 'dulos']
A Greek registered t
['greek', 'cameroon', 'greek']
0.022727272727272728
--------------------- 624 -----------------------
(MENAFN)  British au
['menafn', 'british', 'the new year']
ES News email The la
['es news', 'fill', 'register']
0.13953488372093023
--------------------- 625 -----------------------
WASHINGTON, DC-MAY 1
['washington', 'dc', 'michael s williamson']
One of three suspect
['bethesda', 'm

--------------------- 676 -----------------------
The FIFA Club World 
['europe', 'england', 'liverpool']
Hassan Al Thawadi - 
['hassan al thawadi   club', 'qatar', 'world cup  the middle east']
0.08108108108108109
--------------------- 677 -----------------------
Google recently roll
['chrome', 'android', 'chrome']
Realme has very conf
['realme', 'bang', 'realme']
0.1111111111111111
--------------------- 679 -----------------------
Zenerchi, a Salt Lak
['salt lake city', 'bryan brandenburg', 'comic con']
Text size  One way t
['the new york times', 'whatsapp', 'facebook']
0.02857142857142857
--------------------- 681 -----------------------
A number of those de
['the brussels west', 'brussels elsene', 'brussels south']
The evening was not 
['belgium', 'beekkant', 'sint jans molenbeek']
0.07142857142857142
--------------------- 684 -----------------------
Chris Brunskill/Fant
['chris brunskill/fantasista/getty images  liverpool', 'premier league', 'sheffield united']
Klopp reveals how h

--------------------- 725 -----------------------
“They treat us like 
['arizona', 'evan blake', 'jerry white']
  Moderators for Dec
['democratic', 'pbs newshour', 'democrats']
0.07792207792207792
--------------------- 727 -----------------------
A truck driver was a
['east harlem', 'new year eve', 'madison avenue']
A woman was shot and
['northeast philadelphia', 'new year eve', 'frontenac st']
0.05
--------------------- 728 -----------------------
The son of the woman
['utah', 'charlie noxon', 'christopher noxon  ']
Charlie Noxon, the s
['charlie noxon', 'jenji kohan', 'park city']
0.26666666666666666
--------------------- 729 -----------------------
More people died in 
['maine', 'the us census bureau', 'new hampshire']
The annual populatio
['the united states', 'the us census bureau', 'us']
0.37037037037037035
--------------------- 730 -----------------------
PHOENIX — Since it f
['phoenix', 'international rescue committee community engagement coordinator', 'stanford prescott']
Mexi

--------------------- 792 -----------------------
Hello all, and welco
['newcastle united', 'leicester city', 'steve bruce']
'It's not our concer
["liverpool'   rodgers", 'newcastle', 'new year']
0.06451612903225806
--------------------- 793 -----------------------
JAKARTA, ANN, Jan 2 
['jakarta', 'ann', 'halim perdanakusuma airport']
Jakarta experienced 
['jakarta', 'climatology and geophysics agency', 'bmkg']
0.1875
--------------------- 794 -----------------------
More than a million 
['hong kong', 'new year day', 'victoria park']
Tear gas fired durin
['hong kong', 'new year day', 'hong kong']
0.14285714285714285
--------------------- 795 -----------------------
More than a million 
['hong kong', 'new year day', 'victoria park']
Tear gas fired durin
['hong kong', 'new year day', 'hong kong']
0.14285714285714285
--------------------- 796 -----------------------
During one of his fi
['marielle heller', 'a beautiful day', 'neighborhood']
Sign up to FREE dail
['free', 'kentlive   daily 

--------------------- 846 -----------------------
Berlin (DPA-AFX) - S
['berlin', 'dpa afx', 'achim post']
Hannover / Münster (
['dpa afx', 'germany', 'european']
0.18518518518518517
--------------------- 848 -----------------------
Bad Mergentheim.The 
['mergentheim', 'voft environmental and fluid technology', 'h brombach gmbh']
In 2019, findings we
['the data protection basic regulation', 'the "handelsblatt"', 'mecklenburg vorpommern']
0.01694915254237288
--------------------- 850 -----------------------
Rheinische Post Comm
['rheinische post comment: animals', 'horst thoren dusseldorf', 'krefeld']
Düsseldorf (ots) - D
['the düsseldorf "rheinische post', 'deutsche bahn', 'germany']
0.18181818181818182
--------------------- 851 -----------------------
Since today 5 o'cloc
['group', 'deutsche bahn', 'group']
A WLAN logo on the w
['wlan', 'berlin wi fi', 'content view']
0.08
--------------------- 852 -----------------------
Tehran (DPA) - the I
['tehran', 'the iranian ministry of foreig

--------------------- 898 -----------------------
German Children's Fu
['german children fund calls anchoring of children rights', 'the basic law', 'german']
Deutsche Bahn Schult
['deutsche bahn', 'the düsseldorf "rheinische post', 'deutsche bahn']
0.03571428571428571
--------------------- 899 -----------------------
Share now: Share now
['german', 'böller', 'new year eve']
Berlin riots in Leip
['berlin', 'leipzig', 'stuttgart']
0.12121212121212122
--------------------- 900 -----------------------
Slowly lose the view
['netflix', 'amazon', 'sky']
The new series year 
['the witcher', 'the third season', 'babylon berlin']
0.5
--------------------- 901 -----------------------
From DPA 01 January 
['dpa', 'tehran', 'the iranian ministry of foreign affairs']
From DPA 02. January
['the us embassy', 'iraq', 'baghdad']
0.15384615384615385
--------------------- 903 -----------------------
Tragic avalanche det
['tyrol', 'austria', 'german']
His parents reported
['kitzbühel', 'austrian', 'tyrol']

--------------------- 958 -----------------------
By Guillermo Lawyer 
['guillermo lawyer', 'gonzález milton keynes', 'ing']
Of the twenty pilots
['red bull', 'gasly', 'kvyat']
0.15254237288135594
--------------------- 959 -----------------------
It will be the sixth
['muti', 'riccardo muti', 'the vienna philharmonic']
Once again, and Van 
['van 80', 'the vienna philharmonic', 'the new year']
0.0851063829787234
--------------------- 960 -----------------------
"We decided to freez
['oscar herrera ahuad', 'argentina', 'argentina']
The President climbe
['argentina', 'argentina', 'argentina']
0.16666666666666666
--------------------- 961 -----------------------
The celebration for 
['germany', 'krefeld', 'krefeld']
14:41 The most affec
['reuters', 'the new year eve', 'germany']
0.125
--------------------- 963 -----------------------
Caracas (Sputnik) - 
['caracas', 'venezuela', 'cubans']
The Granma was compo
['granma', 'venezuela', 'cuba']
0.19230769230769232
--------------------- 964 ---

--------------------- 1010 -----------------------
Heliciculture Snail 
['heliciculture snail', 'seville', 'spain']
Television The provi
['córdoba', 'spain', 'córdoba updated']
0.10869565217391304
--------------------- 1011 -----------------------
Mourinho was always 
['the premier league', 'marcelo bielsa', 'portuguese']
A particular moment 
['the tottenham hotspur', 'josé mourinho', 'southampton']
0.07142857142857142
--------------------- 1012 -----------------------
David Stern, brain o
['david stern', 'nba', 'the ballocente league']
David Stern, Excommi
['david stern', 'nba', 'the national basketball association']
0.15873015873015872
--------------------- 1013 -----------------------
Baghdad, Iraq - The 
['baghdad', 'iraq', 'baghdad']
At least six people 
['the us embassy', 'baghdad', 'the us embassy']
0.20588235294117646
--------------------- 1014 -----------------------
Stopped to be exchan
['murcia', 'murcia', 'murcia']
The Palma Firemen ha
['the new year', 'twitter']
0.25
-----

--------------------- 1055 -----------------------
The year that has al
['sakoneta gymnastics erritmiko saldea', 'spain', 'saioa agirre']
Today he played his 
['david villa', 'puro', 'japan']
0.05357142857142857
--------------------- 1056 -----------------------
Conserving their Cat
['catholic', 'clara luz roldán', 'the church of san francisco']
Roldán, who won the 
['roldán', 'the catholic church', 'evangelical']
0.045454545454545456
--------------------- 1057 -----------------------
The National Statist
['the national statistics institute', 'spain', 'ine']
Notimex The Mexican 
['notimex', 'mexican', 'mexico']
0.08695652173913043
--------------------- 1058 -----------------------
In Paris, tens of th
['paris', 'the champs elysees', 'times square']
AFP / Santiago thous
['afp /', 'chileans', 'santiago']
0.06896551724137931
--------------------- 1060 -----------------------
Sara Garkunel reques
['sara garkunel', 'the federal courts of comodoro p', 'alberto nisman']
WhatsApp Facebook Tw
[

--------------------- 1111 -----------------------
The next detainees o
['mormon »number of', 'american mormons', 'mexico']
Manifestans who prot
['the us embassy', 'baghdad', 'iraqi']
0.02040816326530612
--------------------- 1112 -----------------------
We express a deep an
['abp', 'mark jędraszewski', 'lech kaczyński']
Meteorologists predi
['north and western poland', 'lublin', 'podlasie']
0.046511627906976744
--------------------- 1113 -----------------------
The US sent addition
['us', 'iraq', 'donald turmp']
Protests Before the 
['the united states', 'baghdad', 'american']
0.2682926829268293
--------------------- 1114 -----------------------
Police officers from
['the food not bombs', 'the metropolitan police headquarters suylwester marczak', 'the metropolitan police']
Dead birds on paveme
['warsaw', 'vote vote share', 'new year eve']
0.05
--------------------- 1116 -----------------------
Jessica Enslow lives
['jessica enslow', 'american', 'salt lake city']
About the Union of I
[

--------------------- 1185 -----------------------
In Bolu, the hand di
['bolu', 'bolu i̇zzet', 'the bayal state hospital']
Some traffic acciden
['sakarya', 'bolu', 'mobese']
0.05555555555555555
--------------------- 1186 -----------------------
13 people who have r
['nigeria', 'west african', 'nigeria']
Due to the events of
['nigeria', 'nigeria', 'nigeria']
0.16666666666666666
--------------------- 1187 -----------------------
Seah disinfected Sak
['research hospital', 'sakarya education', 'research hospital']
Turning from abroad,
['sakarya', 'mevlüt çavuşoğlu', 'turkish']
0.13636363636363635
--------------------- 1188 -----------------------
The ease of WhatsApp
['whatsapp', 'mersin', 'the mersin water and sewerage administration']
MERSIN COMMUNICATION
['mersin', 'mersin', 'noodle kg']
0.1111111111111111
--------------------- 1193 -----------------------
Korona Virus Control
['korona virus control', 'the ministry of the ministry', 'the ministry of interior']
Auditors of the Mini
['th

--------------------- 1239 -----------------------
Getty Images A man d
['wuhan', 'china', 'tencent']
A Wall Street Journa
['wall street journal', 'china', 'asia']
0.15254237288135594
--------------------- 1240 -----------------------
AVANESHNI Govender w
['avaneshni govender', 'scottburgh high’s', 'english']
Jealousy has destruc
['jealousy', 'the south coast herald’s', 'twitter']
0.26666666666666666
--------------------- 1241 -----------------------
Norwegian-born model
['norwegian', 'frida aasen', 'victoria’s']
Norwegian beauty Fri
['norwegian', 'frida aasen', 'instagram']
0.5714285714285714
--------------------- 1243 -----------------------
× Expand File photo 
['lake george', 'albany', 'lake george winter carnival']
× Expand Photo by Ti
['tim weatherwax', 'lake george winter carnival', 'atv poker run']
0.18181818181818182
--------------------- 1244 -----------------------
Sen. Chuck Grassley 
['chuck grassley', 'twitter', 'donald trump']
Sen. Chuck Grassley 
['chuck grassley', 'iow

--------------------- 1291 -----------------------
Mohanlal, the comple
['mohanlal', 'marakkar arabikadalinte simham', 'marakkar arabikadalinte simham']
Marakkar Arabikadali
['marakkar arabikadalinte simham', 'mohanlal', 'malayalam']
0.20833333333333334
--------------------- 1292 -----------------------
Please enable Javasc
['javascript', 'kansas city', 'mo']
KANSAS CITY, Mo. — B
['kansas city', 'mo', 'boulevard brewing company']
0.26666666666666666
--------------------- 1293 -----------------------
Please enable Javasc
['javascript', 'kansas city', 'mo']
KANSAS CITY, Mo. — B
['kansas city', 'mo', 'boulevard brewing company']
0.26666666666666666
--------------------- 1294 -----------------------
In response to Gov. 
['steve sisolak’s', 'covid 19', 'lyon county']
Send this page to so
['the assiniboine park zoo', 'covid 19', 'creature feature']
0.1111111111111111
--------------------- 1297 -----------------------
A longtime reader wh
['democratic', 'democratic', 'bernie sanders']
Democra

--------------------- 1341 -----------------------
Welcome to our new a
['harvey mackay', 'atlanta', 'minneapolis']
"I need you to go ou
['harvey mackay', 'mackaymitchell envelope co', 'elm st se']
0.36363636363636365
--------------------- 1342 -----------------------
The Trump administra
['trump', 'iranian', 'qassem soleimani']
Sri Lanka’s attempt 
['guinness world records', 'colombo', 'independent premium']
0.06329113924050633
--------------------- 1343 -----------------------
With so many busines
['covid 19', 'dread river distillery', 'birmingham']
Hide Transcript Show
['transcript show transcript  news', 'us', 'broad branch']
0.14285714285714285
--------------------- 1344 -----------------------
Bernie Sanders would
['socialists', 'venezuela', 'nicaragua']
From Common Dreams  
['bernie', 'bernie sanders', 'democratic']
0.15384615384615385
--------------------- 1345 -----------------------
On February 28, fede
['venezuelan', 'washington', 'dc']
A jury of 12 Washing
['washington dc',

--------------------- 1394 -----------------------
Mainz (ots) - The la
['mainz', 'pompeii', 'terra x']
Mainz (ots) - to und
['mainz', 'mirko drotschmann', 'youtube']
0.24324324324324326
--------------------- 1395 -----------------------
It follows the stock
['british', 'rishi sunak', 'finance']
It follows the stock
['the federal government', 'eu', 'markit itraxx europe crossover']
0.1
--------------------- 1396 -----------------------
Because he stealed c
['duisburg', 'duisburg rheinhausen', 'duisburg']
A telephone owner tr
['duisburg marxloh', 'duisburg marxloh', 'duisburg']
0.1111111111111111
--------------------- 1398 -----------------------
Cologne (DPA-AFX) - 
['dpa afx', 'corona', 'ströer']
Cologne (DPA-AFX) - 
['dpa afx', 'ströer', 'corona']
0.2727272727272727
--------------------- 1399 -----------------------
Manja Schüle (2.v.r.
['science, research and culture', 'brandenburg', 'the annual press conference of the foundation brandenburg memorial places']
Potsdam.Where are fu
['

--------------------- 1441 -----------------------
Havana.- Cuba confir
['cuba', 'coronavirus', 'italian']
Havana (Sputnik) - C
['havana', 'cuban', 'coronavirus']
0.3333333333333333
--------------------- 1442 -----------------------
Two extensions The p
['josé enrique', 'abuín gey', 'el gum`']
The defense of José 
['josé enrique', 'abuín gey', 'permanent revisable prison']
0.3333333333333333
--------------------- 1443 -----------------------
The new forms of sha
['blablacar', 'galicia', 'a coruña']
The start of the new
['galicia', 'spain', 'galician']
0.25
--------------------- 1444 -----------------------
The President of Ven
['venezuela', 'nicolás maduro', 'washington']
Posted in: Coronavir
['coronavirus', 'chavista', 'coronavirus']
0.23076923076923078
--------------------- 1445 -----------------------
nnn the government u
['labor', 'yolanda diaz', 'ugt']
Published on 02/14/2
['yolanda díaz', 'the agricultural section of the trade unions', 'europa press']
0.171875
-------------------

--------------------- 1485 -----------------------
United Kingdom will 
['united kingdom', 'the united kingdom', 'rishi sunak']
London, 17 Mar (EFE)
['london', 'efe', 'the united kingdom']
0.5
--------------------- 1486 -----------------------
By 2030, it is expec
['the growth and development panorama of colombia', 'colombia', 'colombians']
A growth of 9.1% in 
['colombia', 'colombia', 'juan carlos salazar']
0.2
--------------------- 1487 -----------------------
Beyond the global co
['coronavirus', 'disney', 'paramount pictures']
Due to the Coronavir
['coronavirus', 'paramount pictures', 'venice']
0.45454545454545453
--------------------- 1488 -----------------------
On Monday night, the
['american', 'floyd mayweather jr', 'california']
The hobby of the for
['us', 'floyd mayweather', 'josie harris']
0.46153846153846156
--------------------- 1489 -----------------------
Coffee produced by 3
['abangares', 'monteverde', 'carbon neutral']
A total of 380 Famil
['abangares', 'monteverde', 't

--------------------- 1529 -----------------------
Following the Outbre
['outbreak', 'iraqi', 'the north atlantic treaty']
Ottawa, Jan 8 (Notim
['ottawa', 'notimex', 'canadian']
0.08333333333333333
--------------------- 1530 -----------------------
For 30 years, the me
['the historical affairs commission', 'san luis río colorado', 'the actual city council of san luis']
After several visits
['santos gonzález yescas', 'san luis', 'standing tribune']
0.20833333333333334
--------------------- 1531 -----------------------
1 1 1 1 1 1 1 1 1 1 
['the municipal company of potable water', 'guayaquil', 'directory']
1 1 1 1 1 1 1 1 1 1 
['compances of the municipal company of water', 'guayaquil', 'mount sinai']
0.1111111111111111
--------------------- 1533 -----------------------
At the beginning of 
['sanna marin', 'finnish', 'orb']
Sanna Marin, Finland
['sanna marin', 'finland', 'marin']
0.25
--------------------- 1534 -----------------------
«The Municipal Group
['llerena', 'the interiorritori

--------------------- 1576 -----------------------
Madeleine McCann has
['madeleine mccann', 'portugal', 'maddie mccann']
Madeleine McCann has
['madeleine mccann', 'portugal', 'maddie mccann']
0.42857142857142855
--------------------- 1578 -----------------------
The tragic accident 
['helicopter', 'kobe bryant', 'gianna']
On February 24, the 
['kobere bryanta', 'los angeles', 'beyonce']
0.10810810810810811
--------------------- 1580 -----------------------
Rysie recently relea
['rysie', 'the dobrzany district district', 'west pomeranian']
The Minister of Nati
['national education', 'matur', 'national education']
0.037037037037037035
--------------------- 1581 -----------------------
Two twenty-year-olds
['damasławek', 'dominik zieliński', 'wągrowiec']
The man was stopped.
['society', 'the animal protection act', 'sierż']
0.3333333333333333
--------------------- 1582 -----------------------
During the pandemic,
['pko bank polski', 'sw research', 'treasury']
PKO BP warns against
['pko b

--------------------- 1629 -----------------------
Thousands of Thousan
["dede korkutin '", 'namamgah   thousands of thousands of people', 'namamgah']
The cost of the cost
['eskişehir', 'eskişehir, agriculture', 'bekir pakdemirli']
0.041666666666666664
--------------------- 1631 -----------------------
03.01.2020 12:45 | L
['latest update', 'istanbul', 'eagle']
In the written state
['the public prosecutor of the republic of anadolu', 'ee', 'istanbul']
0.06666666666666667
--------------------- 1632 -----------------------
Bursa Metropolitan M
['bursa metropolitan municipality', 'alinur aktas', 'alinur aktas']
Bursa Bursa Metropol
['bursa bursa metropolitan municipality', 'akp', 'bursa bursa metropolitan municipality']
0.17647058823529413
--------------------- 1635 -----------------------
Belarus President Al
['aleksandr lukashenko', 'lukashenko', 'lukashenko']
883 cases were found
['belarus', 'the belarus ministry of health', 'aleksandr lukashenko']
0.2222222222222222
------------------

--------------------- 1675 -----------------------
The 'underpass is mo
["the maintenance house'", 'istanbul', 'the hashim occupancy pass']
In line with the ins
['the ministry of interior', 'istanbul governorship', 'istanbul']
0.25
--------------------- 1678 -----------------------
Turkey Newspaper Tur
['turkey', 'germany', 'korona']
Turkey Newspaper of 
['turkey newspaper', 'turkey', 'the eu commission']
0.3448275862068966
--------------------- 1679 -----------------------
The 70-year-old Hati
['hatice', 'hatice dalboy', 'the dalboy of dalboy']
Support to the Natio
['the national solidarity campaign support', 'kilis', 'recep tayyip erdogan']
0.047619047619047616
--------------------- 1680 -----------------------
Ouch in Osmaniye Kad
['osmaniye kadirli', 'ministry of hatice foundation', 'certificate kadirli hatice foundation']
The Hatice Foundatio
['the hatice foundation', 'the ministry of hatice foundation', 'the ministry of interior']
0.0967741935483871
--------------------- 1684 ---

--------------------- 1719 -----------------------
The Minister of Manp
['manpower mohammed savan', 'corona', 'ap']
He wrote - Khaled Ha
['khaled hassan:', 'the ministry of manpower', 'manpower mohammed savan']
0.14285714285714285
--------------------- 1720 -----------------------
Japan's Russian emba
['japan', 'russian', 'russian']
Tokyo - France-Press
['tokyo', 'tokyo', 'france']
0.3333333333333333
--------------------- 1721 -----------------------
WASHINGTON (Reuters)
['washington', 'reuters', 'us']
WASHINGTON (Reuters)
['washington', 'reuters', 'us']
0.4375
--------------------- 1722 -----------------------
The Lebanese preside
['lebanese', 'aoun', 'emergency beirut   "jerusalem']
Lebanon: 4 new casua
['lebanon', 'corona', 'beirut']
0.23076923076923078
--------------------- 1723 -----------------------
Shadi Ryan, chairman
['shadi ryan', 'the egyptian automotive company', 'egyptian']
Eng. Shadi Ryan, cha
['shadi ryan', 'egyptian', 'corona']
0.2857142857142857
--------------------- 

--------------------- 1763 -----------------------
Libya News 24-Specia
['libya news', 'the steering board', 'corona']
Libya News 24 - Prim
['libya news', 'tobruk municipal municipality', 'corona']
0.14285714285714285
--------------------- 1764 -----------------------
Asayish Kirkuk confi
['asayish kirkuk', 'arab', 'iraq']
A security force in 
['kirkuk', 'iraq', 'daash']
0.375
--------------------- 1765 -----------------------
Beijing-SANA The Chi
['beijing', 'the china national health commission', 'corona']
Beijing - (dpa): off
['beijing', 'hubei', 'china']
0.2857142857142857
--------------------- 1766 -----------------------
Agencies: Russian Fo
['russian', 'sergei lavrov', 'moscow']
Turkish President Re
['turkish', 'recep tayyip erdogan', 'libyan']
0.4838709677419355
--------------------- 1768 -----------------------
Southern Sudan leade
['juba', 'sudan', 'bloomberg']
Juba (AFP): Southern
['juba', 'afp', 'sudan']
0.6
--------------------- 1769 -----------------------
Hamada al-Sadqi

--------------------- 1814 -----------------------
Poland barely felt t
['poland', 'die welt', 'poland']
The newspaper resemb
['the european commission', 'eu', 'poland']
0.3225806451612903
--------------------- 1815 -----------------------
In the province Sile
['silesia', 'pgg', 'pgg']
Most infections (574
['rybnik jankowice', 'row', 'the katowice mound mine staszic']
0.21951219512195122
--------------------- 1817 -----------------------
According to "Corrie
['corriere della', "sant'anna", 'italy']
A 71-year-old infect
['italy', 'bergamo', 'lombardy']
0.3333333333333333
--------------------- 1818 -----------------------
He has more than 14 
['szczecin', 'the puck island', 'kocin']
"This is a peaceful 
['temporary guardians', 'maja', 'mai shelter guardians']
0.21428571428571427
--------------------- 1820 -----------------------
Bukowina Tatrzańska.
['bukowina', 'tatrzańska', 'tatras']
Bukowina Tatrzańska.
['bukowina', 'tatrzańska', 'tatras']
1.0
--------------------- 1821 --------------

--------------------- 1863 -----------------------
28.02.2020 12:38 | L
['tashköprü', 'kastamonu kaşköprü', 'the ünal atalay']
Kastamonu was conduc
['kastamonu', 'the tashköpron county gendarmerie command', 'jandamonu']
0.08333333333333333
--------------------- 1864 -----------------------
In the case of the P
['mayä', 'balä ± fries', 'bursa']
The Chalkananıkkanış
['mayä ± s', 'lerı', 'corona virilurgiti normal normal survari normal normal survari']
0.5263157894736842
--------------------- 1865 -----------------------
The Chairman of the 
['the united states', 'usa', 'donald trump']
The number of viruse
['us', 'coronavirus', 'usa']
0.36363636363636365
--------------------- 1866 -----------------------
The corps of 7 of th
['iranian', 'van   van instructor', 'iranian']
Research and examina
['iranian', 'iranian', 'afghanistan']
0.36363636363636365
--------------------- 1867 -----------------------
22.01.2020 16:46 | R
['hatay', 'yayladağı district', 'tahir yılmaz']
In the last days the
[

--------------------- 1905 -----------------------
In Maltepe Central M
['maltepe central mosque', 'the social distance friday prayer', 'maltepe central mosque']
27.02.2020 23:40 | L
['maltepe peaceful streets', 'the maltepe county police department', 'gülsuyu']
0.16666666666666666
--------------------- 1906 -----------------------
"Was the President H
['hawk', 'bilecik', 'semih sahin']
Bilecik's infrastruc
['bilecik', 'bilecik', 'semih sahin']
0.5
--------------------- 1909 -----------------------
11.02.2020 14:25 | L
['recep tayyip erdogan', 'recep tayyip erdogan', 'the cherry harbor neighborhood']
President Recep Tayy
['recep tayyip erdogan', 'erdogan', 'couples']
0.14285714285714285
--------------------- 1910 -----------------------
Eat. FDP Council Fac
['hans peter schönweiß', 'thuringia', 'thuringia']
Food: FDP split to P
['thuringia the essen fdp', 'fdp', 'thomas kemmerich']
0.2
--------------------- 1911 -----------------------
Thuringia CDU top ag
['thuringia cdu', 'althaus', 

--------------------- 1952 -----------------------
London (DPA-AFX) - t
['london', 'dpa afx', 'british']
London (DPA-AFX) - I
['london', 'dpa afx', 'england']
0.35714285714285715
--------------------- 1953 -----------------------
Archaeologists have 
['chinese', 'chinese', 'henan']
Archaeologists have 
['east asia', 'chinese', 'plos one']
0.2777777777777778
--------------------- 1954 -----------------------
The planned construc
['oberdieten', 'bürgerhaus', 'mark natel share']
No photo from Lahnau
['lahnau', 'the spd group', 'heick lyding']
0.12121212121212122
--------------------- 1956 -----------------------
Police Bonn POL-BN: 
['bonn', 'königswinter oberplesis bonn', 'ots']
Police Bonn POL-BN: 
['bonn', 'bonn', 'the duisdorfer police watch']
0.14285714285714285
--------------------- 1957 -----------------------
Potsdam The security
['potsdam', 'jewish', 'brandenburg']
Potsdam. Half a year
['potsdam', 'halle', 'interior']
0.19047619047619047
--------------------- 1958 ---------------

--------------------- 2001 -----------------------
A country in the exc
['the national guard', 'us', 'the white house']
US President Donald 
['us', 'donald trump', 'us']
0.22727272727272727
--------------------- 2002 -----------------------
Just on the day when
['place hirsch gastronom', 'herbert schwarz', 'angie']
Secretly, quiet and 
['herbert schwarz', 'herbert osterbauer', 'project']
0.3333333333333333
--------------------- 2003 -----------------------
Wiesbaden - Police H
['wiesbaden   police headquarters', 'west hesse', 'pol wi:']
Wiesbaden - Police H
['wiesbaden   police headquarters', 'west hesse', 'pol wi:']
0.14953271028037382
--------------------- 2004 -----------------------
Berlin (DPA-AFX) - S
['berlin', 'dpa afx', 'australia']
Berlin (DPA-AFX) - T
['berlin', 'dpa afx', 'luisa neubauer']
0.5714285714285714
--------------------- 2005 -----------------------
Avatar_shz by DPA 30
['dpa', 'hamburger sv', 'nuremberg']
For the gates, howev
['bakery jatta', 'luke hinterseer', 's

--------------------- 2043 -----------------------
From recommendation 
['novartis', 'switzerland', 'the pharmaconzern novartis']
The novel coronaviru
['swiss', 'the swiss federal technical college', 'eth']
0.20833333333333334
--------------------- 2044 -----------------------
Schwerin. According 
['schwerin', 'verdi', 'the municipal employer association']
Schwerin. In public 
['schwerin', 'mecklenburg vorpommern', 'lowbus nordwestmecklenburg gmbh']
0.3076923076923077
--------------------- 2045 -----------------------
Neos to Corona: "Pre
['corona', 'neos', 'beate meinl reisinger']
Neos apply for a sup
['vienna', 'ots', 'neos']
0.25
--------------------- 2046 -----------------------
At a heavy bus accid
['india', 'uttar pradesh', 'the firozab district']
Many other people we
['poles', 'bihar', 'uttar pradesh']
0.2222222222222222
--------------------- 2048 -----------------------
Zurich (DPA-AFX) - T
['zurich', 'dpa afx', 'the swiss bank credit suisse']
Zurich (DPA-AFX) - T
['zurich', 'd

--------------------- 2091 -----------------------
United Internet with
['united internet ag', 'annual results/dividend', 'cet/cest disclosure']
United Internet invo
['united internet', 'eur', 'united internet ag']
0.30612244897959184
--------------------- 2092 -----------------------
Drive-in cinemas ope
['hohenfelden', 'erfurt', 'thuringia']
Erfurt. The number o
['erfurt', 'erfurt', 'next page']
0.02702702702702703
--------------------- 2093 -----------------------
Baku, May 17, Azerta
['baku', 'azertac', 'turkey']
Ankara, April 18, Az
['ankara', 'azertac the turkish health', 'fahrettin koca']
0.1
--------------------- 2094 -----------------------
Editorial by Jürgen 
['jürgen elsässer', 'compact magazine', 'corona']
While Ministerskin J
['ministerskin jens spahn', 'china', 'germany']
0.04838709677419355
--------------------- 2095 -----------------------
First infection with
['coronavirus', 'remsmurrecreis', 'rudersberg']
InterCommunale Train
['intercommunale training square fair pho

--------------------- 2133 -----------------------
Frankfurt (DPA-AFX) 
['frankfurt', 'dpa afx', 'the german press agency']
Berlin (DPA-AFX) - t
['berlin', 'dpa afx', 'germany']
0.3333333333333333
--------------------- 2134 -----------------------
Came the Coronavirus
['coronavirus', 'germany', 'italy']
The Coronavirus has 
['coronavirus', 'kiel   how', 'china']
0.22448979591836735
--------------------- 2135 -----------------------
During the night, th
['holzerstrasse', 'basel', 'the rescue basel city']
A passenger car drov
['basel', 'the cantonal police basel city', 'sanität']
0.08333333333333333
--------------------- 2136 -----------------------
advertisement  Good 
['karlsruhe', 'pforzheimer', 'kaiserstraße']
Kahlschlag at Galeri
['kahlschlag', 'galeria karstadt kaufhof', 'pforzheim / karlsruhe']
0.3125
--------------------- 2137 -----------------------
Paris / London (DPA-
['paris', 'london', 'dpa afx']
Paris / London (DPA-
['paris', 'london', 'dpa afx']
0.25
--------------------- 

--------------------- 2180 -----------------------
The anti-fascist all
['oberhausen', 'oberhausen', 'alliance oberhausen']
At the main station 
['oberhausen', 'oberhausen', 'plant villion guard']
0.1111111111111111
--------------------- 2181 -----------------------
With 120 participant
['the painted birds', 'jumping tourism, paysandú', 'río negro']
Last Wednesday, the 
['uruguayan', 'uruguayans', 'the tourism painted birds']
0.21739130434782608
--------------------- 2182 -----------------------
The Director of the 
['the bank nation', 'claudio lozano', 'radio nacional']
The director of the 
['the bank nation', 'claudio lozano', 'salvemos kamchatka']
0.18181818181818182
--------------------- 2183 -----------------------
Santo Domingo, RD Ph
['rd physicians', 'luis abinader', 'víctor atallah']
The presidential can
['modern revolutionary party', 'prm', 'luis abinader']
0.3333333333333333
--------------------- 2184 -----------------------
The State American S
['the state american state', 

--------------------- 2220 -----------------------
Appearance of the X-
['the united states air force', 'x 37b space drone', 'cape cañaveral']
The logos have been 
['the united states', 'donald trump', 'twitter']
0.08108108108108109
--------------------- 2221 -----------------------
The Mexican peso is 
['mexican', 'the federal reserve', 'the united states']
Reuters.- The Mexica
['mexican', 'the federal reserve', 'the united states']
0.4444444444444444
--------------------- 2223 -----------------------
Córdoba.- A man lost
['córdoba ', 'villa', 'corina']
A 37-year-old man wa
['pitbull race', 'barrio villa corina', 'córdoba']
0.16666666666666666
--------------------- 2224 -----------------------
The scandals for the
['the procurator office of córdoba', 'luis antonio renal', 'juan felipe angulo']
The W knew that in n
['the department of córdoba', 'the regional office of córdoba', 'certé']
0.14705882352941177
--------------------- 2225 -----------------------
As reported by Execu
['anses'

--------------------- 2264 -----------------------
TPA Group wins Real 
['tpa group', 'real estate brand award', 'vienna']
Deadline: 10 March 2
['albania', 'bosnia herzegovina', 'bulgaria']
0.16666666666666666
--------------------- 2266 -----------------------
After the Covid-19-d
['british', 'boris johnson', 'london']
Boris Johnson,Britis
['boris johnson', 'british', 'oxygen']
0.2413793103448276
--------------------- 2267 -----------------------
Magdeburg - Saxony-A
['magdeburg', 'corona', 'saxony anhalt']
Hydrogen is indispen
['hydrogen', 'the fraunhofer institute for factory operation', 'automation iff']
0.09090909090909091
--------------------- 2268 -----------------------
Weather experts prov
['the british weather service', 'germany', 'dominik jung']
Elements of 2nd Armo
['combat team', '1st cavalry division', 'the hohenfels training area for combined resolve xiii']
0.06666666666666667
--------------------- 2269 -----------------------
Singer of the former
['the boyband dream stre

--------------------- 2322 -----------------------
The new normality pr
['federal finance', 'olaf scholz', 'german']
by Martin Armstrong 
['martin armstrong  the saudis arabia', 'coronavirus', 'olaf scholz']
0.2727272727272727
--------------------- 2323 -----------------------
The igniter decides 
['karl friedrich schröder', 'dortmund', 'the second world war']
Frank Leboeuf questi
['frank leboeuf', 'dortmund', 'jadon sancho']
0.041666666666666664
--------------------- 2328 -----------------------
News ticker for the 
['the free state  + ©', 'corona', 'markus söder']
A day after he was p
['angela merkel', 'corona', 'germany']
0.0851063829787234
--------------------- 2329 -----------------------
The next returnees f
['the corona crisis area', 'the chinese province of hubei', 'germany']
The Berlin governmen
['berlin', 'german', 'german']
0.1388888888888889
--------------------- 2330 -----------------------
1 Lighthouse on Rüge
['germany', 'adobe stock / madlen steiner', 'corona']
As of Mo

--------------------- 2384 -----------------------
The coronavirus has 
['austria', 'tyrol', 'switzerland']
Shortly before the s
['austria', 'munich', 'germany']
0.16071428571428573
--------------------- 2385 -----------------------
Berlin / Grünheide (
['berlin / grünheide', 'us', 'berlin']
Tesla Inc. (NASDAQ: 
['tesla inc', 'nasdaq', 'berlin']
0.125
--------------------- 2386 -----------------------
Interview «For inter
['michel rochat', 'the hotelfachschule lausanne', 'china']
COVID-19 is a pandem
['covid 19', 'france', 'chinese']
0.11428571428571428
--------------------- 2388 -----------------------
The Israeli Foreign 
['israeli', 'israel katz', 'the world forum']
cnxps.cmd.push(funct
['cnxps', 'iran', 'benjamin netanyahu']
0.2727272727272727
--------------------- 2389 -----------------------
Nairobi (dpa) - in K
['nairobi', 'kenya', 'kenyans']
press release  For 4
['juliet m', 'kenyan', 'nairobi']
0.14285714285714285
--------------------- 2390 -----------------------
Health Minis

--------------------- 2434 -----------------------
In the midst of the 
['meghan markle', 'prince harry', 'george floyd']
Sign up to FREE dail
['free', 'invalid email', 'meghan markle’s']
0.24242424242424243
--------------------- 2435 -----------------------
Image: Pixnio.com by
['bicanski', 'italian', 'kornelia kirchweger']
A distressing video 
['bergamo', 'italy', 'sky news']
0.23809523809523808
--------------------- 2436 -----------------------
The crisis summit in
['wef', 'us', 'donald trump']
Davos forum: Trump t
['iranian', 'klaus schwab', 'founder']
0.2236842105263158
--------------------- 2437 -----------------------
"Our house is still 
['swedish', 'greta thunberg', 'world economic forum']
DAVOS — U.S. Preside
['us', 'donald trump', 'greta thunberg']
0.3125
--------------------- 2438 -----------------------
Greta Thunberg is kn
['greta thunberg', 'swedine', 'stockholm']
Greta Thunberg, a fr
['greta thunberg', 'greta inc', 'keean bexte']
0.23529411764705882
--------------------

--------------------- 2480 -----------------------
Germany has begun to
['germany', 'austria', 'france']
Mar 16, 2020 , 8:37A
['8:37am germany', 'air pics germany', 'france']
0.4166666666666667
--------------------- 2481 -----------------------
"Help must also be w
['bundeswehr', 'iraq', 'ktcc']
With the situation s
['iraq', 'iran', 'bundeswehr']
0.16
--------------------- 2482 -----------------------
With Judowüfen and K
['judowüfen', 'kartactiffen', 'bond girl pussy galore']
Honor Blackman, best
['honor blackman', 'james bond', 'pussy galore']
0.14
--------------------- 2483 -----------------------
ESA-Mission  + © Pic
['esa mission  + © picture alliance', 'mercury', 'bepicolombo']
Earth To Be Buzzed B
['esa', 'atg', 'earth']
0.25
--------------------- 2484 -----------------------
Knowledge gap at the
['the robert koch institut bernd murawski', 'the united kingdom', 'imperial college london']
South Korea’s Center
['south korea’s', 'centers for disease control and prevention', 'daegu'

--------------------- 2524 -----------------------
After the emperor of
['palestine', 'mashreq', 'the white house']
The Kremlin said on 
['kremlin', 'russia', 'us']
0.30158730158730157
--------------------- 2525 -----------------------
"The exhibition, whi
['red expeditions', 'britain', 'corona']
British Prime Minist
['british', 'boris johnson', 'twitter']
0.1111111111111111
--------------------- 2526 -----------------------
The Public Security 
['the public security sector', 'qena', 'qena sahrawi sahrawi']
Subscribe to receive
['qena', 'al wakar', 'abucott']
0.3
--------------------- 2527 -----------------------
Despite her free fro
['corona', 'yemen', 'yemen']
Follow-up March 20 T
['the joint technical committee', 'corona', 'abdul hakim al kahlani']
0.35714285714285715
--------------------- 2528 -----------------------
He said in a dialogu
['german', 'der spiegel', 'iranian']
Iranian Foreign Mini
['iranian', 'mohammad jawad zarif', 'europe']
0.5714285714285714
--------------------- 2

--------------------- 2574 -----------------------
In Sakarya, two brot
['sakarya', 'the national solidarity campaign', 'the national solidarity campaign, misra']
Donated half of the 
['the national solidarity campaign', 'hüseyin aydin', 'keban']
0.16666666666666666
--------------------- 2575 -----------------------
According to the Dep
['the department of communication', 'oktay interior', 'süleyman soylu']
AA President of Pres
['fuat oktay', 'interior', 'süneyman soylu']
0.21428571428571427
--------------------- 2576 -----------------------
22.02.2020 13:16 | L
['antalya', 'halil özçelik', 'behçet']
A different diagnosi
['sspe', 'sspe', 'özçelik']
0.125
--------------------- 2577 -----------------------
We need to solve the
['idlib', 'russia', 'moscow']
President Recep Tayy
['recep tayyip erdogan', 'azerbaijan', 'libya']
0.1
--------------------- 2578 -----------------------
Gümüşhane quarantine
['gümüşhane', 'gumushane', 'belgium']
Fenerbahce's Goalkee
['fenerbahce goalkeeper berke o

In [33]:
text_key_df.isna().sum()

pair_id             0
url1_lang           0
url2_lang           0
text1               0
text2               0
translated_body1    0
translated_body2    0
key1                0
key2                0
key_score           0
dtype: int64

In [34]:
text_key_df[text_key_df['translated_body1'].isna()]

Unnamed: 0,pair_id,url1_lang,url2_lang,text1,text2,translated_body1,translated_body2,key1,key2,key_score


In [35]:
text_key_df

Unnamed: 0,pair_id,url1_lang,url2_lang,text1,text2,translated_body1,translated_body2,key1,key2,key_score
0,1484084337_1484110209,en,en,"MARTINSBURG, W.Va. — A suspected drunken drive...","PORT-AU-PRINCE, Haiti — Haitian President Jove...","MARTINSBURG, W.Va. — A suspected drunken drive...","PORT-AU-PRINCE, Haiti — Haitian President Jove...","[martinsburg, wva, new year’s day, west virgin...","[haiti, haitian, jovenel moïse, moïse, the nat...",0.045455
1,1576314516_1576455088,en,en,Uber has sold its online food-ordering busines...,Rapid digitisation and growth in both online b...,Uber has sold its online food-ordering busines...,Rapid digitisation and growth in both online b...,"[india, zomato, china, ant financial, zomato, ...","[india, google, boston consulting group, bcg, ...",0.121951
2,1484036253_1483894099,en,en,BENGALURU (Reuters) - India has approved its t...,BANGALORE: India plans to make a fresh attempt...,BENGALURU (Reuters) - India has approved its t...,BANGALORE: India plans to make a fresh attempt...,"[bengaluru, reuters, india, indian space resea...","[india, indian space research organisation, k ...",0.318182
3,1484034982_1483785560,en,en,Asserting that India too should protect the in...,The air quality recorded at 9.38 am was 433 in...,Asserting that India too should protect the in...,The air quality recorded at 9.38 am was 433 in...,"[india, malaysia, indonesia, sea, indonesia, c...","[met, meteorological]",0.000000
4,1484188439_1484378177,en,en,From The Guardian\n\nFrom Boeing to Whole Food...,Police have arrested a 38-year-old black man f...,From The Guardian\n\nFrom Boeing to Whole Food...,Police have arrested a 38-year-old black man f...,"[guardian, boeing, whole foods, donald trump, ...","[orange country, ny, jews, hasidic, monsey, ha...",0.031250
...,...,...,...,...,...,...,...,...,...,...
2585,1586195445_1598778991,tr,tr,"BM, Aden'de 2 bini aşkın iç göçmenin selden za...",BM'den Yemen'de kadınların doğumda ölüm riski ...,"The UN announced that more than 2,000 domestic...","In Yemen, Women's Death Risk Warning Explanati...","[un, aden, the united nations, un, yemen, the ...","[yemen, women death risk warning explanation, ...",0.300000
2586,1590915424_1590940388,tr,tr,Kovid-19'dan dolayı La Liga kulüplerinde hayat...,Yeni tip koronavirüs (Kovid-19) salgınının eko...,"Because of the Kovid-19, the Survival Struggle...",The new type of coronavirus (Kovid-19) is cons...,"[la liga clubs, football league, la liga, span...","[football league, la liga, spanish, the cadena...",0.857143
2587,1526157103_1492737005,tr,tr,\n\n\n\n\n\n\n\nİflas noktasındaki kulüplerin ...,"TFF, resmi internet sitesinden Beşiktaş'ın fai...",It is stated that the sales of the clubs at th...,TFF has published an explanation on the offici...,"[besiktas, ahmet nur dek, turkey banks associa...","[tff, besiktas, team spend limits, ahmet nur r...",0.090909
2588,1603274500_1618292937,tr,tr,Ergene Belediyesi yol çalışmalarına aksatmadan...,Ergene'de Ahimehmet ve Yeşiltepe mahallelerind...,Ergene Municipality continues without disrupti...,"In Ergene, the mask was distributed in Ahimehm...","[ergene municipality, tekirdag, ergene municip...","[ergene, ergene county, ergene, ergene, ergene...",0.166667


In [None]:
path = 'eval/_EVAL_text_named-entity_score.csv'
title_key_df.to_csv(path,index=False)