# Adversarial analysis

We are still wondering why some labels in Maltese dataset are predicted correctly, while others are not. Let's analyse whether we can find factors that have an impact to the accuracy of label predictions by:
- shuffling the word order - to see what is the impact when syntax is destroyed (only lexical information is kept)
- removing punctuation marks - see impact of punctuation marks
- removing numbers - see impact of numbers
- removing function words
- removing suffixes of words (last tokens) - destroying syntax
- identifying key words from the text and removing them - the impact of removing lexical information

In [1]:
# Define the gpu  on the gpu machine
%env CUDA_DEVICE_ORDER=PCI_BUS_ID
%env CUDA_VISIBLE_DEVICES=6


env: CUDA_DEVICE_ORDER=PCI_BUS_ID
env: CUDA_VISIBLE_DEVICES=6


In [2]:
import json
import pandas as pd
import random
import regex as re

In [3]:
# Import the dataset
# Import the final dataset with test sets
with open("manual-annotations/multilingual-genre-annotated-test-set.json") as main_file:
	main_dict = json.load(main_file)

main_dict.keys()

dict_keys(['mt', 'el', 'tr', 'sq', 'is', 'uk', 'ca', 'mk', 'hr', 'sl'])

In [4]:
main_dict["mt"].keys()

dict_keys(['accuracy', 'micro_f1', 'macro_f1', 'dataset', 'token_overlap'])

In [5]:
# Get out a dataset
df = pd.DataFrame(main_dict["mt"]["dataset"])
df.head(2)

Unnamed: 0,text_id,y_pred,text,translation,metadata,y_true,tokens,token_ids,text_norm,tokens_norm
0,macocu.mt.402244,News,"Angelo Chetcuti, se jkun qed jieħu post Bjorn ...","Angelo Chetcuti, will be replacing Bjorn Vassa...",{'text_id': 'macocu.mt.402244'},News,"[▁Angel, o, ▁Che, t, cuti, ,, ▁se, ▁j, kun, ▁q...","[26902, 31, 5024, 18, 64969, 4, 40, 1647, 6262...","angelo chetcuti, se jkun qed jiehu post bjorn ...","[▁angel, o, ▁che, t, cuti, ,, ▁se, ▁j, kun, ▁q..."
1,macocu.mt.377203,Prose/Lyrical,Poltergeist jirreferi għal fenomeni oħra tal-m...,"Poltergeist refers to other woman's phenomena,...",{'text_id': 'macocu.mt.377203'},Opinion/Argumentation,"[▁Pol, ter, geist, ▁jir, re, feri, ▁g, ħ, al, ...","[9017, 720, 178490, 52826, 107, 26926, 706, 24...",poltergeist jirreferi ghal fenomeni ohra tal-m...,"[▁pol, ter, geist, ▁jir, re, feri, ▁, ghal, ▁f..."


In [6]:
# Get label-level scores
label_scores = pd.read_json("datasets/label-level-scores.json")

label_scores

Unnamed: 0,mt,el,tr,sq,is,uk,ca,mk,hr,sl
News,0.692308,0.9,0.952381,0.888889,0.727273,1.0,0.823529,0.909091,0.947368,0.9
Opinion/Argumentation,0.333333,0.869565,0.818182,0.7,0.818182,0.909091,0.842105,0.777778,0.777778,0.823529
Instruction,0.689655,0.705882,0.9,0.947368,0.777778,0.952381,0.75,1.0,0.75,1.0
Information/Explanation,0.521739,0.695652,0.823529,0.8,0.588235,1.0,0.72,0.842105,0.952381,0.9
Promotion,0.818182,0.625,0.857143,0.947368,0.782609,0.777778,0.782609,0.952381,0.869565,1.0
Legal,1.0,1.0,1.0,0.947368,0.842105,1.0,0.947368,0.947368,1.0,0.952381
Forum,0.181818,0.952381,0.888889,0.842105,0.947368,0.947368,0.842105,1.0,0.909091,1.0
Prose/Lyrical,0.181818,1.0,0.952381,0.857143,1.0,1.0,0.909091,0.952381,0.857143,0.909091


In [7]:
# Transform a dataset and predict genres on it

#1. Shuffle words
def shuffle(df):
	texts = df["text"].to_list()

	shuffled_text_list = []

	for text in texts:
		# Remove symbols for new lines
		text = text.replace("\n", "")

		# Split into words
		word_list = text.split(" ")

		# Shuffle words
		random.shuffle(word_list)

		# Join shuffled words into string
		shuffled_text = (" ").join(word_list)

		shuffled_text_list.append(shuffled_text)

	df["shuffled-text"] = shuffled_text_list

	display(df[["text", "shuffled-text"]].head(2))

	return df

In [8]:
df = shuffle(df)

Unnamed: 0,text,shuffled-text
0,"Angelo Chetcuti, se jkun qed jieħu post Bjorn ...",waqt kariga tiegħu illum fil-kariga jkun il-FI...
1,Poltergeist jirreferi għal fenomeni oħra tal-m...,"Xi - reminixxenti tal-familja, Poltergeist dif..."


In [9]:
# Remove structural information - punctuations, new lines

def remove_punct(df):
	texts = df["text"].to_list()

	text_list_no_punct = []

	for text in texts:
		# Remove symbols for new lines
		text = text.replace("\n", "")

		# Remove punctuation
		for punct in [",", ".", "!", "?", ":", "'",  ";", '"', "”", "“"]:
			text = text.replace(punct, "")

		text_list_no_punct.append(text)

	df["text_no_punct"] = text_list_no_punct

	print(df[["text", "text_no_punct"]][df["y_true"] == "News"].head(3).to_dict(orient="records"))

	return df

In [10]:
df = remove_punct(df)

[{'text': "Angelo Chetcuti, se jkun qed jieħu post Bjorn Vassallo bħala segretarju ġenerali tal-Malta Football Association, wara li dan tal-aħħar ingħata kariga fi ħdan il-FIFA. \n\nChetcuti huwa viċi President ta' Birzebbuga FC u Membru tal-Kunsill u tal-Eżekuttiv tal-MFA. Din l-aħbar kienet ikkonfermata fil-Kunsill tal-MFA li sar illum wara nofsinhar fis-sala tas-Centenary Stadium, f'Ta' Qali. \n\nChetcuti jibda fil-kariga ta' Segretarju Ġenerali tal-MFA b'mod effettiv mill-1 ta' Diċembru 2016. \n\nFi stqarrija, l-MFA ferħet lill-Vassallo fl-irwol il-ġdid tiegħu waqt li awgurat lil Angelo Chetcuti fil-kariga l-ġdida tiegħu. \n\n", 'text_no_punct': 'Angelo Chetcuti se jkun qed jieħu post Bjorn Vassallo bħala segretarju ġenerali tal-Malta Football Association wara li dan tal-aħħar ingħata kariga fi ħdan il-FIFA Chetcuti huwa viċi President ta Birzebbuga FC u Membru tal-Kunsill u tal-Eżekuttiv tal-MFA Din l-aħbar kienet ikkonfermata fil-Kunsill tal-MFA li sar illum wara nofsinhar fis-sa

In [71]:
text = df["text"].to_list()
text_list = text.split(" ")
text_list_removed_caps = []
removed_counter = 0

for word in text_list:
	new_word = re.sub(r'.*?[A-Z].*', '', word, re.ASCII)
	# Count how many words were substituted
	if new_word != word:
		# If word was changed, count it
		removed_counter += 1
		#print(f"{word} changed to {new_word}")
	text_list_removed_caps.append(new_word)

new_text = " ".join(text_list_removed_caps)
print(new_text)

# Now also change random words in the word list

text_list_rand_removed = []
removed_list = text_list

for i in range(removed_counter):
	random_word_index = random.randint(0,len(removed_list)-1)
	# Check if element is not empty and remove a word if it is not
	# Otherwise try again
	while len(removed_list[random_word_index]) == 0:
		random_word_index = random.randint(0,len(removed_list)-1)
	#Substitute random word with empty space
	#print(f"Removed word: {removed_list[random_word_index]}")
	removed_list[random_word_index] = ""

random_text = " ".join(removed_list)
print(random_text)

print(len(text.split(" ")))
print(len(new_text.split(" ")))
print(len(random_text.split(" ")))


  fis ieħor fid-dinja.  tip ta 'portal bejn ir-realtà u l-immaġinazzjoni tal-awtur. tè sħun u popolari xogħol ta 'letteratura se tagħmel pastime memorabbli. 

 bħala l-aqwa ħabib 

 wieħed minna għandu preferenzi tagħhom stess.  huma manifestat fl-għażla ta 'letteratura.  ħadd iħobb fantaxjenza, xi ħadd bħal ditektifs u ruħ waħda huwa li rumanzi.  kollu hekk individwalment. 

 ħaġa importanti f'dan il-każ - li pick up ktieb korrett.  il-każ, inti tista 'tgħin lill-istituzzjonijiet speċjali, li firxa ta' letteratura hija tant kbir li jista 'jissodisfa l-aktar arbitrarja tal-qarrej.  minnhom jaġixxi   msemmi wara    min qatt żar dan, qatt ma tinsa l-iskala enormi tal-kelma letterarja.  bla preċedent ta 'kotba se timpressjona anki l-waħda li ma kinux mimsus minnhom. 

  hija waħda mill-eqdem bliet u isbaħ mhux biss  iżda madwar id-dinja,    arkitettoniku tikkawża l-xi ħadd tidwiba li jottjeni hemm.  u lussu, li hija mogħnija bl-librerija presidenzjali  sempliċiment aqwa. 

armament letter

In [98]:
# Another transformation is to remove proper nouns - words with capital letters (this would remove also all beginning of the sentence, but we cannot do it otherwise)

def remove_cap(df):
	texts = df["text"].to_list()
	text_list_no_capital = []
	text_list_random_removed = []
	for text in texts:
		text_list = text.split(" ")
		text_list_removed_caps = []
		removed_counter = 0
		removed_list = text_list

		for word in text_list:
			new_word = re.sub(r'.*?[A-Z].*', '', word, re.UNICODE)
			# Count how many words were substituted
			if new_word != word:
				# If word was changed, count it
				removed_counter += 1
				#print(f"{word} changed to {new_word}")
			text_list_removed_caps.append(new_word)

		new_text = " ".join(text_list_removed_caps)
		text_list_no_capital.append(new_text)

		# Now also change random words in the word list
		for i in range(removed_counter):
			random_word_index = random.randint(0,len(removed_list)-1)
			# Check if element is not empty and remove a word if it is not
			# Otherwise try again
			while len(removed_list[random_word_index]) == 0:
				random_word_index = random.randint(0,len(removed_list)-1)
			#Substitute random word with empty space
			#print(f"Removed word: {removed_list[random_word_index]}")
			removed_list[random_word_index] = ""

		random_text = " ".join(removed_list)
		text_list_random_removed.append(random_text)
		#print(len(text.split(" ")))
		#print(len(new_text.split(" ")))
		#print(len(random_text.split(" ")))

	for id in list(range(5)):
		print(len(text_list_no_capital[id].split(" ")), len(text_list_random_removed[id].split(" ")))

	df["text_no_capital"] = text_list_no_capital
	df["text_no_capital_rand"] = text_list_random_removed

	with pd.option_context('display.max_colwidth', 500):
		display(df[["text", "text_no_capital", "text_no_capital_rand"]][df["y_true"] == "Legal"].head(3))
	return df

In [99]:
df = remove_cap(df)

84 84
264 264
512 512
512 512
512 512


Unnamed: 0,text,text_no_capital,text_no_capital_rand
20,Document 62009CJ0162 \n\nJudgment of the Court (Third Chamber) of 7 October 2010.#Secretary of State for Work and Pensions v Taous Lassal.#Reference for a preliminary ruling: Court of Appeal (England &amp; Wales) (Civil Division) - United Kingdom.#Reference for preliminary ruling - Freedom of movement for persons - Directive 2004/38/EC - Article 16 - Right of permanent residence - Temporal application - Periods completed before the date of transposition.#Case C-162/09. \n\nSentenza tal-Qorti...,\n\n of the of 7 of for and v for a preliminary ruling: of &amp; - for preliminary ruling - of movement for persons - - 16 - of permanent residence - application - completed before the date of \n\n tal-Ġustizzja tas-7 ta' 2010. of for and vs għal deċiżjoni preliminari: of &amp; - preliminari - liberu tal-persuni - - 16 - ta' residenza permanenti - fiż-żmien - li ġew fi tmiemhom qabel id-data ta' traspożizzjoni. \n\n of &amp; \n\n...,Document the Court (Third Chamber) of 7 2010.#Secretary State for Work v Taous Lassal.#Reference for a ruling: Court Appeal (England &amp; Wales) (Civil Division) - Kingdom.#Reference for preliminary ruling - Freedom of for persons - Directive - Article - Right of permanent residence Temporal application Periods completed before the date of transposition.#Case C-162/09. tal-Ġustizzja (it-Tielet tas-7 ta' 2010. Secretary of State for Work and Taous Talba għal deċiżjoni p...
26,"Press Release \n\nWhilst referring to the introduction of gay marriage which was pledged by both major political parties in their respective electoral manifestos, the Cana Movement affirms the following. \n\n1. Every person, regardless of his sexual orientation or life preferences should be treated with utmost dignity. \n\n2. Equality before the law should not be a pretext to deny differentiation. Whereas it is the legislator's duty to regulate different forms of relationships, this is not a...","\n\n referring to the introduction of gay marriage which was pledged by both major political parties in their respective electoral manifestos, the affirms the following. \n\n1. person, regardless of his sexual orientation or life preferences should be treated with utmost dignity. \n\n2. before the law should not be a pretext to deny differentiation. it is the legislator's duty to regulate different forms of relationships, this is not a justification to redefine marriage or attempt to ...","Press \n\nWhilst referring to the introduction of gay marriage which was pledged by both political parties in their respective electoral manifestos, Cana Movement affirms the following. \n\n1. Every person, of his sexual orientation or life preferences should be treated with utmost dignity. \n\n2. before the law should not be a pretext to deny differentiation. Whereas it is the legislator's duty to regulate different forms of relationships, this is not a to redefine or attempt to alte..."
28,"Plaintiff \n\nDefendant \n\nKeywords \n\nUnfair Contract Terms Directive, Article 2 Unfair Contract Terms Directive, Article 3, 1. Unfair Contract Terms Directive, Article 4, 1. \n\nHeadnote \n\nRefusal of tribunal to take cognizance of submission that conditions of carriage are irregular. \n\nFacts \n\nPlaintiff company made a lawsuit after being surrogated in the rights of various consumers insured with it, claiming damages suffered by the said consumers following non-delivery and /or dama...","\n\n \n\n \n\n 2 3, 1. 4, 1. \n\n \n\n of tribunal to take cognizance of submission that conditions of carriage are irregular. \n\n \n\n company made a lawsuit after being surrogated in the rights of various consumers insured with it, claiming damages suffered by the said consumers following non-delivery and /or damage to luggage after flying with defendant company. company after undertaking its investigations had duly paid the various consumers for the damages incurred as a ...","Plaintiff \n\nDefendant \n\nKeywords \n\nUnfair Contract Terms Directive, 2 Unfair Contract Terms Directive, Article 3, 1. Contract Terms Directive, Article 4, 1. \n\nHeadnote \n\nRefusal of tribunal to take cognizance of submission that conditions of carriage are irregular. \n\nFacts \n\nPlaintiff company made a lawsuit after being surrogated the various consumers insured with it, claiming damages suffered by the said following non-delivery and /or damage to luggage after flying defe..."


In [106]:
with pd.option_context('display.max_colwidth', 500):
	display(df[df["y_pred"] == "Legal"][["text"]])

Unnamed: 0,text
20,Document 62009CJ0162 \n\nJudgment of the Court (Third Chamber) of 7 October 2010.#Secretary of State for Work and Pensions v Taous Lassal.#Reference for a preliminary ruling: Court of Appeal (England &amp; Wales) (Civil Division) - United Kingdom.#Reference for preliminary ruling - Freedom of movement for persons - Directive 2004/38/EC - Article 16 - Right of permanent residence - Temporal application - Periods completed before the date of transposition.#Case C-162/09. \n\nSentenza tal-Qorti...
26,"Press Release \n\nWhilst referring to the introduction of gay marriage which was pledged by both major political parties in their respective electoral manifestos, the Cana Movement affirms the following. \n\n1. Every person, regardless of his sexual orientation or life preferences should be treated with utmost dignity. \n\n2. Equality before the law should not be a pretext to deny differentiation. Whereas it is the legislator's duty to regulate different forms of relationships, this is not a..."
28,"Plaintiff \n\nDefendant \n\nKeywords \n\nUnfair Contract Terms Directive, Article 2 Unfair Contract Terms Directive, Article 3, 1. Unfair Contract Terms Directive, Article 4, 1. \n\nHeadnote \n\nRefusal of tribunal to take cognizance of submission that conditions of carriage are irregular. \n\nFacts \n\nPlaintiff company made a lawsuit after being surrogated in the rights of various consumers insured with it, claiming damages suffered by the said consumers following non-delivery and /or dama..."
37,"Document 01962R0031-20140501 \n\nTHE COUNCIL OF THE EUROPEAN ECONOMIC COMMUNITY, \n\nTHE COUNCIL OF THE EUROPEAN ATOMIC ENERGY COMMUNITY, \n\nHaving regard to the Treaty establishing the European Economic Community, and in particular Articles 179, 212 and 215 thereof; \n\nHaving regard to the Treaty establishing the European Atomic Energy Community, and in particular Articles 152, 186 and 188 thereof; \n\nHaving regard to the Protocol on the Privileges and Immunities of the European Economic..."
41,Document 62019CJ0223 \n\nJudgment of the Court (Third Chamber) of 24 September 2020.#YS v NK.#Request for a preliminary ruling from the Landesgericht Wiener Neustadt.#Reference for a preliminary ruling – Equal treatment in employment and occupation – Directives 2000/78/EC and 2006/54/EC – Scope – Prohibition of indirect discrimination on grounds of age or sex – Justifications – National legislation providing for an amount to be withheld from pensions paid directly to their recipients by unde...
45,"Document 62019CJ0219 \n\nJudgment of the Court (Tenth Chamber) of 11 June 2020.#Parsec Fondazione Parco delle Scienze e della Cultura v Ministero delle Infrastrutture e dei Trasporti and Autorità nazionale anticorruzione (ANAC).#Request for a preliminary ruling from the Tribunale Amministrativo Regionale per il Lazio.#Reference for a preliminary ruling – Public works contracts, public supply contracts and public service contracts – Directive 2014/24/EU – Procurement procedure for the award o..."
55,Document 62008CJ0480 \n\nJudgment of the Court (Grand Chamber) of 23 February 2010.#Maria Teixeira v London Borough of Lambeth and Secretary of State for the Home Department.#Reference for a preliminary ruling: Court of Appeal (England &amp; Wales) (Civil Division) - United Kingdom.#Freedom of movement for persons - Right of residence - National of a Member State who worked in another Member State and remained there after ceasing to work - Child in vocational training in the host Member Stat...
57,"Document 62014CJ0547 \n\nJudgment of the Court (Second Chamber) of 4 May 2016.#Philip Morris Brands SARL and Others v Secretary of State for Health.#Request for a preliminary ruling from the High Court of Justice, Queen's Bench Division (Administrative Court).#Reference for a preliminary ruling – Approximation of laws – Directive 2014/40/EU – Articles 7, 18 and 24(2) and (3) – Articles 8(3), 9(3), 10(1)(a), (c) and (g), 13 and 14 – Manufacture, presentation and sale of tobacco products – Val..."
69,"(1) The price stipulated in a contract is not to be considered an unfair term. \n\n(2) A contract may include standard expressions, created to protected the needs of the trade, however these must also take into account the interests and rights of the consumer. \n\nFacts \n\nThe parties had entered into an agreement whereby the plaintiff was to provide advertising services to the defendants. Payment was to be effected to the plaintiff only on condition that the property being advertised was s..."
73,Document 32016R0429 \n\nRegulation (EU) 2016/429 of the European Parliament and of the Council of 9 March 2016 on transmissible animal diseases and amending and repealing certain acts in the area of animal health (‘Animal Health Law') (Text with EEA relevance) \n\nDraft implementing regulation draft Commission Implementing Regulation on categories of listed animal diseases and related control rules; \n\nDraft delegated regulation Movements within the Union of germinal products of certain kep...


In [96]:
# Do the same, but with numbers
def remove_num(df):
	texts = df["text"].to_list()
	text_list_no_num = []
	text_list_random_removed = []
	for text in texts:
		text_list = text.split(" ")
		text_list_removed_nums = []
		removed_counter = 0
		removed_list = text_list

		for word in text_list:
			new_word = re.sub(r'.*?\d.*', '', word, re.UNICODE)
			#new_word = re.sub(r'\w*\d\w*', '', word, re.UNICODE)
			# Count how many words were substituted
			if new_word != word:
				# If word was changed, count it
				removed_counter += 1
				#print(f"{word} changed to {new_word}")
			text_list_removed_nums.append(new_word)

		new_text = " ".join(text_list_removed_nums)
		text_list_no_num.append(new_text)

		# Now also change random words in the word list
		for i in range(removed_counter):
			random_word_index = random.randint(0,len(removed_list)-1)
			# Check if element is not empty and remove a word if it is not
			# Otherwise try again
			while len(removed_list[random_word_index]) == 0:
				random_word_index = random.randint(0,len(removed_list)-1)
			#Substitute random word with empty space
			#print(f"Removed word: {removed_list[random_word_index]}")
			removed_list[random_word_index] = ""

		random_text = " ".join(removed_list)
		text_list_random_removed.append(random_text)
		#print(len(text.split(" ")))
		#print(len(new_text.split(" ")))
		#print(len(random_text.split(" ")))

	for id in list(range(5)):
		print(len(text_list_no_num[id].split(" ")), len(text_list_random_removed[id].split(" ")))

	df["text_no_num"] = text_list_no_num
	df["text_no_num_rand"] = text_list_random_removed
	
	with pd.option_context('display.max_colwidth', 500):
		display(df[["text", "text_no_num", "text_no_num_rand"]][df["y_true"] == "Legal"].head(3))

	return df

In [107]:
df

Unnamed: 0,text_id,y_pred,text,translation,metadata,y_true,tokens,token_ids,text_norm,tokens_norm,shuffled-text,text_no_punct,text_no_capital,text_no_capital_rand,text_no_num,text_no_num_rand
0,macocu.mt.402244,News,"Angelo Chetcuti, se jkun qed jieħu post Bjorn ...","Angelo Chetcuti, will be replacing Bjorn Vassa...",{'text_id': 'macocu.mt.402244'},News,"[▁Angel, o, ▁Che, t, cuti, ,, ▁se, ▁j, kun, ▁q...","[26902, 31, 5024, 18, 64969, 4, 40, 1647, 6262...","angelo chetcuti, se jkun qed jiehu post bjorn ...","[▁angel, o, ▁che, t, cuti, ,, ▁se, ▁j, kun, ▁q...",waqt kariga tiegħu illum fil-kariga jkun il-FI...,Angelo Chetcuti se jkun qed jieħu post Bjorn V...,se jkun qed jieħu post bħala segretarju ġe...,se jkun qed segretarju ġenerali tal-Mal...,"Angelo Chetcuti, se jkun qed jieħu post Bjorn ...","Angelo Chetcuti, se jkun qed jieħu post Bjorn ..."
1,macocu.mt.377203,Prose/Lyrical,Poltergeist jirreferi għal fenomeni oħra tal-m...,"Poltergeist refers to other woman's phenomena,...",{'text_id': 'macocu.mt.377203'},Opinion/Argumentation,"[▁Pol, ter, geist, ▁jir, re, feri, ▁g, ħ, al, ...","[9017, 720, 178490, 52826, 107, 26926, 706, 24...",poltergeist jirreferi ghal fenomeni ohra tal-m...,"[▁pol, ter, geist, ▁jir, re, feri, ▁, ghal, ▁f...","Xi - reminixxenti tal-familja, Poltergeist dif...",Poltergeist jirreferi għal fenomeni oħra tal-m...,"jirreferi għal fenomeni oħra tal-mara, spirti...","jirreferi għal fenomeni oħra tal-mara, spirti...",Poltergeist jirreferi għal fenomeni oħra tal-m...,Poltergeist jirreferi għal fenomeni oħra tal-m...
2,macocu.mt.109995,Forum,Chrysler: Brand ta 'lussu jew le? \n\nBrand ji...,Chrysler: Luxury brand or not?\n\nBrand moves ...,{'text_id': 'macocu.mt.109995'},Opinion/Argumentation,"[▁Chrysler, :, ▁Brand, ▁ta, ▁', lus, su, ▁je, ...","[237562, 12, 23243, 308, 242, 5782, 1159, 55, ...",chrysler: brand ta 'lussu jew le? \n\nbrand ji...,"[▁chr, ys, ler, :, ▁brand, ▁ta, ▁', lus, su, ▁...",biex 'jirrispetta. il-klijenti Wara 'lussu ħal...,Chrysler Brand ta lussu jew le Brand jiċċaqlaq...,"ta 'lussu jew le? \n\n jiċċaqlaq mainstream,...",Chrysler: Brand ta 'lussu jew le? \n\nBrand ji...,Chrysler: Brand ta 'lussu jew le? \n\nBrand ji...,Chrysler: Brand ta 'lussu jew le? \n\nBrand ji...
3,macocu.mt.243402,Forum,Kif tkellem lit-tfal dwar id-diżabbiltajiet \n...,How to talk to children about disabilities\n\n...,{'text_id': 'macocu.mt.243402'},Instruction,"[▁Ki, f, ▁t, kel, lem, ▁lit, -, t, fal, ▁d, wa...","[1519, 420, 808, 2590, 6153, 16060, 9, 18, 871...",kif tkellem lit-tfal dwar id-dizabbiltajiet \n...,"[▁ki, f, ▁t, kel, lem, ▁lit, -, t, fal, ▁d, wa...","kuġin simili, tiegħek 'tagħlim hu il-persuna, ...",Kif tkellem lit-tfal dwar id-diżabbiltajiet Għ...,tkellem lit-tfal dwar id-diżabbiltajiet \n\n ...,Kif tkellem lit-tfal dwar id-diżabbiltajiet \n...,Kif tkellem lit-tfal dwar id-diżabbiltajiet \n...,Kif tkellem lit-tfal dwar id-diżabbiltajiet \n...
4,macocu.mt.213859,Forum,Kif tneħħi hangover sewwa u bla perikolu \n\nH...,How to remove a proper and safe hangover\n\nHa...,{'text_id': 'macocu.mt.213859'},Instruction,"[▁Ki, f, ▁, tne, ħ, ħ, i, ▁hang, over, ▁se, w,...","[1519, 420, 6, 23738, 245766, 245766, 14, 1075...",kif tnehhi hangover sewwa u bla perikolu \n\nh...,"[▁ki, f, ▁, tne, h, hi, ▁hang, over, ▁se, w, w...",livelli tagħha jew li għodu fil-vitamini parac...,Kif tneħħi hangover sewwa u bla perikolu Hango...,tneħħi hangover sewwa u bla perikolu \n\n - k...,Kif tneħħi hangover sewwa u bla perikolu \n\nH...,Kif tneħħi hangover sewwa u bla perikolu \n\nH...,Kif tneħħi hangover sewwa u bla perikolu \n\nH...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
77,macocu.mt.501882,News,Il-Ġermanja tikseb rebħa importanti u l-Olanda...,Germany get an important victory and the Nethe...,{'text_id': 'macocu.mt.501882'},News,"[▁Il, -, Ġ, er, man, ja, ▁tik, seb, ▁, reb, ħ,...","[891, 9, 244871, 56, 669, 145, 2126, 27359, 6,...",il-germanja tikseb rebha importanti u l-olanda...,"[▁il, -, ger, man, ja, ▁tik, seb, ▁re, bha, ▁i...",bi l-Belġju fis-76 Minn ALDERWEIRELD minn Grup...,Il-Ġermanja tikseb rebħa importanti u l-Olanda...,tikseb rebħa importanti u tkompli l-perjodu ...,Il-Ġermanja tikseb importanti u l-Olanda tkom...,Il-Ġermanja tikseb rebħa importanti u l-Olanda...,Il-Ġermanja tikseb rebħa importanti u l-Olanda...
78,macocu.mt.395570,News,Attard's AD councillor condemns vandalism of t...,Attard's AD CONCILLOR CONDEMNS VANDALISM OF TR...,{'text_id': 'macocu.mt.395570'},News,"[▁Att, ard, ', s, ▁AD, ▁council, lor, ▁con, de...","[9208, 5861, 25, 7, 19831, 215394, 1484, 158, ...",attard's ad councillor condemns vandalism of t...,"[▁att, ard, ', s, ▁ad, ▁council, lor, ▁con, de...",ġimgħa were to the was Lokali x'qed sar bħala ...,Attards AD councillor condemns vandalism of tr...,councillor condemns vandalism of trees at t...,Attard's AD councillor condemns vandalism of ...,Attard's AD councillor condemns vandalism of t...,Attard's AD councillor condemns vandalism of t...
79,macocu.mt.247552,Instruction,Kif tapplika l-verniċ ġell: tips u tricks \n\n...,How to apply the Varnish Gel: Tips and Tricks\...,{'text_id': 'macocu.mt.247552'},Instruction,"[▁Ki, f, ▁tap, plika, ▁l, -, ver, ni, ċ, ▁, ġ,...","[1519, 420, 10704, 168837, 96, 9, 814, 93, 245...",kif tapplika l-vernic gell: tips u tricks \n\n...,"[▁ki, f, ▁tap, plika, ▁l, -, ver, nic, ▁ge, ll...","dettall il-parti, sigrieti li il-wiċċ, - kapaċ...",Kif tapplika l-verniċ ġell tips u tricks Ġel l...,tapplika l-verniċ ġell: tips u tricks \n\nĠel...,Kif tapplika l-verniċ tips u tricks \n\nĠel l...,Kif tapplika l-verniċ ġell: tips u tricks \n\n...,Kif tapplika l-verniċ ġell: tips u tricks \n\n...
80,macocu.mt.307028,Instruction,Preheat il-forn għal 200 grad. Il-brunġiel sħa...,Preheat the oven to 200 degrees.Whole eggplant...,{'text_id': 'macocu.mt.307028'},Instruction,"[▁Pre, heat, ▁il, -, for, n, ▁g, ħ, al, ▁200, ...","[1914, 156253, 211, 9, 2472, 19, 706, 245766, ...",preheat il-forn ghal 200 grad. il-brungiel sha...,"[▁pre, heat, ▁il, -, for, n, ▁, ghal, ▁200, ▁g...",- mbagħad il-basal u u naqqsu Hekk ħallat. sak...,Preheat il-forn għal 200 grad Il-brunġiel sħaħ...,il-forn għal 200 grad. sħaħ huma minjieri u ...,Preheat il-forn għal 200 grad. Il-brunġiel sħa...,Preheat il-forn għal grad. Il-brunġiel sħaħ h...,il-forn għal 200 grad. Il-brunġiel sħaħ huma ...


In [97]:
df = remove_num(df)

84 84
264 264
512 512
512 512
512 512


Unnamed: 0,text,text_no_num,text_no_num_rand
20,Document 62009CJ0162 \n\nJudgment of the Court (Third Chamber) of 7 October 2010.#Secretary of State for Work and Pensions v Taous Lassal.#Reference for a preliminary ruling: Court of Appeal (England &amp; Wales) (Civil Division) - United Kingdom.#Reference for preliminary ruling - Freedom of movement for persons - Directive 2004/38/EC - Article 16 - Right of permanent residence - Temporal application - Periods completed before the date of transposition.#Case C-162/09. \n\nSentenza tal-Qorti...,Document \n\nJudgment of the Court (Third Chamber) of October of State for Work and Pensions v Taous Lassal.#Reference for a preliminary ruling: Court of Appeal (England &amp; Wales) (Civil Division) - United Kingdom.#Reference for preliminary ruling - Freedom of movement for persons - Directive - Article - Right of permanent residence - Temporal application - Periods completed before the date of transposition.#Case \n\nSentenza tal-Qorti tal-Ġustizzja (it-Tielet Awla) ta' Ottubru Se...,Document 62009CJ0162 \n\nJudgment of the Court (Third Chamber) of October 2010.#Secretary of State for Work and Pensions v Taous Lassal.#Reference for a preliminary ruling: Court Appeal (England &amp; Wales) (Civil Division) - United Kingdom.#Reference for preliminary ruling - Freedom of movement for persons - 2004/38/EC - Article - Right of permanent residence - Temporal application Periods completed before the date of transposition.#Case \n\nSentenza tal-Qorti tal-Ġustizzja (it-Tiele...
26,"Press Release \n\nWhilst referring to the introduction of gay marriage which was pledged by both major political parties in their respective electoral manifestos, the Cana Movement affirms the following. \n\n1. Every person, regardless of his sexual orientation or life preferences should be treated with utmost dignity. \n\n2. Equality before the law should not be a pretext to deny differentiation. Whereas it is the legislator's duty to regulate different forms of relationships, this is not a...","Press Release \n\nWhilst referring to the introduction of gay marriage which was pledged by both major political parties in their respective electoral manifestos, the Cana Movement affirms the following. \n\n Every person, regardless of his sexual orientation or life preferences should be treated with utmost dignity. \n\n Equality before the law should not be a pretext to deny differentiation. Whereas it is the legislator's duty to regulate different forms of relationships, this is not a jus...","Press \n\nWhilst referring to the introduction of gay marriage which was pledged by both major political parties in their respective electoral manifestos, the Cana Movement affirms the following. \n\n1. Every person, regardless of his sexual orientation or life preferences should be treated with utmost dignity. \n\n2. Equality before the law should not be a pretext to deny differentiation. Whereas it is the legislator's duty to regulate different forms of relationships, this is not a justif..."
28,"Plaintiff \n\nDefendant \n\nKeywords \n\nUnfair Contract Terms Directive, Article 2 Unfair Contract Terms Directive, Article 3, 1. Unfair Contract Terms Directive, Article 4, 1. \n\nHeadnote \n\nRefusal of tribunal to take cognizance of submission that conditions of carriage are irregular. \n\nFacts \n\nPlaintiff company made a lawsuit after being surrogated in the rights of various consumers insured with it, claiming damages suffered by the said consumers following non-delivery and /or dama...","Plaintiff \n\nDefendant \n\nKeywords \n\nUnfair Contract Terms Directive, Article Unfair Contract Terms Directive, Article Unfair Contract Terms Directive, Article \n\nHeadnote \n\nRefusal of tribunal to take cognizance of submission that conditions of carriage are irregular. \n\nFacts \n\nPlaintiff company made a lawsuit after being surrogated in the rights of various consumers insured with it, claiming damages suffered by the said consumers following non-delivery and /or damage to lug...","Plaintiff \n\nDefendant \n\nKeywords \n\nUnfair Contract Terms Directive, Article 2 Unfair Contract Terms Directive, Article 3, 1. Unfair Contract Terms Directive, Article 4, 1. \n\nHeadnote \n\nRefusal of tribunal to take of submission that conditions of carriage are irregular. \n\nFacts \n\nPlaintiff company made a lawsuit after being surrogated in the rights of various consumers insured with it, claiming damages suffered by the said consumers following non-delivery /or damage to luggage..."
