### __CE-406771:__ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; _Natural Language Processing, Fall 23._ <br>
### __Homework #3:__ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; _Drug Name Prediction. <br>_
### __Student Information:__ &nbsp; _Mohammad M. Gharaguzlo, 401206836, [My mail](moh.gharaguzlo13@sharif.edu)._ <br>

## _Introduction:_ <br>
As stated in the proposed [homework document](./NLP_HW3.pdf), we are to obtain text embeddings for the provided data sets. <br>
These embeddings will be extracted utilizing FastText and BERT pretrained models. <br>
 The aforementioned sets include drug names and their descriptions. <br>
Finally, using a prediction method which'll use cosine simirality we should find the three most similar drugs to the given input description.

In [None]:
from nltk import download
download("popular")
!pip install hazm

In [None]:
!pip install fasttext

### _Installing Prerequisites and Dependencies:_ <br>
Alongside the common frameworks regarding language processing and text manipulation(Nltk, Hazm, etc... ), we're going to need
[FastText](https://fasttext.cc/docs/en/crawl-vectors.html)
and
[BERT](https://huggingface.co/HooshvareLab/bert-base-parsbert-uncased)
to extract embedding. <br>

FastText is quite a hefty model, it's english and farsi packages together are sized more than 10 gigabytes. So take your time, it'll take more than an hour to download them.

In [17]:
from fasttext.util import download_model

In [None]:
download_model('en', if_exists='ignore')

In [18]:
download_model('fa', if_exists='ignore')

'cc.fa.300.bin'

### _Parsing And Preprocessing:_ <br>
Drugbank includes valuable information for about 9000 drugs. Note that the aforementioned data set actually consists of 15000 drugs with description tag but only 9000 of those descriptions has content in them and the rest are empty. <br>

Initially I wanted to use [Beautifulsoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
to extract the content I needed, however given the size of drugbank file which is more than 1 gigabyte BS4 failed to provide the desired reuslt(it took me about 6 hours and a few crashes to realize BS4 isn't going to work).<br>

Finally I resorted to using
[lxml](https://lxml.de/)'s
etree which solved the problem(somehow...). you can find the parser script in the [drugbank_parser](./drugbank_parser.py) file. <br>


Now that we have the data we need, next step is conduct some preprocessing operations on the descriptions. These include removing stop words, unifying alphabet letters(lowering) and so on.<br>

Obviously farsi pipe line differs from english and I used hazm to conduct the desired preprocessing. <br>

You can find the repective code in [preprocess](./preprocess.py) file. It includes a <em>"PreProcess"</em> class which takes language name as a constructor argument.
This can be <em>fa</em> or <em>eng</em> specifying <em>farsi</em> and <em>english</em> respectively. <br>

<em>"PreProcess"</em> class has two functions wich can be utilized. <em>"process"</em> method takes a sentence and return the processed text and <em>"get_and_save_processed"</em> which takes a list of data frame column names and conducts the preprocess operation on all of it's row. It'll save the reuslts in the specified file name which you'll provide it as an argument.

In [3]:
from preprocess import PreProcess

fa_processor = PreProcess(language="fa")
eng_processor = PreProcess(language="eng")

In [None]:
import pandas as pd

drugbank = pd.read_csv(r"./drugbank_raw.csv")
farsibank = pd.read_csv(r"./farsi_drug_data.csv")


drugbank_processed = eng_processor.get_and_save_processed(data= drugbank,
                                     name_att= "name",
                                     designated_atts=["description", "indication"],
                                     file_name=r"./drugbank_processed.csv")


farsibank_processed = fa_processor.get_and_save_processed(data= farsibank,
                                    name_att= "name_tejary",
                                    designated_atts=["mavared_masraf",
                                                     "avarez_janebi",
                                                     "amozesh",
                                                     "tavajohat"],
                                    file_name=r"./farsibank_processed.csv")

In [None]:
print("Drug Bank Processed Lenght: {} \n".format(len(drugbank_processed)))
drugbank_processed.head(3)

Drug Bank Processed Lenght: 8974 



Unnamed: 0,name,description,indication
0,Lepirudin,lepirudin recombinant hirudin formed 65 amino ...,lepirudin indicated anticoagulation adult pati...
1,Cetuximab,cetuximab recombinant chimeric human/mouse igg...,cetuximab indicated treatment locally regional...
2,Dornase alfa,dornase alfa biosynthetic form human deoxyribu...,used adjunct therapy treatment cystic fibrosis


In [None]:
print("Farsi Bank Processed Lenght: {} \n".format(len(farsibank_processed)))
farsibank_processed.head(3)

Farsi Bank Processed Lenght: 457 



Unnamed: 0,name_tejary,mavared_masraf,avarez_janebi,amozesh,tavajohat
0,Milk Of Magnesia,عنوان آنتی اسید ملین استفاده قرار میگیرد بیما...,اسهال مشکلات قلب اختلالات نوار قلب EEG 1,"دارو همراه مایعات استفاده مصرف, شیشه دارو تکا...",توصیه لیوان آب دارو میل مصرف دراز وابستگی ایجاد
1,Catapres,کنترل فشار خون همراه بیمار های کلیوی درمان مح...,خشکی دهان تهوع استفراغ یبوست نارسایی قلب عصبان...,مصرف همزمان آنتی هیستامین خوددار نوبت دارو خو...,معاینه چشم مرتب انجام ارو تدریج قطع فشار خون ب...
2,Dobutrex,درمان نارسایی حاد قلب شوک قلب عفونی جراح قلب ا...,"حملات آسم کاهش فشار خون حساسیت شدید, تهوع سردر...",صورت تشدید علائم پزشک اطلاع داد#ده محل تزریق ...,مایعات بیمار تامین نوار قلب فشار خون بیمار کنترل


### _Text Representation:_ <br>

As mentioned before, we'll use <em>FastText</em> and <em>BERT</em>
to produces embeddings. Operation is encapsulated inside <em>Embedding</em> class which it's can be found in [produce_embedding](./produce_embedding.py) file. <br>

Similar to the <em>PreProcess</em>. <em>Embedding</em> class too has two functions named as <em>sentence_embedding</em> and <em>get_and_save_embedding</em>, former returns embedding for a single sentence(or whatever kind of text you give it) while the latter takes the whole data frame column and saves the embedding in a pickled format. <br>

I have extracted the requried embedding so there's no need for you to run the following cells.<br>

Just a quick heads up: since the farsi data set descriptions are so brief and consice(unlike drugbank which has quite long descriptions), I decided to produce embrddings for some other attributes as well, these include
<em>mavared_masraf</em>, <em>avarez_janebi</em>, <em>amozesh</em>, <em>tavajohat</em>.



In [None]:
from produce_embedding import Embedding

ft_eng_embedder = Embedding("fast eng")
bert_eng_embedder = Embedding("bert eng")

In [None]:
from produce_embedding import Embedding

ft_fa_embedder = Embedding("fast fa")
bert_fa_embedder = Embedding("bert fa")

In [None]:
drugbank_embedding_ft = ft_eng_embedder.get_and_save_embedding(data= drugbank_processed,
                                       name_att= "name",
                                       designated_atts= ["description"],
                                       file_name= r"./drugbank_embedding.pkl")

In [26]:
farsibank_embedding_ft = ft_fa_embedder.get_and_save_embedding(data= farsibank_processed,
                                    name_att= "name_tejary",
                                    designated_atts=["mavared_masraf",
                                                     "avarez_janebi",
                                                     "amozesh",
                                                     "tavajohat"],
                                    file_name= r"./farsibank_embedding.pkl")

100%|██████████| 457/457 [00:57<00:00,  8.00it/s]


In [None]:
drugbank_embedding_bert = bert_eng_embedder.get_and_save_embedding(data= drugbank_processed,
                                       name_att= "name",
                                       designated_atts= ["description"],
                                       file_name= r"./drugbank_embedding_bert.pkl")

In [27]:
farsibank_embedding_bert = bert_fa_embedder.get_and_save_embedding(data= farsibank_processed,
                                    name_att= "name_tejary",
                                    designated_atts=["mavared_masraf",
                                                     "avarez_janebi",
                                                     "amozesh",
                                                     "tavajohat"],
                                    file_name= r"./farsibank_embedding_bert.pkl")

100%|██████████| 457/457 [03:33<00:00,  2.14it/s]


### _Compraing Embeddings:_ <br>
By default, <em>FastText</em> embeddings are 300 lenghed 1 dimensional vectors.<br>
However, <em>BERT</em> produced embeddings are much lengthier(more than 700).<br>

Note that I didn't fine <em>BERT</em> on my data set as the home work document didn't require it. <br>

In [None]:
import pickle

with open('drugbank_embedding.pkl', 'rb') as f:
  drugbank_embedding_ft = pickle.load(f)

with open('drugbank_embedding_bert.pkl', 'rb') as f:
  drugbank_embedding_bert = pickle.load(f)

In [None]:
print("English FastText embedding for {} length: {}".format(drugbank_embedding_ft.iloc[1]["name"], len(drugbank_embedding_ft.iloc[1]["description"])))
print("English BERT embedding for {} length: {}".format(drugbank_embedding_bert.iloc[1]["name"], len(drugbank_embedding_bert.iloc[1]["description"])))

English FastText embedding for Cetuximab length: 300
English BERT embedding for Cetuximab length: 768


In [29]:
import pickle

with open('farsibank_embedding.pkl', 'rb') as f:
  farsibank_embedding_ft = pickle.load(f)

with open('farsibank_embedding_bert.pkl', 'rb') as f:
  farsibank_embedding_bert = pickle.load(f)

In [None]:
print("Farsi FastText embedding for {} length: {}".format(farsibank_embedding_ft.iloc[1]["name_tejary"], len(farsibank_embedding_ft.iloc[1]["mavared_masraf"])))
print("Farsi BERT embedding for {} length: {}".format(farsibank_embedding_bert.iloc[1]["name_tejary"], len(farsibank_embedding_bert.iloc[1]["mavared_masraf"])))

Farsi FastText embedding for Catapres length: 300
Farsi BERT embedding for Catapres length: 768


### _Prediction:_ <br>
At long last! now that we the embeddings, we can start predicting drug names.<br>

To this end, we'll utilize [Scikit learn](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html)
for finding cosine similarity between the input and pickled embeddings.<br>

This is done by extracting the newly given sentence's representation using <em>Embedding</em>'s <em>sentence_embedding</em> method. Then we calculate cosine similarity between this embedding and those that we previously extracted from our data set. <br>

You can also specify an attribute, the <em>predict</em>
function will use that attribute that to find similarities. <br>

Finally, by sorting the similarities in a decreasing manner the desired output can be extracted.

In [8]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
from tqdm import tqdm


def predict(test_input: str, drug_embeddings: pd.DataFrame, embedder: Embedding, attribute: str) -> None:
  input_embedding = embedder.sentence_embedding(sentence= test_input)
  if "eng" in embedder.model_and_language:
    name = "name"
  else:
    name = "name_tejary"

  input_embedding = input_embedding.reshape(1, -1)
  similarities = []
  for index, row in tqdm(drug_embeddings.iterrows(), total=len(drug_embeddings)):
    drug_name = row[name]
    drug_embedding = row[attribute]
    drug_embedding = np.array(drug_embedding).reshape(1, -1)
    similarity = cosine_similarity(input_embedding, drug_embedding)
    similarities.append((drug_name, similarity[0][0]))

  similarities.sort(key=lambda x: x[1], reverse=True)
  most_similar_drugs = similarities[:3]

  return most_similar_drugs



### _Evaluating English Embeddings:_ <br>



In [None]:
test_drug_1 = "Etanercept"
test_input_1 = "Dimeric fusion protein consisting of the extracellular ligand-binding portion of the human 75 kilodalton (p75) tumor necrosis factor receptor (TNFR) linked to the Fc portion of human IgG1.[L14862,A216522] The Fc component of etanercept contains the CH2 domain, the CH3 domain and hinge region, but not the CH1 domain of IgG1. Etanercept is produced by recombinant DNA technology in a Chinese hamster ovary (CHO) mammalian cell expression system. It consists of 934 amino acids. It is used to treat or manage a variety of inflammatory conditions including rheumatoid arthritis (RA), ankylosing spondylitis (AS), and juvenile idiopathic poly-articular arthritis (JIA)."
test_input_1 = eng_processor.process(test_input_1)
test_input_1_modified ="HELLO I CHANGED THIS protein consisting of the extracellular ligand-binding portion of the human 75 kilodalton (p75) tumor necrosis factor receptor (TNFR) linked to the Fc portion of human IgG1.[L14862,A216522] The Fc component of etanercept contains the CH2 domain, the CH3 domain and hinge region, but not the CH1 domain of IgG1. Etanercept is produced by recombinant DNA technology in a Chinese hamster ovary (CHO) mammalian cell expression system. It consists of 934 amino acids. It is used to treat or manage a variety of inflammatory conditions including rheumatoid arthritis (RA), ankylosing spondylitis (AS), and juvenile idiopathic poly-articular arthritis (JIA)."
test_input_1_modified = eng_processor.process(test_input_1_modified)


test_drug_2 = "Sargramostim"
test_input_2 = "Sargramostim is a human recombinant granulocyte macrophage colony-stimulating factor (GM-CSF) expressed in yeast. It is a glycoprotein that is 127 residues. Substitution of Leu23 leads to a difference from native protein."
test_input_2 = eng_processor.process(test_input_2)
test_input_2_modified = "HELLO I CHANGED THIS is a human recombinant granulocyte macrophage colony-stimulating factor (GM-CSF) expressed in yeast. It is a glycoprotein that is 127 residues. Substitution of Leu23 leads to a difference from native protein."
test_input_2_modified = eng_processor.process(test_input_2_modified)


test_drug_3 = "Amifostine"
test_input_3 = "A phosphorothioate proposed as a radiation-protective agent. It causes splenic vasodilation and may block autonomic ganglia."
test_input_3 = eng_processor.process(test_input_3)
test_input_3_modified = "HELLO I CHANGED THIS proposed as a radiation-protective agent. It causes splenic vasodilation and may block autonomic ganglia."
test_input_3_modified = eng_processor.process(test_input_3_modified)



print("FastText prediction for none modified {} is: {}".format(test_drug_1, predict(test_input_1, drugbank_embedding_ft, ft_eng_embedder, "description")))
print("Bert prediction for none modified {} is: {}".format(test_drug_1, predict(test_input_1, drugbank_embedding_bert, bert_eng_embedder, "description")))
print("-----------------------------------")
print("FastText prediction for modified {} is: {}".format(test_drug_1, predict(test_input_1_modified, drugbank_embedding_ft, ft_eng_embedder, "description")))
print("Bert prediction for modified {} is: {}".format(test_drug_1, predict(test_input_1_modified, drugbank_embedding_bert, bert_eng_embedder, "description")))
print("===================================")



print("FastText prediction for none modified {} is: {}".format(test_drug_2, predict(test_input_2, drugbank_embedding_ft, ft_eng_embedder, "description")))
print("Bert prediction for none modified {} is: {}".format(test_drug_2, predict(test_input_2, drugbank_embedding_bert, bert_eng_embedder, "description")))
print("-----------------------------------")
print("FastText prediction for modified {} is: {}".format(test_drug_2, predict(test_input_2_modified, drugbank_embedding_ft, ft_eng_embedder, "description")))
print("Bert prediction for modified {} is: {}".format(test_drug_2, predict(test_input_2_modified, drugbank_embedding_bert, bert_eng_embedder, "description")))
print("===================================")



print("FastText prediction for none modified {} is: {}".format(test_drug_3, predict(test_input_3, drugbank_embedding_ft, ft_eng_embedder, "description")))
print("Bert prediction for none modified {} is: {}".format(test_drug_3, predict(test_input_3, drugbank_embedding_bert, bert_eng_embedder, "description")))
print("-----------------------------------")
print("FastText prediction for modified {} is: {}".format(test_drug_3, predict(test_input_3_modified, drugbank_embedding_ft, ft_eng_embedder, "description")))
print("Bert prediction for modified {} is: {}".format(test_drug_3, predict(test_input_3_modified, drugbank_embedding_bert, bert_eng_embedder, "description")))

100%|██████████| 8974/8974 [00:04<00:00, 1891.11it/s]


FastText prediction for none modified Etanercept is: [('Etanercept', 0.9999999), ('Abatacept', 0.93334967), ('Belatacept', 0.91063845)]


100%|██████████| 8974/8974 [00:05<00:00, 1755.59it/s]


Bert prediction for none modified Etanercept is: [('Etanercept', 0.9999999), ('Coagulation Factor IX (Recombinant)', 0.907702), ('Reteplase', 0.9063566)]
-----------------------------------


100%|██████████| 8974/8974 [00:05<00:00, 1697.31it/s]


FastText prediction for modified Etanercept is: [('Etanercept', 0.99841696), ('Abatacept', 0.929565), ('Belatacept', 0.9091094)]


100%|██████████| 8974/8974 [00:04<00:00, 1956.82it/s]


Bert prediction for modified Etanercept is: [('Etanercept', 0.9944751), ('Coagulation Factor IX (Recombinant)', 0.91497695), ('Padimate O', 0.910378)]


100%|██████████| 8974/8974 [00:04<00:00, 1893.99it/s]


FastText prediction for none modified Sargramostim is: [('Sargramostim', 1.0), ('Coagulation factor VIIa Recombinant Human', 0.82511944), ('Interferon alfa-2a', 0.81604797)]


100%|██████████| 8974/8974 [00:04<00:00, 1795.09it/s]


Bert prediction for none modified Sargramostim is: [('Sargramostim', 1.0), ('Thrombomodulin Alfa', 0.92537725), ('Regramostim', 0.92339206)]
-----------------------------------


100%|██████████| 8974/8974 [00:04<00:00, 2144.12it/s]


FastText prediction for modified Sargramostim is: [('Sargramostim', 0.97761035), ('Coagulation factor VIIa Recombinant Human', 0.8153826), ('Interferon alfa-2a', 0.814155)]


100%|██████████| 8974/8974 [00:05<00:00, 1721.51it/s]


Bert prediction for modified Sargramostim is: [('Sargramostim', 0.97339237), ('Reteplase', 0.92151475), ('Galsulfase', 0.9207161)]


100%|██████████| 8974/8974 [00:04<00:00, 1964.15it/s]


FastText prediction for none modified Amifostine is: [('Amifostine', 1.0), ('Bencyclane', 0.81368625), ('Gallamine triethiodide', 0.80637664)]


100%|██████████| 8974/8974 [00:04<00:00, 1936.18it/s]


Bert prediction for none modified Amifostine is: [('Amifostine', 0.9999998), ('Mivacurium', 0.9292915), ('Bethanidine', 0.9287053)]
-----------------------------------


100%|██████████| 8974/8974 [00:06<00:00, 1438.01it/s]


FastText prediction for modified Amifostine is: [('Amifostine', 0.9574679), ('Gallamine triethiodide', 0.80024874), ('Bencyclane', 0.7970907)]


100%|██████████| 8974/8974 [00:05<00:00, 1772.64it/s]

Bert prediction for modified Amifostine is: [('Domperidone', 0.90957), ('Amifostine', 0.90950537), ('Dialyzable leukocyte extract', 0.90517956)]





### _Evaluating Farsi Embeddings:_ <br>


In [31]:
farsibank = pd.read_csv(r"./farsi_drug_data.csv")

# For drug_1 we'll use "mavared_masraf"
test_drug_1 = farsibank.loc[160]["name_tejary"]
test_input_1 = farsibank.loc[160]["mavared_masraf"]
test_input_1 = fa_processor.process(test_input_1)
test_input_1_modified = test_drug_1 + " اضافه "
test_input_1_modified = fa_processor.process(test_input_1_modified)


# For drug_2 we'll use "avarez_janebi"
test_drug_2 = farsibank.loc[53]["name_tejary"]
test_input_2 = farsibank.loc[53]["avarez_janebi"]
test_input_2 = fa_processor.process(test_input_2)
test_input_2_modified = test_drug_2 + " اضافه "
test_input_2_modified = fa_processor.process(test_input_2_modified)


#dor drug_3 we'll use "tavajohat"
test_drug_3 = farsibank.loc[380]["name_tejary"]
test_input_3 = farsibank.loc[380]["tavajohat"]
test_input_3 = fa_processor.process(test_input_3)
test_input_3_modified = test_drug_3 + " اضافه "
test_input_3_modified = fa_processor.process(test_input_3_modified)



print("FastText prediction for none modified {} is: {}".format(test_drug_1, predict(test_input_1, farsibank_embedding_ft, ft_fa_embedder, "mavared_masraf")))
print("Bert prediction for none modified {} is: {}".format(test_drug_1, predict(test_input_1, farsibank_embedding_bert, bert_fa_embedder, "mavared_masraf")))
print("-----------------------------------")
print("FastText prediction for modified {} is: {}".format(test_drug_1, predict(test_input_1_modified, farsibank_embedding_ft, ft_fa_embedder, "mavared_masraf")))
print("Bert prediction for modified {} is: {}".format(test_drug_1, predict(test_input_1_modified, farsibank_embedding_bert, bert_fa_embedder, "mavared_masraf")))
print("===================================")



print("FastText prediction for none modified {} is: {}".format(test_drug_2, predict(test_input_2, farsibank_embedding_ft, ft_fa_embedder, "avarez_janebi")))
print("Bert prediction for none modified {} is: {}".format(test_drug_2, predict(test_input_2, farsibank_embedding_bert, bert_fa_embedder, "avarez_janebi")))
print("-----------------------------------")
print("FastText prediction for modifieded {} is: {}".format(test_drug_2, predict(test_input_2_modified, farsibank_embedding_ft, ft_fa_embedder, "avarez_janebi")))
print("BERT prediction for modifieded {} is: {}".format(test_drug_2, predict(test_input_2_modified, farsibank_embedding_bert, bert_fa_embedder, "avarez_janebi")))
print("===================================")



print("FastText prediction for none modified {} is: {}".format(test_drug_3, predict(test_input_3, farsibank_embedding_ft, ft_fa_embedder, "tavajohat")))
print("Bert prediction for none modified {} is: {}".format(test_drug_3, predict(test_input_3, farsibank_embedding_bert, bert_fa_embedder, "tavajohat")))
print("-----------------------------------")
print("FastText prediction for modified {} is: {}".format(test_drug_3, predict(test_input_3_modified, farsibank_embedding_ft, ft_fa_embedder, "tavajohat")))
print("Bert prediction for modified {} is: {}".format(test_drug_3, predict(test_input_3_modified, farsibank_embedding_bert, bert_fa_embedder, "tavajohat")))

100%|██████████| 457/457 [00:00<00:00, 2035.65it/s]


FastText prediction for none modified Salmon is: [('Salmon', 1.0), ('Drionel', 0.85113543), ('Endoxan', 0.81451327)]


100%|██████████| 457/457 [00:00<00:00, 882.62it/s]


Bert prediction for none modified Salmon is: [('Salmon', 1.0), ('Fosamax', 0.87427545), ('Drionel', 0.84084404)]
-----------------------------------


100%|██████████| 457/457 [00:00<00:00, 1863.03it/s]


FastText prediction for modified Salmon is: [('Adecaps', 0.35229343), ('Dextraran', 0.35143468), ('------', 0.34656754)]


100%|██████████| 457/457 [00:00<00:00, 2219.36it/s]


Bert prediction for modified Salmon is: [('Buspar', 0.70213836), ('Capastat', 0.70213836), ('Sinemet', 0.70213836)]


100%|██████████| 457/457 [00:00<00:00, 2337.61it/s]


FastText prediction for none modified Cosmegen is: [('Cosmegen', 1.0000002), ('Cytosar', 0.90485376), ('Paracetamol', 0.8949)]


100%|██████████| 457/457 [00:00<00:00, 1861.37it/s]


Bert prediction for none modified Cosmegen is: [('Cosmegen', 0.99999994), ('Norplant', 0.9344437), ('Laniazid', 0.93442386)]
-----------------------------------


100%|██████████| 457/457 [00:00<00:00, 2114.02it/s]


FastText prediction for modifieded Cosmegen is: [('Celestone', 0.43488458), ('Panthoderm', 0.414363), ('Corisan', 0.414363)]


100%|██████████| 457/457 [00:00<00:00, 1814.01it/s]


BERT prediction for modifieded Cosmegen is: [('Calamox', 0.6796804), ('Murine', 0.6796804), ('Desitin', 0.66405934)]


100%|██████████| 457/457 [00:00<00:00, 1876.39it/s]


FastText prediction for none modified Timoptic is: [('Timoptic', 0.9999999), ('Cyclogyl', 0.81283665), ('Taxotere', 0.7773459)]


100%|██████████| 457/457 [00:00<00:00, 1971.58it/s]


Bert prediction for none modified Timoptic is: [('Timoptic', 1.0000002), ('Adriblastina', 0.8327975), ('Betoptic', 0.82929534)]
-----------------------------------


100%|██████████| 457/457 [00:00<00:00, 1925.13it/s]


FastText prediction for modified Timoptic is: [('Duraphat', 0.49165392), ('Plasil', 0.48066857), ('Acthar', 0.47884744)]


100%|██████████| 457/457 [00:00<00:00, 1998.59it/s]

Bert prediction for modified Timoptic is: [('Topiccy cline', 0.61484325), ('Mentopin', 0.61484325), ('Sorbilax', 0.61484325)]



