# fastText

## Fun Facts

- fastText can handle `OOV` better than `word2vec`.
- fastText is often a first choice when you want to train `custom word` embeddings for your domain.
- fastText is a technique (similar to word2vec) as well as a library.

In [None]:
!pip install fasttext



In [None]:
# creating a folder for fasttext model
!mkdir -p fasttext_models

In [None]:
# Download the English bin model
!wget -P fasttext_models https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz

--2024-12-21 06:59:21--  https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 18.238.176.126, 18.238.176.19, 18.238.176.115, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|18.238.176.126|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4503593528 (4.2G) [application/octet-stream]
Saving to: ‘fasttext_models/cc.en.300.bin.gz’


2024-12-21 06:59:48 (161 MB/s) - ‘fasttext_models/cc.en.300.bin.gz’ saved [4503593528/4503593528]



In [None]:
# Unzip the downloaded file
!gunzip fasttext_models/cc.en.300.bin.gz

In [None]:
# downloading the Hindi model
!wget -P fasttext_models https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.hi.300.bin.gz


--2024-12-21 07:02:49--  https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.hi.300.bin.gz
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 18.238.176.115, 18.238.176.44, 18.238.176.19, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|18.238.176.115|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4371554972 (4.1G) [application/octet-stream]
Saving to: ‘fasttext_models/cc.hi.300.bin.gz’


2024-12-21 07:04:02 (57.9 MB/s) - ‘fasttext_models/cc.hi.300.bin.gz’ saved [4371554972/4371554972]



In [None]:
!gunzip fasttext_models/cc.hi.300.bin.gz

## Importing the model

In [None]:
import fasttext

model_en = fasttext.load_model('/content/fasttext_models/cc.en.300.bin')

In [None]:
# checking the methods available

dir(model_en)

['__class__',
 '__contains__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_labels',
 '_words',
 'f',
 'get_analogies',
 'get_dimension',
 'get_input_matrix',
 'get_input_vector',
 'get_label_id',
 'get_labels',
 'get_line',
 'get_meter',
 'get_nearest_neighbors',
 'get_output_matrix',
 'get_sentence_vector',
 'get_subword_id',
 'get_subwords',
 'get_word_id',
 'get_word_vector',
 'get_words',
 'is_quantized',
 'labels',
 'predict',
 'quantize',
 'save_model',
 'set_args',
 'set_matrices',
 'test',
 'test_label',
 'words']

In [None]:
model_en.get_nearest_neighbors("good") # similar to gensim's "most_similar()" method.

[(0.7517593502998352, 'bad'),
 (0.7426098585128784, 'great'),
 (0.7299689054489136, 'decent'),
 (0.7123614549636841, 'nice'),
 (0.6796907186508179, 'Good'),
 (0.6737031936645508, 'excellent'),
 (0.669592022895813, 'goood'),
 (0.6602178812026978, 'ggod'),
 (0.6479219794273376, 'semi-good'),
 (0.6417751908302307, 'good.Good')]

In [None]:
model_en.get_nearest_neighbors("Lion")

[(0.6564886569976807, 'Lion-'),
 (0.6405343413352966, 'LionThe'),
 (0.6277195811271667, 'Leopard'),
 (0.6234227418899536, 'Lion.'),
 (0.6087048053741455, 'lion'),
 (0.5858215093612671, 'Tiger'),
 (0.5717810988426208, 'Lioness'),
 (0.5446290969848633, 'Elephant'),
 (0.5444255471229553, 'Eagle'),
 (0.5406057238578796, 'Lions')]

In [None]:
# obtaining the word vector of an individual words using "get_word_vector()" method
model_en.get_word_vector("good")

array([-0.09213716, -0.0634383 ,  0.00173813,  0.13524324, -0.06561062,
        0.00619071,  0.12609869, -0.01646539,  0.0174491 , -0.00126792,
       -0.09709831,  0.02329333,  0.00996784,  0.00463419,  0.01587938,
        0.00689824,  0.08575399, -0.01988525, -0.0601579 , -0.02327966,
        0.01183712,  0.08217917,  0.01488847,  0.00902181,  0.00696296,
       -0.06426616,  0.03345198, -0.02101481,  0.06767873,  0.03022419,
        0.07203474, -0.05689922, -0.04370377,  0.00642597,  0.0439174 ,
        0.0604848 , -0.00611545, -0.12256738, -0.03530414, -0.02696739,
       -0.02058216,  0.00752347, -0.00686451,  0.0362783 , -0.03308735,
        0.05801626,  0.00832448, -0.06336953, -0.05775082,  0.01089846,
       -0.0925179 ,  0.01559984, -0.04079024,  0.0066871 , -0.06374165,
        0.05881973,  0.07209535, -0.05387195, -0.14658651, -0.04046486,
       -0.02507038, -0.04954465, -0.05224417, -0.06846938,  0.0467079 ,
        0.00459271, -0.07522177,  0.03627685, -0.0698283 ,  0.01

the **size** of this vector is 300.

In [None]:
model_en.get_word_vector("good").shape

(300,)

In [None]:
# finding the analogies with "get_analogies()" method.

model_en.get_analogies("berlin", "germany", "india")

[(0.7148876190185547, 'delhi'),
 (0.6974374055862427, 'mumbai'),
 (0.648612916469574, 'jaipur'),
 (0.6349966526031494, 'kolkata'),
 (0.6279922723770142, 'pune'),
 (0.6277596354484558, 'bangalore'),
 (0.6044078469276428, 'hyderabad'),
 (0.6021745800971985, 'noida'),
 (0.6018899083137512, 'bhubaneswar'),
 (0.599077582359314, 'nashik')]

*It will first understand the relationship between `berlin` and `germany` and then predict the output for ` India`*

In [None]:
model_en.get_analogies("driving", "car", "phone")

[(0.610385537147522, 'texting'),
 (0.5203558802604675, 'phone-calling'),
 (0.5153835415840149, 'cellphone'),
 (0.5135326981544495, 'cell-phone'),
 (0.5117910504341125, 'dialing'),
 (0.5087355971336365, 'texing'),
 (0.5079342722892761, 'text-messaging'),
 (0.500900387763977, 'txting'),
 (0.4960441589355469, 'texting.'),
 (0.4951859414577484, 'Texting')]

In [None]:
model_en.get_analogies("Lion", "carnivorous", "deer")

[(0.5342251658439636, 'Deer'),
 (0.4726506173610687, 'deers'),
 (0.46525630354881287, 'Leopard'),
 (0.46158280968666077, 'Deers'),
 (0.4545937776565552, 'lion'),
 (0.44877374172210693, 'elk'),
 (0.44499215483665466, 'Stag'),
 (0.4286802411079407, 'Lion.'),
 (0.4252222180366516, 'Bear'),
 (0.42179834842681885, 'Lion-')]

## Custom train word embeddings on indian food receipes 😋


In [None]:
import pandas as pd
import numpy as np

df = pd.read_csv('/content/Cleaned_Indian_Food_Dataset.csv')

In [None]:
df.head()

Unnamed: 0,TranslatedRecipeName,TranslatedIngredients,TotalTimeInMins,Cuisine,TranslatedInstructions,URL,Cleaned-Ingredients,image-url,Ingredient-count
0,Masala Karela Recipe,"1 tablespoon Red Chilli powder,3 tablespoon Gr...",45,Indian,"To begin making the Masala Karela Recipe,de-se...",https://www.archanaskitchen.com/masala-karela-...,"salt,amchur (dry mango powder),karela (bitter ...",https://www.archanaskitchen.com/images/archana...,10
1,Spicy Tomato Rice (Recipe),"2 teaspoon cashew - or peanuts, 1/2 Teaspoon ...",15,South Indian Recipes,"To make tomato puliogere, first cut the tomato...",https://www.archanaskitchen.com/spicy-tomato-r...,"tomato,salt,chickpea lentils,green chilli,rice...",https://www.archanaskitchen.com/images/archana...,12
2,Ragi Semiya Upma Recipe - Ragi Millet Vermicel...,"1 Onion - sliced,1 teaspoon White Urad Dal (Sp...",50,South Indian Recipes,"To begin making the Ragi Vermicelli Recipe, fi...",https://www.archanaskitchen.com/ragi-vermicell...,"salt,rice vermicelli noodles (thin),asafoetida...",https://www.archanaskitchen.com/images/archana...,12
3,Gongura Chicken Curry Recipe - Andhra Style Go...,"1/2 teaspoon Turmeric powder (Haldi),1 tablesp...",45,Andhra,To begin making Gongura Chicken Curry Recipe f...,https://www.archanaskitchen.com/gongura-chicke...,"tomato,salt,ginger,sorrel leaves (gongura),fen...",https://www.archanaskitchen.com/images/archana...,15
4,Andhra Style Alam Pachadi Recipe - Adrak Chutn...,"oil - as per use, 1 tablespoon coriander seed...",30,Andhra,"To make Andhra Style Alam Pachadi, first heat ...",https://www.archanaskitchen.com/andhra-style-a...,"tomato,salt,ginger,red chillies,curry,asafoeti...",https://www.archanaskitchen.com/images/archana...,12


In [None]:
df.shape

(5938, 9)

In [None]:
df.columns

Index(['TranslatedRecipeName', 'TranslatedIngredients', 'TotalTimeInMins',
       'Cuisine', 'TranslatedInstructions', 'URL', 'Cleaned-Ingredients',
       'image-url', 'Ingredient-count'],
      dtype='object')

In [None]:
df['TranslatedInstructions'][0]

'To begin making the Masala Karela Recipe,de-seed the karela and slice.\nDo not remove the skin as the skin has all the nutrients.\nAdd the karela to the pressure cooker with 3 tablespoon of water, salt and turmeric powder and pressure cook for three whistles.\nRelease the pressure immediately and open the lids.\nKeep aside.Heat oil in a heavy bottomed pan or a kadhai.\nAdd cumin seeds and let it sizzle.Once the cumin seeds have sizzled, add onions and saute them till it turns golden brown in color.Add the karela, red chilli powder, amchur powder, coriander powder and besan.\nStir to combine the masalas into the karela.Drizzle a little extra oil on the top and mix again.\nCover the pan and simmer Masala Karela stirring occasionally until everything comes together well.\nTurn off the heat.Transfer Masala Karela into a serving bowl and serve.Serve Masala Karela along with Panchmel Dal and Phulka for a weekday meal with your family.\n'

In [None]:
# performing preprocessing with regex.
import re

text = 'To begin making the Masala Karela Recipe,de-seed the karela and slice.\nDo not remove the skin as the skin has all the nutrients.\nAdd the karela to the pressure cooker with 3 tablespoon of water, salt and turmeric powder and pressure cook for three whistles.\nRelease the pressure immediately and open the lids.\nKeep aside.Heat oil in a heavy bottomed pan or a kadhai.\nAdd cumin seeds and let it sizzle.Once the cumin seeds have sizzled, add onions and saute them till it turns golden brown in color.Add the karela, red chilli powder, amchur powder, coriander powder and besan.\nStir to combine the masalas into the karela.Drizzle a little extra oil on the top and mix again.\nCover the pan and simmer Masala Karela stirring occasionally until everything comes together well.\nTurn off the heat.Transfer Masala Karela into a serving bowl and serve.Serve Masala Karela along with Panchmel Dal and Phulka for a weekday meal with your family.\n'

In [None]:
re.sub(r'[^\w\s]', ' ', text, flags=re.MULTILINE)

'To begin making the Masala Karela Recipe de seed the karela and slice \nDo not remove the skin as the skin has all the nutrients \nAdd the karela to the pressure cooker with 3 tablespoon of water  salt and turmeric powder and pressure cook for three whistles \nRelease the pressure immediately and open the lids \nKeep aside Heat oil in a heavy bottomed pan or a kadhai \nAdd cumin seeds and let it sizzle Once the cumin seeds have sizzled  add onions and saute them till it turns golden brown in color Add the karela  red chilli powder  amchur powder  coriander powder and besan \nStir to combine the masalas into the karela Drizzle a little extra oil on the top and mix again \nCover the pan and simmer Masala Karela stirring occasionally until everything comes together well \nTurn off the heat Transfer Masala Karela into a serving bowl and serve Serve Masala Karela along with Panchmel Dal and Phulka for a weekday meal with your family \n'

In [None]:
# removing extra spaces
re.sub(" +", " ", text, flags=re.MULTILINE)

'To begin making the Masala Karela Recipe,de-seed the karela and slice.\nDo not remove the skin as the skin has all the nutrients.\nAdd the karela to the pressure cooker with 3 tablespoon of water, salt and turmeric powder and pressure cook for three whistles.\nRelease the pressure immediately and open the lids.\nKeep aside.Heat oil in a heavy bottomed pan or a kadhai.\nAdd cumin seeds and let it sizzle.Once the cumin seeds have sizzled, add onions and saute them till it turns golden brown in color.Add the karela, red chilli powder, amchur powder, coriander powder and besan.\nStir to combine the masalas into the karela.Drizzle a little extra oil on the top and mix again.\nCover the pan and simmer Masala Karela stirring occasionally until everything comes together well.\nTurn off the heat.Transfer Masala Karela into a serving bowl and serve.Serve Masala Karela along with Panchmel Dal and Phulka for a weekday meal with your family.\n'

In [None]:
# removing \n
re.sub(r'[\n]', ' ', text, flags=re.MULTILINE)

'To begin making the Masala Karela Recipe,de-seed the karela and slice. Do not remove the skin as the skin has all the nutrients. Add the karela to the pressure cooker with 3 tablespoon of water, salt and turmeric powder and pressure cook for three whistles. Release the pressure immediately and open the lids. Keep aside.Heat oil in a heavy bottomed pan or a kadhai. Add cumin seeds and let it sizzle.Once the cumin seeds have sizzled, add onions and saute them till it turns golden brown in color.Add the karela, red chilli powder, amchur powder, coriander powder and besan. Stir to combine the masalas into the karela.Drizzle a little extra oil on the top and mix again. Cover the pan and simmer Masala Karela stirring occasionally until everything comes together well. Turn off the heat.Transfer Masala Karela into a serving bowl and serve.Serve Masala Karela along with Panchmel Dal and Phulka for a weekday meal with your family. '

In [None]:
# combining all
def preprocess(text):
  text = re.sub(r'[^\w\s\']', ' ', text)
  text = re.sub(r'[ \n]+', ' ', text)

  return text.strip().lower()

In [None]:
preprocess(text)

'to begin making the masala karela recipe de seed the karela and slice do not remove the skin as the skin has all the nutrients add the karela to the pressure cooker with 3 tablespoon of water salt and turmeric powder and pressure cook for three whistles release the pressure immediately and open the lids keep aside heat oil in a heavy bottomed pan or a kadhai add cumin seeds and let it sizzle once the cumin seeds have sizzled add onions and saute them till it turns golden brown in color add the karela red chilli powder amchur powder coriander powder and besan stir to combine the masalas into the karela drizzle a little extra oil on the top and mix again cover the pan and simmer masala karela stirring occasionally until everything comes together well turn off the heat transfer masala karela into a serving bowl and serve serve masala karela along with panchmel dal and phulka for a weekday meal with your family'

In [None]:
df.TranslatedInstructions = df.TranslatedInstructions.map(preprocess)

In [None]:
df.head()

Unnamed: 0,TranslatedRecipeName,TranslatedIngredients,TotalTimeInMins,Cuisine,TranslatedInstructions,URL,Cleaned-Ingredients,image-url,Ingredient-count
0,Masala Karela Recipe,"1 tablespoon Red Chilli powder,3 tablespoon Gr...",45,Indian,to begin making the masala karela recipe de se...,https://www.archanaskitchen.com/masala-karela-...,"salt,amchur (dry mango powder),karela (bitter ...",https://www.archanaskitchen.com/images/archana...,10
1,Spicy Tomato Rice (Recipe),"2 teaspoon cashew - or peanuts, 1/2 Teaspoon ...",15,South Indian Recipes,to make tomato puliogere first cut the tomatoe...,https://www.archanaskitchen.com/spicy-tomato-r...,"tomato,salt,chickpea lentils,green chilli,rice...",https://www.archanaskitchen.com/images/archana...,12
2,Ragi Semiya Upma Recipe - Ragi Millet Vermicel...,"1 Onion - sliced,1 teaspoon White Urad Dal (Sp...",50,South Indian Recipes,to begin making the ragi vermicelli recipe fir...,https://www.archanaskitchen.com/ragi-vermicell...,"salt,rice vermicelli noodles (thin),asafoetida...",https://www.archanaskitchen.com/images/archana...,12
3,Gongura Chicken Curry Recipe - Andhra Style Go...,"1/2 teaspoon Turmeric powder (Haldi),1 tablesp...",45,Andhra,to begin making gongura chicken curry recipe f...,https://www.archanaskitchen.com/gongura-chicke...,"tomato,salt,ginger,sorrel leaves (gongura),fen...",https://www.archanaskitchen.com/images/archana...,15
4,Andhra Style Alam Pachadi Recipe - Adrak Chutn...,"oil - as per use, 1 tablespoon coriander seed...",30,Andhra,to make andhra style alam pachadi first heat o...,https://www.archanaskitchen.com/andhra-style-a...,"tomato,salt,ginger,red chillies,curry,asafoeti...",https://www.archanaskitchen.com/images/archana...,12


In [None]:
df.to_csv('food_receipes.txt', columns = ['TranslatedInstructions'], header=None, index=False)

**fasttext uses `skipgram`, which is a unsupervised learning method.**

In [None]:
model = fasttext.train_unsupervised("food_receipes.txt")

After training, fasttext will get the `word vectors`.

In [None]:
# now we can get the correct "nearest_neighbors" for custom words.
model.get_nearest_neighbors("chutney")

[(0.9275704622268677, 'chutneys'),
 (0.7463748455047607, 'dhaniya'),
 (0.7132056951522827, 'imli'),
 (0.7042087316513062, 'khajur'),
 (0.6639349460601807, 'kanchipuram'),
 (0.6590506434440613, 'pudina'),
 (0.6549491286277771, 'gothsu'),
 (0.6544407606124878, 'chammanthi'),
 (0.6525646448135376, 'south'),
 (0.6511055827140808, 'madurai')]

In [None]:
model.get_nearest_neighbors("Halwa")

[(0.9746202230453491, 'halwa'),
 (0.712103545665741, 'khoya'),
 (0.6994524598121643, 'burfi'),
 (0.6804381012916565, 'rabri'),
 (0.6798492670059204, 'sheera'),
 (0.6793428063392639, 'mawa'),
 (0.6720821261405945, 'badam'),
 (0.6708599328994751, 'mohan'),
 (0.6636401414871216, 'kheer'),
 (0.6382583379745483, 'basundi')]

In [None]:
model.get_nearest_neighbors("paneer")

[(0.7046746611595154, 'tikka'),
 (0.6630706191062927, 'tikkas'),
 (0.6622522473335266, 'tandoori'),
 (0.6518504619598389, 'bhurji'),
 (0.6466901302337646, 'reshmi'),
 (0.6369193196296692, 'nawabi'),
 (0.6190375685691833, 'makhanwala'),
 (0.6179590821266174, 'hariyali'),
 (0.6143130660057068, 'makhani'),
 (0.5987952351570129, 'malai')]