# Fasttext Embedding

<!-- ![alternative text](img/embedding.png) -->
<img src="img/embedding.png" width="800">

## Installation

fastText builds on modern Mac OS and Linux distributions. Since it uses C++11 features, it requires a compiler with good C++11 support. <br>
**You will need Python (version 2.7 or ≥ 3.4), NumPy & SciPy and pybind11.**

In [None]:
# !pip install fasttext

## Word Representation

In [3]:
import fasttext

### Training

In [None]:
# Skipgram model :
model = fasttext.train_unsupervised('data.txt', model='skipgram')

# vs

# Cbow model :
model = fasttext.train_unsupervised('data.txt', model='cbow')

In [1]:
meFind = {"FindCADT": "CADT is at BLAH BLAH"}

result = meFind["FindCADT"]

result

'CADT is at BLAH BLAH'

For more info: https://fasttext.cc/docs/en/python-module.html#train_unsupervised-parameters

Continuous Bag of Word(CBOW) vs Skipgram : https://towardsdatascience.com/nlp-101-word2vec-skip-gram-and-cbow-93512ee24314

### Model

The returned **model** object represents your learned model, and you can use it to retrieve information.

In [None]:
print(model.words)   # list of words in dictionary
print(model['king']) # get the vector of the word 'king'

For more info: https://fasttext.cc/docs/en/python-module.html#model-object

### Saving and loading a model object

You can save the trained model by

In [None]:
model.save_model("model_filename.bin")

The saved model are usually save in **.bin** file format

You can load the model later on without training again using

In [None]:
model = fasttext.load_model("model_filename.bin")

### The Tutorial

https://fasttext.cc/docs/en/unsupervised-tutorial.html

### Task

Using the following **hyperparameters** try to train the word representation/embedding model

In [8]:
# embedding_size = 300
# window_size = 5
# min_word = 2
# max_word = 5
# down_sampling = 1e-2
# embedding_epoch=100
# learning_rate = 0.25
# # loss='softmax'
# loss='hs'
# model='skipgram'

In [9]:
# %%time
# model = fasttext.train_unsupervised()

In [10]:
# !head dataset/task.txt

save the model and restart the kernel of the notebook, then load the model and view the similarities between word.