## Fasttext tutorial

In [None]:
%%bash
set -x
# Create tutorial directory
mkdir -p /workspace/fasttext-tutorial
cd /workspace/fasttext-tutorial

# Download data
wget https://dl.fbaipublicfiles.com/fasttext/data/cooking.stackexchange.tar.gz
tar xvzf cooking.stackexchange.tar.gz

In [5]:
%%bash
cd /workspace/fasttext-tutorial
wc -l cooking.stackexchange.txt
head -n 5 cooking.stackexchange.txt

15404 cooking.stackexchange.txt
__label__sauce __label__cheese How much does potato starch affect a cheese sauce recipe?
__label__food-safety __label__acidity Dangerous pathogens capable of growing in acidic environments
__label__cast-iron __label__stove How do I cover up the white spots on my cast iron stove?
__label__restaurant Michelin Three Star Restaurant; but if the chef is not there
__label__knife-skills __label__dicing Without knife skills, how can I quickly and accurately dice vegetables?


### Create test and training sets

In [6]:
%%bash
cd /workspace/fasttext-tutorial
head -n -3000 cooking.stackexchange.txt > cooking.train
tail -3000 cooking.stackexchange.txt > cooking.test

### Train classifier

Run training on docs from file `cooking.train` and
output trained model in file `model_cooking.bin`

In [7]:
%%bash
cd /workspace/fasttext-tutorial
fasttext supervised -input cooking.train -output model_cooking

Read 0M words
Number of words:  14543
Number of labels: 735
Progress: 100.0% words/sec/thread:   25868 lr:  0.000000 avg.loss: 10.104228 ETA:   0h 0m 0s


### Ad hoc prediction

In [14]:
%%bash
cd /workspace/fasttext-tutorial
set -x 
echo "How do I make sourdough?" | fasttext predict model_cooking.bin -

+ echo 'How do I make sourdough?'
+ fasttext predict model_cooking.bin -


__label__baking


### Test model

Run predictions on data in `cooking.test`

In [15]:
%%bash
cd /workspace/fasttext-tutorial
fasttext test model_cooking.bin cooking.test

N	3000
P@1	0.14
R@1	0.0605


In one run, the values were:
```
N	3000
P@1	0.14
R@1	0.0605
```

- N = 3000 is the number of documents on which model was run.
- P@1 = 0.14 (Precision at 1 = 14%) is the fraction of times top predicted label is correct one.
- R@1 = .0605 is recall. It is lower as a a document can have multiple labels.


#### Predict more than on label

We can ask fastText to predict more than just the single most likely label for each document. Let’s try asking for the top 5:

In [16]:
%%bash
cd /workspace/fasttext-tutorial
fasttext test model_cooking.bin cooking.test 5

N	3000
P@5	0.0687
R@5	0.149


In one run, the values were:
```
N	3000
P@5	0.0687
R@5	0.149
```

Recall has improved. But now it is predicting labels which do not
match labels in test data, so precision has gone down.

### Tune analyzer

Let's preprocess the input to:
- Convert doc to lowercase
- Separate punctuation with spaces.
This will return in a different tokenization


In [18]:
%%bash
cd /workspace/fasttext-tutorial
set -x
cat cooking.stackexchange.txt | sed -e "s/\([.\!?,'/()]\)/ \1 /g" | tr "[:upper:]" "[:lower:]" > cooking.preprocessed.txt
head -n -3000 cooking.preprocessed.txt > cooking.pp.train
tail -n 3000 cooking.preprocessed.txt > cooking.pp.test
head -n 5 cooking.preprocessed.txt

+ cat cooking.stackexchange.txt
+ sed -e 's/\([.\!?,'\''/()]\)/ \1 /g'
+ tr '[:upper:]' '[:lower:]'
+ head -n -3000 cooking.preprocessed.txt
+ tail -n 3000 cooking.preprocessed.txt
+ head -n 5 cooking.preprocessed.txt


__label__sauce __label__cheese how much does potato starch affect a cheese sauce recipe ? 
__label__food-safety __label__acidity dangerous pathogens capable of growing in acidic environments
__label__cast-iron __label__stove how do i cover up the white spots on my cast iron stove ? 
__label__restaurant michelin three star restaurant; but if the chef is not there
__label__knife-skills __label__dicing without knife skills ,  how can i quickly and accurately dice vegetables ? 


In [19]:
%%bash
cd /workspace/fasttext-tutorial
set -x

fasttext supervised -input cooking.pp.train -output model_cooking.pp
fasttext test model_cooking.pp.bin cooking.pp.test

+ fasttext supervised -input cooking.pp.train -output model_cooking.pp
Read 0M words
Number of words:  8952
Number of labels: 735
Progress: 100.0% words/sec/thread:   27770 lr:  0.000000 avg.loss:  9.869978 ETA:   0h 0m 0s
+ fasttext test model_cooking.pp.bin cooking.pp.test


N	3000
P@1	0.165
R@1	0.0712


### Increase number of epochs

By default, fastText performs 5 epochs. Since we don’t have that much training data, let’s try increasing the number of epochs to 25.

In [20]:
%%bash
cd /workspace/fasttext-tutorial
set -x

fasttext supervised -input cooking.pp.train -output model_cooking.e25 -epoch 25
fasttext test model_cooking.e25.bin cooking.pp.test

+ fasttext supervised -input cooking.pp.train -output model_cooking.e25 -epoch 25
Read 0M words
Number of words:  8952
Number of labels: 735
Progress: 100.0% words/sec/thread:   27926 lr:  0.000000 avg.loss:  7.106458 ETA:   0h 0m 0s
+ fasttext test model_cooking.e25.bin cooking.pp.test


N	3000
P@1	0.517
R@1	0.223


In one run, the output was:
```
N	3000
P@1	0.517
R@1	0.223
```

This is a huge improvement from original P@1 of 14% to 51.7%

### Trying with 100 epochs

In [21]:
%%bash
cd /workspace/fasttext-tutorial
set -x

fasttext supervised -input cooking.pp.train -output model_cooking.e100 -epoch 100
fasttext test model_cooking.e100.bin cooking.pp.test

+ fasttext supervised -input cooking.pp.train -output model_cooking.e100 -epoch 100
Read 0M words
Number of words:  8952
Number of labels: 735
Progress: 100.0% words/sec/thread:   28450 lr:  0.000000 avg.loss:  3.251314 ETA:   0h 0m 0s
+ fasttext test model_cooking.e100.bin cooking.pp.test


N	3000
P@1	0.543
R@1	0.235


P@1 moved from 51.7% to 54.3%, but we’re clearly facing diminishing returns.
Increasing the number of epochs can only get you so far. You’re squeezing
as much signal as you can from the training data by repeatedly iterating
through it, but ultimately you run out of juice.

### Changing the learning rate

The learning rate, a real number that should be set between 0 and 1, determines how quickly the model adjusts to each example it encounters in the training data. A learning rate of 0 would mean that the model doesn’t learn at all, while a learning rate of 1 means that the model reacts strongly to each example. fastText sets the default learning rate to 0.1, but let’s try raising it to 1.0 (and keeping the 25 epochs) to see what happens:

In [22]:
%%bash
cd /workspace/fasttext-tutorial
set -x

fasttext supervised -input cooking.pp.train -output model_cooking.lr10 -lr 1.0 -epoch 25
fasttext test model_cooking.lr10.bin cooking.pp.test

fasttext supervised -input cooking.pp.train -output model_cooking.lr05 -lr 0.5 -epoch 25
fasttext test model_cooking.lr05.bin cooking.pp.test

+ fasttext supervised -input cooking.pp.train -output model_cooking.lr10 -lr 1.0 -epoch 25
Read 0M words
Number of words:  8952
Number of labels: 735
Progress: 100.0% words/sec/thread:   29137 lr:  0.000000 avg.loss:  4.510390 ETA:   0h 0m 0s
+ fasttext test model_cooking.lr10.bin cooking.pp.test


N	3000
P@1	0.589
R@1	0.255


+ fasttext supervised -input cooking.pp.train -output model_cooking.lr05 -lr 0.5 -epoch 25
Read 0M words
Number of words:  8952
Number of labels: 735
Progress: 100.0% words/sec/thread:   29123 lr:  0.000000 avg.loss:  3.606066 ETA:   0h 0m 0s
+ fasttext test model_cooking.lr05.bin cooking.pp.test


N	3000
P@1	0.572
R@1	0.248


Now we’re at ~ 59% precision. Not too shabby! You can experiment with different values of the learning rate and number of epochs to see what happens. Unfortunately, manually tuning parameters like these is a bit of a dark art, which is why it’s better to let a machine do it for you by performing automatic hyperparameter optimization.

### Using bigrams

While fastText normally learns from individual tokens, it can also pay attention to ngrams – that is, sequences of two or more tokens. Bigrams are especially useful for content understanding, since a two-word phrase (e.g., “machine learning”) often denotes a single concept that is not just the sum of the two words.

In [23]:
%%bash
cd /workspace/fasttext-tutorial
set -x

fasttext supervised -input cooking.pp.train -output model_cooking.bigrams -lr 1.0 -epoch 25 -wordNgrams 2
fasttext test model_cooking.bigrams.bin cooking.pp.test

+ fasttext supervised -input cooking.pp.train -output model_cooking.bigrams -lr 1.0 -epoch 25 -wordNgrams 2
Read 0M words
Number of words:  8952
Number of labels: 735
Progress: 100.0% words/sec/thread:   29131 lr:  0.000000 avg.loss:  3.151454 ETA:   0h 0m 0s
+ fasttext test model_cooking.bigrams.bin cooking.pp.test


N	3000
P@1	0.613
R@1	0.265
