This follows the tutorial at https://fasttext.cc/docs/en/supervised-tutorial.html.

We build a classifier which automatically classifies stackexchange questions about cooking into one of several possible tags, such as pot, bowl or baking

In [1]:
import fasttext

In [2]:
help(fasttext.FastText)

Help on module fasttext.FastText in fasttext:

NAME
    fasttext.FastText

DESCRIPTION
    # Copyright (c) 2017-present, Facebook, Inc.
    # All rights reserved.
    #
    # This source code is licensed under the MIT license found in the
    # LICENSE file in the root directory of this source tree.

FUNCTIONS
    cbow(*kargs, **kwargs)
    
    eprint(*args, **kwargs)
    
    load_model(path)
        Load a model given a filepath and return a model object.
    
    read_args(arg_list, arg_dict, arg_names, default_values)
    
    skipgram(*kargs, **kwargs)
    
    supervised(*kargs, **kwargs)
    
    tokenize(text)
        Given a string of text, tokenize it and return a list of tokens
    
    train_supervised(*kargs, **kwargs)
        Train a supervised model and return a model object.
        
        input must be a filepath. The input text does not need to be tokenized
        as per the tokenize function, but it must be preprocessed and encoded
        as UTF-8. You might wan

# Getting and preparing the data

In [4]:
# Check out the data
!head data/cooking.stackexchange.txt

__label__sauce __label__cheese How much does potato starch affect a cheese sauce recipe?
__label__food-safety __label__acidity Dangerous pathogens capable of growing in acidic environments
__label__cast-iron __label__stove How do I cover up the white spots on my cast iron stove?
__label__restaurant Michelin Three Star Restaurant; but if the chef is not there
__label__knife-skills __label__dicing Without knife skills, how can I quickly and accurately dice vegetables?
__label__storage-method __label__equipment __label__bread What's the purpose of a bread box?
__label__baking __label__food-safety __label__substitutions __label__peanuts how to seperate peanut oil from roasted peanuts at home?
__label__chocolate American equivalent for British chocolate terms
__label__baking __label__oven __label__convection Fan bake vs bake
__label__sauce __label__storage-lifetime __label__acidity __label__mayonnaise Regulation and balancing of readymade packed mayonnaise and other sauces


In [5]:
!wc data/cooking.stackexchange.txt

   15404  169582 1401900 data/cooking.stackexchange.txt


In [7]:
# Split into train and validation set
!head -n 12404 data/cooking.stackexchange.txt > data/cooking.train
!tail -n 3000 data/cooking.stackexchange.txt > data/cooking.valid

# Our first classifier

In [9]:
# Build the classifier
model = fasttext.train_supervised(input="data/cooking.train")
model

<fasttext.FastText._FastText at 0x118f6d128>

In [10]:
# We can also call save_model to save it as a file and load it later with load_model function.
model.save_model("model_cooking.bin")

In [11]:
# Test the classifier
model.predict("Which baking dish is best to bake a banana bread ?")

(('__label__baking',), array([0.06229293]))

In [12]:
# Test the classifier
model.predict("Why not put knives in the dishwasher?")

(('__label__food-safety',), array([0.06416821]))

In [14]:
# Test on validation data
model.test("data/cooking.valid")

(3000, 0.15, 0.06486954014703762)

The output are the number of samples (here 3000), the precision at one (0.124) and the recall at one (0.0541). 'At one' here means we generate only 1 label for each prediction. Below, k=5 means we generate 5 labels for each prediction.

In [15]:
# Compute the precision at five and recall at five 
model.test("data/cooking.valid", k=5)

(3000, 0.0664, 0.14357791552544327)

In [16]:
model.predict("Why not put knives in the dishwasher?", k=5)

(('__label__food-safety',
  '__label__baking',
  '__label__bread',
  '__label__equipment',
  '__label__substitutions'),
 array([0.06416821, 0.06336509, 0.03612485, 0.03470869, 0.03225474]))

# Making the model better

## pre-processing the data

In [18]:
!cat data/cooking.stackexchange.txt | sed -e "s/\([.\!?,'/()]\)/ \1 /g" | tr "[:upper:]" "[:lower:]" > data/cooking.preprocessed.txt

In [19]:
!head -n 12404 data/cooking.preprocessed.txt > data/cooking.train
!tail -n 3000 data/cooking.preprocessed.txt > data/cooking.valid

In [21]:
model = fasttext.train_supervised(input="data/cooking.train")
model.test("data/cooking.valid")

(3000, 0.17266666666666666, 0.07467204843592332)

We observe that thanks to the pre-processing, the vocabulary is smaller (from 14k words to 9k). The precision is also starting to go up by 4%!

## more epochs and larger learning rate

In [22]:
model = fasttext.train_supervised(input="data/cooking.train", epoch=25)

In [24]:
model.test("data/cooking.valid")

(3000, 0.52, 0.22488107250973044)

This is much better! Another way to change the learning speed of our model is to increase (or decrease) the learning rate of the algorithm. This corresponds to how much the model changes after processing each example. A learning rate of 0 would mean that the model does not change at all, and thus, does not learn anything. Good values of the learning rate are in the range 0.1 - 1.0

In [25]:
model = fasttext.train_supervised(input="data/cooking.train", lr=1.0)
model.test("data/cooking.valid")

(3000, 0.5686666666666667, 0.24592763442410265)

In [26]:
model = fasttext.train_supervised(input="data/cooking.train", lr=1.0, epoch=25)
model.test("data/cooking.valid")

(3000, 0.5786666666666667, 0.25025227043390513)

## word n-grams

In [27]:
model = fasttext.train_supervised(input="data/cooking.train", lr=1.0, epoch=25, wordNgrams=2)
model.test("data/cooking.valid")

(3000, 0.6036666666666667, 0.2610638604584114)

# Scaling things up

In [30]:
model = fasttext.train_supervised(input="data/cooking.train", lr=1.0, epoch=25, wordNgrams=2, bucket=200000, dim=50, loss='hs')
model.test("data/cooking.valid")

(3000, 0.5863333333333334, 0.2535678247080871)

# Multi-label classification

When we want to assign a document to multiple labels, we can still use the softmax loss and play with the parameters for prediction, namely the number of labels to predict and the threshold for the predicted probability. However playing with these arguments can be tricky and unintuitive since the probabilities must sum to 1.

A convenient way to handle multiple labels is to use independent binary classifiers for each label. This can be done with -loss one-vs-all or -loss ova.

In [35]:
model = fasttext.train_supervised(input="data/cooking.train", lr=0.5, epoch=25, wordNgrams=2, bucket=200000, dim=50, loss='ova')
model.test("data/cooking.valid")

(3000, 0.609, 0.26337033299697277)

Now let's have a look on our predictions, we want as many prediction as possible (argument -1) and we want only labels with probability higher or equal to 0.5 :

In [36]:
model.predict("Which baking dish is best to bake a banana bread ?", k=-1, threshold=0.5)

(('__label__baking',
  '__label__bread',
  '__label__equipment',
  '__label__bananas'),
 array([1.00001001, 0.99194801, 0.95397609, 0.86704576]))

In [39]:
model.test("data/cooking.valid", k=-1)

(3000, 0.003146031746031746, 1.0)