## Introduction
Flair NLP consists of Sentences and Tokens to define a statement. 

In [1]:
from flair.data import Sentence
text = "I would like to look for a flight from Miami to Dubai on the 1st of September."

In [2]:
sent = Sentence(text)
print(sent)
print(sent.get_token(4)) 
print(sent[3])
for token in sent:
    print(f'The token is {token}')

Sentence: "I would like to look for a flight from Miami to Dubai on the 1st of September ."   [− Tokens: 18]
Token: 4 to
Token: 4 to
The token is Token: 1 I
The token is Token: 2 would
The token is Token: 3 like
The token is Token: 4 to
The token is Token: 5 look
The token is Token: 6 for
The token is Token: 7 a
The token is Token: 8 flight
The token is Token: 9 from
The token is Token: 10 Miami
The token is Token: 11 to
The token is Token: 12 Dubai
The token is Token: 13 on
The token is Token: 14 the
The token is Token: 15 1st
The token is Token: 16 of
The token is Token: 17 September
The token is Token: 18 .


The token is nothing but the individual words composing  a sentence. Now here we can see another application if you dont want to tokenise the sentence.

In [3]:
sent= Sentence(text,use_tokenizer=False)
print(sent)

Sentence: "I would like to look for a flight from Miami to Dubai on the 1st of September."   [− Tokens: 17]


## Tokenizer

Using external tokenizers such as Spacy and Japanese or writing your own tokenizer

In [4]:
from flair.tokenization import SpacyTokenizer

# init japanese tokenizer
tokenizer = SpacyTokenizer(model='en_core_web_lg')

# make sentence (and tokenize)
sentence = Sentence(text, use_tokenizer=tokenizer)

# output tokenized sentence
print(sentence)

Sentence: "I would like to look for a flight from Miami to Dubai on the 1st of September ."   [− Tokens: 18]


Here we could use the Spacy Tokenizer or we can write our own tokenizer by extending flair.data.Tokenizer. Let us see an example.

In [5]:
text = "My QFFnumber is 8972927292"
from typing import List, Callable, Optional
from flair.data import Sentence, Token, Tokenizer
class MyOwnTokenizer(Tokenizer):
    def tokenize(self, text: str) -> List[Token]:
        text = text.replace("QFF","QFF ")
        words = text.split(" ")
        previous_token = None
        index=1;
        tokens: List[Token] = []
        for word in words:
            if len(word.strip()) == 0:
                continue
            token = Token(
                text=word, start_position=index, whitespace_after=True
            )
            tokens.append(token)
            if (previous_token is not None) and (
                    token.start_pos == previous_token.start_pos + len(previous_token.text)):
                previous_token.whitespace_after = False

            previous_token = token
            index += index+1
        return tokens

    
tokenizer = MyOwnTokenizer()

# make sentence (and tokenize)
sentence = Sentence(text, use_tokenizer=tokenizer)

# output tokenized sentence
print(sentence)

Sentence: "My QFF number is 8972927292"   [− Tokens: 5]


## Labeling Tokens and Sentences
We assign tags to tokens and labels to sentences
Now we are going to add a ner label to a word in the sentence. 

In [6]:
sentence = Sentence(text, use_tokenizer=tokenizer)
sentence[4].add_tag('ner','FFN')
# get the 'ner' tag of the token
tag = sentence[4].get_tag('ner')
# print token
print(f'"{sentence[4]}" is tagged as "{tag.value}" with confidence score "{tag.score}"')

"Token: 5 8972927292" is tagged as "FFN" with confidence score "1.0"


Here we can tag a sentence to a single or multiple labels and these can be of a single type or can be more as well. Here the sentence labels also have a confidence score.

In [7]:
sentence.add_label('topic', 'airline')
sentence.add_label('topic', 'loyalty')
sentence.add_label('language', 'English')
print(sentence)

Sentence: "My QFF number is 8972927292"   [− Tokens: 5  − Token-Labels: "My QFF number is 8972927292 <FFN>"  − Sentence-Labels: {'topic': [airline (1.0), loyalty (1.0)], 'language': [English (1.0)]}]


Again very similar to a tag, here the label also have value and score.

In [8]:
print(sentence.to_plain_string())
for label in sentence.labels:
    print(f' - classified as "{label.value}" with score {label.score}')

MyQFF number is 8972927292
 - classified as "airline" with score 1.0
 - classified as "loyalty" with score 1.0
 - classified as "English" with score 1.0


In [9]:
for label in sentence.get_labels('topic'):
    print(label)

airline (1.0)
loyalty (1.0)


## Working with pretrained models

In [10]:
from flair.models import SequenceTagger

tagger = SequenceTagger.load('ner')

2021-11-19 04:03:27,030 --------------------------------------------------------------------------------
2021-11-19 04:03:27,030 The model key 'ner' now maps to 'https://huggingface.co/flair/ner-english' on the HuggingFace ModelHub
2021-11-19 04:03:27,030  - The most current version of the model is automatically downloaded from there.
2021-11-19 04:03:27,030  - (you can alternatively manually download the original model at https://nlp.informatik.hu-berlin.de/resources/models/ner/en-ner-conll03-v0.4.pt)
2021-11-19 04:03:27,030 --------------------------------------------------------------------------------
2021-11-19 04:03:28,475 loading file C:\Users\A-7651\.flair\models\ner-english\4f4cdab26f24cb98b732b389e6cebc646c36f54cfd6e0b7d3b90b25656e4262f.8baa8ae8795f4df80b28e7f7b61d788ecbb057d1dc85aacb316f1bd02837a4a4


In [11]:
text = "I would like to look for a flight from Miami to Dubai on the 1st of September."
sentence = Sentence(text, use_tokenizer=tokenizer)
tagger.predict(sentence)
print(sentence.to_tagged_string())

I would like to look for a flight from Miami <S-LOC> to Dubai <S-LOC> on the 1st of September.


Now we will try the same with another preloaded model from the Hugging Face Model Hub

In [12]:
tagger = SequenceTagger.load('flair/ner-english-large')
tagger.predict(sentence)
print(sentence.to_tagged_string())

2021-11-19 04:03:32,832 loading file C:\Users\A-7651\.flair\models\ner-english-large\07301f59bb8cb113803be316267f06ddf9243cdbba92a4c8067ef92442d2c574.554244d3476d97501a766a98078421817b14654496b86f2f7bd139dc502a4f29
I would like to look for a flight from Miami <S-LOC> to Dubai <S-LOC> on the 1st of September.


In [13]:
#If the same entity spans across multiple tokens
for entity in sentence.get_spans('ner'):
    print(entity)
print(sentence.to_dict(tag_type='ner'))

Span [10]: "Miami"   [− Labels: LOC (1.0)]
Span [12]: "Dubai"   [− Labels: LOC (1.0)]
{'text': ' I wouldlike   to              look                            for                                                             a                                                                                                                               flight                                                                                                                                                                                                                                                          from                                                                                                                                                                                                                                                                                                                                                                                                     

## Getting tags associated with a sentence

Getting the spans in the sentence and their corresponding entities.

In [14]:
for entity in sentence.get_spans('ner'):
    print(entity)

Span [10]: "Miami"   [− Labels: LOC (1.0)]
Span [12]: "Dubai"   [− Labels: LOC (1.0)]


In [15]:
#TO load  the tagger and ner at the same time.
from flair.models import MultiTagger
tagger = MultiTagger.load(['pos','ner'])
tagger.predict(sentence)
print(sentence)

2021-11-19 04:03:58,639 --------------------------------------------------------------------------------
2021-11-19 04:03:58,640 The model key 'pos' now maps to 'https://huggingface.co/flair/pos-english' on the HuggingFace ModelHub
2021-11-19 04:03:58,642  - The most current version of the model is automatically downloaded from there.
2021-11-19 04:03:58,643  - (you can alternatively manually download the original model at https://nlp.informatik.hu-berlin.de/resources/models/pos/en-pos-ontonotes-v0.5.pt)
2021-11-19 04:03:58,644 --------------------------------------------------------------------------------
2021-11-19 04:04:00,015 loading file C:\Users\A-7651\.flair\models\pos-english\a9a73f6cd878edce8a0fa518db76f441f1cc49c2525b2b4557af278ec2f0659e.121306ea62993d04cd1978398b68396931a39eb47754c8a06a87f325ea70ac63
2021-11-19 04:04:00,631 --------------------------------------------------------------------------------
2021-11-19 04:04:00,631 The model key 'ner' now maps to 'https://huggin

In [16]:
#Tagging a group of sentences
from flair.models import SequenceTagger
from flair.tokenization import SegtokSentenceSplitter

# example text with many sentences
text = "I took the Qantas flight from Dubai. I lost my baggage. My name is Darwin. Can you please help?"

# initialize sentence splitter
splitter = SegtokSentenceSplitter()

# use splitter to split text into list of sentences
sentences = splitter.split(text)

# predict tags for sentences
tagger = SequenceTagger.load('ner')
tagger.predict(sentences,mini_batch_size=10)

# iterate through sentences and print predicted labels
for sentence in sentences:
    print(sentence.to_tagged_string())

2021-11-19 04:04:06,061 --------------------------------------------------------------------------------
2021-11-19 04:04:06,062 The model key 'ner' now maps to 'https://huggingface.co/flair/ner-english' on the HuggingFace ModelHub
2021-11-19 04:04:06,063  - The most current version of the model is automatically downloaded from there.
2021-11-19 04:04:06,064  - (you can alternatively manually download the original model at https://nlp.informatik.hu-berlin.de/resources/models/ner/en-ner-conll03-v0.4.pt)
2021-11-19 04:04:06,064 --------------------------------------------------------------------------------
2021-11-19 04:04:07,433 loading file C:\Users\A-7651\.flair\models\ner-english\4f4cdab26f24cb98b732b389e6cebc646c36f54cfd6e0b7d3b90b25656e4262f.8baa8ae8795f4df80b28e7f7b61d788ecbb057d1dc85aacb316f1bd02837a4a4
I took the Qantas <S-ORG> flight from Dubai <S-LOC> .
I lost my baggage .
My name is Darwin <S-PER> .
Can you please help ?


## Text Classification with pretrained models

Flair provides pretrained models for sentiment and communicative functions. 

In [17]:
from flair.models import TextClassifier

# load tagger
classifier = TextClassifier.load('sentiment')



2021-11-19 04:04:09,773 loading file C:\Users\A-7651\.flair\models\sentiment-en-mix-distillbert_4.pt


In [18]:
# make example sentence
sentence = Sentence("Staff were very friendly but the rooms were not clean")

# call predict
classifier.predict(sentence)

# check prediction
print(sentence)

sentence = Sentence("I thought you would take care of it. Now I will have to do the thing over.")

# call predict
classifier.predict(sentence)

# check prediction
print(sentence)

sentence = Sentence("Can i speak to your manager?")

# call predict
classifier.predict(sentence)

# check prediction
print(sentence)

Sentence: "Staff were very friendly but the rooms were not clean"   [− Tokens: 10  − Sentence-Labels: {'label': [NEGATIVE (0.9976)]}]
Sentence: "I thought you would take care of it . Now I will have to do the thing over ."   [− Tokens: 19  − Sentence-Labels: {'label': [NEGATIVE (0.9995)]}]
Sentence: "Can i speak to your manager ?"   [− Tokens: 7  − Sentence-Labels: {'label': [NEGATIVE (0.7642)]}]


## Zero Shot Classification

Flair gives a zero shot model for classification. Below will give you an example.

In [19]:
from flair.models import TARSClassifier
from flair.data import Sentence

# 1. Load our pre-trained TARS model for English
tars = TARSClassifier.load('tars-base')

# 2. Prepare a test sentence
sentence = Sentence("I am so glad you liked it!")

# 3. Define some classes that you want to predict using descriptive names
classes = ["happy", "sad"]

#4. Predict for these classes
tars.predict_zero_shot(sentence, classes)

# Print sentence with predicted labels
print(sentence)

2021-11-19 04:04:21,436 loading file C:\Users\A-7651\.flair\models\tars-base-v8.pt
{'TREC_6': {'label_dictionary': <flair.data.Dictionary object at 0x0000021395F80190>, 'multi_label': False, 'multi_label_threshold': 0.5, 'label_type': None, 'beta': 1.0}, 'DBPedia': {'label_dictionary': <flair.data.Dictionary object at 0x0000021395F80460>, 'multi_label': False, 'multi_label_threshold': 0.5, 'label_type': None, 'beta': 1.0}, 'AGNews': {'label_dictionary': <flair.data.Dictionary object at 0x00000213889ACF40>, 'multi_label': False, 'multi_label_threshold': 0.5, 'label_type': None, 'beta': 1.0}, 'IMDB': {'label_dictionary': <flair.data.Dictionary object at 0x000002138892D0A0>, 'multi_label': False, 'multi_label_threshold': 0.5, 'label_type': None, 'beta': 1.0}, 'SST': {'label_dictionary': <flair.data.Dictionary object at 0x000002138892D160>, 'multi_label': False, 'multi_label_threshold': 0.5, 'label_type': None, 'beta': 1.0}, 'GO_EMOTIONS': {'label_dictionary': <flair.data.Dictionary object

In [20]:
text=" I can not log into my account, i have forgotten the pin and when i try to use the retrieval system, the text message"
classes = ["profile update","new booking","modify booking"]
sentences = splitter.split(text)


classes = ["forgotten pin","new booking","modify booking"]

tars.predict_zero_shot(sentences, classes)
for sentence in sentences:
    print(sentence)

Sentence: "I can not log into my account , i have forgotten the pin and when i try to use the retrieval system , the text message"   [− Tokens: 26]


In [21]:
sentence=Sentence("I have forgotten my pin and would like to reset it.")
classes = ["forgotten pin","Booking", "Change Book","Covid"]
tars.predict_zero_shot(sentence, classes)
print(sentence)

sentence=Sentence("I want to book a flight from Paris to Dubai")
tars.predict_zero_shot(sentence, classes)
print(sentence)

sentence=Sentence("I want to modify my existing booking")
tars.predict_zero_shot(sentence, classes)
print(sentence)

sentence=Sentence("Due to Covid, will there be any additional checks or restrictions in the airport")
tars.predict_zero_shot(sentence, classes)
print(sentence)

Sentence: "I have forgotten my pin and would like to reset it ."   [− Tokens: 12  − Sentence-Labels: {'Booking-Change Book-forgotten pin-Covid': [Change Book (0.6227), forgotten pin (0.9139)]}]
Sentence: "I want to book a flight from Paris to Dubai"   [− Tokens: 10  − Sentence-Labels: {'Booking-Change Book-forgotten pin-Covid': [Booking (0.8506)]}]
Sentence: "I want to modify my existing booking"   [− Tokens: 7  − Sentence-Labels: {'Booking-Change Book-forgotten pin-Covid': [Booking (0.9189), Change Book (0.5405)]}]
Sentence: "Due to Covid , will there be any additional checks or restrictions in the airport"   [− Tokens: 15  − Sentence-Labels: {'Booking-Change Book-forgotten pin-Covid': [Booking (0.9777), Covid (0.8474)]}]


## Entity Disambuigation

When the same word could mean different in different contexts. Flair has a Frame loader which helps us to disambuigate between entities.


In [22]:
# # load model
# tagger = SequenceTagger.load('frame')

# # make English sentence
# sentence_1 = Sentence('I went to the bank to deposit $100')
# sentence_2 = Sentence('I was sitting on the river bank for some wind.')

# # predict NER tags
# tagger.predict(sentence_1)
# tagger.predict(sentence_2)

# # print sentence with predicted tags
# print(sentence_1.to_tagged_string())
# print(sentence_2.to_tagged_string())

## Word Embeddings - Tokenisation

## Flair supports multiple types of embeddings. 
It deals with both word embeddings and sentence embeddings. 

### Token embeddings
Glove, flair embeddings, word2vec and so on.
Flair token embeddings include contextual information as well there by creating different embeddings for each word.
It also provides option to stack embeddings.
https://github.com/flairNLP/flair/blob/master/resources/docs/embeddings/CLASSIC_WORD_EMBEDDINGS.md - list of all the embeddings supported by Flair. Please note that the it does not support word2vec by default and will need to download gensim vectors and supply.

In [23]:
from flair.embeddings import WordEmbeddings,FlairEmbeddings
from flair.data import Sentence

# init embedding
glove_embedding = WordEmbeddings('glove')

In [24]:
sentence=Sentence('What are the plans for iCargo?')
glove_embedding.embed(sentence)
print('Note that OOV words dont have an embedding')
for token in sentence:
    print(token)
    print(token.embedding)
    print(token.embedding.shape)

Note that OOV words dont have an embedding
Token: 1 What
tensor([-1.5180e-01,  3.8409e-01,  8.9340e-01, -4.2421e-01, -9.2161e-01,
         3.7988e-02, -3.2026e-01,  3.4119e-03,  2.2101e-01, -2.2045e-01,
         1.6661e-01,  2.1956e-01,  2.5325e-01, -2.9267e-01,  1.0171e-01,
        -7.5491e-02, -6.0406e-02,  2.8194e-01, -5.8519e-01,  4.8271e-01,
         1.7504e-02, -1.2086e-01, -1.0990e-01, -6.9554e-01,  1.5600e-01,
         7.0558e-02, -1.5058e-01, -8.1811e-01, -1.8535e-01, -3.6863e-01,
         3.1650e-02,  7.6616e-01,  8.4041e-02,  2.6928e-03, -2.7440e-01,
         2.1815e-01, -3.5157e-02,  3.2569e-01,  1.0032e-01, -6.0932e-01,
        -7.0316e-01,  1.8299e-01,  3.3134e-01, -1.2416e-01, -9.0542e-01,
        -3.9157e-02,  4.4719e-01, -5.7338e-01, -4.0172e-01, -8.2234e-01,
         5.5740e-01,  1.5101e-01,  2.4598e-01,  1.0113e+00, -4.6626e-01,
        -2.7133e+00,  4.3273e-01, -1.6314e-01,  1.5828e+00,  5.5081e-01,
        -2.4738e-01,  1.4184e+00, -1.6867e-02, -1.9368e-01,  1.0090

In [25]:
sentence=Sentence('What are the plans for iCargo?')
flair_embedding = FlairEmbeddings('news-forward')
flair_embedding.embed(sentence)
print('Note that OOV words also have an embedding')
for token in sentence:
    print(token)
    print(token.embedding)
    print(token.embedding.shape)

Note that OOV words also have an embedding
Token: 1 What
tensor([-0.0138, -0.0010,  0.0710,  ..., -0.0033,  0.0016,  0.0274])
torch.Size([2048])
Token: 2 are
tensor([ 0.0003, -0.0217, -0.0197,  ...,  0.0007,  0.0001,  0.0035])
torch.Size([2048])
Token: 3 the
tensor([-0.0039,  0.0006,  0.0103,  ...,  0.0015,  0.0124,  0.0510])
torch.Size([2048])
Token: 4 plans
tensor([-1.2313e-03, -8.0977e-05,  3.0217e-02,  ..., -5.0141e-04,
        -2.7164e-03,  4.7440e-03])
torch.Size([2048])
Token: 5 for
tensor([ 1.4394e-03,  5.1840e-05,  6.1391e-03,  ..., -6.3837e-04,
        -4.0412e-03,  6.9544e-03])
torch.Size([2048])
Token: 6 iCargo
tensor([ 0.0052, -0.0003,  0.0368,  ..., -0.0002,  0.0020,  0.0003])
torch.Size([2048])
Token: 7 ?
tensor([ 6.5264e-04,  1.5800e-05, -7.8801e-02,  ...,  5.5658e-06,
         1.3291e-03,  1.8110e-03])
torch.Size([2048])


Here we can not only combine simple embeddings we can even combine BERT and FLAIR. For using BERT embeddings, we need to include TransformerEmbeddings

In [26]:
from flair.embeddings import StackedEmbeddings

# create a StackedEmbedding object that combines glove and forward/backward flair embeddings
stacked_embeddings = StackedEmbeddings([
                                        glove_embedding,
                                        flair_embedding
                                       ])

In [27]:
sentence=Sentence('What are the plans for iCargo?')
stacked_embeddings.embed(sentence)
print('Note that OOV words also have an embedding. Now note that we are trying to stack two embedding one is 100 \
and other is 2048 and the resulting one is 2148.')
for token in sentence:
    print(token)
    print(token.embedding)
    print(token.embedding[2248:2278])
    print(token.embedding.shape)

Note that OOV words also have an embedding. Now note that we are trying to stack two embedding one is 100 and other is 2048 and the resulting one is 2148.
Token: 1 What
tensor([-0.1518,  0.3841,  0.8934,  ..., -0.0033,  0.0016,  0.0274])
tensor([])
torch.Size([2148])
Token: 2 are
tensor([-5.1533e-01,  8.3186e-01,  2.2457e-01,  ...,  7.1151e-04,
         1.4361e-04,  3.5160e-03])
tensor([])
torch.Size([2148])
Token: 3 the
tensor([-0.0382, -0.2449,  0.7281,  ...,  0.0015,  0.0124,  0.0510])
tensor([])
torch.Size([2148])
Token: 4 plans
tensor([ 0.0481,  0.0155, -0.1667,  ..., -0.0005, -0.0027,  0.0047])
tensor([])
torch.Size([2148])
Token: 5 for
tensor([-0.1440,  0.3255,  0.1426,  ..., -0.0006, -0.0040,  0.0070])
tensor([])
torch.Size([2148])
Token: 6 iCargo
tensor([ 0.0000,  0.0000,  0.0000,  ..., -0.0002,  0.0020,  0.0003])
tensor([])
torch.Size([2148])
Token: 7 ?
tensor([1.6382e-01, 6.0464e-01, 1.0789e+00,  ..., 5.5658e-06, 1.3291e-03,
        1.8110e-03])
tensor([])
torch.Size([2148])

## Document Embeddings
They are four different ways to generate a document embedding in Flair.
They are,
<li> DocumentPoolEmbeddings - average of all word embeddings
<li> DocumentRNNEmbeddings - generate the sentence embedding through a RNN
<li> TransformerDocumentEmbeddings - Transformer based single vector representation. Best for classification tasks
<li> SentenceTransformerDocumentEmbeddings - Transformer based best for vector representation.
    
I am just going to doa  cosine similarity between two vectors to see how well the context is maintained across these sentences.

In [28]:
sent1 = Sentence('I would like to book a ticket from Singapore to Dubai')
sent2= Sentence('I am looking for a flight to Dubai from Sydney')

In [29]:
from flair.embeddings import WordEmbeddings, DocumentPoolEmbeddings
# initialize the word embeddings
glove_embedding = WordEmbeddings('glove')

# initialize the document embeddings, mode = mean
document_embeddings = DocumentPoolEmbeddings([glove_embedding])

In [30]:
document_embeddings.embed(sent1)
document_embeddings.embed(sent2)
from scipy.spatial import distance
print(distance.cosine(sent1.embedding ,sent2.embedding))

0.0411607027053833


Here we can see that when using the DocumentPool embedding it is not able to maintain the meaning that well. Now let us try the others and see.

In [31]:
from flair.embeddings import WordEmbeddings, DocumentRNNEmbeddings
glove_embedding = WordEmbeddings('glove')
document_rnn_embeddings = DocumentRNNEmbeddings([glove_embedding])

In [32]:
sent1 = Sentence('I would like to book a ticket from Singapore to Dubai')
sent2= Sentence('I am looking for a flight to Dubai from Sydney')
document_rnn_embeddings.embed(sent1)
document_rnn_embeddings.embed(sent2)
print(sent1.get_embedding().shape)
print(sent2.get_embedding().shape)
from scipy.spatial import distance
print(distance.cosine(sent1.get_embedding().detach().numpy() ,sent2.get_embedding().detach().numpy()))

torch.Size([128])
torch.Size([128])
0.12523549795150757


Note that this is much better than the Document Pool embedding the fact that it is a RNN model offers us many choices in terms of training the network. Next we will look at the transformer based embeddings. This also uses Hugging Face pretrained models. The complete list of model names can be found here. https://huggingface.co/transformers/pretrained_models.html . In the below code, we will use roberta-base.

In [33]:

from flair.embeddings import TransformerDocumentEmbeddings
# init embedding
transformer_embedding = TransformerDocumentEmbeddings('bert-base-uncased')
sent1 = Sentence('I would like to book a ticket from Singapore to Dubai')
sent2= Sentence('I am looking for a flight to Dubai from Sydney')
transformer_embedding.embed(sent1)
transformer_embedding.embed(sent2)
print(sent1.get_embedding().shape)
print(sent2.get_embedding().shape)
from scipy.spatial import distance
print(distance.cosine(sent1.get_embedding().detach().numpy() ,sent2.get_embedding().detach().numpy()))

torch.Size([768])
torch.Size([768])
0.03231412172317505


In [34]:
from flair.data import Sentence
from flair.embeddings import SentenceTransformerDocumentEmbeddings
# init embedding
embedding = SentenceTransformerDocumentEmbeddings('bert-base-nli-mean-tokens')
sent1 = Sentence('I would like to book a ticket from Singapore to Dubai')
sent2= Sentence('I am looking for a flight to Dubai from Sydney')
embedding.embed(sent1)
embedding.embed(sent2)
print(sent1.get_embedding().shape)
print(sent2.get_embedding().shape)
from scipy.spatial import distance
print(distance.cosine(sent1.get_embedding().detach().numpy() ,sent2.get_embedding().detach().numpy()))

torch.Size([768])
torch.Size([768])
0.19691473245620728


## Loading Corpus and Datasets

In [36]:
import flair.datasets
corpus = flair.datasets.UD_ENGLISH()

2021-11-19 04:07:32,507 https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu not found in cache, downloading to C:\Users\A-7651\AppData\Local\Temp\tmpwk4ez8t6


1737028B [00:07, 230295.72B/s]                                                                                         

2021-11-19 04:07:40,516 copying C:\Users\A-7651\AppData\Local\Temp\tmpwk4ez8t6 to cache at C:\Users\A-7651\.flair\datasets\ud_english\en_ewt-ud-dev.conllu
2021-11-19 04:07:40,525 removing temp file C:\Users\A-7651\AppData\Local\Temp\tmpwk4ez8t6





2021-11-19 04:07:41,391 https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu not found in cache, downloading to C:\Users\A-7651\AppData\Local\Temp\tmp616mz5gf


1738118B [00:06, 266770.50B/s]                                                                                         

2021-11-19 04:07:48,420 copying C:\Users\A-7651\AppData\Local\Temp\tmp616mz5gf to cache at C:\Users\A-7651\.flair\datasets\ud_english\en_ewt-ud-test.conllu
2021-11-19 04:07:48,424 removing temp file C:\Users\A-7651\AppData\Local\Temp\tmp616mz5gf





2021-11-19 04:07:50,257 https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu not found in cache, downloading to C:\Users\A-7651\AppData\Local\Temp\tmpe_i78zpb


13679077B [00:02, 6752527.53B/s]                                                                                       

2021-11-19 04:07:57,060 copying C:\Users\A-7651\AppData\Local\Temp\tmpe_i78zpb to cache at C:\Users\A-7651\.flair\datasets\ud_english\en_ewt-ud-train.conllu
2021-11-19 04:07:57,083 removing temp file C:\Users\A-7651\AppData\Local\Temp\tmpe_i78zpb
2021-11-19 04:07:57,083 Reading data from C:\Users\A-7651\.flair\datasets\ud_english
2021-11-19 04:07:57,083 Train: C:\Users\A-7651\.flair\datasets\ud_english\en_ewt-ud-train.conllu
2021-11-19 04:07:57,093 Dev: C:\Users\A-7651\.flair\datasets\ud_english\en_ewt-ud-dev.conllu
2021-11-19 04:07:57,094 Test: C:\Users\A-7651\.flair\datasets\ud_english\en_ewt-ud-test.conllu





In [37]:
print(len(corpus.train))
print(len(corpus.dev))
print(len(corpus.test))

12543
2001
2077


In [40]:
print(corpus.train[0].to_tagged_string('pos'))
print(corpus.train[0].to_tagged_string('ner'))

Al <NNP> - <HYPH> Zaman <NNP> : <:> American <JJ> forces <NNS> killed <VBD> Shaikh <NNP> Abdullah <NNP> al <NNP> - <HYPH> Ani <NNP> , <,> the <DT> preacher <NN> at <IN> the <DT> mosque <NN> in <IN> the <DT> town <NN> of <IN> Qaim <NNP> , <,> near <IN> the <DT> Syrian <JJ> border <NN> . <.>
Al - Zaman : American forces killed Shaikh Abdullah al - Ani , the preacher at the mosque in the town of Qaim , near the Syrian border .


In [44]:
# create label dictionary for a Universal Part-of-Speech tagging task
upos_dictionary = corpus.make_label_dictionary(label_type='foreign')

# print dictionary
print(upos_dictionary)

2021-11-19 04:21:11,130 Computing label dictionary. Progress:


100%|█████████████████████████████████████████████████████████████████████████| 12543/12543 [00:00<00:00, 21758.81it/s]

2021-11-19 04:21:11,711 Corpus contains the labels: upos (#204580), lemma (#204579), pos (#204579), dependency (#204579), number (#77297), verbform (#35405), prontype (#33571), person (#30486), tense (#20232), mood (#16549), degree (#13941), definite (#13301), case (#12093), numtype (#4448), gender (#4041), poss (#3040), voice (#1206), typo (#466), abbr (#230), extpos (#189), reflex (#100), style (#33), foreign (#19)
2021-11-19 04:21:11,711 Created (for label 'foreign') Dictionary with 1 tags: Yes
Dictionary with 1 tags: Yes





Make Label Dictionary can be used for creating labels in Text Classification tasks as well. We could also use the MultiCorpus object to initiate and pass multiple corpuses.

## Predicting using Flair Zero Shot Model


In [1]:
from flair.models import TARSClassifier
from flair.data import Sentence

# 1. Load our pre-trained TARS model for English
tars = TARSClassifier.load('tars-base')

# 2. Prepare a test sentence
sentence = Sentence("I am so glad you liked it!")

# 3. Define some classes that you want to predict using descriptive names
classes = ["happy", "sad"]

#4. Predict for these classes
tars.predict_zero_shot(sentence, classes)

# Print sentence with predicted labels
print(sentence)

2021-12-02 10:05:23,577 loading file C:\Users\A-7651\.flair\models\tars-base-v8.pt
{'TREC_6': {'label_dictionary': <flair.data.Dictionary object at 0x0000019628DD4F70>, 'multi_label': False, 'multi_label_threshold': 0.5, 'label_type': None, 'beta': 1.0}, 'DBPedia': {'label_dictionary': <flair.data.Dictionary object at 0x0000019628DD4FA0>, 'multi_label': False, 'multi_label_threshold': 0.5, 'label_type': None, 'beta': 1.0}, 'AGNews': {'label_dictionary': <flair.data.Dictionary object at 0x0000019628DFE250>, 'multi_label': False, 'multi_label_threshold': 0.5, 'label_type': None, 'beta': 1.0}, 'IMDB': {'label_dictionary': <flair.data.Dictionary object at 0x0000019628DFE310>, 'multi_label': False, 'multi_label_threshold': 0.5, 'label_type': None, 'beta': 1.0}, 'SST': {'label_dictionary': <flair.data.Dictionary object at 0x0000019628DFE370>, 'multi_label': False, 'multi_label_threshold': 0.5, 'label_type': None, 'beta': 1.0}, 'GO_EMOTIONS': {'label_dictionary': <flair.data.Dictionary object

Now we will try this one of our sentences relevant for travel scenario

In [15]:
# 2. Prepare a test sentence
sentence = Sentence("Please book a cargo shipment for next week from dxb to syd")

# 3. Define some classes that you want to predict using descriptive names
classes = ["Cargo Booking", "Check Rates"]

#4. Predict for these classes
tars.predict_zero_shot(sentence, classes)
print(sentence)

Sentence: "Please book a cargo shipment for next week from dxb to syd"   [− Tokens: 12  − Sentence-Labels: {'Check Rates-Cargo Booking': [Check Rates (0.664), Cargo Booking (0.9632)]}]


In [16]:
from flair.models import SequenceTagger
sentence = Sentence('I love Berlin .')

# load the NER tagger
tagger = SequenceTagger.load('ner')

2021-12-02 10:16:26,512 --------------------------------------------------------------------------------
2021-12-02 10:16:26,515 The model key 'ner' now maps to 'https://huggingface.co/flair/ner-english' on the HuggingFace ModelHub
2021-12-02 10:16:26,515  - The most current version of the model is automatically downloaded from there.
2021-12-02 10:16:26,518  - (you can alternatively manually download the original model at https://nlp.informatik.hu-berlin.de/resources/models/ner/en-ner-conll03-v0.4.pt)
2021-12-02 10:16:26,519 --------------------------------------------------------------------------------
2021-12-02 10:16:28,610 loading file C:\Users\A-7651\.flair\models\ner-english\4f4cdab26f24cb98b732b389e6cebc646c36f54cfd6e0b7d3b90b25656e4262f.8baa8ae8795f4df80b28e7f7b61d788ecbb057d1dc85aacb316f1bd02837a4a4


In [18]:
# run NER over sentence
tagger.predict(sentence)
print('The following NER tags are found:')

# iterate over entities and print
for entity in sentence.get_spans('ner'):
    print(entity)

The following NER tags are found:
Span [3]: "Berlin"   [− Labels: LOC (0.999)]


## Training a new model
This introduces us to Flair.. Now i am going to create a new notebook for training using flair. That will be separate.
