<a href="https://colab.research.google.com/github/SaifAlmaliki/flair-python/blob/main/Flair_python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
pip install flair

**Tokenization**

In [None]:
from flair.data import Sentence
sentence = Sentence('The grass is green.', use_tokenizer=True)
print(sentence)

Sentence: "The grass is green ."   [− Tokens: 5]


In [None]:
# using token id  id=4 ==> green
print(sentence.get_token(4))

# using the index    index=2 ==> is
print(sentence[2])

Token: 4 green
Token: 3 is


In [None]:
# print all tokents
for token in sentence:
  print(token)

Token: 1 The
Token: 2 grass
Token: 3 is
Token: 4 green
Token: 5 .


In [None]:
ar_sentence = Sentence('انا احب لغة بايثون')

# print all tokents
for token in ar_sentence:
  print(token)


Token: 1 انا
Token: 2 احب
Token: 3 لغة
Token: 4 بايثون


**َِAdd Tags to token**

In [None]:
# add a tag to a word in the sentence  (ner): name entity recognition
sentence[3].add_tag('ner', 'color')

# print the sentence with all tags
print(sentence.to_tagged_string())

The grass is green <color> .


In [None]:
from flair.data import Label

# extract other information about tagging
tag: Label = sentence[3].get_tag('ner')

print(f'"{sentence[3]}" is tagged as "{tag.value}" with confidence score "{tag.score}"')

"Token: 4 green" is tagged as "color" with confidence score "1.0"


**NAME ENTITY RECOGNITION (NER)**

In [None]:
from flair.models import SequenceTagger

tagger = SequenceTagger.load('ner')

sentence = Sentence('Goerge Washington went to Washington.')

# predict NER tags
tagger.predict(sentence)

# Print sentence with prdicted tags
print(sentence.to_tagged_string())

for entity in sentence.get_spans('ner'):
    print(entity)

print(sentence.to_dict(tag_type='ner'))

2020-10-08 19:57:25,757 loading file /root/.flair/models/en-ner-conll03-v0.4.pt
Goerge <B-PER> Washington <E-PER> went to Washington <S-LOC> .
Span [1,2]: "Goerge Washington"   [− Labels: PER (0.9053)]
Span [5]: "Washington"   [− Labels: LOC (0.9988)]
{'text': 'Goerge Washington went to Washington.', 'labels': [], 'entities': [{'text': 'Goerge Washington', 'start_pos': 0, 'end_pos': 17, 'labels': [PER (0.9053)]}, {'text': 'Washington', 'start_pos': 26, 'end_pos': 36, 'labels': [LOC (0.9988)]}]}


**Text classification and prediction**

In [None]:
from flair.models import TextClassifier

classifier =  TextClassifier.load('en-sentiment')

sentence = Sentence('This Film hurt. Its so bad that I am confused')

# predict NER tags
classifier.predict(sentence)

# print sentence with predicted label
print(sentence.labels)

2020-10-08 20:01:38,429 https://nlp.informatik.hu-berlin.de/resources/models/sentiment-curated-distilbert/sentiment-en-mix-distillbert_3.1.pt not found in cache, downloading to /tmp/tmprqhvbdqo


100%|██████████| 266147697/266147697 [00:26<00:00, 10026114.77B/s]

2020-10-08 20:02:05,798 copying /tmp/tmprqhvbdqo to cache at /root/.flair/models/sentiment-en-mix-distillbert_3.1.pt





2020-10-08 20:02:06,363 removing temp file /tmp/tmprqhvbdqo
2020-10-08 20:02:06,393 loading file /root/.flair/models/sentiment-en-mix-distillbert_3.1.pt


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=442.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…


[NEGATIVE (0.9999)]


In [None]:
sentence = Sentence('Flair framwork is perfect')

# predict NER tags
classifier.predict(sentence)

# print sentence with predicted label
print(sentence.labels)

[POSITIVE (0.9997)]


**Word Embedding**

In [None]:
from flair.embeddings import WordEmbeddings

glove_embedding = WordEmbeddings('glove')

# create sentence
sentence = Sentence('The grass is green.')

# embed a sentence using glove
glove_embedding.embed(sentence)

# checkout the embedded tokens
for token in sentence:
  print (token)
  print (token.embedding)

**Document Embedding**

In [None]:
from flair.embeddings import WordEmbeddings, DocumentRNNEmbeddings
sentence = Sentence("The grass is green. And the sky is blue")

document_embeddings = DocumentRNNEmbeddings([glove_embedding])

# Embed the sentence with our document embedding
document_embeddings.embed(sentence)

print(sentence.get_embedding())

**Loading Training Data**

In [None]:
import flair.datasets
corpus_dataset = flair.datasets.UD_ENGLISH()

# Print the number of sentences in train set
print("Train set sentences: ", len(corpus_dataset.train), "sentences")

# Print the number of sentences in test set
print("Test set Sentences: ", len(corpus_dataset.test), "Sentences")

# Print the first sentence in trainin set
print(corpus_dataset.test[0])

print(corpus_dataset.test[0].to_tagged_string('pos'))

2020-10-09 17:51:08,370 Reading data from /root/.flair/datasets/ud_english
2020-10-09 17:51:08,371 Train: /root/.flair/datasets/ud_english/en_ewt-ud-train.conllu
2020-10-09 17:51:08,372 Dev: /root/.flair/datasets/ud_english/en_ewt-ud-dev.conllu
2020-10-09 17:51:08,373 Test: /root/.flair/datasets/ud_english/en_ewt-ud-test.conllu
Train set sentences:  12543 sentences
Test set Sentences:  2077 Sentences
Sentence: "What if Google Morphed Into GoogleOS ?"   [− Tokens: 7  − Token-Labels: "What <what/PRON/WP/root/Int> if <if/SCONJ/IN/mark> Google <Google/PROPN/NNP/nsubj/Sing> Morphed <morph/VERB/VBD/advcl/Ind/Past/Fin> Into <into/ADP/IN/case> GoogleOS <GoogleOS/PROPN/NNP/obl/Sing> ? <?/PUNCT/./punct>"]
What <WP> if <IN> Google <NNP> Morphed <VBD> Into <IN> GoogleOS <NNP> ? <.>
