<a href="https://colab.research.google.com/github/ApurbaPaul-NLP/FLAIR-MODELS/blob/main/Prog1_06_09_2022_NLP_Base_Types.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Creating a Sentence**

There are two types of objects that are central to this library, namely the Sentence and Token objects. A Sentence holds a textual sentence and is essentially a list of Token.

Let's start by making a Sentence object for an example sentence.

In [None]:
!pip install flair

In [2]:
from flair.data import Sentence
sent=Sentence('The grass is green.')
print(sent)

Sentence: "The grass is green ."


In [4]:
print(type(sent))

<class 'flair.data.Sentence'>


In [5]:
for i in range(len(sent)):
  print(sent[i])

Token[0]: "The"
Token[1]: "grass"
Token[2]: "is"
Token[3]: "green"
Token[4]: "."


In [7]:
for i in range(len(sent)+1):
  print(sent.get_token(i))

None
Token[0]: "The"
Token[1]: "grass"
Token[2]: "is"
Token[3]: "green"
Token[4]: "."


In [8]:
for i in sent:
  print(i)

Token[0]: "The"
Token[1]: "grass"
Token[2]: "is"
Token[3]: "green"
Token[4]: "."


The above text is automatically tokenized using the lightweight segtok library.

# **Adding Labels**

In Flair, any data point can be labeled. For instance, you can label a word or label a sentence:

**Adding labels to tokens**

You can add a tag by specifying the tag type and the tag value. In this example, we're adding an NER tag of type 'color' to the word 'green'. 

This means that we've tagged this word as an entity of type color.

In [9]:
# add a tag to a word in the sentence
sent[3].set_label('ner', 'color')

# print the sentence (now with this annotation)
print(sent)

Sentence: "The grass is green ." → ["green"/color]


The output indicates that the word "green" in this sentence is labeled as a "color". 

You can also iterate through each token and print it to see if it has labels:

In [10]:
for token in sent:
    print(token)

Token[0]: "The"
Token[1]: "grass"
Token[2]: "is"
Token[3]: "green" → color (1.0)
Token[4]: "."


**Accessing Label information**

Each label is of class Label which next to the value has a score indicating confidence. Print like this:

In [11]:
# get and print token 3 in the sentence
token = sent[3]
print(token)

# get the 'ner' label of the token
label = token.get_label('ner')
# print text and id fields of the token, and the value and score fields of the label
print(f'token.text is: "{token.text}"')
print(f'token.idx is: "{token.idx}"')
print(f'label.value is: "{label.value}"')
print(f'label.score is: "{label.score}"')

Token[3]: "green" → color (1.0)
token.text is: "green"
token.idx is: "4"
label.value is: "color"
label.score is: "1.0"


Our color tag has a score of 1.0 since we manually added it. If a tag is predicted by our sequence labeler, the score value will indicate classifier confidence.

**Multiple labels**

In [13]:
# add a tag to a word in the sentence
sent[3].set_label('ner', 'color')
sent[3].add_label('ner','person')

# print the sentence (now with this annotation)
print(sent)

Sentence: "The grass is green ." → ["green"/color/person]


# **Adding labels to sentences**

You can also add a Label to a whole Sentence. For instance, the example below shows how we add the label 'sports' to a sentence, thereby labeling it as belonging to the sports "topic".

In [15]:
sentence = Sentence('France is the current world cup winner.')

# add a label to a sentence
sentence.add_label('topic', 'sports')

print(sentence)



Sentence: "France is the current world cup winner ." → sports (1.0)


In [16]:
# Alternatively, you can also create a sentence with label in one line
sentence = Sentence('France is the current world cup winner.').add_label('topic', 'sports')

print(sentence)

Sentence: "France is the current world cup winner ." → sports (1.0)


**Multiple labels to Sentences**

In [17]:
sentence = Sentence('France is the current world cup winner.')

# this sentence has multiple topic labels
sentence.add_label('topic', 'sports')
sentence.add_label('topic', 'soccer')
sentence

Sentence: "France is the current world cup winner ." → sports (1.0); soccer (1.0)

In [18]:
sentence = Sentence('France is the current world cup winner.')

# this sentence has multiple "topic" labels
sentence.add_label('topic', 'sports')
sentence.add_label('topic', 'soccer')

# this sentence has a "language" label
sentence.add_label('language', 'English')

print(sentence)

Sentence: "France is the current world cup winner ." → sports (1.0); soccer (1.0); English (1.0)


**Accessing a sentence's labels**

In [19]:
for label in sentence.labels:
    print(label)

Sentence: "France is the current world cup winner ." → sports (1.0)
Sentence: "France is the current world cup winner ." → soccer (1.0)
Sentence: "France is the current world cup winner ." → English (1.0)


In [20]:
print(sentence.to_plain_string())
for label in sentence.labels:
    print(f' - classified as "{label.value}" with score {label.score}')

France is the current world cup winner.
 - classified as "sports" with score 1.0
 - classified as "soccer" with score 1.0
 - classified as "English" with score 1.0


In [21]:
for label in sentence.get_labels('topic'):
    print(label)

Sentence: "France is the current world cup winner ." → sports (1.0)
Sentence: "France is the current world cup winner ." → soccer (1.0)
