# FLAIR BASICS

Check the documentation and tutorials: 

https://github.com/flairNLP/flair/tree/master/resources/docs

Flair is a state of the art neural toolkit to perform sequence labelling and text classification.

The aim of this lab is to learn how to install Flair, understand the intuitions about the character-based contextual word representations and getting familiar with its API, which is built around the Token and Sentence objects.

In [51]:
# !pip install --upgrade git+https://github.com/flairNLP/flair.git
# !pip install --upgrade pip

!pip install flair==0.8

[0m

In [52]:
from flair.data import Sentence
from flair.models import SequenceTagger
from flair.models import TextClassifier

In [53]:
# "use_tokenizer" parameter for tokenizing the input text

sentence = Sentence('Washington University, which is located in Missouri, is named after George Washington.', use_tokenizer=False)
tokenized_sentence = Sentence('Washington University, which is located in Missouri, is named after George Washington.', use_tokenizer=True)

In [54]:
print(sentence)
print(tokenized_sentence)

Sentence: "Washington University, which is located in Missouri, is named after George Washington."   [− Tokens: 12]
Sentence: "Washington University , which is located in Missouri , is named after George Washington ."   [− Tokens: 15]


In [55]:
## get_token() function retrieves the token by index (starting from 1)
print(sentence.get_token(3))

Token: 3 which


In [56]:
## indexes to obtain the tokens (starting from 0)
print(sentence[2])

Token: 3 which


In [57]:
for token in sentence:
  print(token)

Token: 1 Washington
Token: 2 University,
Token: 3 which
Token: 4 is
Token: 5 located
Token: 6 in
Token: 7 Missouri,
Token: 8 is
Token: 9 named
Token: 10 after
Token: 11 George
Token: 12 Washington.


# WORD REPRESENTATIONS

1. Static Word Embeddings (fastText, Glove, etc.)
2. Flair character-based contextual embeddings


In [58]:
from flair.embeddings import WordEmbeddings

# init embedding
en_embedding = WordEmbeddings('glove')

In [59]:
#sentence = Sentence('Washington University, which is located in Missouri, is named after George Washington.')

# Obtain vector-based representation from glove pre-trained model
en_embedding.embed(sentence)

# print the vector representing each word in the sentence
for token in sentence:
    print(token)
    print(token.embedding)

Token: 1 Washington
tensor([-2.2048e-01, -1.1316e-01,  9.4277e-01, -3.9024e-01,  2.5004e-01,
        -4.1651e-01, -1.4640e-01,  2.3628e-03, -1.2966e-01, -1.1173e-01,
        -2.1546e-01, -8.6271e-01,  1.3817e-01,  3.3118e-01, -6.6500e-01,
         3.7134e-01,  2.0050e-01, -3.4055e-01, -1.2422e+00, -7.6653e-01,
        -1.1253e-02,  3.8440e-01, -5.0105e-02, -1.8869e-01,  1.0785e-01,
         1.7502e-01, -1.0167e-01, -5.7925e-01,  2.3529e-01,  3.2626e-02,
         3.2353e-01,  9.7457e-01,  4.5231e-01,  4.9740e-01, -8.8874e-01,
         4.9170e-01,  1.1230e-01, -2.1484e-01,  9.3187e-02,  4.7039e-01,
        -7.8776e-01, -6.8219e-01, -2.3741e-01,  2.2351e-01,  2.0269e-01,
        -1.0166e+00,  1.3095e-01, -2.3654e-01,  3.1501e-01, -3.1880e-01,
         5.9744e-01, -2.8722e-01,  2.9970e-01,  3.4877e-01, -1.6597e-01,
        -2.8483e+00,  3.2219e-01, -7.8469e-01,  1.3754e+00,  1.5050e-01,
        -8.5193e-01,  2.5303e-01,  2.0142e-01, -5.9176e-01,  8.9212e-02,
        -3.5561e-01,  2.6522e-0

In [60]:
# Washington embedding "Washington University"
sentence[0].get_embedding()

tensor([-2.2048e-01, -1.1316e-01,  9.4277e-01, -3.9024e-01,  2.5004e-01,
        -4.1651e-01, -1.4640e-01,  2.3628e-03, -1.2966e-01, -1.1173e-01,
        -2.1546e-01, -8.6271e-01,  1.3817e-01,  3.3118e-01, -6.6500e-01,
         3.7134e-01,  2.0050e-01, -3.4055e-01, -1.2422e+00, -7.6653e-01,
        -1.1253e-02,  3.8440e-01, -5.0105e-02, -1.8869e-01,  1.0785e-01,
         1.7502e-01, -1.0167e-01, -5.7925e-01,  2.3529e-01,  3.2626e-02,
         3.2353e-01,  9.7457e-01,  4.5231e-01,  4.9740e-01, -8.8874e-01,
         4.9170e-01,  1.1230e-01, -2.1484e-01,  9.3187e-02,  4.7039e-01,
        -7.8776e-01, -6.8219e-01, -2.3741e-01,  2.2351e-01,  2.0269e-01,
        -1.0166e+00,  1.3095e-01, -2.3654e-01,  3.1501e-01, -3.1880e-01,
         5.9744e-01, -2.8722e-01,  2.9970e-01,  3.4877e-01, -1.6597e-01,
        -2.8483e+00,  3.2219e-01, -7.8469e-01,  1.3754e+00,  1.5050e-01,
        -8.5193e-01,  2.5303e-01,  2.0142e-01, -5.9176e-01,  8.9212e-02,
        -3.5561e-01,  2.6522e-01,  1.1283e+00, -3.7

In [61]:
# Washington embedding in "George Washington"
sentence[11].get_embedding()

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0.], device='cuda:0')

## ASSIGNMENT 1

In theory, the representations for tokens sentence[0] and sentence[11] should be the same (same glove vector representation).

+ Write code to establish whether the vectors are actually the same. The output should look like the one below.



In [62]:
# TODO: why is this False?
sentence[0].get_embedding() == sentence[11].get_embedding()

#compares each element of the vector and as it is different returns "false".

tensor([False, False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False, False],
       device='cuda:0')

+ TODO: You need to find out why this is the case.
+ TODO: Once you find out, write code to obtain the embeddings again and to establish that they are indeed the same representations (for both occurrences of 'Washington').

In [63]:
# TODO: obtain new representation using the glove embeddings and compare them again.


# TODO: You need to find out why this is the case
#The first embedding is "Washington" but the second is "Washington." . By not tokenizing the phrase you run this risk.

In [64]:
#print tokens
en_embedding.embed(tokenized_sentence)


for token in  tokenized_sentence:
  print (token)

#print vectors

for token in tokenized_sentence:
  print(token.embedding)


tokenized_sentence[0].get_embedding() == tokenized_sentence[13].get_embedding()


Token: 1 Washington
Token: 2 University
Token: 3 ,
Token: 4 which
Token: 5 is
Token: 6 located
Token: 7 in
Token: 8 Missouri
Token: 9 ,
Token: 10 is
Token: 11 named
Token: 12 after
Token: 13 George
Token: 14 Washington
Token: 15 .
tensor([-2.2048e-01, -1.1316e-01,  9.4277e-01, -3.9024e-01,  2.5004e-01,
        -4.1651e-01, -1.4640e-01,  2.3628e-03, -1.2966e-01, -1.1173e-01,
        -2.1546e-01, -8.6271e-01,  1.3817e-01,  3.3118e-01, -6.6500e-01,
         3.7134e-01,  2.0050e-01, -3.4055e-01, -1.2422e+00, -7.6653e-01,
        -1.1253e-02,  3.8440e-01, -5.0105e-02, -1.8869e-01,  1.0785e-01,
         1.7502e-01, -1.0167e-01, -5.7925e-01,  2.3529e-01,  3.2626e-02,
         3.2353e-01,  9.7457e-01,  4.5231e-01,  4.9740e-01, -8.8874e-01,
         4.9170e-01,  1.1230e-01, -2.1484e-01,  9.3187e-02,  4.7039e-01,
        -7.8776e-01, -6.8219e-01, -2.3741e-01,  2.2351e-01,  2.0269e-01,
        -1.0166e+00,  1.3095e-01, -2.3654e-01,  3.1501e-01, -3.1880e-01,
         5.9744e-01, -2.8722e-01,  2.99

tensor([True, True, True, True, True, True, True, True, True, True, True, True,
        True, True, True, True, True, True, True, True, True, True, True, True,
        True, True, True, True, True, True, True, True, True, True, True, True,
        True, True, True, True, True, True, True, True, True, True, True, True,
        True, True, True, True, True, True, True, True, True, True, True, True,
        True, True, True, True, True, True, True, True, True, True, True, True,
        True, True, True, True, True, True, True, True, True, True, True, True,
        True, True, True, True, True, True, True, True, True, True, True, True,
        True, True, True, True], device='cuda:0')

## ASSIGNMENT 2

In this assigment we will show how the representations obtained for both occurrences of 'Washington' are different when they are obtained from Flair contextual-based embeddings.



In [65]:
from flair.embeddings import FlairEmbeddings

# init Flair embedding
flair_embedding_forward = FlairEmbeddings('news-forward')
tokenized_sentence = Sentence('Washington University, which is located in Missouri, is named after George Washington.', use_tokenizer=True)


+ TODO: compare the representations obtained for 'Washington' for both sentences, tokenized and raw.

In [66]:
# TODO compare the Flair embeddings obtained for 'Washington'
flair_embedding_forward.embed(tokenized_sentence)

tokenized_sentence[0].get_embedding() == tokenized_sentence[13].get_embedding()

tensor([False, False, False,  ..., False, False, False], device='cuda:0')

In [67]:
# TODO compare the Flair embeddings obtained for 'Washington'
flair_embedding_forward.embed(sentence)

tokenized_sentence[0].get_embedding() == tokenized_sentence[11].get_embedding()

tensor([False, False, False,  ..., False, False, False], device='cuda:0')



---


# Tagging

Now we will learn how to tag our sentence using Flair pre-trained models for the following tasks:

1. POS tagging
3. Named Entity Recognition
4. Frame Semantics (event detection)
5. Polarity classification

** Check the following link to see the list of available models and languages:**
[Flair Tagging Info](https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_2_TAGGING.md)
---




In [68]:
pos_tagger = SequenceTagger.load('pos')

2022-03-01 18:33:58,078 --------------------------------------------------------------------------------
2022-03-01 18:33:58,080 The model key 'pos' now maps to 'https://huggingface.co/flair/pos-english' on the HuggingFace ModelHub
2022-03-01 18:33:58,084  - The most current version of the model is automatically downloaded from there.
2022-03-01 18:33:58,085  - (you can alternatively manually download the original model at https://nlp.informatik.hu-berlin.de/resources/models/pos/en-pos-ontonotes-v0.5.pt)
2022-03-01 18:33:58,088 --------------------------------------------------------------------------------
2022-03-01 18:33:58,253 loading file /root/.flair/models/pos-english/a9a73f6cd878edce8a0fa518db76f441f1cc49c2525b2b4557af278ec2f0659e.121306ea62993d04cd1978398b68396931a39eb47754c8a06a87f325ea70ac63


In [69]:
pos_tagger.predict(sentence)

In [70]:
for postag in sentence.get_spans('pos'):
  print(postag)

Span [1]: "Washington"   [− Labels: NNP (1.0)]
Span [2]: "University,"   [− Labels: NNP (1.0)]
Span [3]: "which"   [− Labels: WDT (1.0)]
Span [4]: "is"   [− Labels: VBZ (1.0)]
Span [5]: "located"   [− Labels: VBN (0.9999)]
Span [6]: "in"   [− Labels: IN (1.0)]
Span [7]: "Missouri,"   [− Labels: NNP (1.0)]
Span [8]: "is"   [− Labels: VBZ (1.0)]
Span [9]: "named"   [− Labels: VBN (1.0)]
Span [10]: "after"   [− Labels: IN (0.9994)]
Span [11]: "George"   [− Labels: NNP (1.0)]
Span [12]: "Washington."   [− Labels: NNP (1.0)]


In [71]:
print(sentence.to_tagged_string())

Washington <NNP> University, <NNP> which <WDT> is <VBZ> located <VBN> in <IN> Missouri, <NNP> is <VBZ> named <VBN> after <IN> George <NNP> Washington. <NNP>


In [72]:
chunker = SequenceTagger.load('chunk')

2022-03-01 18:33:59,097 --------------------------------------------------------------------------------
2022-03-01 18:33:59,102 The model key 'chunk' now maps to 'https://huggingface.co/flair/chunk-english' on the HuggingFace ModelHub
2022-03-01 18:33:59,108  - The most current version of the model is automatically downloaded from there.
2022-03-01 18:33:59,110  - (you can alternatively manually download the original model at https://nlp.informatik.hu-berlin.de/resources/models/chunk/en-chunk-conll2000-v0.4.pt)
2022-03-01 18:33:59,114 --------------------------------------------------------------------------------
2022-03-01 18:33:59,257 loading file /root/.flair/models/chunk-english/5b53097d6763734ee8ace8de92db67a1ee2528d5df9c6d20ec8e3e6f6470b423.d81b7fd7a38422f2dbf40f6449b1c63d5ae5b959863aa0c2c1ce9116902e8b22


In [73]:
chunker.predict(sentence)

In [74]:
for chunktag in sentence.get_spans('np'):
  print(chunktag)

Span [1,2]: "Washington University,"   [− Labels: NP (0.7409)]
Span [3]: "which"   [− Labels: NP (0.9995)]
Span [4,5]: "is located"   [− Labels: VP (0.8574)]
Span [6]: "in"   [− Labels: PP (1.0)]
Span [7]: "Missouri,"   [− Labels: NP (0.9999)]
Span [8,9]: "is named"   [− Labels: VP (0.9624)]
Span [10]: "after"   [− Labels: PP (0.9981)]
Span [11,12]: "George Washington."   [− Labels: NP (0.798)]


In [75]:
print(sentence.to_tagged_string())

Washington <NNP/B-NP> University, <NNP/E-NP> which <WDT/S-NP> is <VBZ/B-VP> located <VBN/E-VP> in <IN/S-PP> Missouri, <NNP/S-NP> is <VBZ/B-VP> named <VBN/E-VP> after <IN/S-PP> George <NNP/B-NP> Washington. <NNP/E-NP>


In [76]:
ner_tagger = SequenceTagger.load('ner')

2022-03-01 18:33:59,991 --------------------------------------------------------------------------------
2022-03-01 18:33:59,999 The model key 'ner' now maps to 'https://huggingface.co/flair/ner-english' on the HuggingFace ModelHub
2022-03-01 18:34:00,002  - The most current version of the model is automatically downloaded from there.
2022-03-01 18:34:00,008  - (you can alternatively manually download the original model at https://nlp.informatik.hu-berlin.de/resources/models/ner/en-ner-conll03-v0.4.pt)
2022-03-01 18:34:00,009 --------------------------------------------------------------------------------
2022-03-01 18:34:00,162 loading file /root/.flair/models/ner-english/4f4cdab26f24cb98b732b389e6cebc646c36f54cfd6e0b7d3b90b25656e4262f.8baa8ae8795f4df80b28e7f7b61d788ecbb057d1dc85aacb316f1bd02837a4a4


In [77]:
ner_tagger.predict(sentence)

In [78]:
# iterate over entities and print
for entity in sentence.get_spans('ner'):
    print(entity)

Span [1,2]: "Washington University,"   [− Labels: ORG (0.8766)]
Span [7]: "Missouri,"   [− Labels: LOC (0.9987)]
Span [11,12]: "George Washington."   [− Labels: PER (0.9916)]


In [79]:
print(sentence.to_tagged_string())

Washington <NNP/B-NP/B-ORG> University, <NNP/E-NP/E-ORG> which <WDT/S-NP> is <VBZ/B-VP> located <VBN/E-VP> in <IN/S-PP> Missouri, <NNP/S-NP/S-LOC> is <VBZ/B-VP> named <VBN/E-VP> after <IN/S-PP> George <NNP/B-NP/B-PER> Washington. <NNP/E-NP/E-PER>


In [80]:
sem_tagger = SequenceTagger.load('frame')

2022-03-01 18:34:03,370 --------------------------------------------------------------------------------
2022-03-01 18:34:03,375 The model key 'frame' now maps to 'https://huggingface.co/flair/frame-english' on the HuggingFace ModelHub
2022-03-01 18:34:03,381  - The most current version of the model is automatically downloaded from there.
2022-03-01 18:34:03,382  - (you can alternatively manually download the original model at https://nlp.informatik.hu-berlin.de/resources/models/frame/en-frame-ontonotes-v0.4.pt)
2022-03-01 18:34:03,384 --------------------------------------------------------------------------------
2022-03-01 18:34:03,560 loading file /root/.flair/models/frame-english/c397b8bbddf56e35a7d4b64295712a42a1a9b7ccf430dff76d03c8c7e26b9707.fd7786a36026b383ca73a1413c0a29aa1e67551621b805a0d28ca547636353b9


In [81]:
sem_tagger.predict(sentence)

In [82]:
for event in sentence.get_spans('frame'):
  print(event)
print(sentence.to_tagged_string())

Span [4]: "is"   [− Labels: be.03 (0.9955)]
Span [5]: "located"   [− Labels: locate.01 (0.9831)]
Span [8]: "is"   [− Labels: be.03 (0.9987)]
Span [9]: "named"   [− Labels: name.01 (0.7091)]
Washington <NNP/B-NP/B-ORG> University, <NNP/E-NP/E-ORG> which <WDT/S-NP> is <VBZ/B-VP/be.03> located <VBN/E-VP/locate.01> in <IN/S-PP> Missouri, <NNP/S-NP/S-LOC> is <VBZ/B-VP/be.03> named <VBN/E-VP/name.01> after <IN/S-PP> George <NNP/B-NP/B-PER> Washington. <NNP/E-NP/E-PER>


In [83]:
print(sentence.to_dict(tag_type='pos'))

{'text': 'Washington University, which is located in Missouri, is named after George Washington.', 'labels': [], 'entities': [{'text': 'Washington', 'start_pos': 0, 'end_pos': 10, 'labels': [NNP (1.0)]}, {'text': 'University,', 'start_pos': 11, 'end_pos': 22, 'labels': [NNP (1.0)]}, {'text': 'which', 'start_pos': 23, 'end_pos': 28, 'labels': [WDT (1.0)]}, {'text': 'is', 'start_pos': 29, 'end_pos': 31, 'labels': [VBZ (1.0)]}, {'text': 'located', 'start_pos': 32, 'end_pos': 39, 'labels': [VBN (0.9999)]}, {'text': 'in', 'start_pos': 40, 'end_pos': 42, 'labels': [IN (1.0)]}, {'text': 'Missouri,', 'start_pos': 43, 'end_pos': 52, 'labels': [NNP (1.0)]}, {'text': 'is', 'start_pos': 53, 'end_pos': 55, 'labels': [VBZ (1.0)]}, {'text': 'named', 'start_pos': 56, 'end_pos': 61, 'labels': [VBN (1.0)]}, {'text': 'after', 'start_pos': 62, 'end_pos': 67, 'labels': [IN (0.9994)]}, {'text': 'George', 'start_pos': 68, 'end_pos': 74, 'labels': [NNP (1.0)]}, {'text': 'Washington.', 'start_pos': 75, 'end_po

In [84]:
print(sentence.to_dict(tag_type='chunk'))

{'text': 'Washington University, which is located in Missouri, is named after George Washington.', 'labels': [], 'entities': []}


In [85]:
print(sentence.to_dict(tag_type='ner'))

{'text': 'Washington University, which is located in Missouri, is named after George Washington.', 'labels': [], 'entities': [{'text': 'Washington University,', 'start_pos': 0, 'end_pos': 22, 'labels': [ORG (0.8766)]}, {'text': 'Missouri,', 'start_pos': 43, 'end_pos': 52, 'labels': [LOC (0.9987)]}, {'text': 'George Washington.', 'start_pos': 68, 'end_pos': 86, 'labels': [PER (0.9916)]}]}


In [86]:
print(sentence.to_dict(tag_type='frame'))

{'text': 'Washington University, which is located in Missouri, is named after George Washington.', 'labels': [], 'entities': [{'text': 'is', 'start_pos': 29, 'end_pos': 31, 'labels': [be.03 (0.9955)]}, {'text': 'located', 'start_pos': 32, 'end_pos': 39, 'labels': [locate.01 (0.9831)]}, {'text': 'is', 'start_pos': 53, 'end_pos': 55, 'labels': [be.03 (0.9987)]}, {'text': 'named', 'start_pos': 56, 'end_pos': 61, 'labels': [name.01 (0.7091)]}]}


In [87]:
polarity_classifier = TextClassifier.load('en-sentiment')

2022-03-01 18:34:05,409 loading file /root/.flair/models/sentiment-en-mix-distillbert_4.pt


In [88]:
polarity_classifier.predict(sentence)

In [89]:
print(sentence.to_tagged_string())
print(sentence.labels)

Washington <NNP/B-NP/B-ORG> University, <NNP/E-NP/E-ORG> which <WDT/S-NP> is <VBZ/B-VP/be.03> located <VBN/E-VP/locate.01> in <IN/S-PP> Missouri, <NNP/S-NP/S-LOC> is <VBZ/B-VP/be.03> named <VBN/E-VP/name.01> after <IN/S-PP> George <NNP/B-NP/B-PER> Washington. <NNP/E-NP/E-PER>
[POSITIVE (0.9722)]


## ASSIGNMENT 3

Check out the following list of sentences and perform the following tasks using the Flair system and models:

1. Perform POS tagging and Named Entity Recognition on sentences 1-4.
2. Chunking and Frame detection for sentences 5-6.
3. Sentiment Analysis for sentences 7-8.

**Do not repeat the instructions, use the loop structure to annotate and display the annotations of every sentence in one step per task.**

In [90]:
sentence_1 = Sentence('Jackson is placed in Microsoft located in Redmond .')
sentence_2 = Sentence('Redmond is coming to New York city .')
sentence_3 = Sentence('Redmond is coming to New York City .')
sentence_4 = Sentence('Redmond is coming to New York City.')
sentence_5 = Sentence('Redmond returned to New York City to return his hat .')
sentence_6 = Sentence('He had a look at different hats .')
sentence_7 = Sentence('This film hurts.')
sentence_8 = Sentence('It is so bad that I am confused.')

In [91]:
#Put the phrases in a list to be able to use a loop.

all_sentences= [sentence_1,sentence_2,sentence_3,sentence_4,sentence_5,sentence_6,sentence_7,sentence_8]

In [92]:
from flair.models.sequence_tagger_model import MultiTagger

tagger1= MultiTagger.load(["pos","ner"]) #1
tagger2= MultiTagger.load(["chunk","frame"]) #2


2022-03-01 18:34:07,959 --------------------------------------------------------------------------------
2022-03-01 18:34:07,965 The model key 'pos' now maps to 'https://huggingface.co/flair/pos-english' on the HuggingFace ModelHub
2022-03-01 18:34:07,967  - The most current version of the model is automatically downloaded from there.
2022-03-01 18:34:07,969  - (you can alternatively manually download the original model at https://nlp.informatik.hu-berlin.de/resources/models/pos/en-pos-ontonotes-v0.5.pt)
2022-03-01 18:34:07,972 --------------------------------------------------------------------------------
2022-03-01 18:34:08,115 loading file /root/.flair/models/pos-english/a9a73f6cd878edce8a0fa518db76f441f1cc49c2525b2b4557af278ec2f0659e.121306ea62993d04cd1978398b68396931a39eb47754c8a06a87f325ea70ac63
2022-03-01 18:34:08,635 --------------------------------------------------------------------------------
2022-03-01 18:34:08,638 The model key 'ner' now maps to 'https://huggingface.co/f

In [93]:
#1. Perform POS tagging and Named Entity Recognition on sentences 1-4.

for i in range(4):
  tagger1.predict(all_sentences[i])
  print(all_sentences[i].to_tagged_string())


Jackson <NNP/S-PER> is <VBZ> placed <VBN> in <IN> Microsoft <NNP/S-ORG> located <VBN> in <IN> Redmond <NNP/S-LOC> . <.>
Redmond <NNP/S-PER> is <VBZ> coming <VBG> to <IN> New <NNP/B-LOC> York <NNP/E-LOC> city <NN> . <.>
Redmond <NNP/S-PER> is <VBZ> coming <VBG> to <IN> New <NNP/B-LOC> York <NNP/I-LOC> City <NNP/E-LOC> . <.>
Redmond <NNP/S-PER> is <VBZ> coming <VBG> to <IN> New <NNP/B-LOC> York <NNP/I-LOC> City <NNP/E-LOC> . <.>


In [94]:
# 2. Chunking and Frame detection for sentences 5-6.


for i in range(4,6):
  tagger2.predict(all_sentences[i])
  print(all_sentences[i].to_tagged_string())

Redmond <S-NP> returned <S-VP/return.01> to <S-PP> New <B-NP> York <I-NP> City <E-NP> to <B-VP> return <E-VP/return.02> his <B-NP> hat <E-NP> .
He <S-NP> had <S-VP/have.03> a <B-NP> look <E-NP/look.01> at <S-PP> different <B-NP> hats <E-NP> .


In [95]:
# 3. Sentiment Analysis for sentences 7-8.

for i in range(6,8):
  polarity_classifier.predict(all_sentences[i])
  print(all_sentences[i].labels)

[NEGATIVE (0.9999)]
[NEGATIVE (0.9999)]


## ASSIGNMENT 4 (BONUS)

In this task you will be annotating a movie review at document and sentence level.

1. Open the text in this file: '/content/drive/My Drive/Colab Notebooks/2021-ILTAPP/resources/movie-review.txt'
2. Predict Named Entities and Sentiment for the **whole document**.
3. Predict Named Entities and Sentiment for each of the sentences in the document.
> 3.1 Hint: You will need to segment the document at sentence level using the segtok segmenter and store each sentence as a Sentence object. The final result must be a list of Sentence objects. The segtok segmenter is used as in the following code snippet:

```
from segtok.segmenter import split_single
split_single(docText)
```

4. Print both the sentiment classification output and Named Entities.
5. Spot the differences in the annotations when performed at document and at sentence level. Write the difference at the end of this notebook.




In [97]:
from google.colab import drive
drive.mount('/content/drive')

with open("/content/drive/MyDrive/2022-ILTAPP/resources/movie-review.txt", "r") as file:
  data = file.read()

tokenized_text = Sentence(data)

ner_tagger.predict(tokenized_text)
polarity_classifier.predict(tokenized_text)

print(tokenized_text.to_tagged_string)
print(tokenized_text.labels)

listsentences = []
for sent in split_single(data):
  listsentences.append(Sentence(sent))

ner_tagger.predict(listsentences)
polarity_classifier.predict(listsentences)
for item in listsentences:
    print(item.to_tagged_string())
    print(item.labels)



Mounted at /content/drive
<bound method Sentence.to_tagged_string of Sentence: "Once again Mr. Costner has dragged out a movie for far longer than necessary . Aside from the terrific sea rescue sequences , of which there are very few I just did not care about any of the characters . Most of us have ghosts in the closet , and Costner 's character are realized early on , and then forgotten until much later , by which time I did not care . The character we should really care about is a very cocky , overconfident Ashton Kutcher . The problem is he comes off as kid who thinks he 's better than anyone else around him and shows no signs of a cluttered closet . His only obstacle appears to be winning over Costner . Finally when we are well past the half way point of this stinker , Costner tells us all about Kutcher 's ghosts . We are told why Kutcher is driven to be the best with no prior inkling or foreshadowing . No magic here , it was all I could do to keep from turning it off an hour in ."