### Word Sense Disambiguity

In [1]:
from nltk import word_tokenize, sent_tokenize
from nltk.corpus import stopwords, state_union, wordnet
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.tokenize import PunktSentenceTokenizer
from nltk.wsd import lesk
import nltk
from nltk.tag.brill import BrillTagger, Pos, Word
from nltk.tbl import Template

Given the following sentences:

    The agent will book the to the show for the entire family.
    But you can generally book tickets online.
    When you book tickets online they provide you with a book of stamps
    
If you could see the above sentences the word book is used in different context. In first two sentences the word book(verb) refers to the meaning 'reserve' while in the second portion of the third sentence book(noun) refers to a physical entity.

## Part - 1

    Use the Lesk Module to find the similar words of the word *book* using the above sentences. Record your observations.
    
## Part - 2

Tag sentences using Brill Tagger.

### Brill Tagger

The BrillTagger class is a **transformation-based tagger**. The BrillTagger class uses a series
of rules to correct the results of an initial tagger. These rules are scored based on how many
errors they correct minus the number of new errors they produce.

The idea is simple Brill Tagger tries to correct the mistake made by the inital tagger. Brill tagger inputs an initial tagger and the templates which autmatically tells to create new rules based on the Training Set.

**Recommended Steps:**

1. Initially tag the sentence using POS Tagger. Then observe the POS tags for the word book in different context
2. Then create a tagged_sentence using the POS Tagger correcting it with the mistakes it made.
3. Now create a Brill Tagger using an initial tagger (POS) and pass templates(rules) to it.
4. Train the Brill Tagger using the Tagged Sentence
5. Test the Brill Tagger on the following sentences:
       > "I bought this book from Kerala"
       > "He will book tickets to Kerala"
       
## Part - 3

    Perform Part-1 again but passing the POS tags produced by the Brill Tagger.
    

## Part1

In [2]:
str1 = 'The agent will book the to the show for the entire family.'
str2 = 'But you can generally book tickets online.'
str3 = 'When you book tickets online they provide you with a book of stamps'

In [3]:
print(lesk(str1, 'book'))
print(lesk(str2, 'book'))
print(lesk(str3, 'book'))

Synset('script.n.01')
Synset('script.n.01')
Synset('script.n.01')


In [19]:
for i in wordnet.synsets('project'):
    print(i, i.definition())

Synset('undertaking.n.01') any piece of work that is undertaken or attempted
Synset('project.n.02') a planned undertaking
Synset('project.v.01') communicate vividly
Synset('stick_out.v.01') extend out or project in space
Synset('project.v.03') transfer (ideas or principles) from one domain into another
Synset('project.v.04') project on a screen
Synset('project.v.05') cause to be heard
Synset('project.v.06') draw a projection of
Synset('plan.v.03') make or work out a plan for; devise
Synset('project.v.08') present for consideration, examination, criticism, etc.
Synset('visualize.v.01') imagine; conceive of; see in one's mind
Synset('project.v.10') put or send forth
Synset('project.v.11') throw, send, or cast forward
Synset('project.v.12') regard as objective


## Part2

In [29]:
sentences = """Please project this on the screen.
This is an interesting project."""

In [30]:
[lesk(s, 'project')for s in sent_tokenize(sentences)]

[Synset('project.v.06'), Synset('project.v.06')]

In [31]:
tagged_sentences = [nltk.pos_tag(word_tokenize(sentence)) for sentence in sent_tokenize(sentences)]

In [32]:
tagged_sentences

[[('Please', 'VB'),
  ('project', 'NN'),
  ('this', 'DT'),
  ('on', 'IN'),
  ('the', 'DT'),
  ('screen', 'NN'),
  ('.', '.')],
 [('This', 'DT'),
  ('is', 'VBZ'),
  ('an', 'DT'),
  ('interesting', 'JJ'),
  ('project', 'NN'),
  ('.', '.')]]

In [33]:
tagger = nltk.data.load('taggers/maxent_treebank_pos_tagger/PY3/english.pickle')

In [34]:
tagged_sent = ' '.join([nltk.tag.tuple2str(j)for i in tagged_sentences for j in i])
tagged_sent

'Please/VB project/NN this/DT on/IN the/DT screen/NN ./. This/DT is/VBZ an/DT interesting/JJ project/NN ./.'

In [35]:
training_sentence = """Please/VB project/VB this/DT on/IN the/DT screen/NN ./. This/DT is/VBZ an/DT interesting/JJ project/NN"""

In [36]:
training_data = [nltk.tag.str2tuple(w)for w in training_sentence.split(' ')]
training_data

[('Please', 'VB'),
 ('project', 'VB'),
 ('this', 'DT'),
 ('on', 'IN'),
 ('the', 'DT'),
 ('screen', 'NN'),
 ('.', '.'),
 ('This', 'DT'),
 ('is', 'VBZ'),
 ('an', 'DT'),
 ('interesting', 'JJ'),
 ('project', 'NN')]

In [37]:
def getTemplates():
    return [
        Template(Pos([-1])),
        Template(Pos([1])),
        Template(Pos([-2])),
        Template(Pos([2])),
        Template(Pos([-2, -1])),
        Template(Pos([1, 2])),
        Template(Pos([-3, -2, -1])),
        Template(Pos([1, 2, 3])),
        Template(Pos([-1]), Pos([1])),
        Template(Word([-1])),
        Template(Word([1])),
        Template(Word([-2])),
        Template(Word([2])),
        Template(Word([-2, -1])),
        Template(Word([1, 2])),
        Template(Word([-3, -2, -1])),
        Template(Word([1, 2, 3])),
        Template(Word([-1]), Word([1])),
    ]


In [56]:
tt = nltk.tag.brill_trainer.BrillTaggerTrainer(tagger, getTemplates())

In [58]:
trained_tagger = tt.train([training_data])

In [59]:
trained_tagger.rules()

(Rule('050', 'NN', 'VB', [(Word([1, 2]),'this')]),)

In [101]:
test_sentence = """The agent will project it tomorrow. Please project it on the big screen. Let's do this project. Project on a screen"""

In [102]:
test_sent_tagged = [trained_tagger.tag(word_tokenize(s)) for s in sent_tokenize(test_sentence)]

## Part 3

In [99]:
brill_tagged_sent = ' '.join([nltk.tag.tuple2str(j)for i in test_sent_tagged for j in i])
brill_tagged_sent

"The/DT agent/NN will/MD project/VB it/PRP tomorrow/NN ./. Please/NN project/NN it/PRP on/IN the/DT big/JJ screen/NN ./. Let/NNP 's/POS do/VBP this/DT project/NN ./. Project/NNP the/DT presesntation/NN on/IN the/DT screen.Project/JJ on/IN a/DT screen/NN"

In [103]:
for i, j in zip(sent_tokenize(test_sentence), ['v', 'v', 'n', 'v']):
    print(lesk(i.lower(), 'project', pos = j))

Synset('project.v.06')
Synset('project.v.06')
Synset('undertaking.n.01')
Synset('project.v.06')


In [87]:
[lesk(i, 'project')for i in sent_tokenize(test_sentence)]

[Synset('project.v.06'),
 Synset('project.v.06'),
 Synset('visualize.v.01'),
 Synset('project.v.06')]

In [88]:
[(i, i.definition())for i in wordnet.synsets('project')]

[(Synset('undertaking.n.01'),
  'any piece of work that is undertaken or attempted'),
 (Synset('project.n.02'), 'a planned undertaking'),
 (Synset('project.v.01'), 'communicate vividly'),
 (Synset('stick_out.v.01'), 'extend out or project in space'),
 (Synset('project.v.03'),
  'transfer (ideas or principles) from one domain into another'),
 (Synset('project.v.04'), 'project on a screen'),
 (Synset('project.v.05'), 'cause to be heard'),
 (Synset('project.v.06'), 'draw a projection of'),
 (Synset('plan.v.03'), 'make or work out a plan for; devise'),
 (Synset('project.v.08'),
  'present for consideration, examination, criticism, etc.'),
 (Synset('visualize.v.01'), "imagine; conceive of; see in one's mind"),
 (Synset('project.v.10'), 'put or send forth'),
 (Synset('project.v.11'), 'throw, send, or cast forward'),
 (Synset('project.v.12'), 'regard as objective')]

In [108]:
lesk(context_sentence="This was projected on the screen", ambiguous_word='project', pos='v').definition()

'draw a projection of'