### Word Sense Disambiguity

##### Given the following sentences:
    Project is very difficult for completion in 5 days.
    The project should have proper documentation.
    Please project the document on the screen.
    Project this project in the workshop

In [1]:
import nltk
from nltk.tokenize import word_tokenize,sent_tokenize
from nltk.wsd import lesk
from nltk.corpus import wordnet as wn
from nltk.tag import BrillTaggerTrainer
from nltk.tag.brill import *


In [2]:
sent1 = 'Project is very difficult for completion in 5 days.'
sent2 = 'The project should have proper documentation.'
sent3 = 'Please project the document on the screen.'
sent4 = 'Project this project in the workshop'
word = 'project'

# Part - 1

## Use the Lesk Module to find the similar words of the word *book* using the above sentences. Record your observations.


In [3]:
for ss in wn.synsets(word):
     print(ss, ss.definition())

Synset('undertaking.n.01') any piece of work that is undertaken or attempted
Synset('project.n.02') a planned undertaking
Synset('project.v.01') communicate vividly
Synset('stick_out.v.01') extend out or project in space
Synset('project.v.03') transfer (ideas or principles) from one domain into another
Synset('project.v.04') project on a screen
Synset('project.v.05') cause to be heard
Synset('project.v.06') draw a projection of
Synset('plan.v.03') make or work out a plan for; devise
Synset('project.v.08') present for consideration, examination, criticism, etc.
Synset('visualize.v.01') imagine; conceive of; see in one's mind
Synset('project.v.10') put or send forth
Synset('project.v.11') throw, send, or cast forward
Synset('project.v.12') regard as objective


In [4]:
lesk(sent1, word)

Synset('project.v.06')

In [5]:
lesk(sent2, word)

Synset('project.v.06')

In [6]:
lesk(sent3, word)

Synset('project.v.06')

In [7]:
lesk(sent4, word)

Synset('visualize.v.01')

# Part - 2

## Tag sentences using Brill Tagger.

### Brill Tagger

#### The BrillTagger class is a transformation-based tagger. The BrillTagger class uses a series of rules to correct the results of an initial tagger. These rules are scored based on how many errors they correct minus the number of new errors they produce.

#### The idea is simple Brill Tagger tries to correct the mistake made by the inital tagger. Brill tagger inputs an initial tagger and the templates which autmatically tells to create new rules based on the Training Set.

#### Recommended Steps:

##### 1. Initially tag the sentence using POS Tagger. Then observe the POS tags for the word book in different context
##### 2. Then create a tagged_sentence using the POS Tagger correcting it with the mistakes it made.
##### 3. Now create a Brill Tagger using an initial tagger (POS) and pass templates(rules) to it.
##### 4. Train the Brill Tagger using the Tagged Sentence
##### 5. Test the Brill Tagger.



In [8]:
train_text = '''Project is very difficult for completion in 5 days.
                The project should have proper documentation.
                Please project the document on the screen.
                Project this project in the workshop'''

In [9]:
pos_tags = [nltk.pos_tag(word_tokenize(sentence)) for sentence in sent_tokenize(train_text)]
pos_tags

[[('Project', 'NN'),
  ('is', 'VBZ'),
  ('very', 'RB'),
  ('difficult', 'JJ'),
  ('for', 'IN'),
  ('completion', 'NN'),
  ('in', 'IN'),
  ('5', 'CD'),
  ('days', 'NNS'),
  ('.', '.')],
 [('The', 'DT'),
  ('project', 'NN'),
  ('should', 'MD'),
  ('have', 'VB'),
  ('proper', 'JJ'),
  ('documentation', 'NN'),
  ('.', '.')],
 [('Please', 'NNP'),
  ('project', 'VB'),
  ('the', 'DT'),
  ('document', 'NN'),
  ('on', 'IN'),
  ('the', 'DT'),
  ('screen', 'NN'),
  ('.', '.')],
 [('Project', 'NN'),
  ('this', 'DT'),
  ('project', 'NN'),
  ('in', 'IN'),
  ('the', 'DT'),
  ('workshop', 'NN')]]

In [10]:
pic_tagger = nltk.data.load('taggers/maxent_treebank_pos_tagger/english.pickle')
pickle_tagger = [pic_tagger.tag(word_tokenize(sentence)) for sentence in sent_tokenize(train_text)]
pickle_tagger

[[('Project', 'NNP'),
  ('is', 'VBZ'),
  ('very', 'RB'),
  ('difficult', 'JJ'),
  ('for', 'IN'),
  ('completion', 'NN'),
  ('in', 'IN'),
  ('5', 'CD'),
  ('days', 'NNS'),
  ('.', '.')],
 [('The', 'DT'),
  ('project', 'NN'),
  ('should', 'MD'),
  ('have', 'VB'),
  ('proper', 'JJR'),
  ('documentation', 'NN'),
  ('.', '.')],
 [('Please', 'NN'),
  ('project', 'NN'),
  ('the', 'DT'),
  ('document', 'NN'),
  ('on', 'IN'),
  ('the', 'DT'),
  ('screen', 'NN'),
  ('.', '.')],
 [('Project', 'NNP'),
  ('this', 'DT'),
  ('project', 'NN'),
  ('in', 'IN'),
  ('the', 'DT'),
  ('workshop', 'NN')]]

In [11]:
tagged_sentence =[['Project/NNP','is/VBZ','very/RB','difficult/JJ','for/IN','completion/NN','in/IN','5/CD','days/NNS','./None'],
  ['The/DT','project/NN','should/MD','have/VB','proper/JJR','documentation/NN','./None'],
 ['Please/NN','project/VB','the/DT','document/NN','on/IN','the/DT','screen/NN','./None'],
 ['Project/VB','Please/NN','project/NN','in/IN','the/DT','workshop/NN'],
 ["Please/NN", "project/VB", "this/DT"]]

In [12]:
training_set = []
for sentence in tagged_sentence:
    tuples = []
    for word in sentence:
        tuples.append(nltk.str2tuple(word))
    training_set.append(tuples)

In [13]:
def nltk18():

    return [
        Template(Pos([-1])),
        Template(Pos([1])),
        Template(Pos([-2])),
        Template(Pos([2])),
        Template(Pos([-2, -1])),
        Template(Pos([1, 2])),
        Template(Pos([-3, -2, -1])),
        Template(Pos([1, 2, 3])),
        Template(Pos([-1]), Pos([1])),
        Template(Word([-1])),
        Template(Word([1])),
        Template(Word([-2])),
        Template(Word([2])),
        Template(Word([-2, -1])),
        Template(Word([1, 2])),
        Template(Word([-3, -2, -1])),
        Template(Word([1, 2, 3])),
        Template(Word([-1]), Word([1])),
    ]

In [14]:
brill_tag_trainer = BrillTaggerTrainer(pic_tagger, nltk18())
brill_tagger = brill_tag_trainer.train(training_set)

In [15]:
brill_tagger.rules()

(Rule('006', '.', 'NONE', [(Pos([-3, -2, -1]),'IN')]),
 Rule('000', 'NN', 'VB', [(Pos([-1]),'NN')]))

In [16]:
[brill_tagger.tag(word_tokenize(sentence)) for sentence in sent_tokenize(train_text)]

[[('Project', 'NNP'),
  ('is', 'VBZ'),
  ('very', 'RB'),
  ('difficult', 'JJ'),
  ('for', 'IN'),
  ('completion', 'NN'),
  ('in', 'IN'),
  ('5', 'CD'),
  ('days', 'NNS'),
  ('.', 'NONE')],
 [('The', 'DT'),
  ('project', 'NN'),
  ('should', 'MD'),
  ('have', 'VB'),
  ('proper', 'JJR'),
  ('documentation', 'NN'),
  ('.', '.')],
 [('Please', 'NN'),
  ('project', 'VB'),
  ('the', 'DT'),
  ('document', 'NN'),
  ('on', 'IN'),
  ('the', 'DT'),
  ('screen', 'NN'),
  ('.', 'NONE')],
 [('Project', 'NNP'),
  ('this', 'DT'),
  ('project', 'NN'),
  ('in', 'IN'),
  ('the', 'DT'),
  ('workshop', 'NN')]]

In [17]:
tagged_sent = brill_tagger.tag(word_tokenize("Project please project this"))

In [18]:
tagged_sent

[('Project', 'NNP'), ('please', 'NN'), ('project', 'VB'), ('this', 'DT')]

# Part - 3

## Perform Part-1 again but passing the POS tags produced by the Brill Tagger.
    

In [19]:
sent1 = 'Project is very difficult for completion in 5 days.'
sent2 = 'The project should have proper documentation.'
sent3 = 'Please project the document on the screen.'
sent4 = 'Project this project in the workshop'
word = 'project'

In [20]:
lesk(context_sentence=sent1.lower(),ambiguous_word='project',pos = 'n').definition()

'a planned undertaking'

In [21]:
lesk(context_sentence=sent2.lower(),ambiguous_word='project',pos = 'n').definition()

'a planned undertaking'

In [22]:
lesk(context_sentence=sent3.lower(),ambiguous_word='project',pos = 'v').definition()

'draw a projection of'

In [23]:
lesk(context_sentence=sent4.lower(),ambiguous_word='project',pos = ['v','n'])

In [24]:
def transform(tagged_word):
    tag = wn.NOUN
    if tagged_word.startswith('V'):
        tag = wn.VERB
    return tag

In [25]:
for word in tagged_sent:
    if word[0].lower() == 'project':
        sentence = ' '.join([word[0] for word in tagged_sent])
        print(lesk(context_sentence=sentence, ambiguous_word=word[0].lower(), pos=transform(word[1])).definition()) 

a planned undertaking
draw a projection of
