---
## Section 3: The Lesk algorithm
---
We now have all the necessary building blocks for the actual algorithm. Here is a pseudocode/plain text representation for running Simplified Lesk on a single word:

```
target = target word for disambiguation                                  (1)
context = context words for disambiguation                                |

for all senses of target word                                            (2)
    words_definition = words from the definition of the sense            (3)
    words_examples = words from the example sentences of the sense        |
    
    overlap = calculate vocabulary overlap between                       (4)
              context and definition/examples                             |
    
    if overlap is higher than before                                     (5)
        rembember current sense as the best option                        |
```

---

**Ex 3.1** Let's implement the algorithm one step at a time. The numbers on the right hand side of the pseudocode mark the lines corresponding to each step of this exercise.

---

**Step 1.** In the first two lines of our pseudocode we initialize the target word we want to disambiguate as well as the context we will use. Your first task is to initialize these two things. Use the sentence "*time flies like an arrow*" and disambiguate the word "*time*". In the code cell below, initialize a variable `target` to contain the string "time". Also initialize a variable `context` that contains the context words as a Python set. You can find examples of how to do this in the review section.


**Step 2.** The next step in the algorithm is to go through all the senses of the target word. Remember, you can access all the senses (synsets) of the word with `wn.synsets(word)`. The function returns a list containing all the synset objects. The`for`-loop for iterating through the senses is already given. To make sure you got this right, you can print out the synset objects. The output should look something like this:

    Synset('time.n.01')
    Synset('time.n.02')
        .  .  .
            
**Step 3.** You now need to gather the words in the sense definition and the examples of the sense, so we can compare them to the context words. Recall the two methods of the Synset object we saw before, `definition()` and `examples()`. We have actually already extracted the words from the examples for you so what is left is the definition. We also combine the two sets of words using a method `union` offered by the set class.

**Step 4.** In the fourth step you need to calculate the overlap between the context and the definition/examples. The set-class offers another convenient function, `a.intersection(b)`, that returns the set of overlapping elements of the sets `a` and `b`. After you have the set of common words, you can calculate its length using the familiar `len`-function.

**Step 5.** In the final step you need to compare this overlap to the previous overlaps. For this we initialized the variables `best_sense` an `best_overlap` for you at the start of the code cell. If current overlap is greater than `best_overlap`, assign current sense to the `best_sense` variable and update the `best_overlap` variable to keep track of the best one this far. Use the supplied `if`-statement for this.


You can now uncomment the `print`-statement on the final line of the code cell to check the correctness of you algorithm. The output of the cell should be:

    time time.v.05 adjust so that a force is applied and an action occurs at the desired time
    
Feel free to play around with different target words and sentences by changing the variables `target` and `sentence`. How well does the algorithm work?

In [3]:
# We need these variables in the fifth step.
# Don't worry about until then.
best_sense = None
best_overlap = -1


# Step 1. TODO: Initialize target word and context here.
# 'context' should be a set containing the words in 'sentence'
target = ""
sentence = "time flies like an arrow"
context = set()


# Step 2. TODO: Get the list of senses using wn.synsets()
senses = []

# Iterate through all the senses using a for-loop
for sense in senses:
    # You can print out the 'sense' objects here to 
    # check correctness of step 2. You can comment this
    # out when you are finished with this step to declutter
    # the output
    print(sense)
    

    # Step 3. TODO: Retrieve the definition of the 
    # sense under consideration and turn it into a list
    definition_as_string = ""
    definition_as_list = []
    # Step 3. TODO: Represent the words as a set
    words_in_definition = set()
    
    # Here we give you the words in the examples
    words_in_examples = set(" ".join(sense.examples()).split())
    # And here we combine the two sets of words
    words_in_both = words_in_definition.union(words_in_examples)
    
    
    # Step 4. TODO: Use the 'intersection' method here to get the
    # common words in 'context' and 'words_in_both'. See how 
    # the 'union' method was used above. The intersection
    # works in a similar way.
    words_overlapping = set() 
    
    # Step 4. TODO: Use the 'len' function to calculate the number of
    # overlapping words and assign the number to the variable 'overlap'
    overlap = -1
    
    
    if overlap > best_overlap:
        # Step 5 TODO: Update these accordingly
        best_sense = None
        best_overlap = -1
    
    
# Uncomment the print-statement below when you are done with
# steps 1-5 to see if you implemented the calgorithm correctly.

# print(target, best_sense.name(), best_sense.definition())

You can now move on to Part 2 of this lab.