**NOTA**: questa esercitazione non è da considerarsi completa

# Esercitazione

## Mapping di Frame in WordNet synset

In [7]:
import hashlib
from random import randint
from random import seed
import string
from nltk.corpus import framenet as fn, stopwords, wordnet as wn

Il seguente metodo, fornito a lezione, permette di generare un set di ID di FrameNet da usare durante l'esercitazione.

In [3]:
def get_frame_set_for_student(surname, list_len=5):
    nof_frames = len(fn.frames())
    base_idx = (abs(int(hashlib.sha512(surname.encode('utf-8')).hexdigest(), 16)) % nof_frames)
    print('\nstudent: ' + surname)
    framenet_IDs = [f.ID for f in fn.frames()]
    i = 0
    offset = 0
    seed(1)
    my_ids = []
    while i < list_len:
        fID = framenet_IDs[(base_idx + offset) % nof_frames]
        my_ids.append(fID)
        f = fn.frame(fID)
        fNAME = f['name']
        print(f'\tID: {fID:4d}\tframe: {fNAME}')
        offset = randint(0, nof_frames)
        i += 1
    return my_ids

In [4]:
ids = get_frame_set_for_student('Mario Bifulco')


student: Mario Bifulco
	ID:   12	frame: Feigning
	ID: 2615	frame: Noncombatant
	ID: 2622	frame: Endeavor_failure
	ID: 2664	frame: Inhibit_motion_scenario
	ID:   31	frame: Scrutiny


Per evitare di svolgere più chiamate a FrameNet, ho deciso di riorganizzare le informazioni rilevanti in una semplice classe che raccoglie tutte le parole da disambiguare del frame.

In [6]:
class MFrame:
    def __init__(self, frame_id):
        self.id = frame_id
        self.name = fn.frame(frame_id)['name']
        self.fes = []
        for fe in fn.frame(frame_id)['FE']:
            self.fes.append(fe)
        self.lus = []
        for lu in fn.frame(frame_id)['lexUnit']:
            self.lus.append(lu)

    def __str__(self):
        return f'{self.name}:\n\tFEs: {self.fes}\n\tLUs: {self.lus}'

    def __repr__(self):
        return str(self)

    def get_words(self):
        res = [self.name]
        res.extend(self.fes)
        res.extend(self.lus)
        return res

Per trovare il senso a partire dal termine, e dal contesto, utilizzo l'algoritmo Lesk con approccio bag-of-words, rimuovendo le stopwords come operazione di preprocessing.

In [11]:
def remove_stopwords(phrase):
    phrase = phrase.split()
    for p in string.punctuation:
        phrase = {item.replace(p, '') for item in phrase}
    phrase = {item.replace('\'s', '') for item in phrase}
    stop = stopwords.words('english')
    return {t for t in phrase if t not in stop}

In [12]:
def lesk(word, ctx_w):
    split_w = word.split('.')
    pos = None
    if len(split_w) == 2:
        word = split_w[0]
        pos = split_w[1]
    synset = wn.synsets(word, pos=pos)
    if synset is None or len(synset) == 0:
        return None
    guess = synset[0]
    max_overlap = 0
    ctx_w = remove_stopwords(ctx_w)
    for sense in synset:
        ctx_s = remove_stopwords(sense.definition())
        ext_s = sense.hypernyms()
        ext_s.extend(sense.hyponyms())
        for s in ext_s:
            ctx_s.add(str(s).split('.')[0][8:])
        for ex in sense.examples():
            ctx_s.union(remove_stopwords(ex))
        overlap = len(ctx_s.intersection(ctx_w))
        if overlap > max_overlap:
            max_overlap = overlap
            guess = sense
    return guess

In [13]:
frame_list = []
for idx in ids:
    frame_list.append(MFrame(idx))
for f in frame_list:
    ctx = fn.frame(f.id)['definition'].replace('_', ' ')
    for fe in f.fes:
        ctx += f' {fe.replace("_", " ")}'
    words = f.get_words()

    for i, w in enumerate(words):
        best_sense = lesk(w, ctx)
        print(f'{i:3d}: {w:25s} - {str(best_sense):25s}')

  0: Feigning                  - Synset('pretense.n.02')  
  1: Agent                     - Synset('agent.n.02')     
  2: Original                  - Synset('original.s.02')  
  3: Copy                      - Synset('transcript.n.02')
  4: Manner                    - Synset('manner.n.02')    
  5: Means                     - Synset('means.n.01')     
  6: Degree                    - Synset('degree.n.01')    
  7: State_of_affairs          - Synset('situation.n.01') 
  8: Purpose                   - Synset('purpose.n.01')   
  9: Explanation               - Synset('explanation.n.01')
 10: Frequency                 - Synset('frequency.n.01') 
 11: Time                      - Synset('time.n.06')      
 12: Period_of_iterations      - None                     
 13: Duration                  - Synset('duration.n.01')  
 14: Depictive                 - Synset('delineative.s.01')
 15: Circumstances             - Synset('fortune.n.04')   
 16: Place                     - Synset('place.n.02') 

I risultati ottenuti in output dall'algoritmo vanno quindi confrontati con le annotazioni svolte manualmente, di seguito sono riportati i sensi relativi ai termini dei primi due frame dell'esercitazione.

| id   | word             | sense                 | real annotation           |
|------|------------------|-----------------------|---------------------------|
| 12   | Feigning         | feign.v.01            |                           |
| 12   | Agent            | agent.n.02            |                           |
| 12   | Original         | original.n.02         |                           |
| 12   | Copy             | copy.n.02             |                           |
| 12   | Manner           | manner.n.01           |                           |
| 12   | Means            | means.n.01            |                           |
| 12   | Degree           | degree.n.01           |                           |
| 12   | State of affairs | state_of_affairs.n.01 |                           |
| 12   | Purpose          | purpose.n.03          |                           |
| 12   | Explanation      | explanation.n.01      |                           |
| 12   | Frequency        | frequency.n.03        |                           |
| 12   | Time             | time.n.05             |                           |
| 12   | iteration        | iteration.n.03        | Period_of_iterations      |
| 12   | Duration         | duration.n.01         |                           |
| 12   | Depictive        | deceptive.a.01        |                           |
| 12   | Circumstances    | circumstance.n.02     |                           |
| 12   | Place            | place.n.05            |                           |
| 12   | counterfeit.v    | counterfeit.v.01      |                           |
| 12   | fake.v           | fake.v.02             |                           |
| 12   | feign.v          | feign.v.01            |                           |
| 12   | stage.v          | stage.v.02            |                           |
| 12   | affect.v         | affect.v.04           |                           |
| 12   | pretend.v        | pretend.v.03          |                           |
| 12   | simulate.v       | simulate.v.01         |                           |
| 2615 | Noncombatant     | noncombatant.n.01     |                           |
| 2615 | Origin           | origin.n.01           |                           |
| 2615 | Person           | person.n.01           |                           |
| 2615 | characteristic   | characteristic.n.01   | Persistent_characteristic |
| 2615 | Descriptor       | descriptor.n.01       |                           |
| 2615 | Age              | age.v.01              |                           |
| 2615 | Ethnicity        | ethnicity.n.01        |                           |
| 2615 | context          | context.n.02          | Context_of_acquaintance   |
| 2615 | Conflict         | conflict.n.03         |                           |
| 2615 | noncombatant.n   | noncombatant.n.01     | non-combatant.n           |
| 2615 | civilian.n       | civilian.n.01         |                           |
| 2615 | civilian         | civilian.n.01         | civvie.n                  |
