# Lab2.6: Words, concepts, semantic relations in FrameNet-NLTK

Copyright, Vrije Universiteit Amsterdam, Faculty of Humanities, CLTL

FrameNet is a database about situation semantics developed at Berkeley University under the leadership of Fillmore:

https://framenet.icsi.berkeley.edu

FrameNet provides over a thousand frames that represent conceptual schemata for events involving participants in certain roles.
W are going to use the FrameNet module inside the NLTK package to assign frames to the main predicates of sentences according to spaCy.


## 1. FrameNet in NLTK

In [1]:
import nltk

We assume you have NLTK already installed. To use the FrameNet module, you need to download FrameNet within it. Run the following cell to download it within NLTK. If you have already done this before, FrameNet is included in NLTK and you do not need to download it again.

In [2]:
nltk.download('framenet_v17')

[nltk_data] Downloading package framenet_v17 to
[nltk_data]     /Users/piek/nltk_data...
[nltk_data]   Package framenet_v17 is already up-to-date!


True

To check if the install was succesful, the following code cell should work:

After succesful download you can comment out the previous cell.

In [3]:
from nltk.corpus import framenet as fn
len(fn.frames())

1221

Now you know how many different frames there are.

There are some instructions how to use FrameNet in NLTK although they are quote sparse:

http://www.nltk.org/howto/framenet.html


In [4]:
### get the frame identifier for a specific Frame
print(fn.frames('Killing'))

[<frame ID=590 name=Killing>]


In [8]:
### get the frames for a specific lemma and print all information for each
word = 'inject'
frames = fn.frames_by_lemma(word)
for frame in frames:
    print(frame)

frame (262): Abounding_with

[URL] https://framenet2.icsi.berkeley.edu/fnReports/data/frame/Abounding_with.xml

[definition]
  A Location is filled or covered with the Theme.  The Location is
  realized as the External Argument, and the Theme either as PP
  complement headed by with, in or of.  NB:  This frame does not
  include uses of adjectives like paved when they merely specify
  the Type of some location, as in "paved and unpaved roads".  'The
  waters of the bay teemed with fish.' 'The waters of the bay were
  teeming with fish.' 'The road was completely covered in mud.'

[semTypes] 0 semantic types

[frameRelations] 7 frame relations
  <Parent=Abounding_with -- Inheritance -> Child=Lively_place>
  <Parent=Locative_relation -- Inheritance -> Child=Abounding_with>
  <Parent=Abounding_with -- Using -> Child=Expensiveness>
  <Parent=Abounding_with -- Using -> Child=Mass_motion>
  <Parent=Abundance -- Using -> Child=Abounding_with>
  <MainEntry=Distributed_position -- See_also -> Re

Take your time to read the information which is quiet rich.

In [16]:
### get frames with the substring 'medical' regardless of case
frames = fn.frames(r'(?i)medical')
for frame in frames:
    print(frame.name)


Medical_conditions
Medical_instruments
Medical_interaction_scenario
Medical_intervention
Medical_professionals
Medical_specialties


In [17]:
### get a specific frame through its identifier
f = fn.frame(59)
### check what properties and functions are provided for a frame
dict(f)

{'cBy': 'ChW',
 'cDate': '02/07/2001 04:12:13 PST Wed',
 'name': 'Filling',
 'ID': 59,
 '_type': 'frame',
 'definition': "These are words relating to filling containers and covering areas with some thing, things or substance, the Theme. The area or container can appear as the direct object with all these verbs, and is designated Goal because it is the goal of motion of the Theme. Corresponding to its nuclear argument status, it is also affected in some crucial way, unlike goals in other frames.  'Lionel Hutz coated the wall with paint. '",
 'definitionMarkup': '<def-root>These are words relating to filling containers and covering areas with some thing, things or substance, the <fen>Theme</fen>. The area or container can appear as the direct object with all these verbs, and is designated <fen>Goal</fen> because it is the goal of motion of the <fen>Theme</fen>. Corresponding to its nuclear argument status, it is also affected in some crucial way, unlike goals in other frames.\n <ex><fex 

In [18]:
#### print some properties of a frame structure in NLTK

print('ID', f.ID)
print('FRAME:',f.name)
print('DEFINITION', f.definition)
print()
print('LEXICAL UNITS:')
for lu in f.lexUnit:
    print(lu)
print()
print('FRAME ELEMENTS:')
for fe in f.FE:
    print(fe)

ID 59
FRAME: Filling
DEFINITION These are words relating to filling containers and covering areas with some thing, things or substance, the Theme. The area or container can appear as the direct object with all these verbs, and is designated Goal because it is the goal of motion of the Theme. Corresponding to its nuclear argument status, it is also affected in some crucial way, unlike goals in other frames.  'Lionel Hutz coated the wall with paint. '

LEXICAL UNITS:
adorn.v
anoint.v
cover.v
dust.v
load.v
pack.v
smear.v
spread.v
stuff.v
wrap.v
plaster.v
drape.v
dab.v
daub.v
inject.v
cram.v
sow.v
seed.v
brush.v
hang.v
spatter.v
splash.v
splatter.v
spray.v
sprinkle.v
squirt.v
shower.v
drizzle.v
heap.v
pile.v
pump.v
jam.v
plant.v
scatter.v
butter.v
asphalt.v
surface.v
tile.v
wallpaper.v
coat.v
suffuse.v
fill.v
strew.v
douse.v
flood.v
crowd.v
pave.v
varnish.v
paint.v
gild.v
glaze.v
embellish.v
panel.v
wax.v
wash.v
plank.v
yoke.v
dress.v
accessorize.v

FRAME ELEMENTS:
Agent
Theme
Source
Path


In [19]:
print('FRAME RELATIONS:')
for relation in f.frameRelations:
   # print(relation.subFrameName)
    print(relation.superFrameName)
    #print(relation)

FRAME RELATIONS:
Container_focused_placing
Cause_motion
Distributed_position
Placing
Filling


## 2 Getting frames for predicates

Frames can be evoked by many different words and phrases. In the following example, the subject and object of *cause* are also events and actually more information than the main predicate:

```Vaccination can cause autism```

In this notebook, we are restricting ourselves to predicates however, as it is more complex to decide whether subjects and objects denote events as well. To find the predicates, we can rely on the syntactic parsing by spaCy as we did in the previous notebook.

We repeat here for convenience the cells with our example sentence and the dependency tree rendering. We also re-use our function for obtaining event tuples from the dependency relations.

In [20]:
import spacy
from spacy import displacy
# depending on how you installed spaCy, the name of the model might be different
nlp = spacy.load(name='en_core_web_sm') 
text = "John makes the cake . He got sick . He went to bed ."
doc = nlp(text)

In [21]:
displacy.render(doc, jupyter=True, style='dep')

In [22]:
def get_predicate_subject_object(doc, rels={'nsubj', 'dobj', 'prep'}):
    """
    extract predicates with:
    -subject
    -object
    
    :param spacy.tokens.doc.Doc doc: spaCy object after processing text
    
    :rtype: list 
    :return: list of tuples (predicate, subject, object)
    """
    predicates = {}
    
    for token in doc:
        if token.dep_ in rels:
            
            head = token.head
            head_id = head.i
            
            if head_id not in predicates:
                predicates[head_id] = dict()
            
            predicates[head_id][token.dep_] = token.lemma_
    
    output = []
    for pred_token, pred_info in predicates.items():
        one_row = (doc[pred_token].lemma_, 
                   pred_info.get('nsubj', None),
                   pred_info.get('dobj', None)
                  )
        output.append(one_row)
    
    return output

Given that we can process the text with spaCy and obtain the events, we can now make a simple script to iterate over de event tuples and obtain all the frames for each event word. The next cell does that: 

In [23]:
events = get_predicate_subject_object(doc)
for event in events:
    predicate=event[0]
    print(event)
    frames = fn.frames_by_lemma(predicate)
    print('Number of frames:', len(frames))
    frame_names=[]
    for frame in frames:
        frame_names.append(frame.name)
    print(frame_names)

('make', 'John', 'cake')
Number of frames: 27
['Arriving', 'Behind_the_scenes', 'Body_decoration', 'Building', 'Causation', 'Cause_change', 'Communicate_categorization', 'Cooking_creation', 'Creating', 'Destroying', 'Earnings_and_losses', 'Fame', 'Historic_event', 'Intentionally_create', 'Leadership', 'Make_acquaintance', 'Making_arrangements', 'Manufacturing', 'People_by_vocation', 'Personal_success', 'Procreative_sex', 'Reparation', 'Self_motion', 'Sex', 'Theft', 'Type', 'Verification']
('get', '-PRON-', None)
Number of frames: 51
['Abandonment', 'Accompaniment', 'Accoutrements', 'Activity_prepare', 'Activity_start', 'Aiming', 'Amalgamation', 'Arriving', 'Board_vehicle', 'Body_movement', 'Bringing', 'Building', 'Cause_to_amalgamate', 'Cause_to_wake', 'Clothing', 'Collaboration', 'Come_down_with', 'Come_together', 'Contacting', 'Cooking_creation', 'Disembarking', 'Dressing', 'Dynamism', 'Escaping', 'Evading', 'Food', 'Gathering_up', 'Getting', 'Getting_underway', 'Getting_up', 'Giving

We can see that these predicates are very polysemous! Many of these frames are very general and a-specific. So which of these frames are most relevant for our sentences? In other words, which frames tell the story!

## End of this Notebook