# Lab2.6: Words, concepts, semantic relations in FrameNet-NLTK

Copyright, Vrije Universiteit Amsterdam, Faculty of Humanities, CLTL

FrameNet is a database about situation semantics developed at Berkeley University under the leadership of Fillmore:

https://framenet.icsi.berkeley.edu

FrameNet provides over a thousand frames that represent conceptual schemata for events involving participants in certain roles.

We are going to use the FrameNet module inside the NLTK package to assign frames to the main predicates of sentences according to spaCy.

## 1. FrameNet in NLTK

In [1]:
import nltk

We assume you have NLTK already installed. To use the FrameNet module, you need to download FrameNet within it. Run the following cell to download it within NLTK. If you have already done this before, FrameNet is included in NLTK and you do not need to download it again.

In [2]:
nltk.download('framenet_v17')

[nltk_data] Downloading package framenet_v17 to
[nltk_data]     /Users/piek/nltk_data...
[nltk_data]   Package framenet_v17 is already up-to-date!


True

To check if the install was succesful, the following code cell should work:

After succesful download you can comment out the previous cell.

In [3]:
from nltk.corpus import framenet as fn
len(fn.frames())

1221

Now you know how many different frames there are. The function *frames* gives a list of frame data. Let's have look at the first 5 items in the list.

In [4]:
fn.frames()[:5]

[frame (2031): Abandonment

[URL] https://framenet2.icsi.berkeley.edu/fnReports/data/frame/Abandonment.xml

[definition]
  An Agent leaves behind a Theme effectively rendering it no longer
  within their control or of the normal security as one's property.
  'Carolyn abandoned her car and jumped on a red double decker
  bus.'  'Perhaps he left the key in the ignition'  'Abandonment of
  a child is considered to be a serious crime in many
  jurisdictions.'  There are also metaphorically used examples:
  'She left her old ways behind .'

[semTypes] 0 semantic types

[frameRelations] 1 frame relations
  <Parent=Intentionally_affect -- Inheritance -> Child=Abandonment>

[lexUnit] 5 lexical units
  abandon.v (14839), abandoned.a (14843), abandonment.n (14842),
  forget.v (15317), leave.v (14841)


[FE] 12 frame elements
            Core: Agent (12338), Theme (12339)
      Peripheral: Degree (14482), Duration (12343), Manner (12342), Means (15920), Place (12340), Purpose (15921), Time (12341

There are some instructions how to use FrameNet in NLTK although they are sparse:

http://www.nltk.org/howto/framenet.html


In [5]:
### get the frame identifier for a specific Frame
print(fn.frames('Killing'))

[<frame ID=590 name=Killing>]


In [6]:
### get the frames for a specific lemma and print all information for each
word = 'inject'
frames = fn.frames_by_lemma(word)
for frame in frames:
    print(frame)

frame (262): Abounding_with

[URL] https://framenet2.icsi.berkeley.edu/fnReports/data/frame/Abounding_with.xml

[definition]
  A Location is filled or covered with the Theme.  The Location is
  realized as the External Argument, and the Theme either as PP
  complement headed by with, in or of.  NB:  This frame does not
  include uses of adjectives like paved when they merely specify
  the Type of some location, as in "paved and unpaved roads".  'The
  waters of the bay teemed with fish.' 'The waters of the bay were
  teeming with fish.' 'The road was completely covered in mud.'

[semTypes] 0 semantic types

[frameRelations] 7 frame relations
  <Parent=Abounding_with -- Inheritance -> Child=Lively_place>
  <Parent=Locative_relation -- Inheritance -> Child=Abounding_with>
  <Parent=Abounding_with -- Using -> Child=Expensiveness>
  <Parent=Abounding_with -- Using -> Child=Mass_motion>
  <Parent=Abundance -- Using -> Child=Abounding_with>
  <MainEntry=Distributed_position -- See_also -> Re

Take your time to read the information which is rich.

If you do not know the precise name of a frame you can also look for substrings:

In [7]:
### get frames with the substring 'medical' regardless of case
frames = fn.frames(r'(?i)medical')
for frame in frames:
    print(frame.name)


Medical_conditions
Medical_instruments
Medical_interaction_scenario
Medical_intervention
Medical_professionals
Medical_specialties


In [8]:
### get a specific frame through its identifier
f = fn.frame(590)
### check what properties and functions are provided for a frame
dict(f)

{'cBy': 'MJE',
 'cDate': '03/19/2003 04:20:05 PST Wed',
 'name': 'Killing',
 'ID': 590,
 '_type': 'frame',
 'definition': "A Killer or Cause causes the death of the Victim. 'John drowned Martha.'",
 'definitionMarkup': '<def-root>A <fen>Killer</fen> or <fen>Cause</fen> causes the death of the <fen>Victim</fen>.\n<ex><fex name="Killer">John</fex> <t>drowned</t> <fex name="Victim">Martha</fex>.</ex></def-root>',
 'FE': {'Beneficiary': <fe ID=11725 name=Beneficiary>, 'Cause': <fe ID=4452 name=Cause>, 'Circumstances': <fe ID=11726 name=Circumstances>, 'Containing_event': <fe ID=14606 name=Containing_event>, 'Degree': <fe ID=4752 name=Degree>, 'Depictive': <fe ID=4458 name=Depictive>, 'Explanation': <fe ID=4455 name=Explanation>, 'Frequency': <fe ID=13175 name=Frequency>, 'Instrument': <fe ID=4480 name=Instrument>, 'Killer': <fe ID=4450 name=Killer>, 'Manner': <fe ID=4482 name=Manner>, 'Means': <fe ID=4454 name=Means>, 'Period_of_iterations': <fe ID=11728 name=Period_of_iterations>, 'Place'

In [9]:
#### print some properties of a frame structure in NLTK

print('ID', f.ID)
print('FRAME:',f.name)
print('DEFINITION', f.definition)
print()
print('LEXICAL UNITS:')
for lu in f.lexUnit:
    print(lu)
print()
print('FRAME ELEMENTS:')
for fe in f.FE:
    print(fe)

ID 590
FRAME: Killing
DEFINITION A Killer or Cause causes the death of the Victim. 'John drowned Martha.'

LEXICAL UNITS:
kill.v
annihilate.v
assassinate.v
behead.v
eliminate.v
exterminate.v
drown.v
liquidate.v
murder.v
slay.v
terminate.v
suffocate.v
smother.v
starve.v
asphyxiate.v
suicide.v
homicide.n
suicide.n
euthanize.v
euthanasia.n
dispatch.v
lethal.a
annihilation.n
liquidation.n
crucify.v
crucifixion.n
decapitate.v
massacre.v
slaughter.v
lynch.v
liquidator.n
killing.n
murder.n
slaughter.n
killer.n
massacre.n
infanticide.n
smothering.n
slaying.n
regicide.n
pogrom.n
shooting.n
holocaust.n
matricide.n
genocide.n
butcher.v
beheading.n
assassination.n
butchery.n
carnage.n
suffocation.n
assassin.n
blood-bath.n
decapitation.n
extermination.n
fratricide.n
garrotte.v
murderer.n
slaughterer.n
slayer.n
immolation.n
patricide.n
silence.v
deadly.a
fatal.a
fatality.n
destroy.v
take out.v
bloodshed.n
do in.v
take (someone's) life.idio

FRAME ELEMENTS:
Killer
Victim
Cause
Purpose
Means
Explanati

In [10]:
print('FRAME RELATIONS:')
for relation in f.frameRelations:
   # print(relation.subFrameName)
    print(relation.superFrameName)
    #print(relation)

FRAME RELATIONS:
Killing
Transitive_action
Killing
Killing
Killing
Killing


## 2 Getting frames for predicates

Frames can be evoked by many different words and phrases. In the following example, the subject and object of *cause* are also events and actually more informative than the main predicate:

```Vaccination can cause autism```

In this notebook, we are restricting ourselves to predicates, as it is more complex to decide whether subjects and objects denote events as well. To find the predicates, we can rely on the syntactic parsing by spaCy as we did in the previous notebook.

In [12]:
import spacy
from spacy import displacy
# depending on how you installed spaCy, the name of the model might be different
nlp = spacy.load(name='en_core_web_sm') 
text = "John makes the cake . He got sick . He went to bed ."
doc = nlp(text)

Let is first see what frames we would get from all the words in the above text:

In [13]:
for token in doc:
    print(token)
    frames = fn.frames_by_lemma(token.lemma_)
    print('Number of frames:', len(frames))


John
Number of frames: 0
makes
Number of frames: 27
the
Number of frames: 107
cake
Number of frames: 3
.
Number of frames: 1073
He
Number of frames: 203
got
Number of frames: 51
sick
Number of frames: 5
.
Number of frames: 1073
He
Number of frames: 203
went
Number of frames: 79
to
Number of frames: 227
bed
Number of frames: 9
.
Number of frames: 1073


So that is a lot of possible frames and it is not clear how they connect. Furhermore, each period is associated with 1073 frames, which does not make sense. We could limit the analysis by only considering the main predicates and the tokens that have a dependency relation. Let's see what dependency structure spaCy gives for the text.

In [14]:
displacy.render(doc, jupyter=True, style='dep')

We could get the frames for *make*, *get* and *went* and see how the subject, object and other complements could relate to it. For this we build a specific function.

Given the spaCy objects and the dependency relations we can define a function that does the following:

* find tokens that have a specific dependency relation to an a head, i.e. nsubj, nobj, prep
* create a dictionary with head and the tokens with the dependency relations
* we output sets with the predicate and the dependent tokens


In [15]:
def get_predicate_subject_object(doc, rels={'nsubj', 'dobj', 'prep'}):
    """
    extract predicates with:
    -subject
    -object
    
    :param spacy.tokens.doc.Doc doc: spaCy object after processing text
    
    :rtype: list 
    :return: list of tuples (predicate, subject, object)
    """
    ### We create an empty dictionary as a structure to collect all the predicates
    ### A dictionary has keys and some information for each key.
    ### How this works will be explained in the Python course
    ### Below, we will use the token identifiers from spacy as the entries
    predicates = {} ### No entries
    
    for token in doc:
        if token.dep_ in rels:
            
            head = token.head
            head_id = head.i
            
            ## In case there is no information on the head_id we create an empty dict() first
            if head_id not in predicates:
                predicates[head_id] = dict()
            
            ## Now we now for sure that there is an entry for the head_id
            ## and we can add information to it
            predicates[head_id][token.dep_] = token.text
    
    ### After the previous loop we have a dictionary with head_ids and tokens that have a dependency to it
    ### We can iterate over all entries in this dictionary and obtain the information we need
    ### All information is added as a list to the output variable which consists of 4 elements:
    ### the lemma of the predicate, the nsubj, dobj and prop token if present
    output = []
    for pred_token, pred_info in predicates.items():
        one_row = (doc[pred_token].lemma_, 
                   pred_info.get('nsubj', None),
                   pred_info.get('dobj', None),
                   pred_info.get('prep', None)
                  )
        output.append(one_row)
    
    return output

Given that we can process the text with spaCy and obtain the events, we can now make a simple script to iterate over de event tuples and obtain all the frames for each event word. We then pick the first frame from the list and get the frame elements that belong to it.

In [16]:
events = get_predicate_subject_object(doc)
print(events)
print()

for event in events:
    predicate=event[0]
    print(event)
    frames = fn.frames_by_lemma(predicate)
    print('Number of frames:', len(frames))
    frame_names=[]
    for frame in frames:
        frame_names.append(frame.name)
    #print(frame_names)
    first_frame = frames[0]
    first_frame_elements = []
    for fe in first_frame.FE:
        first_frame_elements.append(fe)
    print('First frame listed', first_frame.name, first_frame_elements)
    print()

[('make', 'John', 'cake', None), ('get', 'He', None, None), ('go', 'He', None, 'to')]

('make', 'John', 'cake', None)
Number of frames: 27
First frame listed Arriving ['Theme', 'Source', 'Path', 'Goal', 'Manner', 'Means', 'Mode_of_transportation', 'Cotheme', 'Time', 'New_situation', 'Depictive', 'Period_of_iterations', 'Circumstances', 'Purpose', 'Degree', 'Event_description', 'Re-encoding', 'Frequency', 'Place']

('get', 'He', None, None)
Number of frames: 51
First frame listed Abandonment ['Agent', 'Theme', 'Place', 'Time', 'Manner', 'Duration', 'Explanation', 'Depictive', 'Degree', 'Means', 'Purpose', 'Event_description']

('go', 'He', None, 'to')
Number of frames: 79
First frame listed Accoutrements ['Wearer', 'Style', 'Material', 'Accoutrement', 'Descriptor', 'Use', 'Part', 'Body_location', 'Creator', 'Time_of_creation', 'Name']



We can see that these predicates are very polysemous! Many of these frames are very general and a-specific. So which of these frames are most relevant for our sentences? In other words, which frames tell the story!

We only show the first frame now. Semantic parsing mainly consists of chosing the correct frame for the sentence. This is a disambiguation problem. Once the frame is selected the frame elements need to be associated with the dependent phrases.

## End of this Notebook