# FrameNet Examples using NLTK

**(C) 2019-2024 by [Damir Cavar](http://damir.cavar.me/)**

**Version:** 0.5, January 2024

**Download:** This and various other Jupyter notebooks are available from my [GitHub repo](https://github.com/dcavar/python-tutorial-for-ipython).

**License:** [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/) ([CA BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/))

**Prerequisites:**

In [None]:
!pip install -U nltk

This is a tutorial related to the discussion of grammar engineering and parsing in the class *Alternative Syntactic Theories* and *Advanced Natural Language Processing* taught at Indiana University in Spring 2019 and 2020.

## Introduction

Using FrameNet in NLTK requires that the NLTK module and the FrameNet data set is installed in your Python environment. You can install and update your NLTK module using for example *pip* or *conda*, depending on the particular distribution you are using. In the command line use the following command to install or update your Python module:

*pip install -U nltk*

To run the following code examples, you will need at least the FrameNet data. Install the necessary data set using:

In [1]:
import nltk
nltk.download('framenet_v17')

[nltk_data] Downloading package framenet_v17 to
[nltk_data]     /home/damir/nltk_data...
[nltk_data]   Package framenet_v17 is already up-to-date!


True

## Using FrameNet

We can load FrameNet from the NLTK corpus collection using:

In [2]:
from nltk.corpus import framenet as fn

The list of frames and thus the number of frames in the FrameNet data set can be retrieved using the following code example:

In [3]:
len(fn.frames())

1221

Selecting specific frames

In [4]:
fn.frames(r'(?i)medical')

[<frame ID=239 name=Medical_conditions>, <frame ID=257 name=Medical_instruments>, ...]

Selecting a specific frame by number

In [5]:
myFrame = fn.frame(256)

Accessing the ID of the specific frame

In [6]:
myFrame.ID

256

Accessing the name of the specific frame

In [7]:
myFrame.name

'Medical_specialties'

Accessing the definition of the specific frame

In [8]:
from pprint import pprint
pprint(myFrame.definition)

('This frame includes words that name medical specialties and is closely '
 'related to the Medical_professionals frame.  The FE Type characterizing a '
 "sub-are in a Specialty may also be expressed. 'Ralph practices paediatric "
 "oncology.'")


Accessing the Lexical Units (LU) and thus the number of the LUs

In [9]:
len(myFrame.lexUnit)

29

Frame Elements (FE) can be retrieved using the *FE* method:

In [10]:
sorted([x for x in myFrame.FE])

['Affliction', 'Body_system', 'Specialty', 'Type']

Accessing the Frame Relations

In [11]:
myFrame.frameRelations

[<Parent=Medical_interaction_scenario -- Using -> Child=Medical_specialties>]

### Lexical Units

We can access the list of Lexical Units (LU) using

In [12]:
len(fn.lus())

13572

A specific LU can be searched for

In [13]:
fn.lus(r'(?i)a little')

[<lu ID=14744 name=a little bit.adv>, <lu ID=14743 name=a little.adv>, ...]

We can pick a particular LU by number

In [14]:
myLU = fn.lu(14744)

The properties of this LU can be retrieved using various methods. For example the *name* of the LU can be accessed using:

In [15]:
myLU.name

'a little bit.adv'

The name is encoded using the dotted notation. The string preceding the dot is the lemma. The string following the dot represents the part of speech. The parts of speech are:
- *a*: adjective
- *adv*: adverb
- *art*: article
- *c*: conjunction
- *intj*: interjection
- *n*: noun
- *num*: numbers
- *prep*: preposition
- *scon*: subordinating conjunctio
- *v*: verb

The definition is available using:

In [16]:
myLU.definition

'FN: to a small degree'

The Frame name

In [17]:
myLU.frame.name

'Degree'

In [18]:
for x in myLU.lexemes:
    print(x)

[order] 1
[headword] false
[breakBefore] false
[POS] ART
[name] a

[order] 2
[headword] false
[breakBefore] false
[POS] A
[name] little

[order] 3
[headword] true
[breakBefore] false
[POS] N
[name] bit



In [19]:
myLU.lexemes[0].name

'a'

In [20]:
myLU.lexemes[0].POS

'ART'

### Annotated Documents

FrameNet contains a set of annotated documents. The list of documents can be accessed in the following way:

In [21]:
docs = fn.docs()
len(docs)

107

We can print a particular document using:

In [22]:
docs[0]

full-text document (25397) chapter8_911report:

[corpid] 195
[corpname] ANC
[description] chapter8_911report
[URL] https://framenet2.icsi.berkeley.edu/fnReports/data/fulltext/ANC__chapter8_911report.xml

[sentence]
[0] '' THE SYSTEM WAS BLINKING RED"
[1] THE SUMMER OF THREAT
[2] As 2001 began , counterterrorism officials were receiving frequent but fragmentary
[3] reports about threats .
[4] Indeed , there appeared to be possible threats almost
[5] everywhere the United States had interests-including at home .
[6] To understand how the escalation in threat reporting was handled in the summer of
[7] 2001 , it is useful to understand how threat information in general is collected and
[8] conveyed .
[9] Information is collected through several methods , including signals
[10] intelligence and interviews of human sources , and gathered into intelligence
[11] reports .
[12] Depending on the source and nature of the reporting , these reports may be
[13] highly classified-and therefore tightl

The document is a disctionary data structure. The individual keys can be retrieved using:

In [33]:
docs[0].keys()

dict_keys(['_type', 'sentence', 'description', 'name', 'ID', 'filename', 'URL', 'corpname', 'corpid'])

To print the list of sentences:

In [34]:
docs[0]["sentence"]

[full-text sentence (4154255) in chapter8_911report:
 
 
 [POS] 6 tags
 
 [POS_tagset] PENN
 
 [text] + [annotationSet]
 
 '' THE SYSTEM WAS BLINKING RED"
  
  
  
 ,
 full-text sentence (4154256) in chapter8_911report:
 
 
 [POS] 4 tags
 
 [POS_tagset] PENN
 
 [text] + [annotationSet]
 
 THE SUMMER OF THREAT
  
  
  
 ,
 full-text sentence (4154257) in chapter8_911report:
 
 
 [POS] 11 tags
 
 [POS_tagset] PENN
 
 [text] + [annotationSet]
 
 As 2001 began , counterterrorism officials were receiving 
  
  
  
 
 frequent but fragmentary
  
  
  
 ,
 full-text sentence (4154258) in chapter8_911report:
 
 
 [POS] 4 tags
 
 [POS_tagset] PENN
 
 [text] + [annotationSet]
 
 reports about threats .
  
  
  
 ,
 full-text sentence (4154259) in chapter8_911report:
 
 
 [POS] 9 tags
 
 [POS_tagset] PENN
 
 [text] + [annotationSet]
 
 Indeed , there appeared to be possible threats almost
  
  
  
 ,
 full-text sentence (4154260) in chapter8_911report:
 
 
 [POS] 9 tags
 
 [POS_tagset] PENN
 
 [t

We can print out the text of the sentences using:

In [23]:
for s in docs[0]['sentence']:
    print(s.text)

'' THE SYSTEM WAS BLINKING RED"
THE SUMMER OF THREAT
As 2001 began , counterterrorism officials were receiving frequent but fragmentary
reports about threats .
Indeed , there appeared to be possible threats almost
everywhere the United States had interests-including at home .
To understand how the escalation in threat reporting was handled in the summer of
2001 , it is useful to understand how threat information in general is collected and
conveyed .
Information is collected through several methods , including signals
intelligence and interviews of human sources , and gathered into intelligence
reports .
Depending on the source and nature of the reporting , these reports may be
highly classified-and therefore tightly held-or less sensitive and widely
disseminated to state and local law enforcement agencies .
Threat reporting must be
disseminated , either through individual reports or through threat advisories .
Such
advisories , intended to alert their recipients , may address a specif

In [24]:
for s in docs[0]['sentence']:
    print(s.annotationSet)

[POS annotation set (6653358) PENN in sentence 4154255:

'' THE SYSTEM WAS BLINKING RED"
-- --- ------ --- -------- ----
'' dt  NP     NP  NP       NP  
]
[POS annotation set (6653359) PENN in sentence 4154256:

THE SUMMER OF THREAT
--- ------ -- ------
dt  NP     in NP    
]
[POS annotation set (6653360) PENN in sentence 4154257:

As 2001 began , counterterrorism officials were receiving 
-- ---- ----- - ---------------- --------- ---- --------- 
in cd   VVD   , nn               nns       VBD  VVG       

frequent but fragmentary
-------- --- -----------
jj       cc  jj         
]
[POS annotation set (6653361) PENN in sentence 4154258:

reports about threats .   
------- ----- ------- -   
nns     in    nns     sent
]
[POS annotation set (6653362) PENN in sentence 4154259:

Indeed , there appeared to be possible threats almost
------ - ----- -------- -- -- -------- ------- ------
rb     , ex    VVD      to vb jj       nns     rb    
]
[POS annotation set (6653363) PENN in sentence 415

**(C) 2019-2024 by [Damir Cavar](http://damir.cavar.me/)**