Skip to content

danielcstone/frameBERT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The frameBERT is available for both of English FrameNet 1.7 and Korean FrameNet 1.2.

About

The frameBERT is a BERT-based frame-semantic parser to understand the meaning of texts in terms of FrameNet.

frame (frame semantics) is a schematic representation of a situation or an event. For an example sentence, "The center's director pledged a thorough review of safety precedures", frameBERT identifies several frames such as Being_born and Death for lexical units (e.g., center.n, director.n and pledge.v).

prerequisite

  • python 3
  • pytorch (Link)
  • transformers (Link)
  • Korean FrameNet (Link)
  • keras (Link)
  • nltk (for target identification)
  • flask_restful (for REST API service)
  • flask_cors (for REST API service)

For nltk, please download following packages in the python terminal:

  • import nltk
  • nltk.download('averaged_perceptron_tagger')
  • nltk.download('wordnet')

How to use

Install

Install frameBERT, and Korean FrameNet.

(Note: Korean FrameNet would be not mandatory package in the next update)

git clone https://github.com/machinereading/frameBERT.git
cd frameBERT
git clone https://github.com/machinereading/koreanframenet.git

How to use a frame-semantic parser for a language (English or Korean)

1. Download the pretrained model

Download two pretrained model files to {your_model_dir} (e.g. /home/model/).

  • English Model (recommended for English): (download)
  • Multilingual Model (English & Korean): (download)

2. Import model (in your python code) (make sure that your code is in a parent folder of frameBERT)

from frameBERT import frame_parser

model_path = {your_model_dir} # absolute_path (e.g. /home/model/)
parser = frame_parser.FrameParser(model_path=model_path, language='en')

optional: If you want to DO NOT USE LU DICTIONARY, set argument masking=False)

3. Parse the input text

text = 'Hemingway was born on July 21, 1899 in Illinois, and died of suicide at the age of 62.'
parsed = parser.parser(text, sent_id='1', result_format='graph')

Then, your result would be:

[('frame:Giving_birth#1', 'frdf:lu', 'born'),
 ('frame:Giving_birth#1', 'frdf:Giving_birth-Child', 'Hemingway'),
 ('frame:Giving_birth#1', 'frdf:Giving_birth-Time', 'on July 21, 1899'),
 ('frame:Giving_birth#1', 'frdf:Giving_birth-Place', 'in Illinois,'),
 ('frame:Death#1', 'frdf:lu', 'died'),
 ('frame:Death#1', 'frdf:Death-Protagonist', 'Hemingway'),
 ('frame:Death#1', 'frdf:Death-Explanation', 'of suicide'),
 ('frame:Killing#1', 'frdf:lu', 'suicide'),
 ('frame:Killing#1', 'frdf:Killing-Victim', 'Hemingway'),
 ('frame:Age#1', 'frdf:lu', 'age'),
 ('frame:Age#1', 'frdf:Age-Age', 'of 62.')]

Also, you can run the Korean FrameBERT for the korean text

parser = frame_parser.FrameParser(model_path=model_path, language='ko')
text = '헤밍웨이는 1899년 7월 21일 미국 일리노이에서 태어났고 62세에 자살로 사망했다.'
parsed = parser.parser(text, sent_id='1', result_format='all')

optional: sent_id and result_format are not mandatory arguments. You can get the result in following argument: conll', graph, textae, and all. The result consits of following three parts:

(1) triple format (result_format='graph') (2) conll format (result_format='conll') (3) pubannotation format (result_format='textae')

Or, you can get all result in json by result_format='all'

Result Format

triple format (as a Graph) The result is a list of triples.

[
    ('frame:Giving_birth#1', 'frdf:lu', 'born'), 
    ('frame:Giving_birth#1', 'frdf:Giving_birth-Child', 'Hemingway'), 
    ('frame:Giving_birth#1', 'frdf:Giving_birth-Time', 'on July 21, 1899'), 
    ('frame:Giving_birth#1', 'frdf:Giving_birth-Place', 'in Illinois,'), 
    ...
]

conll format The result is a list, which consists of multiple Frame-Semantic structures. Each SRL structure is in a list, which consists of four lists: (1) tokens, (2) lexical units, (3) its frames, and (4) its arguments. For example, for the given input text, the output is in the following format:

[
    [
        ['Hemingway', 'was', 'born', 'on', 'July', '21,', '1899', 'in', 'Illinois,', 'and', 'died', 'of', 'suicide', 'at', 'the', 'age', 'of', '62.'], 
        ['_', '_', 'bear.v', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_'], 
        ['_', '_', 'Giving_birth', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_'], 
        ['B-Child', 'O', 'O', 'B-Time', 'I-Time', 'I-Time', 'I-Time', 'B-Place', 'I-Place', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
    ], 
    [
    ...
]

Running REST API service

By running the code restApp.py, you can make a standalone REST service at your own server.

How to run REST API service

python restApp.py --port {port number} --language {en|ko} --model {model path}

Example

python restApp.py --port 8888 --language en --model ./models/en

Then, you can use it with the POST method to the url XXX.XXX.XXX.XXX:8888/frameBERT. XXX.XXX.XXX.XXX is your IP address.

Input format

# JSON format
{
 "text": "Hemingway was born on July 21, 1899 in Illinois, and died of suicide at the age of 62.",
 "result_format": "all"
}

How to train a model?

Prepare the FrameNet dataset

# such as
[
 [
  ['Greece', 'wildfires', 'force', 'thousands', 'to', '<tgt>', 'evacuate', '</tgt>'], # token list (target is indicated by the special tokens)
  ['_', '_', '_', '_', '_', '_', 'evacuate.v', '_'],                                  # lu list (lu for target, else '_'
  ['_', '_', '_', '_', '_', '_', 'Escaping', '_'],                                    # Frame list (frame for target, else '_')
  ['O', 'O', 'O', 'B-Escapee', 'O', 'X', 'O', 'X']                                    # FE list (IOB scheme, 'X' for the special tokens)
 ],
 ...
]

Train the model

(reference: train.ipynb)

python train.py --train {TRAINING DATA, e.g., efn} --model_path {DIRECTORY TO SAVE YOUR MODEL} --pretrained_model {default="bert-base-multilingual-cased"} --early_stopping {default=TRUE} --epochs {default=20}

Evaluate the model

(reference: train.ipynb)

python evaluate.py --language {default='ko') --model {DIRECTORY OF YOUR MODEL} --test {test_data} --reult {DIRECTORY TO SAVE THE RESULT}

Licenses

Publisher

Machine Reading Lab @ KAIST

Contact

Younggyun Hahm. hahmyg@kaist.ac.kr, hahmyg@gmail.com

Acknowledgement

This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (2013-0-00109, WiseKB: Big data based self-evolving knowledge base and reasoning platform)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 76.3%
  • Python 23.7%