Korean FrameNet

About

Korean FrameNet is a lexical database that has rich annotations to represent the meaning of text using semantic frames.

KFN Statistics

frame (frame semantics): a schematic representation of a situation. Korean FrameNet is based on ICSI FrameNet 1.7. Verification is an example of frames. (definition: An Inspector attains a degree of certainty in the Unconfirmed_content, generally by inspecting some evidence.)
lexical unit (lu): a word with its part-of-speech. e.g. 입증하다.v
LU.frame: a paring of a lu and frame. e.g. 입증하다.v.Verification

prerequisite

python 3
nltk (optional)

How to use

Install

git clone https://github.com/machinereading/koreanframenet.git

Import Korean FrameNet (in your python code)

from koreanframenet import koreanframenet
version = 1.2 
kfn = koreanframenet.interface(version=version)

Get LUs by word

lus = kfn.lus_by_word('입증하다')
print(lus)

[
  {'lu': '입증하다.v', 'frame': 'Statement', 'lu_id': 5565}, 
  {'lu': '입증하다.v', 'frame': 'Verification', 'lu_id': 5566}, 
  {'lu': '입증하다.v', 'frame': 'Evidence', 'lu_id': 5564}
]

Get LUs by Frame

lus = kfn.lus_by_frame('Verification')
print(lus)

[
  {'frame': 'Verification', 'lu': '인정받다.v', 'lu_id': 5441},
  {'frame': 'Verification', 'lu': '인정시키다.v', 'lu_id': 5442},
  {'frame': 'Verification', 'lu': '검증되다.v', 'lu_id': 419},
  {'frame': 'Verification', 'lu': '검증시키다.v', 'lu_id': 420},
  {'frame': 'Verification', 'lu': '확인.n', 'lu_id': 7782},
  {'frame': 'Verification', 'lu': '검증하다.v', 'lu_id': 423},
  {'frame': 'Verification', 'lu': '확인되다.v', 'lu_id': 7784},
  {'frame': 'Verification', 'lu': '인증되다.v', 'lu_id': 5449},
  {'frame': 'Verification', 'lu': '인증하다.v', 'lu_id': 5451},
  {'frame': 'Verification', 'lu': '확인하다.v', 'lu_id': 7792},
  ...

Get all frames in KFN

frames = kfn.frames()
print(len(frames))
print(frames)

809
['Abandonment', 'Abounding_with', 'Absorb_heat', 'Abundance', 'Abusing', 'Accompaniment', 'Accomplishment', ...]

Get annotations by LU id

annotations = kfn.annotations_by_lu(5566)
print(annotations[4])

{'arguments': ['한국 축구팬들에게 첫선을 보인 마이클 오언이 [Inspector]',
               '세계 최고 골잡이의 명성을 [Unconfirmed_Content]',
               '그대로 [Manner]'],
 'lu': '입증하다.v.Verification',
 'text': '한국 축구팬들에게 첫선을 보인 마이클 오언이 세계 최고 골잡이의 명성을 그대로 입증했다.'}

(optional) Get frame definition

For advanced search, you can use NLTK FrameNet library (http://www.nltk.org/howto/framenet.html). This is a simple interface for NLTK.

definition = kfn.get_frame_definition('Verification')
print(definition)

A Money_owner exchanges Sum_1 in the  Source_currency for Sum_2 in the Target_currency at some Exchange_service.  'After checking into the Intourist-operated $30-a-day Hotel Zerafshon, we each exchanged $50 for Uzbek currency with a shifty loiterer in the lobby at the rate .'  'We recently exchanged $100 for £64.00 UK pounds.'  'If he exchanged the money into Deutsch-marks, his 18 marks in Germany can just barely obtain four Big Macs.'  'He said the most crucial evidence was that of former police reservist, Simo Petersen-Jessen, now in Australia, who had converted the money into rands.'

(optional) Get frame by English word

For advanced search, you can use NLTK FrameNet library (http://www.nltk.org/howto/framenet.html). This is a simple interface for NLTK.

frames = kfn.get_frames_by_trans('verify')
print(frames)

['Evidence', 'Verification']

How to Load Korean FrameNet dataset: training data, dev data, and test data

training_data, dev_data, test_data = kfn.load_data()

Each data is a list for a sentence and its FrameNet annotations. Each sentence consists of four lists: tokens, target, frame, and its arguments. For example, a sentence '한국 축구팬들에게 첫선을 보인 마이클 오언이 세계 최고 골잡이의 명성을 그대로 입증했다.' is shown in following four lists (dev_data[2330]):

TOKENS (dev_data[2330][0]): ['한국', '축구팬들에게', '첫선을', '보인', '마이클', '오언이', '세계', '최고', '골잡이의', '명성을', '그대로', '입증했다.']
TARGET (dev_data[2330][1]): ['_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '입증하다.v']
FRAME (dev_data[2330][2]): ['_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', 'Verification']
ARGUMENTS (dev_data[2330][3]): ['B-Inspector', 'I-Inspector', 'I-Inspector', 'I-Inspector', 'I-Inspector', 'I-Inspector', 'B-Unconfirmed_Content', 'I-Unconfirmed_Content', 'I-Unconfirmed_Content', 'I-Unconfirmed_Content', 'B-Manner', 'O']

TARGET list provides target annotation. The tag _ means that the token is not target word and other tag is target word. For above example, the lexeme "입증하다.v" is annotated for the target word "입증했다" (12th token in TOKEN list. i.e. dev_data[2330][0][11]). In this case, the lexeme "입증하다.v" is annotated with the frame Verification. In terms of FrameNet, arguments is annotated with frame element tags of the frame Verification with BIO scheme. For above example, the argument "한국 축구팬들에게 첫선을 보인 마이클 오언이" is annotated with Inspector, the argument '세계 최고 골잡이의 명성을' is with Unconfirmed_Content, and the argument '그대로' is with Manner.

Licenses

CC BY-NC-SA Attribution-NonCommercial-ShareAlike
If you want to commercialize this resource, please contact to us

Publisher

Machine Reading Lab @ KAIST

Contact

Younggyun Hahm. hahmyg@kaist.ac.kr, hahmyg@gmail.com

Acknowledgement

This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (2013-0-00109, WiseKB: Big data based self-evolving knowledge base and reasoning platform)

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
data		data
image		image
resource		resource
src		src
README.md		README.md
Untitled.ipynb		Untitled.ipynb
koreanframenet.py		koreanframenet.py
split_data.ipynb		split_data.ipynb

machinereading/koreanframenet

Folders and files

Latest commit

History

Repository files navigation

Korean FrameNet

About

prerequisite

How to use

Get LUs by word

Get LUs by Frame

Get all frames in KFN

Get annotations by LU id

(optional) Get frame definition

(optional) Get frame by English word

How to Load Korean FrameNet dataset: training data, dev data, and test data

Licenses

Publisher

Contact

Acknowledgement

About

Resources

Stars

Watchers

Forks

Languages