Korean FrameNet is a lexical database that has rich annotations to represent the meaning of text using semantic frames.
KFN Statistics
- frame (frame semantics): a schematic representation of a situation. Korean FrameNet is based on ICSI FrameNet 1.7. Verification is an example of frames. (definition: An Inspector attains a degree of certainty in the Unconfirmed_content, generally by inspecting some evidence.)
- lexical unit (lu): a word with its part-of-speech. e.g. 입증하다.v
- LU.frame: a paring of a lu and frame. e.g. 입증하다.v.Verification
python 3
nltk
(optional)
Install
git clone https://github.com/machinereading/koreanframenet.git
Import Korean FrameNet (in your python code)
from koreanframenet import koreanframenet
version = 1.2
kfn = koreanframenet.interface(version=version)
lus = kfn.lus_by_word('입증하다')
print(lus)
[
{'lu': '입증하다.v', 'frame': 'Statement', 'lu_id': 5565},
{'lu': '입증하다.v', 'frame': 'Verification', 'lu_id': 5566},
{'lu': '입증하다.v', 'frame': 'Evidence', 'lu_id': 5564}
]
lus = kfn.lus_by_frame('Verification')
print(lus)
[
{'frame': 'Verification', 'lu': '인정받다.v', 'lu_id': 5441},
{'frame': 'Verification', 'lu': '인정시키다.v', 'lu_id': 5442},
{'frame': 'Verification', 'lu': '검증되다.v', 'lu_id': 419},
{'frame': 'Verification', 'lu': '검증시키다.v', 'lu_id': 420},
{'frame': 'Verification', 'lu': '확인.n', 'lu_id': 7782},
{'frame': 'Verification', 'lu': '검증하다.v', 'lu_id': 423},
{'frame': 'Verification', 'lu': '확인되다.v', 'lu_id': 7784},
{'frame': 'Verification', 'lu': '인증되다.v', 'lu_id': 5449},
{'frame': 'Verification', 'lu': '인증하다.v', 'lu_id': 5451},
{'frame': 'Verification', 'lu': '확인하다.v', 'lu_id': 7792},
...
frames = kfn.frames()
print(len(frames))
print(frames)
809
['Abandonment', 'Abounding_with', 'Absorb_heat', 'Abundance', 'Abusing', 'Accompaniment', 'Accomplishment', ...]
annotations = kfn.annotations_by_lu(5566)
print(annotations[4])
{'arguments': ['한국 축구팬들에게 첫선을 보인 마이클 오언이 [Inspector]',
'세계 최고 골잡이의 명성을 [Unconfirmed_Content]',
'그대로 [Manner]'],
'lu': '입증하다.v.Verification',
'text': '한국 축구팬들에게 첫선을 보인 마이클 오언이 세계 최고 골잡이의 명성을 그대로 입증했다.'}
For advanced search, you can use NLTK FrameNet library (http://www.nltk.org/howto/framenet.html). This is a simple interface for NLTK.
definition = kfn.get_frame_definition('Verification')
print(definition)
A Money_owner exchanges Sum_1 in the Source_currency for Sum_2 in the Target_currency at some Exchange_service. 'After checking into the Intourist-operated $30-a-day Hotel Zerafshon, we each exchanged $50 for Uzbek currency with a shifty loiterer in the lobby at the rate .' 'We recently exchanged $100 for £64.00 UK pounds.' 'If he exchanged the money into Deutsch-marks, his 18 marks in Germany can just barely obtain four Big Macs.' 'He said the most crucial evidence was that of former police reservist, Simo Petersen-Jessen, now in Australia, who had converted the money into rands.'
For advanced search, you can use NLTK FrameNet library (http://www.nltk.org/howto/framenet.html). This is a simple interface for NLTK.
frames = kfn.get_frames_by_trans('verify')
print(frames)
['Evidence', 'Verification']
training_data, dev_data, test_data = kfn.load_data()
Each data is a list for a sentence and its FrameNet annotations. Each sentence consists of four lists: tokens, target, frame, and its arguments. For example, a sentence '한국 축구팬들에게 첫선을 보인 마이클 오언이 세계 최고 골잡이의 명성을 그대로 입증했다.' is shown in following four lists (dev_data[2330]
):
- TOKENS (
dev_data[2330][0]
):['한국', '축구팬들에게', '첫선을', '보인', '마이클', '오언이', '세계', '최고', '골잡이의', '명성을', '그대로', '입증했다.']
- TARGET (
dev_data[2330][1]
):['_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '입증하다.v']
- FRAME (
dev_data[2330][2]
):['_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', 'Verification']
- ARGUMENTS (
dev_data[2330][3]
):['B-Inspector', 'I-Inspector', 'I-Inspector', 'I-Inspector', 'I-Inspector', 'I-Inspector', 'B-Unconfirmed_Content', 'I-Unconfirmed_Content', 'I-Unconfirmed_Content', 'I-Unconfirmed_Content', 'B-Manner', 'O']
TARGET list provides target annotation. The tag _
means that the token is not target word and other tag is target word. For above example, the lexeme "입증하다.v" is annotated for the target word "입증했다" (12th token in TOKEN list. i.e. dev_data[2330][0][11]
). In this case, the lexeme "입증하다.v" is annotated with the frame Verification. In terms of FrameNet, arguments is annotated with frame element tags of the frame Verification with BIO scheme. For above example, the argument "한국 축구팬들에게 첫선을 보인 마이클 오언이" is annotated with Inspector, the argument '세계 최고 골잡이의 명성을' is with Unconfirmed_Content, and the argument '그대로' is with Manner.
CC BY-NC-SA
Attribution-NonCommercial-ShareAlike- If you want to commercialize this resource, please contact to us
Machine Reading Lab @ KAIST
Younggyun Hahm. hahmyg@kaist.ac.kr
, hahmyg@gmail.com
This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (2013-0-00109, WiseKB: Big data based self-evolving knowledge base and reasoning platform)