This is the repository containing the code used to produce the results in the paper "Modeling semantics from syntax in the Korean language". Transcripts from conversations between children and their parents in Korean are fed into a contextual self-organizing map to learn semantic representations of the Korean language.
The scripts require Python 3 and Jupyter Notebook to run.
The Python libraries required by the scripts can be found in requirements.txt
. These can be easily installed by running pip install -r requirements.txt
within the Python environment of your choice.
In addition to the standard KoNLPy library, the scripts require the Korean MeCab segmentation library. Instructions for installing this can be found here.
This section goes through the directory structure and files included in this repository.
A copy of the data used in the paper is stored in Ko/
. This was retrieved from
J. Jo & Ko, E.-S. (2018) Korean mothers attune the frequency and acoustic saliency of sound symbolic words to the linguistic maturity of their children, Frontiers in Psychology 9:2225, doi: 10.3389/fpsyg.2018.02225
at this link on March 19, 2021.
A notebook tutorial can be found in tutorial.ipynb
. This notebook goes through the processes with which the contextual SOMs were trained on the Ko corpus, including data extraction, preprocessing, training, and evaluation.
The exact code used to produce the results in the paper can be found in paper_results/
. This directory includes contextual_som.py
, an aggregated Python script of the cells in tutorial.ipynb
; and main.ipynb
, the exact notebook used to produce the results in the paper. Note that due to the randomness in the algorithm, re-running main.ipynb
may produce slightly different outputs than in the paper.
All code was written by Calvin Choi.