Skip to content
This repository has been archived by the owner on Jan 6, 2022. It is now read-only.
/ CS579-Project Public archive

Final Project for CS579 Computational Linguistics, Fall 2021, at KAIST.

License

Notifications You must be signed in to change notification settings

Lee-Janggun/CS579-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Konglog: Write your favorite Konglish correctly with Prolog

Final Project for CS579 Computational Linguistics at KAIST, Fall 2021, by Janggun Lee.

Introduction:

Konglog is an implementation of the Korean Loanword Orthography in Prolog and Python. It aims to faifully encode the rules of the orthography, and provide a simple API for all to use.

Dependencies:

Python 3

  • Install Python 3.
  • swiplserver is tested on 3.7 and above, and NLTK only supports up to 3.9, so get a version in between.

Prolog

NLTK

  • Install NLTK.
  • Download the cmudict corpus. Run the following Python script. This will download only the nessecary data.
import nltk
nltk.download('cmudict')
  • If the download doesn't start with [SSL:CERTIFICATE_VERIFY_FAILED], check this comment for a solution.

API:

Konglog provides a simple function, eng_to_kong that takes in a english word as input, and returns the Konglish translation as output. A very simple example is shown below.

import konglog

def main():
    word = "shrimp"

    print(konglog.eng_to_kong(word))

For a more complete example, try the script in __init__.py with python3 __init__.py! This will create a CLI where the user can type in words to translate as below.

CS579-Project % python3 __init__.py
Welcome to Konglog mini example.
Type an English word you want to translate.
English word, or N to exit: shrimp
Translating to Konglish. This may take a bit...
"shrimp" is translated into "슈림프"

Type an English word you want to translate.
English word, or N to exit: N
Thanks for trying Konglog!

Structure:

Konglog has three main steps in its architecture, depicted in the picture below. Architecture

  1. First, the input word is translated into phonems by looking up the CMU pronounciation dictionary, provided by NLTK.
  2. Second, the phonems are trasnalted into jaem and moems. Translation rules are provided in ipa.pl.
  3. Finally, the jaem and moems are combined into one. The tools for this combination are in unicode.py, and is taken from hangulutils.

About

Final Project for CS579 Computational Linguistics, Fall 2021, at KAIST.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published