<a href="https://colab.research.google.com/github/KelianF/KoreanPalindromes/blob/master/KrPalindrome.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Korean Palindromes

I am interested at how palindromes are formed in Korean, since I come from a language using a latin alphabet, I was not sure how palindromes using hangul, a featural alphabet.
From Looking at the [Wiktionary article on Korean Plaindromes](https://en.wiktionary.org/wiki/Appendix:Korean_palindromes), it is clear to understand that they are formed "Block-wise" 

## 0. Data Scraping

In [1]:
# Necessary imports

import pandas as pd
import requests
!pip install jamo
from jamo import h2j, j2hcj

Collecting jamo
  Downloading https://files.pythonhosted.org/packages/ac/cc/49812faae67f9a24be6ddaf58a2cf7e8c3cbfcf5b762d9414f7103d2ea2c/jamo-0.4.1-py3-none-any.whl
Installing collected packages: jamo
Successfully installed jamo-0.4.1


In [0]:
# We Will be scraping the Data from topikguide.com, which compile the 6000 most used words in korean. 
# (Too bad they are not all on the same page, hence I had to loop)

Data = {}

for i in range(0,10):    
    url1 = 'https://www.topikguide.com/6000-most-common-korean-words-' + str(i) + '/'

    page = requests.get(url1).text
    if "<table" not in page:
        continue
    Table = pd.read_html(page)
    for x in range(len(Table[0])):
        if Table[0].iloc[x,1] not in Data.keys():
            Data[Table[0].iloc[x,1]] = Table[0].iloc[x,2]

## 1. Korean Palindromes

Let's first look at palindromes from the Korean perspective (Block-wise)

In [7]:
Res = []

for x in Data.keys():
    if len(x) > 1: # Here we only select words which have more than one block.
        if x == x[::-1]:
            Res.append((x, Data[x]))

pd.DataFrame(Res)

Unnamed: 0,0,1
0,다르다,Be different
1,다니다,Go to and from a aplace
2,각각,Each and every
3,부부,Man and wife
4,다루다,"Treat, deal with"
5,점점,"More and more, by degrees"
6,적극적,Positively
7,다지다,"Make sure, to harden oneself, firm up one\\`s ..."
8,일주일,One whole day
9,다하다,"Finish, go through, be exhausted, run out of"


From This we can see that there are two types of palindromes, the one using the same block twice (종종 or 다가가다) or the one around a letter (더욱더 or 다치다).
It is interesting to notice that it often involves word in the 다 form (Verbs and Adjectives).
Otherwise, not very interesting.. Let's see if Character-wise gives us funnier result.


## 2. Palindromes, letter by letter

In [8]:
SpellWords = {}

for x in Data.keys():
    SpellWords[j2hcj(h2j(x)).replace(' ', '')] = x

Res = []
for x in SpellWords.keys():
    if len(x) > 1:
        if x == x[::-1]:
            Res.append((SpellWords[x], Data[SpellWords[x]]))
pd.DataFrame(Res)

Unnamed: 0,0,1
0,년,Year
1,눈,Eyes
2,몸,"body,physique"
3,밥,"Rice, a meal"
4,각,Each or every
5,법,"A law, the law"
6,각각,Each and every
7,응,"Yes, i see!"
8,왕,King
9,양,"quantity,volume"


Here, no surprise, most of them are monoblock such as 영 or 맘 but a few are more interesting: 고속 and 만남.
Finally, one word made it in both: 각각, meaning that it is the only block and letter wise palindrom in our Korean word list.

However, noticing words like 으응 (or rather an onomatopoea), and knowing that it is possible for the korean language to be written top to bottom, I wonder if it is possible to find Palindromes with reflection symmetry (a 180° rotation).

## 3. Mirror-Wise Palindromes

In [0]:
# First I had to build a custom set of Mirrored Letter (which is definitely open to correction).

# set([item for sublist in SpellWords.keys() for item in sublist])

# Mirror = {'ㄱ': 'ㄴ',
#  'ㄴ': 'ㄱ',
#   'ㅏ': 'ㅏ',
#   'ㅑ':'ㅑ',
#   'ㅓ': 'ㅓ',
#   'ㅕ':'ㅕ',
#   'ㅔ':'ㅔ',
#   'ㅖ':'ㅖ',
#   'ㅐ': 'ㅐ',
#   'ㅒ':'ㅒ',
#   'ㄹ': 'ㄹ',
#   'ㅁ': 'ㅁ',
#   'ㅇ': 'ㅇ',
#   'ㅍ':'ㅍ',
#  'ㅗ': 'ㅜ',
#  'ㅜ': 'ㅗ',
#  'ㅛ': 'ㅠ',
#  'ㅠ': 'ㅛ',
#   'ㅡ': 'ㅡ',
#   'ㅣ': 'ㅣ',
#   'ㅎ': 'ㅇㅜ',
#   'ㅇㅜ' : 'ㅎ'}

Mirror = {'ㄱ': 'ㄴ',
 'ㄴ': 'ㄱ',
  'ㅏ': 'ㅓ',
  'ㅑ':'ㅕ',
  'ㅓ': 'ㅏ',
  'ㅕ':'ㅑ',
  'ㅔ':'ㅔ',
  'ㅖ':'ㅖ',
  'ㅐ': 'ㅐ',
  'ㅒ':'ㅒ',
  'ㄹ': 'ㄹ',
  'ㅁ': 'ㅁ',
  'ㅇ': 'ㅇ',
  'ㅍ':'ㅍ',
 'ㅗ': 'ㅜ',
 'ㅜ': 'ㅗ',
 'ㅛ': 'ㅠ',
 'ㅠ': 'ㅛ',
  'ㅡ': 'ㅡ',
  'ㅣ': 'ㅣ',
  'ㅎ': 'ㅇㅜ',
  'ㅇㅜ' : 'ㅎ'}


In [0]:
SelectedList = []
for y in range(len(SpellWords.keys())):
    if sum([x in Mirror for x in list(SpellWords.keys())[y]]) == len(list(SpellWords.keys())[y]):
        SelectedList.append(list(SpellWords.keys())[y])

In [0]:
MirrorList = {}
for x in SelectedList:
    MirrorList[''.join([Mirror[y] for y in x])[::-1]] = x


In [12]:
Res = []

for x in MirrorList.keys():
    if x in SelectedList:
        Res.append((SpellWords[x], Data[SpellWords[x]], SpellWords[MirrorList[x]], Data[SpellWords[MirrorList[x]]]))

pd.DataFrame(Res)

Unnamed: 0,0,1,2,3
0,곡,A tune or an air,눈,Eyes
1,건,A matter or an object or a case,간,The interval between
2,곰,Bear,문,Door
3,미움,Hatred,모임,"A group, a party"
4,운,"Fortune, luck, fate",공,Ball
5,응,"Yes, i see!",응,"Yes, i see!"
6,영,"Really, totally",양,"quantity,volume"
7,간,The interval between,건,A matter or an object or a case
8,영양,Nutrition,영양,Nutrition
9,국,Soup or broth,논,A rice field


Because of Korean grammar, some fluke appear here because a vowel cannot start a block, and thus has to be preceded by ㅇ.
Except this, some interesting things appear such as 눈 with 곡, and 곰 with 문.
Never the less, I would say the most visually pleasing would be: 뉴욕.


Now, what would be an interesting exercise, would be to look at Mirrored words that are not necessarily palindromes just for a visual purpose.