Skip to content

about_unicode

KyusungDev edited this page Nov 4, 2017 · 9 revisions

์œ ๋‹ˆ์ฝ”๋“œ

๋ฌธ์ž์ง‘ํ•ฉ

์ปดํ“จํ„ฐ๋กœ ๋ฌธ์ž๋ฅผ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋ฌธ์ž์ง‘ํ•ฉ(character set)์„ ์ •์˜ํ–ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ž ์ง‘ํ•ฉ์„ ์ฝ”๋“œ ํ˜•ํƒœ(2์ง„์ˆ˜)๋กœ ํ‘œ๊ธฐํ•œ ๊ฒƒ์„ ์ฝ”๋“œํ™”๋œ ๋ฌธ์ž ์ง‘ํ•ฉ(Coded Character Set)์ด๋ผ๊ณ  ํ•œ๋‹ค. ๋ฌธ์ž ์ง‘ํ•ฉ์„ ์ปดํ“จํ„ฐ์— ์ €์žฅํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ”์ดํŠธ(Byte)ํ˜•ํƒœ๋กœ ํ‘œํ˜„ํ•œ ๊ฒƒ์„ encoding(๋ถ€ํ˜ธํ™”) ๋ฐฉ์‹(CES, character encoding scheme)์œผ๋กœ ๋ถˆ๋ ค์ง€๊ณ  ์žˆ๋‹ค.

๊ฐ€์žฅ ๋Œ€ํ‘œ์ ์œผ๋กœ ASCII(American Standard Code for Information Interchange)๊ฐ€ ์žˆ๋‹ค. 0x00 ~ 0X7F๋กœ ์ด 127๊ฐœ ๋ฌธ์ž์™€ ํŠน์ˆ˜๋ฌธ์ž๋ฅผ ์ •์˜ํ•˜๊ณ  ์˜์–ด๋งŒ ๊ฐ€๋Šฅํ•˜๋‹ค. ์ดํ›„ ํ™•์žฅ ASCII๊ฐ€ ๋“ฑ์žฅํ•˜์—ฌ 0x80 ~0xFF 127๊ฐœ๋ฅผ ์ถ”๊ฐ€๋กœ ์ •์˜ํ•˜์—ฌ ํ”„๋ž‘์Šค์–ด, ๋…์ผ์–ด๋“ฑ ์œ ๋Ÿฝ์–ด ํ‘œํ˜„์ด ๊ฐ€๋Šฅํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค.

์œ ๋‹ˆ์ฝ”๋“œ์˜ ๋“ฑ์žฅ

ASCII๊ฐ€ ์‚ฌ์šฉ๋˜์ž ํ•œ๊ตญ,์ค‘๊ตญ,์ผ๋ณธ๋“ฑ ๋ฌธ์ž๋Š” ASCII ์ฝ”๋“œ๋กœ ์ฒ˜๋ฆฌ๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜์˜€๋‹ค. ๋”ฐ๋ผ์„œ ๋‚˜๋ผ๋ณ„๋กœ ๋…์ž์ ์ธ ๋ฌธ์ž ์ง‘ํ•ฉ(ํ•œ๊ตญ์€ KSC 5061)์„ ๋งŒ๋“ค์–ด๋‚ด์„œ ์‚ฌ์šฉํ•˜๊ธฐ ์‹œ์ž‘ํ–ˆ๋Š”๋ฐ, ์›น์—์„œ๋Š” ๋ฌธ์ž ์ง‘ํ•ฉ์„ ์ง€์›ํ•˜์ง€ ์•Š์œผ๋ฉด ๊ธ€์ž๋ฅผ ๋ณผ ์ˆ˜ ์—†๋Š” ์ƒํ™ฉ์ด ๋ฐœ์ƒํ•˜๊ธฐ ์‹œ์ž‘ํ•˜์˜€๋‹ค.

๋˜ํ•œ ๋ฏธ๊ตญ์˜ ์†Œํ”„ํŠธ์›จ์–ด ํšŒ์‚ฌ๋“ค์ด ์ง€์—ญ๋ณ„๋กœ ๋‹ฌ๋ผ์ง€๋Š” ์ธ์ฝ”๋”ฉ ๋ณ€ํ™˜์„ ๊ด€๋ฆฌํ•˜๋Š”๊ฒƒ์— ๋Œ€ํ•ด์„œ ๋งŽ์€ ๋น„์šฉ์ด ๋ฐœ์ƒํ•˜์˜€๊ณ , ์ด๋กœ ์ธํ•ด ์ธ์ฝ”๋”ฉ์œผ๋กœ ์ธํ•œ ์‹œ์žฅ ๊ณต๋žต์—๋„ ํฐ ์žฅ๋ฒฝ์ด ์ƒ๊ฒผ๋‹ค.

์ด๋Ÿฌํ•œ ๋ฌธ์ œ ๊ทน๋ณต์„ ์œ„ํ•ด ํ†ต์ผ๋œ ๋ฌธ์ž ์ง‘ํ•ฉ์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ๋…ธ๋ ฅ์ด ์‹œ์ž‘๋˜์—ˆ๋‹ค. ํ†ต์ผ๋œ ํ‘œ์ค€ ๋ฌธ์ž ์ง‘ํ•ฉ์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ๋…ธ๋ ฅ์€ ISO(๊ตญ์ œ ํ‘œ์ค€ ๊ธฐ๊ตฌ)์— ์˜ํ•ด ์‹œ์ž‘๋˜์—ˆ๊ณ  ISO 10646 ๊ตญ์ œ ํ‘œ์ค€์ด ๋งŒ๋“ค์–ด์กŒ๋‹ค.

์ด ๊ตญ์ œ ํ‘œ์ค€์€ 4๋ฐ”์ดํŠธ ์ฒด๊ณ„๋กœ ๋ชจ๋“  ๋ฌธ์ž์— 4๋ฐ”์ดํŠธ๋ฅผ ํ• ๋‹นํ•˜์—ฌ ์‚ฌ์šฉํ•˜์ž๋Š” ๊ฒƒ์ด์—ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋ช‡๋ช‡ ๋Œ€ํ˜• ์†Œํ”„ํŠธ์›จ์–ด ํšŒ์‚ฌ์™€์˜ ์˜๊ฒฌ์ฐจ๊ฐ€ ๋ฐœ์ƒํ•˜์ž ํšŒ์‚ฌ๋“ค์ด ์—ฐํ•ฉํ•˜์—ฌ Unicode ์ปจ์†Œ์‹œ์—„์„ ์กฐ์งํ•œ๋‹ค. ๋‹น์‹œ Unicode ์ปจ์†Œ์‹œ์—„์—์„œ ์ œ์•ˆํ•œ ๋ฐฉ์‹์€ 2๋ฐ”์ดํŠธ ์ฒด๊ณ„์˜€๋‹ค.

Unicode ์ปจ์†Œ์‹œ์—„์€ ISO ์—๊ฒŒ ์ž์‹ ๋“ค์˜ ํ‘œ์ค€์„ ์‚ฌ์šฉํ•  ๊ฒƒ์„ ์ œ์•ˆํ•˜๊ณ , ์ด ์ œ์•ˆ์ด ์ผ๋ถ€ ๋ฐ›์•„๋“ค์—ฌ์ง€๋ฉด์„œ 1993 ๋…„ ISO 10646-1 ์ด๋ผ๋Š” ๊ตญ์ œ ํ‘œ์ค€์ด ๋ฐœํ‘œ๋˜์—ˆ๋‹ค.

ISO 10646 ํ‘œ์ค€์˜ ์ •์‹ ์ด๋ฆ„์€ ๊ตญ์ œ ๋ฌธ์ž ์ง‘ํ•ฉ(UCS, Universal Character Set) ์ด๋‹ค. 1991๋…„๋ถ€ํ„ฐ, ์œ ๋‹ˆ์ฝ”๋“œ ์ปจ์†Œ์‹œ์—„์—์„œ๋Š” ์œ ๋‹ˆ์ฝ”๋“œ ํ‘œ์ค€๊ณผ ISO/IEC 10646์„ ๋ฐœ์ „์‹œํ‚ค๊ธฐ ์œ„ํ•ด ISO์™€ ๊ณต๋™ ์ž‘์—…์„ ํ•˜๊ณ ์žˆ๋‹ค.

์œ ๋‹ˆ์ฝ”๋“œ๋ž€

์œ ๋‹ˆ์ฝ”๋“œ(Unicode)๋Š” ์ „ ์„ธ๊ณ„์˜ ๋ชจ๋“  ๋ฌธ์ž๋ฅผ ์ปดํ“จํ„ฐ์—์„œ ์ผ๊ด€๋˜๊ฒŒ ํ‘œํ˜„ํ•˜๊ณ  ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋œ ์‚ฐ์—… ํ‘œ์ค€์ด๋‹ค. ์ด ํ‘œ์ค€์—๋Š” ISO 10646 ๋ฌธ์ž ์ง‘ํ•ฉ, ๋ฌธ์ž ์ธ์ฝ”๋”ฉ, ๋ฌธ์ž ์ •๋ณด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค, ๋ฌธ์ž๋“ค์„ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋“ฑ์„ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค.

UTF-8

์œ ๋‹ˆ์ฝ”๋“œ์˜ ์ข…๋ฅ˜๋Š” UCS-2์™€ UCS-4, ๋ณ€ํ™˜ ์ธ์ฝ”๋”ฉ ํ˜•์‹(UTF, UCS Transformation Format)์ธ UTF-7, UTF-8, UTF-16, UTF-32๊ณผ ๊ฐ™์ด ๋‹ค์–‘ํ•˜๊ฒŒ ์กด์žฌํ•˜์ง€๋งŒ ASCII์™€ ํ˜ธํ™˜์ด ๊ฐ€๋Šฅํ•˜๋ฉด์„œ ์œ ๋‹ˆ์ฝ”๋“œ๋ฅผ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” UTF-8 ์ธ์ฝ”๋”ฉ์ด ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉํ•œ๋‹ค.

UTF-8์€ ์œ ๋‹ˆ์ฝ”๋“œ๋ฅผ ์œ„ํ•œ ๊ฐ€๋ณ€ ๊ธธ์ด ๋ฌธ์ž ์ธ์ฝ”๋”ฉ ๋ฐฉ์‹ ์ค‘ ํ•˜๋‚˜๋กœ, ์ผ„ ํ†ฐํ”„์Šจ๊ณผ ๋กญ ํŒŒ์ดํฌ๊ฐ€ ๋งŒ๋“ค์—ˆ๋‹ค. UTF-8์€ Universal Coded Character Set + Transformation Format โ€“ 8-bit ์˜ ์•ฝ์ž์ด๋‹ค.

UTF-8 ์ธ์ฝ”๋”ฉ์€ ์œ ๋‹ˆ์ฝ”๋“œ ํ•œ ๋ฌธ์ž๋ฅผ ๋‚˜ํƒ€๋‚ด๊ธฐ ์œ„ํ•ด 1Byte์—์„œ 4Byte๊นŒ์ง€๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

  • ํ•œByte๋Š” ASCII ์— ๋ช…์‹œ๋œ ์ผ€๋ฆญํ„ฐ 128๊ฐœ ํ‘œํ˜„์ด ๊ฐ€๋Šฅํ•˜๋‹ค. (์‰ฝ๊ฒŒ ์•ŒํŒŒ๋ฒณ)
  • ๋‘Byte๋Š” ๊ทธ๋ฆฌ์Šค์•„, ํžˆ๋ธŒ๋ฆฌ์–ด, ๋“ฑ๋“ฑ ๋ผํ‹ด๊ณ„ํ†ต ์–ธ์–ด๋‚˜ ๋ช‡๊ฐ€์ง€ ๊ณ„ํ†ต ์–ธ์–ด ํ‘œํ˜„์ด ๊ฐ€๋Šฅํ•˜๋‹ค.
  • ์„ธByte๋Š” ์ผ๋ฐ˜์ ์ธ ๋‹ค๊ตญ์–ด ํ‘œํ˜„์ด ๊ฐ€๋Šฅํ•˜๋‹ค. (์ฆ‰ ํ•œ๊ธ€์˜ ๋ฒ”์ฃผ๋‹ค!)
  • ๋„คByte๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์ž˜ ์•ˆ์“ฐ์ด๋Š” ๊ธ€๋“ค์˜ ํ‘œํ˜„์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

์ถœ์ฒ˜: http://kkckc.tistory.com/55 [kkckc์˜ ์ผ์ƒ]

UTF-8 ์ธ์ฝ”๋”ฉ

UTF-8์—์„œ ํ•œ๊ธ€์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” 3Byte๊ฐ€ ํ•„์š”ํ•˜๋‹ค. ์™œ 3Byte๊ฐ€ ํ•„์š”ํ• ๊นŒ?

์œ ๋‹ˆ์ฝ”๋“œ ํ…Œ์ด๋ธ”์—์„œ ํ•œ๊ธ€์˜ ์ฝ”๋“œ ๋ฒ”์œ„๋Š” U+0800 ~ U+FFFF ์ด๋‹ค.
์œ„ ์ฝ”๋“œ ๋ฒ”์œ„๋ฅผ UTF-8 ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•œ ํ‘œํ˜„์€ 1110xxxx 10xxxxxx 10xxxxxx์ด๋‹ค.
ํ•œ๊ธ€ '๊ฐ€'๊ฐ€ ๋งคํ•‘๋œ ์œ ๋‹ˆ์ฝ”๋“œ ํ…Œ์ด๋ธ”์˜ ๊ฐ’์€ U+AC00์ด๋‹ค. (U+0800 ~ U+FFFF ๋ฒ”์œ„์— ์†ํ•ด์žˆ๋‹ค)
(https://unicode-table.com/en/#AC00)

์œ ๋‹ˆ์ฝ”๋“œ ํ…Œ์ด๋ธ” ๊ฐ’ : 0xAC00
10์ง„์ˆ˜ ๋ณ€ํ™˜ : 44,032
2์ง„์ˆ˜ ๋ณ€ํ™˜ : 10101100 00000000

์ด๋ ‡๊ฒŒ ๋ณ€ํ™˜ํ•œ 2์ง„์ˆ˜๋ฅผ ์ฝ”๋“œํฌ์ธํŠธ(1110xxxx 10xxxxxx 10xxxxxx)์— ์ง‘์–ด ๋„ฃ์œผ๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

1110{1010} 10{1100}{00} 10{000000}
234            176           128
EA             B0            80

๊ฒฐ๊ณผ์ ์œผ๋กœ ํ•œ๊ธ€์€ 3๋ฐ”์ดํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ์•Œ์ˆ˜ ์žˆ๋‹ค.

์‹ค์Šต : https://goo.gl/GWjlch


์ฐธ๊ณ 

Clone this wiki locally