[Back to Overview](overview.ipynb)

# UNICODE IN PYTHON

<a data-flickr-embed="true"  href="https://www.flickr.com/photos/kirbyurner/31061849265/in/album-72157660337424600/" title="chessboard_black_orange"><img src="https://farm6.staticflickr.com/5773/31061849265_21895e52c8_z.jpg" width="463" height="533" alt="chessboard_black_orange"></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>

By default, Python sits atop the UTF-8 encoding of the Unicode codepoints database.  However, the default encoding of the operating system platform is also critical and may take precedence.

In [4]:
from IPython.core.display import HTML

def chess():
    pieces = [chr(codepoint) 
        for codepoint in range(int('2654', 16), 
                               int('2660', 16))]
    return pieces
    
HTML("<span style='font-size: 50px'><br/>{}</span>".format(" ".join(chess())))

<a data-flickr-embed="true"  href="https://www.flickr.com/photos/kirbyurner/24979793752/in/album-72157660337424600/" title="unicode_vs_ascii"><img src="https://farm2.staticflickr.com/1674/24979793752_7ef905a515_n.jpg" width="320" height="265" alt="unicode_vs_ascii"></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>

<a data-flickr-embed="true"  href="https://www.flickr.com/photos/kirbyurner/24733599809/in/album-72157660337424600/" title="Unicode Cards"><img src="https://farm2.staticflickr.com/1709/24733599809_15e9cd3a95_n.jpg" width="320" height="232" alt="Unicode Cards"></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>

Studying the UTF-8 specification is a great way to familiarize yourself with a low-level "bits & bytes" format.  One might typically approach this subject by looking at ASCII first, and then seeing how the two relate.

ASCII = American Standard Code for Information Interchange

In [5]:
# 1F0A0—1F0FF
pool     =  [chr(x) for x in range(int('1F0A1', 16), int('1F0FF', 16))]
spades   =  pool[0:11]  + pool[12:14]
hearts   =  pool[16:27] + pool[28:30]
diamonds =  pool[32:43] + pool[44:46]
clubs    =  pool[48:59] + pool[60:62]

red_joker   = chr(int('1F0CF', 16))
white_joker = chr(int('1F0DF', 16))
jokers = [ red_joker, white_joker ]

cards = spades + hearts + diamonds + hearts + jokers

HTML("<span style='font-size: 50px'><br/>{}</span>".format(" ".join(spades)))

In [2]:
from IPython.lib.display import YouTubeVideo
YouTubeVideo('5aJKKgSEUnY')

Other encodings besides ASCII were important as well, however the PC (personal computer) revolution made ASCII come to the forefont, after which Unicode gradually took over, given it's a vastly greater mapping capable of encoding many more symbols.

In [3]:
YouTubeVideo('Z_sl99D2a18')

In [4]:
# %load unicode_fun.py
#!/usr/bin/env python3
"""
Created on Tue Jul 31 11:36:12 2018

@author: Kirby Urner
"""

def emoji():
    for codepoint in range(int('1F600', 16), int('1F620', 16)):
        print(chr(codepoint), end="")
    
def hebrew():
    print([chr(codepoint) 
        for codepoint in range(int('05D0', 16), 
                               int('05DA', 16))]) 
    
def greek():
    for codepoint in range(int('03D0', 16), int('03FF', 16)):
        print(chr(codepoint), end="")
        
def korean():
    for codepoint in range(int('BB00', 16), int('BBAF', 16)):
        print(chr(codepoint), end="")
        
def arabic():
    print([chr(codepoint) 
        for codepoint in range(int('0681', 16), 
                               int('06AF', 16))])

def main():
    print("\nEMOJI")
    emoji()
    print("\n\nHEBREW")
    hebrew()
    print("\n\nGREEK & COPTIC")        
    greek()
    print("\n\nKOREAN")
    korean()
    print("\n\nARABIC")
    arabic()

In [5]:
from unicode_fun import main
main()


EMOJI
😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏😐😑😒😓😔😕😖😗😘😙😚😛😜😝😞😟

HEBREW
['א', 'ב', 'ג', 'ד', 'ה', 'ו', 'ז', 'ח', 'ט', 'י']


GREEK & COPTIC
ϐϑϒϓϔϕϖϗϘϙϚϛϜϝϞϟϠϡϢϣϤϥϦϧϨϩϪϫϬϭϮϯϰϱϲϳϴϵ϶ϷϸϹϺϻϼϽϾ

KOREAN
묀묁묂묃묄묅묆묇묈묉묊묋묌묍묎묏묐묑묒묓묔묕묖묗묘묙묚묛묜묝묞묟묠묡묢묣묤묥묦묧묨묩묪묫묬묭묮묯묰묱묲묳무묵묶묷문묹묺묻물묽묾묿뭀뭁뭂뭃뭄뭅뭆뭇뭈뭉뭊뭋뭌뭍뭎뭏뭐뭑뭒뭓뭔뭕뭖뭗뭘뭙뭚뭛뭜뭝뭞뭟뭠뭡뭢뭣뭤뭥뭦뭧뭨뭩뭪뭫뭬뭭뭮뭯뭰뭱뭲뭳뭴뭵뭶뭷뭸뭹뭺뭻뭼뭽뭾뭿뮀뮁뮂뮃뮄뮅뮆뮇뮈뮉뮊뮋뮌뮍뮎뮏뮐뮑뮒뮓뮔뮕뮖뮗뮘뮙뮚뮛뮜뮝뮞뮟뮠뮡뮢뮣뮤뮥뮦뮧뮨뮩뮪뮫뮬뮭뮮

ARABIC
['ځ', 'ڂ', 'ڃ', 'ڄ', 'څ', 'چ', 'ڇ', 'ڈ', 'ډ', 'ڊ', 'ڋ', 'ڌ', 'ڍ', 'ڎ', 'ڏ', 'ڐ', 'ڑ', 'ڒ', 'ړ', 'ڔ', 'ڕ', 'ږ', 'ڗ', 'ژ', 'ڙ', 'ښ', 'ڛ', 'ڜ', 'ڝ', 'ڞ', 'ڟ', 'ڠ', 'ڡ', 'ڢ', 'ڣ', 'ڤ', 'ڥ', 'ڦ', 'ڧ', 'ڨ', 'ک', 'ڪ', 'ګ', 'ڬ', 'ڭ', 'ڮ']


In [None]:
import emoji_fun  # we'll look at the code in class

Check out this page through the [Jupyter Notebook viewer](http://nbviewer.jupyter.org/github/4dsolutions/SAISOFT/blob/master/Unicode_Fun.ipynb)

[Back to Overview](overview.ipynb)