# HTML ENTITIES

> Python 3.4 added another way to convert to and from Unicode but using HTML
character entities.

 This may be easier to use than looking up Unicode names, especially if you’re working on the web:

In [2]:
import html 

html.unescape("&egrave;")

'è'

>This conversion also works with numbered entities, decimal or hex:

In [7]:
import html
print(html.unescape("&#233;"))
print(html.unescape("&#xe0;"))
print(html.unescape("&#xe1;"))
print(html.unescape("&#xe2;"))
print(html.unescape("&#xe3;"))
print(html.unescape("&#xe4;"))
print(html.unescape("&#xe5;"))
print(html.unescape("&#xe6;"))
print(html.unescape("&#xe7;"))
print(html.unescape("&#xe8;"))
print(html.unescape("&#xe9;"))
print(html.unescape("&#666;"))

é
à
á
â
ã
ä
å
æ
ç
è
é
ʚ


>You can even import the named entity translations as a dictionary and do the conver‐
sion yourself. Drop the initial '&' for the dictionary key (you can also drop the final
';', but it seems to work either way):


In [9]:
from html.entities import html5
print(html5["egrave"])
print(html5["egrave;"])

è
è


>To go the other direction (from a single Python Unicode character to an HTML
entity name), first get the decimal value of the character with ord():


In [11]:
import html
char = '\u00e9'
dec_value = ord(char) # ord() => Return the Unicode code point for a one-character string.
html.entities.codepoint2name[dec_value]


'eacute'

>For Unicode strings with more than one character, use this two-step conversion:

In [12]:
place = 'caf\u00e9'
byte_value = place.encode('ascii', 'xmlcharrefreplace')
print(byte_value)
print(byte_value.decode())

b'caf&#233;'
caf&#233;


The expression place.encode('ascii', 'xmlcharrefreplace') returned ASCII
characters but as type bytes (because it encoded). The following
byte_value.decode() is needed to convert byte_value to an HTML-compatible
string.
