<a href="http://laf-fabric.readthedocs.org/en/latest/" target="_blank"><img align="left" src="images/laf-fabric-xsmall.png"/></a>
<a href="http://www.godgeleerdheid.vu.nl/etcbc" target="_blank"><img align="left" src="images/VU-ETCBC-xsmall.png"/></a>
<a href="http://www.persistent-identifier.nl/?identifier=urn%3Anbn%3Anl%3Aui%3A13-048i-71" target="_blank"><img align="left"src="images/etcbc4easy-small.png"/></a>
<a href="http://tla.mpi.nl" target="_blank"><img align="right" src="images/TLA-xsmall.png"/></a>
<a href="http://www.dans.knaw.nl" target="_blank"><img align="right" src="images/DANS-xsmall.png"/></a>

# Hebrew fonts in browsers: exploring rendering issues

Hebrew fonts in browsers render characters badly sometimes, especially when portions of text are surrounded by tags.

It turns out, however, that much can be remedied by adding a &amp;nbsp; in the right places.

Here is a visualization of the problem in a nutshell.

<img src="images/fontrendering.png"/>

The green panels show the sequence Yod, Shewa (= the diacritic :), Beth.
The top one in the font SBL Hebrew, the bottom one in the font Ezra SIL.

The purple panels show the same sequence, but interrupted by a &lt;/span&gt;&lt;span&gt; sequence right after the Shewa.

The pale orange panels show the same as the purple, but with an &amp;nbsp; added right after the Shewa but before the
&lt;/span&gt;.

You see the problem with Chrome: the span element disrupts the spacing calculations somehow.
A bit of experimenting made clear that this could be remedied by adding a non-breaking space just before the span end tag.
I have checked this for all possible diacritics in a variety of contexts (see a
[pdf made on 2014-09-29 with Chrome](images/hebtest-chrome.pdf).

A minority of the diacritics does not need adaptation.
Sometimes the adaptation overdoes the white spacing.

Firefox is better behaved, no adaptation is needed.

Before 2014-09-29 the Safari rendering was identical to the Chrome rendering, they are both webkit browsers.
Now the Safari rendering has improved and is like Firefox.

All screenshots have been made on a MacBook Air running OSX 10.9.5 (Mavericks).

In [1]:
import collections
import unicodedata
from IPython.display import clear_output, display, HTML
from laf.fabric import LafFabric
from etcbc.lib import Transcription

fabric = LafFabric()
fabric.load('etcbc4', '--', 'fontrender', {
    "xmlids": {"node": False, "edge": False},
    "features": ('''
    ''',''),
    "primary": False,
})
exec(fabric.localnames.format(var='fabric'))

  0.00s This is LAF-Fabric 4.5.0
API reference: http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html
Feature doc: http://shebanq-doc.readthedocs.org/en/latest/texts/welcome.html

  0.00s LOADING API: please wait ... 
  0.00s INFO: USING DATA COMPILED AT: 2014-07-23T09-31-37
  0.91s LOGFILE=/Users/dirk/SURFdrive/laf-fabric-output/etcbc4/fontrender/__log__fontrender.txt
  0.91s INFO: DATA LOADED FROM SOURCE etcbc4 AND ANNOX -- FOR TASK fontrender AT 2015-06-01T15-18-50


In [2]:
to_be_skipped = {
    'A', 'E', 'I', 'O', 'U', '<', '>', '#', '55', '56', '57', '_',
}
not_to_be_adapted = {
    '&', '.', '.c', '.f', '00', '01', '05', 'O',
}
to_be_adapted = {
    '*', ',', '02', '03', '04', '10', '11', '13', '14',
    '24', '33', '35', '44', '52', '53', '60', '61', '62',
    '63', '64', '65', '70', '71', '72', '73', '74', '75',
    '80', '81', '82', '83', '84', '85', '91', '92', '93',
    '94', '95', 
    ':', ':@', ':A', ':E', ';', '@', 'A', 'E', 'I', 'U',
}

# 02, 03, 04, 10, 13, 24, 84: 
# sbl goes wrong in firefox: eats space in x y plain and adapted

# 14, 44:
# even after adaptation still very tight

# @, A:
# In SBL: heth discards more after-space than he

klegenda = ('adapted', 'spanned', 'plain')
kcolor = (('#ffddbb','#ffeecc'), ('#ffbbbb','#ffcccc'), ('#bbffbb','#ccffcc'))
plegenda = ('x y', 'x-y', 'xy')

In [3]:
def htrans(tr):
    return Transcription.to_hebrew_x(tr)

def replace_condition(charset):
    return '{' + ','.join('0x{:04X}'.format(ord(Transcription.hebrew_mapping[c])) for c in charset) + '}'

def replace_reclass(charset):
    return '[' + ''.join('\\u{:04X}'.format(ord(Transcription.hebrew_mapping[c])) for c in charset) + ']'

print(replace_condition(to_be_adapted))
print(replace_reclass(to_be_adapted))

{0x0596,0x05BD,0x05B3,0x05A5,0x05AB,0x05A0,0x05AD,0x05BD,0x0599,0x05A9,0x059D,0x05B1,0x05B7,0x05B5,0x05B2,0x05A4,0x05A6,0x05BF,0x0593,0x0597,0x05AF,0x05B8,0x0591,0x05A9,0x05BB,0x05A1,0x05B6,0x059A,0x05AC,0x05B4,0x059C,0x0595,0x05BD,0x05C5,0x0598,0x05A8,0x05AA,0x05A8,0x059E,0x05A7,0x05B0,0x05A3,0x0594,0x059F,0x05AE,0x05C4,0x059B,0x05A0}
[\u0596\u05BD\u05B3\u05A5\u05AB\u05A0\u05AD\u05BD\u0599\u05A9\u059D\u05B1\u05B7\u05B5\u05B2\u05A4\u05A6\u05BF\u0593\u0597\u05AF\u05B8\u0591\u05A9\u05BB\u05A1\u05B6\u059A\u05AC\u05B4\u059C\u0595\u05BD\u05C5\u0598\u05A8\u05AA\u05A8\u059E\u05A7\u05B0\u05A3\u0594\u059F\u05AE\u05C4\u059B\u05A0]


In [4]:
hfile = outfile('hebtest.html')
hfile.write('''<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<table rules="all" border="all">
''')

font = dict(
sil='''
font-family: Ezra SIL;
font-size: 20pt;
line-height:28pt;
margin-right:0.5em;
direction:rtl;
unicode-bidi:bidi-override;
text-align: right;
''',
sbl='''
font-family: SBL Hebrew;
font-size: 24pt;
line-height:28pt;
margin-right:0.5em;
direction:rtl;
unicode-bidi:bidi-override;
text-align: right;
''')

cnotadapt = 0
cadapt = 0
cremaining = 0
cskip = 0
cdone = ''
first = True
for x in sorted(Transcription.hebrew_mapping):
    if x in to_be_skipped or x.isalpha():
        cskip +=1
        continue
    if x in {}:
        cskip += 1
        continue
    if x in not_to_be_adapted: 
        cnotadapt +=1
        continue
    if x in to_be_adapted:
        cadapt +=1
        continue
    if not first:
        cremaining += 1
        continue
    data = collections.defaultdict(lambda: collections.defaultdict(lambda: []))
    for cons in ('>', 'H', 'X', '<', 'W', '#'):
        for (p, pat) in enumerate(('{} {}', '{}-{}', '{}{}')):
            heb = pat.format(cons + x, 'B')
            comps = heb.split(' ')
            plain = ''
            spanned = ''
            aspanned = ''
            first = True
            sep = ''
            for comp in comps:
                plain += sep + ''.join(htrans(word) for word in comp.split('-'))
                spanned += sep + ''.join('<span>'.format(word) + htrans(word) + '</span>' for word in comp.split('-'))
                aspanned += sep + ''.join('<span>'.format(word) + htrans(word) + '&nbsp;</span>' for word in comp.split('-'))
                if sep == '': sep = ' '
            if p != 1:
                data[2][p].append((heb, plain))
            data[1][p].append((heb, spanned))
            data[0][p].append((heb, aspanned))
    for k in sorted(data):
        for p in sorted(data[k]):
            for (heb, text) in data[k][p]:
                for (f, fnt) in enumerate(sorted(font)):
                    para = '''
<tr style="font-family: Menlo; font-size: 12pt; background-color: {};">
<td>{}</td><td>{}</td><td>{}</td><td>{}</td><td>{}</td>
<td style="{}">{}</td>
</tr>
        '''.format(
                        kcolor[k][f], 
                        x.replace('&', '&amp;'), klegenda[k], plegenda[p], 
                        heb.replace('&','&amp;').replace('<','&lt;').replace('>','&gt;'), 
                        fnt,
                        font[fnt],
                        text,
        )
                    hfile.write(para)
    cdone = x
    first = False
print('''
Skipped           = {:>3}
To be adapted     = {:>3}
Not to be adapted = {:>3}
Done              = '{}'
Remaining         = {:>3}
'''.format(cskip, cadapt, cnotadapt, cdone, cremaining)
)    

hfile.write('''
</table>
</body>
</html>
''')
hfile.close()


Skipped           =  39
To be adapted     =  44
Not to be adapted =   7
Done              = ''
Remaining         =   0



# Results

For each diacritic we inspect a series of contexts of the form CdB, where C is a Hebrew consonant, B is the Hebrew consonant beth, and d is a diacritic. 

More precisely, we examine the following patterns:

plain (x y resp xy)

    Cd B
    CdB

spanned (x y resp. x-y resp xy)

    <span>Cd</span> <span>B</span>
    <span>Cd</span><span>B</span>
    <span>CdB</span>

adapted (x y resp. x-y resp. xy)

    <span>Cd&nbsp;</span> <span>B</span>
    <span>Cd&nbsp;</span><span>B</span>
    <span>Cd&nbsp;B</span>

If the patterns in the *spanned* category look right, no space insertion is needed.

Here are the results of all diacritics where extra spacing is needed, in a
[pdf made on 2014-09-29 with Chrome](images/hebtest-chrome.pdf).

You see how the patterns behave in the contexts indicated, for two choices of font (SBL Hebrew and Ezra SIL), on a webkit browser (Safari). The results for Chrome are identical. Firefox behaves better, less extra space is needed.
I have not tested anything on Windows.

For the adaptation to work, it is essential that the ``&nbsp;`` occurs inside the span-element.

In [5]:
show_case_init = '''<html><head><style type="text/css">
.casehd {border-top: 4pt solid black; text-align: center; font-size: 24pt; height: 28pt; font-weight: bold; color: #00B060;}
.heb {padding-right: 12pt; text-align: right; font-family: Ezra SIL; font-size: 24pt; height: 32pt;}
.code {font-family: Menlo;}
.name {font-family: Verdana; font-variant: small-caps;}
body {margin-left: 2em; margin-right: 2em; margin-top: 2em; margin-bottom: 2em;}
td {padding: 2pt;}
th {padding: 6pt;}
    </style></head><body>
'''
show_case_table_init = '''<table rules="all" border="all">
<tr><th>ETCBC4 code</th><th>UNICODE</th><th>GLYPH</th><th>NAME</th></tr>
'''
show_case_table_final = '''</table>
'''
show_case_final = '''</body></html>
'''

tfile = outfile('trans.html')
tfile.write(show_case_init)
tfile.write('''
<h2>Transcription table for ETCBC4</h2>
<p>Below is a table that connects the transcription codes for Hebrew letters, vowels, accents, and punctuation with
their unicode point and names.</p>
''')
tfile.write(show_case_table_init)
for (c, u) in sorted(Transcription.hebrew_mapping.items()):
    if c in {'55', '56', '57'}: continue
    un = unicodedata.name(u[0]).replace('HEBREW ','')
    tfile.write('''
<tr><td class="code">{}</td><td class="code">{:04X}</td><td class="heb">&nbsp;{}&nbsp;</td><td class="name">{}</td></tr>
'''.format(c, ord(u[0]), u, un))
tfile.write(show_case_table_final)
tfile.write(show_case_final)
tfile.close()
