You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, Sefaria's API seems to send Hebrew text in Unicode-escaped JSON strings. This uses 6 characters per Hebrew character. The following is a single verse in Kohelet:
(And ironically, it turns out to be about King Shlomo advising us to keep our words short 😀)
Is there are reason you aren't simply using Unicode characters in the JSON? Do some clients (that support Unicode) not support Unicode in JSON? That would be very surprising, and more so because the JSON spec actually says it "shall" be encoded in Unicode, UTF-8 by default. That should encode Hebrew in 2 bytes per character instead of 6. UTF16 should be the same.
Trying a few different texts, with commentaries and without, I saw data savings ranging between 27% and 59%. (To easily test this, navigate to an API URL like this one, wait for it to load, and paste this one-liner in to the JS console:)
It's also possible to use a custom encoding to represent Hebrew in 1 byte per character or even less, but that might be out of scope.
At some point I thought I saw Sefaria returning mixed escaped strings and actual Unicode, but I can't find where now.
A related suggestion would be to optionally remove cantillation marks (trop) and/or vowelisation (nikud) from the text before sending. I can post as a separate issue if you like. This is far more feasible if we go with a GraphQL API (GraphQL API #602).
The text was updated successfully, but these errors were encountered:
Currently, Sefaria's API seems to send Hebrew text in Unicode-escaped JSON strings. This uses 6 characters per Hebrew character. The following is a single verse in Kohelet:
(And ironically, it turns out to be about King Shlomo advising us to keep our words short 😀)
Is there are reason you aren't simply using Unicode characters in the JSON? Do some clients (that support Unicode) not support Unicode in JSON? That would be very surprising, and more so because the JSON spec actually says it "shall" be encoded in Unicode, UTF-8 by default. That should encode Hebrew in 2 bytes per character instead of 6. UTF16 should be the same.
Trying a few different texts, with commentaries and without, I saw data savings ranging between 27% and 59%. (To easily test this, navigate to an API URL like this one, wait for it to load, and paste this one-liner in to the JS console:)
Thanks in advance!
Notes
The text was updated successfully, but these errors were encountered: