Skip to content

Commit

Permalink
js: support UTF-8 mujs API with a header-only wrapper
Browse files Browse the repository at this point in the history
The mujs C API expects input strings to be encoded as CESU-8, and
similarly strings from the VM are CESU-8 encoded.

CESU-8 differs from UTF-8 encoding only for codepoints >= U+10000 .

Till now mpv ignored this requirement and as a result emoji and other
non-BMP codepoints were not processed correctly (nothing blew though).

Now such codepoints work and converted correctly in all mujs APIs
to/from CESU-8, including in script source files.

This commit uses a single-header wrapper which replaces all the mujs
CESU-8 string APIs with UTF-8 ones with identical names.

In mpv it's almost entirely fully transparent because mpv currently
doesn't use custom allocator or callbacks (which are not wrapped
automatically).

mpv does use js_Report, but for now we'll live with the rare cases
where reports require conversion to UTF-8 (possibly reported function
names with emoji etc), though it's trivial to make it UTF-8 as well.

The wrapper header - mujsutf8.h - is stand alone and does not depend
on any mpv code or headers.
  • Loading branch information
avih committed May 10, 2020
1 parent 0142dc9 commit 8130668
Show file tree
Hide file tree
Showing 2 changed files with 787 additions and 0 deletions.
1 change: 1 addition & 0 deletions player/javascript.c
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
#include <stdint.h>

#include <mujs.h>
#include "mujsutf8.h"

#include "osdep/io.h"
#include "mpv_talloc.h"
Expand Down

0 comments on commit 8130668

Please sign in to comment.