add multibyte support. increased vim.js compiled size from 25MB to 44MB. #25
Conversation
@@ -72,6 +75,38 @@ var LibraryVIM = { | |||
|
|||
dropbox: null, | |||
|
|||
// from https://github.com/s-macke/jor1k/blob/master/js/lib/utf8.js | |||
// LICENSE: https://github.com/s-macke/jor1k/blob/master/LICENSE.md | |||
UnicodeToUTF8Stream: function(key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remembered that there are utf8 related functions in emscripten js libararies. Have you checked out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I saw those, but at least for the one called from Pointer_stringify, there were bugs, so I used the custom one instead. The code looks quite similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of bugs are they? If they are general, you might report them.
On the other hand, we can also use C code in Vim, if that's not hard. Because the code will be generated in ASM.js, which are likely to be faster than manually written JS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I use Pointer_stringify, the line containing UTF-8 characters shows up blank.
It does show up if I move the cursor over the characters one by one.
It might be something about the state of the UTF-8 decoder becoming wrong, but I haven't looked into it deeply yet. I should report a bug to emscripten once I figure out the details.
I could try to call the UTF8 function directly without going through Pointer_stringify. Also, I can try the function for serializing to UTF-8 since I didn't try that one.
Yes, we could also use C code, but is more work for me since this already looks fast and is tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem could be that our string is not null terminated. Now I used the function UTF8ArrayToString instead of Pointer_stringify on a null terminated array and it seems to work fine.
Now I am looking to see if I can make it work with emscripten functions also the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I made it work using emscripten UTF8 functions. The problem was just with null termination.
That's still a few more megabytes. The online version is supposed to be a fast demo, and I don't think this feature is worth it. Although I do think it'll be useful to include this in the NW.js version. |
Can you squash your commits? |
Increased vim.js compiled size from 25MB to 44MB. It is disabled by default. Uses builtin emscripten UTF8 functions.
8a8cb26
to
8840a84
Compare
Ok, I squashed them. I rebased the other two pull requests as well. |
var MAX_UTF8_BYTES = 6; | ||
var chars = new Uint8Array(MAX_UTF8_BYTES + 1); // null-terminated | ||
var charLen = stringToUTF8Array(String.fromCharCode(charCode), chars, 0, MAX_UTF8_BYTES); | ||
if (charLen == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this if clause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look into it today. Maybe it's not necessary. Sorry for taking so long to respond.
Closing this since #45 now includes it. |
Also fix a small bug about EM_DIR in build.sh.