Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trim output size & complexity #34

Merged
merged 29 commits into from Feb 19, 2014
Merged

Conversation

eemeli
Copy link
Member

@eemeli eemeli commented Feb 7, 2014

With these changes, any real-world sized localization is likely to shrink by a huge margin. With the sort-of realistic test case I've been using, the output is now 70% smaller than with version 0.1.7. The included examples stay at roughly their current size, as they are rather minimal cases.

The main reason this is possible is by including a small set of utility functions in the generated code, which then allows for removing duplicate code, making some of the inner function calls into strings, inlining all the generated local variables, and some other minor changes.

I've also minified the output slightly (including all the simpler locales), but the output should still be at least as readable as it was before. Programmatically minifying it can probably shave another 10-20% from it, but at the cost of legibility.

The utility functions are included in two places; a new file lib/messageformat.include.js and near the top of lib/messageformat.dev.js. They differ only in amounts of comments and white space, to make the latter easier to read.

I had to change a couple of the test cases slightly, as they were referring to code that now gets called from a different function. All tests should be valid for all commits.

@eemeli
Copy link
Member Author

eemeli commented Feb 8, 2014

I realised all of the internal function calls should be replaced by string concatenation, so I did that.

Also, I uploaded the real-world localization data I'm working with to a gist, along with the resulting i18n.js file.

@eemeli
Copy link
Member Author

eemeli commented Feb 9, 2014

Rather than repeat window.i18n["NM"] for each line of output, we can use the shorter "key": value notation.

I'm working on a few further changes on top of the ones I've included here, adding an i18n.get() function in 6656f61, baa3131 and 83f0931 that can take a variable number of parameters and adding i18n.fill() in d893c4b for processing variables defined in HTML.

In that branch, I've also allowed with 83f0931 the top-level entries in i18n["NM"] to be strings rather than functions when there's no d access. This required a bit of fiddling with MessageFormat.compile() to make it wrap such strings as functions so that the tests still pass, and makes it necessary to use a wrapper like i18n.get() to access the data.

With all of that, I'm currently seeing JavaScript output that adds about 30% to the size of the source JSON, while v0.1.7's output adds about 300%.

@bleadof
Copy link

bleadof commented Feb 9, 2014

👍 Awesome @eemeli !

@SlexAxton
Copy link
Member

Looking through all these. Look great from the bird's eye view.

One quick question:

MessageFormat.locale.en is fine, but do we handle MessageFormat.locale.en-CA and/or MessageFormat.locale.in (javascript keyword) specially?

MessageFormat.locale["in"] = function(n) {
return 'other';
};
MessageFormat.locale.in=function(n){return "other"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this will likely need to move back to ["in"] for compat reasons.

@SlexAxton
Copy link
Member

I think I'm good with this. I'm gonna pull it an change that one spot myself, so I don't forget about it.

SlexAxton added a commit that referenced this pull request Feb 19, 2014
Trim output size & complexity
@SlexAxton SlexAxton merged commit d6b90a5 into messageformat:master Feb 19, 2014
@iammerrick
Copy link
Contributor

I think this could potentially be the cause of issue #38

@eemeli eemeli deleted the add-util branch February 23, 2014 09:09
return d[k];
}
// plural
p=function(d,k,o,l,p){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @eemeli - So I hadn't realized before that these were global functions. Can we at least namespace them on MessageFormat? I know it's longer, but after gzip it should only cost 1 byte per repeat. Feels very odd leaking single letter global variables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drat, you're right. I'd been mainly looking at the include code, and had missed that those functions' versions in lib/messageformat.dev.js were missing a var prefix. So they end up as globals, when they really shouldn't and needn't. The compiled code is fine, though, there the single-letter functions only exist within the wrapping anonymous function. Do you want me to fix this and do a pull request, or will you do it yourself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when we actually compile a message and it references p - wouldn't that
break if these weren't global?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, since they're valid within the scope of the wrapping function. When used directly, compile() returns a function that's coming from a scope where p & co. are valid; when precompiled into output JavaScript, the functions are included from lib/messageformat.include.js.

The only way it breaks is if you use an external wrapper to the library that calls precompile(), and tries to parse the results without the utility functions. But if you're doing that, it should break really easily and obviously, and hence be fixable.

Actually, in the precompiled output, the only global we end up with is i18n, even MessageFormat is scoped only within the wrapping function. As its only member is the locale array, we could consider getting rid of that as well, and use something like var locale = { 'en': ... }; instead, but those are currently hard-coded into locale/*.js...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yea. That's how I use it, I just hadn't upgraded my old stuff to use the new smaller hotness. I'm writing some bindings for handlebars to open source and I probably want to make it to where there's only a single global. I definitely don't want to restrict things to only work with the command line tools, etc.

We do only end up with the global i18n when we do the precompile, but it still tries to call several of the functions that no longer exist. So that's probably not ideal.

If you know of an easy fix, I'd love a pull request. Otherwise I can get into how some of this new stuff works tomorrow or the next day.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I've no experience with Handlebars, so I don't really know how the extension needs to be structured to make sure that the utility functions are available in the proper scope.

If the issue is with the Handlebars-precompiled output, can't you use build() from bin/messageformat.js? That should get you working output, with a single definable global, and the utility functions as local variables within that. Or is the problem something else entirely?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea. The handlebars part was an unneeded detail.

I think compiled functions should work by running them (even after saving
them as strings into files and losing scope, etc) with only a MessageFormat
runtime as a single global (or module). It shouldn't require a packaged
build in that manner. It trades a tiny bit of space for a lot of output
flexibility. The new version I think breaks all current integrations I've
done because of the way I precompile (I've never actually used the command
line tool outside of testing it). So I'd be excited to fix it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to make the utility functions public, they'll need to be defined as window.i18n.p = ... in lib/messageformat.include.js as MessageFormat is not a global in the precompiled code. You'll also therefore need to know about i18n in lib/messageformat.dev.js, which is currently not a requirement. Or you'll need to switch from the current configurable namespace that defaults to window.i18n to a fixed MessageFormat global namespace.

I wouldn't do any of that, tbh. Unless you're planning on inlining everything back in to the output of precompile(), you'll still need a custom function for storing functions as strings in order to accommodate for different locales, and if you have that (build() in bin/messageformat.js), why not keep it as is, which already enables the use of the single-letter local functions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spose I think it should pretty much just work like it did before. Any code that wasn't explicitly in the precompiled function called a MessageFormat property function. If we need to change the name of a local variable inside the precompiled function that's fine.

I think it's pretty weird to have references to single-letter functions and stuff. I also think it's weird to create a separate namespace for them. They essentially make up the 'messageformat runtime' vs. the 'messageformat full parser and compiler'.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I fixed the issues with #45. Just making the utility functions local to the wrapping function in lib/messageformat.dev.js wasn't enough since new Function() doesn't end up in its calling scope, as I'd assumed. Hence the solution's a bit different.

The output of compile() is now independent of all external factors, as the utility functions are explicitly included in its scope. Calling toString() on its results isn't quite enough to create portable code, though; you need to include the output of MessageFormat.LocalFunctions() as well. Technically that could be fixable, but I'd recommend not doing so, as it would encourage duplicating code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants