Predictive Optimizing Code Loading
Study of application acceleration by reducing the size of code to be loaded.
1.1 The minimization is based on statistics of what functions are really invoked at runtime, and loading only them when the application is started next time. (The rest of the functions remain available and loaded at first invocation)
1.2 Initially load only the code used immediately at the application startup (renders initial UI, load initial data). Other code could be loaded afterwards in background, or upon some events (user navigates to a certain part of the app).
After publishing the results described below I've been pointed to two other projects based on the same idea:
- From a Microsoft Research researcher in 2008: http://research.microsoft.com/en-us/projects/doloto/
What can we save?
Tested it also with my web application http://testsheet.biz. This application will be used later to experiment with actual code removal. The average JS functions usage ratio was 37%. The main JS file of the application - testsheet.min.js - has usage ratio of 50%. Other files - google libraries loaded dynamically - have lower usage ratio.
Actually removing the unused code
We have created a prototype which removes the unused code based on the function call statistics.
call statistics | V ┌------┐ myapp.min.js -> | POCL | -> myapp.min.a.js | | -> myapp.min.r.js └------┘
The web page includes the "active" script:
Initially, when there is no call statistics, the "active" file includes all the functions. As enough statistics is accumulated, the compilation is repeated and new .a and .r files are generated. Now the .a file only contains the functions seen invoked. Should any other function be called by the application, its stub loads the .r file.
This setup allows to employ POCL even for otherwise static web sites. (TODO: make it possible to include hash codes into file names, for more reliable work with HTTP cashes, and deployment of new versions of myapp.min.js file, new versions of POCL)
It was tested on http://testsheet.biz. The original testsheet.min.js file is generated by Google Closure Compiler. After invoking every app feature in UI, the "active" testsheet.min.a.js file weights 65.89% of testsheet.min.js.
Taking into account that in testsheet.min.js, of all JS functions only 50% are used, and that in many other applications only around 35% of functions are used, we can estimate that for an average application its code size can be reduced to maybe a half of the original size.
Comparing to dead code elimination based on static analysis, POCL results in better minimization even in the simplest implementation (1.1), which loads ALL the code seen to be called at least once. Because there are code paths possible in theory, therefore not "dead" from the static analysis point of view, but which are never called, or rarely called by the application in reality.
This approach can save JS programmers from using and developing libraries which are "only X kilobytes gzipped" - we could use libraries of large size but the app only loads what is necessary (and when necessary, if we implement more intelligent statistics handling).
The difference from the approach currently used in JS: instead of the programmer specifying explicitly what code to load and when, programmer only specifies what code constitutes his "codebase", and runtime system decides itself what code to load and when.
Not only the application code, but also the standard library of the language could be minimized that way.
POCL could be deployed as a tool individually for every web application, as a cloud web acceleration service, or as part of browser implementation. In the last case we could have better minimization if the browser JS engine is adjusted to support POCL natively, thus eliminating the need for some support code injected into the JS sources.
Differentiate between what to download from the internet, and what to load into any particular web page. For popular libraries used by many web pages (e.g. jquery), large part of the library may be downloaded from Internet to the local cache, and only sub-part of that code is loaded into each particular web page. (This assumes we don't require all the application code to be combined into a single file).
When deciding what to load, consider not just what web application it is, but what browser is used; maybe differentiate users into classes (example: occasional user who only invokes minor part of fuctionality versus pro users who use more features).
Privacy: so, we upload code usage statistics to online storage. Does it violate user's privacy? - No. First of all, web servers see each URL accesses by the user; if they were going to spy, the URLs is more than enough, seeing what JS functions are invoked doesn't change much. And most importantly, the POCL statistics is anonymous.
It would be nice to find support for several month of work to continue investigating the POCL concept.
To be continued...