New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve mobile browser performance #653

Closed
r-tock opened this Issue Jan 19, 2017 · 40 comments

Comments

Projects
None yet
2 participants
@r-tock
Contributor

r-tock commented Jan 19, 2017

protobuf.js version: 6.5.3

Currently on mobile browsers, the performance lags significantly. Opening an issue to aid investigation

Here are live sites using this library. You can use these sites to profile. Heritage loads in 1.1s on MacBook pro while it takes several seconds to load on my Nexus 6

https://theheritage.tocktix.com
https://theaviary.tocktix.com
https://next.tocktix.com
https://alinea.tocktix.com

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Jan 20, 2017

For reference: 5-6 seconds on a OnePlus One on first load, about 2 second on subsequent loads.

Is the data processed through protobuf.js on each page load, even when serving from cache?

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 20, 2017

Yes.

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 20, 2017

There is also server latency of the calendar call, you will have to ignore the rare uncached hits there.

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Jan 20, 2017

One thing that comes to my mind is that the used definitions' code is generated on the fly, once. This might or might not result in an initial, though recurring for every page load, performance hit that could be less heavy with static code.

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Jan 20, 2017

A bit of profiling (red marks calls most likely protobuf.js related):

Would be useful to have source maps, but that'd of course show your business code to everyone.

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 20, 2017

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Jan 20, 2017

Yup, the timeline view reveals that bundle.js, which seems to be a file of 2.17mb minified js (~344kb gzipped), takes relatively long to parse.

The red frame shows raw network download time (for my connection) and where bundle.js is parsed in red. The blue frame marks where I'd expect protobuf.js to actually execute (see the question below that might falsify that):

Does bundle.js include and load all the definitions synchronously, like through root.fromJSON?

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 20, 2017

You know what would be great? To compute transitive closure of the protos we use during build process based on import statements. Thus pruning away excess proto fat. Then static bundle maybe useful. Protobufs were a server side thing so no one really cares about pruning code size. This is a web perf motivated optimization.

Maybe too much effort for not enough benefit

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 20, 2017

Yes it loads and does a resolveAll

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Jan 20, 2017

You know what would be great? To compute transitive closure of the protos we use during build process based on import statements. Thus pruning away excess proto fat.

Maybe some mechanism to determine which reflection objects have actually been used. Not exactly sure if that should be in core, but it could certainly be patched into the prototypes of reflection objects or maybe into generated (en/de)code.

If not that, then there's still Namespace#lookup for a rough estimate. Quirk: Should be injected after resolveAll() and the code must be run at least once beforehand to gain the intel.

If not that, then calling resolve() instead of resolveAll() explicitly on everything actually used should nail it down to a reflection tree of resolved and not-yet-resolved types, with the not-yet-resolved types being viable candidates to exclude.

dcodeIO added a commit that referenced this issue Jan 20, 2017

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Jan 20, 2017

Now we have a starting point, at least. Just because Types returned by debug.unusedTypes (which is merely a proof of concept) are effectively unused, this doesn't imply that they aren't required to resolve something else. Basically, if this information is sufficient for any use case already, the types should be properly removed from any dependents prior to actually excluding them.

Based on that, however, pretty much everything should be interceptable, be it enums actually accessed, counters for fields or what not.

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 20, 2017

Will check. I was doing some basic measurements. using performance.now(), confirms what you saw

On nexus 6 chrome 55 for one of the sites.
The Root.fromJSON takes 900ms and 200ms for resolveAll().

On nexus 6 chrome 56 beta
The Root.fromJSON takes 660ms and 166ms for resolveAll().

Secondly decode of the calendar data takes ~250-300ms on either
toObject of proto to json takes ~20ms on either.

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Jan 20, 2017

One thing that comes to my mind is that the required call to resolveAll() to populate additional properties for runtime compatibility could be relieved by moving this to add/remove (moves some but not all work to fromJSON), effectively resulting in a situation where only actually used reflection objects have to be resolved.

Do you know how much of the durations above are from resolveAll() alone?

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 20, 2017

200 and 166ms (Chrome 55 and Chrome 56 respectively)

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Jan 20, 2017

Hmm, ok, would improve it by a bit maybe, but not by much. Todo: Benchmark on reflection stuff.

Just noticed that the durations have been in there already, whoops.

dcodeIO added a commit that referenced this issue Jan 20, 2017

New: Relieved the requirement to call .resolveAll() on roots in order…
… to populate static code-compatible properties, see #653

@dcodeIO dcodeIO added the enhancement label Jan 20, 2017

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Jan 20, 2017

With this now in place, it should also be possible to use ReflectionObject#resolved as an indicator.

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 22, 2017

Update: Our dependency graph is so huge that the result of pruning would not yield more than 5% of data shaving.

Few more interesting things I discovered.

I ran the protobuf benchmark https://jsperf.com/protobufjs-robin
on Safari 10.0.2 and Safari Technical Preview 21 and Chrome 55.

Note this only performs the encode, decode tests.

On Safari 10.0.2 the encoding is ~427K/s and decoding is 681K/s
Page load (median of 10 runs) is ~800ms

On Safari 10.2.0 (Technical preview 21) the encoding is ~173K/s and decoding is 251K/s
Page load (median of 10 runs) is ~550ms (2x better than chrome)

On Chrome 55 the encoding is ~620K/s and decoding is 782K/s
Page load (median of 10 runs) is ~1000ms

Even though Chrome was better are raw encoding decoding tests, Safari specially the new one improved page load by 2x. I have a few hypothesis, let me know if any of these sound plausible.

  1. I suspect the new improvements in GC in Safari is helping page load quite a lot in general https://webkit.org/blog/7122/introducing-riptide-webkits-retreating-wavefront-concurrent-garbage-collector/

  2. GC is probably reducing the Root.fromJSON() performance for such large bundle as ours

  3. Safari may just be switching from interpreter to JIT compiled mode much more quickly and may be aggressively caching them. Chrome interpreter might be running protobufjs slowly while in crankshaft mode Chrome blows Safari out of the water.

  4. Safari may in general be better at network http performance on a mac.

  5. Benchmarks may need more work, maybe run then on node with crankshaft disabled and capture the performance with larger protobuf bundle, maybe useful to write a protobuf generator which fills fields with random data, that way we can point to any proto bundle and let the benchmark use those protos.

  6. Maybe figure out a better test setup for node on ARM CPU.

JS Performance is hard.

dcodeIO added a commit that referenced this issue Jan 22, 2017

Breaking: Refactored util.extend away; Other: Reflection performance …
…pass, see #653; Other: Added TS definitions to alternative builds' index files; Other: Removed unnecessary prototype aliases, improves gzip ratio
@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Jan 22, 2017

This performance pass includes quite a lot of micro-optimizations like inlining Array#forEach and similar, using cached vars instead of getters where possible, and so on. Slightly reduces the number of total ticks in a simple test case.

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 23, 2017

that gave a small boost for Chrome (~5%) and big boost in performance for safari technical preview 21 encoding ~554K/s and decoding is 683K/s

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Jan 23, 2017

Interesting that this affected encoding/decoding speed at all as there haven't been any changes to generated code. Was meant for parsing / setup speed.

One more thing that could be done is disabling any assertions when loading pre-asserted modules through .fromJSON, but I haven't looked through the potential of such an optimization yet.

Do you have updated timings on page load speed (especially regarding "Evaluate Script (bunde.js:1)" above)?

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 23, 2017

I dont have it yet. Locally whatever measurements I make will not like production with production data and I need a protobufjs release to take it all the way till production :|

dcodeIO added a commit that referenced this issue Jan 23, 2017

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Jan 23, 2017

Hmm, there are still some possible tweaks to investigate. Could really need a rather large real world JSON definition for local profiling.

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 23, 2017

@dcodeIO I can send you our large definition via email if you are willing to sign an NDA? and that you never push it any public repo.

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 23, 2017

Actually I realized that is kind of stupid thing, our bundles are in JS anyways. Let me send over the protos

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Jan 23, 2017

So, are you using static code in your bundles / not a JSON module?

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 23, 2017

we are using a JSON file, importing that then calling resolveAll() before re-exporting. I have emailed you the files.

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Jan 23, 2017

I see, then I am still optimizing at the right spots. Currently using tests/test.proto for fromJSON profiling, that's already a lot better than bench.proto.

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 23, 2017

import { Root } from 'protobufjs';
const root = Root.fromJSON(require('core/dist/core.json'));

const protos = {
  ...root.resolveAll().lookup('tocktix.messages')
};

module.exports = protos;

@dcodeIO If we can somehow find a middle ground between fully expanded static classes (which is too big 5MB for these files) and a file containing a bunch of exports for each type that is resolved on import. Webpack is smart enough to resolve them based on demand. Just throwing it out there.

dcodeIO added a commit that referenced this issue Jan 24, 2017

dcodeIO added a commit that referenced this issue Jan 24, 2017

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Jan 24, 2017

If we can somehow find a middle ground between fully expanded static classes (which is too big 5MB for these files) and a file containing a bunch of exports for each type that is resolved on import. Webpack is smart enough to resolve them based on demand. Just throwing it out there.

Hmm, ideally there'd be a way to split messages, services, enums etc. over multiple files to make this possible. As a starting point, the last commit introduces that the root name specified through pbjs's --root parameter is reused for static code (json modules still todo), so that multiple files can define their types on the same shared root instance more easily.

dcodeIO added a commit that referenced this issue Jan 24, 2017

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Jan 24, 2017

With this now in place it still looks like a lot of work to implement proper splitting. A good approach could be to build some sort of abstract bundler for the CLI that takes operations to perform like defining sub-namespaces, setting options, adding (sub-)types and importing types where required - and then reuse that, cutting through the operations, to generate actual JSON and static modules.

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 24, 2017

@dcodeIO can you cut a release. I will try upgrading and report back improvements

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 24, 2017

On nexus 6 chrome 56 beta
The Root.fromJSON takes ~550ms and resolveAll() takes ~120ms But there is huge variance specially when measuring with remote debugging.

dcodeIO added a commit that referenced this issue Jan 24, 2017

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Jan 24, 2017

Just finished (the last) minor improvements, on npm now as 6.6.0.

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 24, 2017

Latest optimized build with latest release of protobufjs

On nexus 6 chrome 56 beta
The Root.fromJSON takes ~318ms and resolveAll() takes ~87ms. And it's much more consistent in performance.

Btw, on a iPhone 7 (tested today on a friends phone), the load time is ridiculously fast, almost blink speed I guess its all due to hardware. I will try to get more scientific measurements tomorrow.

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 24, 2017

These will be live tomorrow, I will update the thread when the release happens. The code is in QA right now.

@r-tock

This comment has been minimized.

Contributor

r-tock commented Jan 26, 2017

New release is live. The same links will now be serving the latest protobufjs library

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Jan 26, 2017

Timeline alone looks pretty much identical to before, hmm.

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Feb 4, 2017

Maybe of interest: pbjs now has an experimental --sparse option that excludes anything not directly or indirectly referenced from within the main .proto files (those directly specified on the command line).

This is used to generate JSON for some new google/api datatypes. internals

@r-tock

This comment has been minimized.

Contributor

r-tock commented Feb 4, 2017

Update:
There is a way to do CPU throttling in the timeline view to replicate low powered smart phones. It looks like the majority time is not fromJSON when I do that.

@dcodeIO

This comment has been minimized.

Owner

dcodeIO commented Mar 22, 2017

Closing this issue for now as it hasn't received any replies recently (and seems to be solved). Feel free to reopen it if necessary!

@dcodeIO dcodeIO closed this Mar 22, 2017

dcodeIO added a commit that referenced this issue Apr 23, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment