Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Cold start time is slow on Android #4027

Closed
marcaaron opened this issue Jul 14, 2021 · 45 comments
Closed

[Performance] Cold start time is slow on Android #4027

marcaaron opened this issue Jul 14, 2021 · 45 comments
Assignees

Comments

@marcaaron
Copy link
Contributor

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Action Performed:

  1. Log into the app
  2. Kill the app
  3. Start up the app again

Expected Result:

App should take a reasonable amount of time to boot back up

Actual Result:

App takes too long to boot up

Proposals (PLEASE READ)

This issue mainly affects native platforms. We don't have a specific target in mind of "how long it should take" and of course this will vary from device to device. But any proposals to this issue should provide clear evidence that the proposal will reduce boot time / cold start.

Workaround:

N/A

Platform:

iOS
Android

Version Number:
Logs: https://stackoverflow.com/c/expensify/questions/4856
Notes/Photos/Videos: Any additional supporting documentation
Expensify/Expensify Issue URL:

View all open jobs on Upwork

@marcaaron marcaaron added the AutoAssignerTriage Auto assign issues for triage to an available triage team member label Jul 14, 2021
@MelvinBot
Copy link

Triggered auto assignment to @JmillsExpensify (AutoAssignerTriage), see https://stackoverflow.com/c/expensify/questions/4749 for more details.

@MelvinBot MelvinBot removed the AutoAssignerTriage Auto assign issues for triage to an available triage team member label Jul 14, 2021
@marcaaron
Copy link
Contributor Author

@JmillsExpensify curious if you think it'd be valuable to maybe have a goal in mind for this and generally base that on some of our most used apps and how long they take to start? Otherwise, I think going for "general improvements" to boot time can work, but there are probably many factors that can move us closer to where we want to be.

@marcaaron
Copy link
Contributor Author

Some conversation here on RAM bundles as it was suggested by @jsamr that we can use Repack for cross platform lazy loading of modules. This could in theory reduce the app start time. I like this idea, but needs some more investigation :)

@JmillsExpensify
Copy link

@marcaaron Catching up here. I'm with you. I think we should be aiming for comparison with best-in-class. Slack and WhatsApp are the favorites, but I'm sure there are others we could include if needed.

Then separately, posting on Upwork now.

@JmillsExpensify
Copy link

Hi contributors! If you are coming from our Upwork posting, welcome! Make sure to post your proposal in this issue and then someone from our team will approve so we can hire you. Thanks!

@MelvinBot MelvinBot added the Help Wanted Apply this label when an issue is open to proposals by contributors label Jul 15, 2021
@MelvinBot
Copy link

Triggered auto assignment to @NikkiWines (Exported), see https://stackoverflow.com/c/expensify/questions/7972 for more details.

@JmillsExpensify JmillsExpensify added Daily KSv2 Engineering External Added to denote the issue can be worked on by a contributor labels Jul 15, 2021
@rushatgabhane
Copy link
Member

rushatgabhane commented Jul 19, 2021

We don't have a specific target in mind of "how long it should take" and of course this will vary from device to device

I was going through this research paper.
If a screen refreshes within 1 second, then the users perceive it as fast.
image

Our goal for boot time, for a flagship phone should be <1 second.

@marcaaron
Copy link
Contributor Author

1 second seems like a good target to me. Did a few tests on Android, but pretty much tells us what we already know... 😅

App Cold Launch Time
Gmail < 1 second
Cash app < 1 second
WhatsApp < 1 second
Expensify.cash ~ 5 seconds

@marcaaron marcaaron changed the title [Performance] Boot time / Cold start still seems slow on native [Performance] Boot time / Cold start is slow on native Jul 19, 2021
@marcaaron
Copy link
Contributor Author

Ideas for more actionable issues we can breakout...

  1. Start benchmarking the cold start time so that we have a better idea of how long it takes (some ideas here)] TL;DR use Firebase.
  2. Improve code splitting
  3. Audit the amount of JS that is executing when the app first boots.
  4. Maybe defer some of that executing to the native layer?
  5. Seems like Peter's Onyx improvements might also help somewhat which means it is important to get set up with 1

@marcaaron marcaaron self-assigned this Jul 21, 2021
@MelvinBot MelvinBot removed the Overdue label Jul 21, 2021
@marcaaron marcaaron removed Exported External Added to denote the issue can be worked on by a contributor labels Jul 21, 2021
@marcaaron
Copy link
Contributor Author

@NikkiWines @JmillsExpensify Since we haven't gotten much action I'm going to close this Upwork posting so I can start investigating the plan laid out above. I think this is one of the most important things we can look at perf wise.

@marcaaron
Copy link
Contributor Author

Looking into setting up a regular benchmark via Firebase that will record:

  1. Time it takes to boot the app (automatic with Firebase Performance package)
  2. Time it takes from launch til JS to be evaluated - can be set up easily with a native module

https://github.com/Expensify/Expensify/issues/171518

@marcaaron
Copy link
Contributor Author

Firebase Performance is hooked up now (not yet on staging but will be soon and data should start to hit this dashboard)

2021-07-22_09-03-32

Still waiting for an iPhone to test on locally, but did a quick output of pre-main time in simulator for now (probably not super accurate, but still seems like there is no obvious issue there)

Total pre-main time: 376.31 milliseconds (100.0%)
         dylib loading time: 170.80 milliseconds (45.3%)
        rebase/binding time:  54.05 milliseconds (14.3%)
            ObjC setup time:  50.47 milliseconds (13.4%)
           initializer time: 100.97 milliseconds (26.8%)
           slowest intializers :
             libSystem.B.dylib :  10.76 milliseconds (2.8%)
                Expensify.cash : 112.61 milliseconds (29.9%)

I'm not an expert, but this seems to suggest that the app is launching fast, but takes a lot longer to become interactive - which makes sense. Will look to the JS next.

@marcaaron
Copy link
Contributor Author

The going is a bit slow since to properly measure the impacts of a change on start up time we have to create release build locally - the real startup time is hard to measure when in "dev" mode because the JS bundle is served by metro and must be downloaded. Also still working with simulator on iOS.

My investigations are still early here, but chased a few leads...

Use Hermes on iOS

Apparently Hermes is enabled for iOS as of react-native 0.64, but we aren't using it yet.

enabling Hermes will result in improved start-up time, decreased memory usage, and smaller app size

On the one hand, this is a one line change so might be worth a shot. But on the other, there could be some unexpected consequences of switching to a different JS engine. Ultimately, we are using it on Android so probably we should use it on both platforms for consistency.

We are maybe "doing too much" when the app first boots ::waves hands::

Some random observations that might be worth looking into:

  • The sidebar + the report are both potentially expensive components to render - we could delay rendering the report a bit to let the sidebar appear.
  • We kick off a pretty large chain of API requests without any sort of delay when the app inits. All the data fetching is front loaded. That in turn will lead to data being processed by Onyx. Maybe we can defer some of those as well to create some breathing room.

Other stuff / dead ends

  • Looked into setting up the performance plugin for Flipper, but it didn't really provide too much additional information. Maybe with more tracing added the Timeline feature can be a bit more useful for other things - but doesn't help for boot time.

Screen Shot 2021-07-22 at 1 28 15 PM

  • Checked the react native JS bundle with npx react-native-bundle-visualizer but didn't see anything too unusual. The JS bundle size is a little less than 4mb which doesn't seem unreasonable.

Screen Shot 2021-07-22 at 1 33 14 PM
Screen Shot 2021-07-22 at 1 32 50 PM

  • Hooked up react-native-fast-image again for sanity to see if caching avatar images would help. Didn't help at all.

Still have to look into...

@MelvinBot
Copy link

@marcaaron Eep! 4 days overdue now. Issues have feelings too...

@marcaaron
Copy link
Contributor Author

Going to update this issue to focus only on Android. I think at this point iOS is cold booting very fast.

Check this comparison...

iOS (iPhone 8)

iOS.Cold.Start.-.after.improvements.mov

Android (Galaxy S8)

Android.-.Cold.Boot.-.After.Improvements.mp4

Android is faster than it's ever been, but still feels like it hangs a bit compared to other apps on my device.

@MelvinBot MelvinBot removed the Overdue label Aug 3, 2021
@marcaaron marcaaron changed the title [Performance] Boot time / Cold start is slow on native [Performance] Cold start time is slow on Android Aug 3, 2021
@quinthar
Copy link

quinthar commented Aug 3, 2021 via email

@marcaaron
Copy link
Contributor Author

Did a quick experiment today to see if enabling Proguard on Android would help (noticed it was not actually enabled despite the presence of a recently edited config file). It didn't do too much (if anything) so I don't think optimizing the build is going to have much of an effect.

I'm not entirely sure what to look into next - but coming around to "it's an Android storage problem". Ran a bunch more benchmarking with release builds to try to identify if there are any bottlenecks in the code (specifically around Onyx connected components that are blocked while getting data from Onyx). Which seems to suggest that Onyx performance is not equivalent when comparing Android to iOS (times in ms).

Time spent waiting for data via Onyx Android iOS
Expensify.js 263 40
AuthScreens.js 17 2
MainDrawerNavigator.js 77 20
SidebarLinks.js 70 21
SidebarScreen.js 38 13
Total Blocking Time 465 96

It feels like we're just trying to read a bunch of data at once and then end up blocking all these components from rendering until a parent component finishes getting what it needs. The effects of that aren't felt as much on iOS for whatever reason.

@quinthar
Copy link

quinthar commented Aug 4, 2021 via email

@marcaaron
Copy link
Contributor Author

Not yet! It could be the next thing to look into. Unsure how large of an effort that will be, but sounds like we might want something fairly custom based on the Slack conversations.

One idea could be to do a quick test and set up react-native-mmkv in Onyx. I am not suggesting we use it as it sounds like we want to RYO. But, it claims to be about 30x faster than AsyncStorage + uses the JSI instead of the bridge. If that's true, it might be a simple way to gauge whether it's worth investing in a custom storage solution over AsyncStorage. If we hook it up and all our performance issues go away then we'll know we're heading in the right direction.

@quinthar
Copy link

quinthar commented Aug 4, 2021 via email

@marcaaron
Copy link
Contributor Author

This is branching off of this comment and @kidroca's observations that a lot of keys are fetched when the app inits. While we are looking into optimizing storage stuff it still feels like some opportunities for optimization at a higher level is possible...

Stop reading report actions from storage on init

  • We fetch all the report actions from storage when the app inits and probably can come up with a way to avoid that. They are the largest / most complex items we have. And we really only do this as a network optimization to compare the last message in storage with the latest on the report to see if there's anything we need to "catch up" on - but we don't really need this to happen immediately when the app inits. We might not need to do this at all if we "catch up" when navigating to a report - or another option would be to push this work to a less critical time.
    Onyx.connect({
    key: ONYXKEYS.COLLECTION.REPORT_ACTIONS,
    callback: (actions, key) => {

In this video, the left side skips fetching the report actions from storage on init

without.report.actions.mp4

Stop reading reports on init?

The next thing I'd wonder is whether we really need to load all the reports from storage immediately on init e.g. it might be better to cache / store only what we need to build the LHN in a single object vs. aggregating a ton of individual report objects to create the sidebar, main drawer navigator, etc.

This one might be tricky to work out and I'm not sure if it's valuable yet but all the data we need for the LHN comes from the Get&returnValueList=reportSummaryList call. We currently take that data and store it on individual report objects, but maybe that data structure isn't serving us well when the app inits since we need to fetch something like 200+ keys instead of just 1 (Note: Just speculating here - haven't tested this theory yet).

Looking forward, the LHN will also need to be paginated in the future when reports can be in the tens of thousands so perhaps it makes sense to store a reportSummaries key instead of aggregating the many report keys.

@quinthar
Copy link

quinthar commented Aug 5, 2021 via email

@marcaaron
Copy link
Contributor Author

hilarious idea: what if we captured a screenshot of the app upon close, and then just... showed that whole screenshot when you open the app, and while we're sorting out the background? It would give the impression of super instant responsiveness.

LOL, but maybe something like it? I think we can just...

  • load the minimum amount of data required to create the LHN (e.g. only about 8 chats are visible for me on init on Android so why bother loading another 190 that are totally hidden from view?)
  • continue retrieving the rest of the reports after that happens

The report summary list (if we start storing a minimal version of it) should tell us something about the order in which reports need to be shown then we can stagger the loading and rendering of anything not in view.

@quinthar
Copy link

quinthar commented Aug 5, 2021 via email

@kidroca
Copy link
Contributor

kidroca commented Aug 6, 2021

One more thing that can be optimized is the chat loaded under the LHN
On web/desktop you see LHN and you last chat
But on mobile this chat is loaded while you don't see it at all

It can be lazy loaded instead - either load/pre-render it after LHN is done, or load it when it is selected (only on small screens though)

@marcaaron
Copy link
Contributor Author

It can be lazy loaded instead

Recently added some code to load this only after the sidebar is loaded

@MelvinBot
Copy link

@marcaaron Whoops! This issue is 2 days overdue. Let's get this updated quick!

@MelvinBot
Copy link

@marcaaron Eep! 4 days overdue now. Issues have feelings too...

@marcaaron
Copy link
Contributor Author

Still looking into this. Have a draft that reduces the number of reportActions_* keys on init that I hope will help boot time.

@MelvinBot MelvinBot removed the Overdue label Aug 11, 2021
@marcaaron
Copy link
Contributor Author

Grabbed a JS profile on startup for Android and just leaving a breakdown of what is happening in the time between metro loading the bundle and the app displaying the sidebar...

Total time ~ 2.5 - 3 seconds

2021-08-12_09-21-16

Going block by block...

271 ms - renderApplication.. I think this is just react native starting…

211 ms - Looks like Onyx doing something there are some getItem calls - probably this is reading from AsyncStorage

184 ms - About half of this time is spent parsing JSON the other half rendering something (probably the Expensify.js component)

60 ms - SafeAreaProvider renders

74 ms - HTML render engine provider

128 ms - I’m a little unsure what this block of time is doing… invokeCallbackAndReturnFlushedQueue, something with callTimer. Those methods indicate that we are firing a callback in a JS timer (which requires some communication over the bridge - but doesn't look like anything really happens - maybe it's the network request queue doing the tick it does each second)

124 ms - This block looks pretty similar to the previous except there is a JSON parse which makes me think we have a network request.

172 ms - Navigation Container renders in this block

612 ms - Largest block here.. we can see several “sendDataToConnection” calls so we are rendering the MainDrawerNavigator, DrawerView, Header, etc - react navigation stuff.

431 ms - The Sidebar is rendering in this block

216 ms - The sidebar seems to render again for some reason (probably it doesn’t need to)

At this point the initial sidebar is “rendered” to the screen.

Here's the profile:
sampling-profiler-trace4670664525343774149-converted.json.zip


Having a couple thoughts...

  1. We can try to limit the sidebar from rendering again and save that 200ms (not much but it's something)
  2. The react-navigation stuff is taking a pretty long time to init together with the sidebar (about 1 second) so anything we can do to make that faster would be good.

It would be nice to compare this to a profile on iOS and see whether the JS there is taking a similar amount of time. But it seems like profiling Hermes on iOS doesn't work at all.

@marcaaron
Copy link
Contributor Author

I'm going to close out this issue and create a new one so we can get some more eyes on the problem. Now that there is a reliable way to test changes I think this would be better as a rolling job where we accept all proposals that are clear improvements (something like 500ms or more and can be verified by anyone).

@kidroca
Copy link
Contributor

kidroca commented Aug 16, 2021

BTW I've tried removing screens/components to see how much different components affect loading, but I used the Timing:... logs

I get this after a fresh login and then app restart

Timing:expensify.cash.homepage_reports_loaded 5396

I've removed all the navigators and screens besidese LHN and the ReportScreen and the time didn't change much

The time dropped to 2833 only after I've removed the whole ReportScreen

Timing:expensify.cash.homepage_reports_loaded 2833

Even then I've seen the LHN maybe about a second earlier

I'll try to run the same with the new performance metrics and post on the new ticket
This should give an idea on how much is there to be gained

@marcaaron
Copy link
Contributor Author

Not sure if it's relevant but that particular timing log only completes when an API request finishes ->

Timing.end(CONST.TIMING.HOMEPAGE_REPORTS_LOADED);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants