accounts_loading: use load account with cache for filtering #1408

HaoranYi · 2024-05-17T15:38:58Z

Problem

account_matches_owner doesn't put missed account into read-only cache, which can cause one additional read-only cache misse in subsequent transaction accounts loading.

Summary of Changes

Replace account_matches_owner with regular accounts load.

Fixes #

codecov-commenter · 2024-05-17T17:12:46Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.7%. Comparing base (20ee70c) to head (64cef8c).
Report is 8 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##           master    #1408     +/-   ##
=========================================
- Coverage    82.7%    82.7%   -0.1%     
=========================================
  Files         871      871             
  Lines      369894   369992     +98     
=========================================
+ Hits       306158   306226     +68     
- Misses      63736    63766     +30

HaoranYi · 2024-05-17T17:15:05Z

blue: load_account put into cache
red: accounts_match_owner

blue has better load_time and less cache misses.

jeffwashington · 2024-05-17T18:17:17Z

another impl here would be to modify account_matches_owners to check owner using the read cache, but if it is not in the read cache, to look on disk as it does today. This change does cause us to put the entry IN the read cache. It also returns an arc to the data, which isn't necessary since we're just checking owners (This is what getting an AccountSharedData does.). Do we want the account in the read cache? If so, then it is cheap to get AccountSharedData to just check owner. If it we don't want it in the read cache, then it is unnecessarily expensive to add to the read cache just to check the owner (if it is not in the read cache already). Do we have metrics on how often account_matches_owners is called for entries IN the write or read caches?

Lichtso · 2024-05-17T18:21:37Z

This change does cause us to put the entry IN the read cache.
Do we want the account in the read cache?

That is intentional, we wanted to runt the counter experiment of #1157 (comment).

It also returns an arc to the data, which isn't necessary since we're just checking owners

That is acting as a pre-fetch for the subsequent actual load in transaction account loading.

jeffwashington

lgtm
Sounds like the side effect of putting these accounts in the read cache (if they aren't already in the write cache) is desired.

brooksprumo · 2024-05-23T13:05:01Z

account_matches_owner doesn't leverage read_only cache and always goes to the underlying storage, which is inefficient.

When I look at the impl for accounts_matches_owner(), it looks to me that we do check the read cache:

agave/accounts-db/src/accounts_db.rs

Lines 5085 to 5120 in 9dd01f5

    
           pub fn account_matches_owners( 
        
               &self, 
        
               ancestors: &Ancestors, 
        
               account: &Pubkey, 
        
               owners: &[Pubkey], 
        
           ) -> Result<usize, MatchAccountOwnerError> { 
        
               let (slot, storage_location, _maybe_account_accesor) = self 
        
                   .read_index_for_accessor_or_load_slow(ancestors, account, None, false) 
        
                   .ok_or(MatchAccountOwnerError::UnableToLoad)?; 
        
               if !storage_location.is_cached() { 
        
                   let result = self.read_only_accounts_cache.load(*account, slot); 
        
                   if let Some(account) = result { 
        
                       return if account.is_zero_lamport() { 
        
                           Err(MatchAccountOwnerError::NoMatch) 
        
                       } else { 
        
                           owners 
        
                               .iter() 
        
                               .position(|entry| account.owner() == entry) 
        
                               .ok_or(MatchAccountOwnerError::NoMatch) 
        
                       }; 
        
                   } 
        
               } 
        
               let (account_accessor, _slot) = self 
        
                   .retry_to_get_account_accessor( 
        
                       slot, 
        
                       storage_location, 
        
                       ancestors, 
        
                       account, 
        
                       None, 
        
                       LoadHint::Unspecified, 
        
                   ) 
        
                   .ok_or(MatchAccountOwnerError::UnableToLoad)?; 
        
               account_accessor.account_matches_owners(owners) 
        
           }

but we do not put the loaded account into the read cache if it's not there. Is the problem this PR is solving actually to put the account into the read cache?

Lichtso · 2024-05-23T13:37:22Z

Is the problem this PR is solving actually to put the account into the read cache?

Yes, that is the idea. Though, I suspect it will make little difference because in the subsequent transaction account loading we put programs in the read cache, so in the next TX batch they are already in.

brooksprumo · 2024-05-23T13:59:02Z

Though, I suspect it will make little difference because in the subsequent transaction account loading we put programs in the read cache, so in the next TX batch they are already in.

I see. Is this PR still the right direction then? The results from @HaoranYi indicate an improvement. Is that still accurate?

HaoranYi · 2024-05-23T14:06:54Z

The profiler from @alessandrod shows that account_match_owner causes 2/3 of all the page faults. We had thought that the page faults were due to not using read cache. Apparently, that's not the reason. Maybe the page faults are due to accounts_index.
@alessandrod When you have time, can you run your profiler with this PR to see if the page faults are reduced?

HaoranYi · 2024-05-23T14:12:38Z

Is the problem this PR is solving actually to put the account into the read cache?

Yes, that is the idea. Though, I suspect it will make little difference because in the subsequent transaction account loading we put programs in the read cache, so in the next TX batch they are already in.

Putting into the cache after checking owners helps the account loading later in the same TX batch. Instead of two read-cache misses, we only have one miss. That's probably why read cache miss is lower with this PR.

brooksprumo

alessandrod · 2024-05-23T14:32:44Z

@alessandrod When you have time, can you run your profiler with this PR to see if the page faults are reduced?

Yep planning to do this today, please don't merge yet

alessandrod · 2024-05-23T15:28:09Z

As you can see the cursed account_matches_owners is 59% of execution. About 3/4 is AppendVec::account_matches_owners - notice AppendVec - meaning we're hitting disk. The rest is the index.

agave/accounts-db/src/accounts_db.rs

Line 5045 in ec9bd79

if !storage_location.is_cached() {

this will only look inside the read cache, not the write cache. If the account is not in the read cache, it'll hit disk. Since we pass all accounts to account_matches_owners, we're probably hitting disk for anything that is in the write cache.

We should probably just remove this cursed function. Load all the accounts, then filter the programs later once you have the accounts.

HaoranYi · 2024-05-23T16:32:24Z

We should probably just remove this cursed function. Load all the accounts, then filter the programs later once you have the accounts.

That's similar to #1157. The only difference is that we don't put program accounts into read cache in #1157.
Should we update #1157 to put program accounts into read cache too?

HaoranYi · 2024-05-23T17:26:54Z

Performance comparison for a longer time frame 24h
load time and replay time

red: master; blue: this PR.

alessandrod · 2024-05-24T02:30:46Z

We should probably just remove this cursed function. Load all the accounts, then filter the programs later once you have the accounts.

That's similar to #1157. The only difference is that we don't put program accounts into read cache in #1157. Should we update #1157 to put program accounts into read cache too?

Yeah that sounds better than keeping account_matches_owners

use load account with cache for filtering

64cef8c

HaoranYi changed the title ~~use load account with cache for filtering~~ accounts_loading: use load account with cache for filtering May 17, 2024

HaoranYi requested a review from Lichtso May 17, 2024 17:18

HaoranYi marked this pull request as ready for review May 17, 2024 17:18

HaoranYi requested review from alessandrod, brooksprumo and jeffwashington May 17, 2024 17:19

jeffwashington approved these changes May 17, 2024

View reviewed changes

brooksprumo approved these changes May 23, 2024

View reviewed changes

HaoranYi mentioned this pull request Jun 6, 2024

get rid of get_account_meta #1620

Merged

alessandrod mentioned this pull request Jun 17, 2024

runtime/Optimize accounts loading (one pass) #1157

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

accounts_loading: use load account with cache for filtering #1408

accounts_loading: use load account with cache for filtering #1408

HaoranYi commented May 17, 2024 •

edited

Loading

codecov-commenter commented May 17, 2024

HaoranYi commented May 17, 2024

jeffwashington commented May 17, 2024

Lichtso commented May 17, 2024

jeffwashington left a comment

brooksprumo commented May 23, 2024

Lichtso commented May 23, 2024

brooksprumo commented May 23, 2024

HaoranYi commented May 23, 2024 •

edited

Loading

HaoranYi commented May 23, 2024 •

edited

Loading

brooksprumo left a comment

alessandrod commented May 23, 2024

alessandrod commented May 23, 2024

HaoranYi commented May 23, 2024 •

edited

Loading

HaoranYi commented May 23, 2024

alessandrod commented May 24, 2024

accounts_loading: use load account with cache for filtering #1408

Are you sure you want to change the base?

accounts_loading: use load account with cache for filtering #1408

Conversation

HaoranYi commented May 17, 2024 • edited Loading

Problem

Summary of Changes

codecov-commenter commented May 17, 2024

Codecov Report

HaoranYi commented May 17, 2024

jeffwashington commented May 17, 2024

Lichtso commented May 17, 2024

jeffwashington left a comment

Choose a reason for hiding this comment

brooksprumo commented May 23, 2024

Lichtso commented May 23, 2024

brooksprumo commented May 23, 2024

HaoranYi commented May 23, 2024 • edited Loading

HaoranYi commented May 23, 2024 • edited Loading

brooksprumo left a comment

Choose a reason for hiding this comment

alessandrod commented May 23, 2024

alessandrod commented May 23, 2024

HaoranYi commented May 23, 2024 • edited Loading

HaoranYi commented May 23, 2024

alessandrod commented May 24, 2024

HaoranYi commented May 17, 2024 •

edited

Loading

HaoranYi commented May 23, 2024 •

edited

Loading

HaoranYi commented May 23, 2024 •

edited

Loading

HaoranYi commented May 23, 2024 •

edited

Loading