Skip to content

[feat](stats) Support preheating for stats#18460

Closed
Kikyou1997 wants to merge 2 commits intoapache:masterfrom
Kikyou1997:feat/stats_preheating
Closed

[feat](stats) Support preheating for stats#18460
Kikyou1997 wants to merge 2 commits intoapache:masterfrom
Kikyou1997:feat/stats_preheating

Conversation

@Kikyou1997
Copy link
Contributor

@Kikyou1997 Kikyou1997 commented Apr 7, 2023

Proposed changes

Load the most recently used stats as FE booted.

StatsCacheLoader will record the column(tblid-idxid-colname) it loaded to a map. This PR add a shutdown hook to FE so that we could write the records in map to the metadata.

When booting, Stats of columns that recorded previously would be loaded by a single thread.

NOTE

This PR add new operation type of log, so the metadata is not backforward compatitable, once user or dev run their FE on the code of this version they cannot rollback to previous FE version.

Problem summary

Describe your changes.

Checklist(Required)

  • Does it affect the original behavior
  • Has unit tests been added
  • Has document been added or modified
  • Does it need to update dependencies
  • Is this PR support rollback (If NO, please explain WHY)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@Kikyou1997
Copy link
Contributor Author

run buildall

@Kikyou1997 Kikyou1997 force-pushed the feat/stats_preheating branch from 41b233a to 26669ee Compare April 7, 2023 07:26
@englefly
Copy link
Contributor

englefly commented Apr 7, 2023

if there is no history information, can we load some stats on FE startup?

@Kikyou1997
Copy link
Contributor Author

if there is no history information, can we load some stats on FE startup?

No

@morrySnow
Copy link
Contributor

i think just load top k newest column statistics is good enough, log the cache status is not necessary

@morrySnow
Copy link
Contributor

and we should load the statistics info after analyze job success immediately

@Kikyou1997
Copy link
Contributor Author

and we should load the statistics info after analyze job success immediately

This logic already exists currently.

@englefly
Copy link
Contributor

englefly commented Apr 7, 2023

if there is no history information, could you just load k stats?

@Kikyou1997 Kikyou1997 closed this Apr 9, 2023
morrySnow pushed a commit that referenced this pull request Apr 11, 2023
1. Support prefetch some column stats when FE booted, it would load column stats that was got updated recently according to the comment of PR #18460 from @morrySnow  
2. Refactor stats cache, split histogram cache from column stats, so that we could avoid some redundant query for column statistics table,for example, update the histogram or column stats only, in the previous implementation a united cache loader would send query request to both column stats table and histogram table, 
3. Extract some common logic to StatsUtil
4. Remove some useless codes in unit tests, those codes is hard to maintaince and it's not a good idea for testing the accurracy of stats estimation according to the advise from @englefly 
5. Add field type restriction when create analysis tasks to avoid unnecessary failure
gnehil pushed a commit to gnehil/doris that referenced this pull request Apr 21, 2023
1. Support prefetch some column stats when FE booted, it would load column stats that was got updated recently according to the comment of PR apache#18460 from @morrySnow  
2. Refactor stats cache, split histogram cache from column stats, so that we could avoid some redundant query for column statistics table,for example, update the histogram or column stats only, in the previous implementation a united cache loader would send query request to both column stats table and histogram table, 
3. Extract some common logic to StatsUtil
4. Remove some useless codes in unit tests, those codes is hard to maintaince and it's not a good idea for testing the accurracy of stats estimation according to the advise from @englefly 
5. Add field type restriction when create analysis tasks to avoid unnecessary failure
Reminiscent pushed a commit to Reminiscent/doris that referenced this pull request May 15, 2023
1. Support prefetch some column stats when FE booted, it would load column stats that was got updated recently according to the comment of PR apache#18460 from @morrySnow  
2. Refactor stats cache, split histogram cache from column stats, so that we could avoid some redundant query for column statistics table,for example, update the histogram or column stats only, in the previous implementation a united cache loader would send query request to both column stats table and histogram table, 
3. Extract some common logic to StatsUtil
4. Remove some useless codes in unit tests, those codes is hard to maintaince and it's not a good idea for testing the accurracy of stats estimation according to the advise from @englefly 
5. Add field type restriction when create analysis tasks to avoid unnecessary failure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants