Skip to content
This repository has been archived by the owner on Jun 7, 2021. It is now read-only.

[TRAFODION-2376] Improve UPDATE STATS performance on varchar columns #1029

Merged
merged 1 commit into from Mar 31, 2017

Conversation

DaveBirdsall
Copy link
Contributor

@DaveBirdsall DaveBirdsall commented Mar 28, 2017

This pull request submits a performance enhancement to the UPDATE STATISTICS utility. This work is the completion of a prototype originally done by Barry Fritchman (@blfritch).

For the moment, the feature is turned off by default. Use CQD USTAT_COMPARE_VARCHARS 'ON' to turn on this enhancement.

What this feature does is compact varchars in memory for the internal sort code path in UPDATE STATISTICS. In the old code, varchars are expanded out to their full length. (Actually, we already truncate them at 256 characters -- the setting of CQD USTAT_MAX_CHAR_COL_LENGTH_IN_BYTES -- giving up some accuracy in UEC computation perhaps but improving performance dramatically for very long varchar columns.) In the new code, we estimate the average length of the column, and allocate space assuming the column still adheres to that average. For columns that already have statistics, we use the average varchar length stored in SB_HISTOGRAMS column V2. For columns that don't, we take a guess that the average is one-half the declared length of the column.

The performance gain from using this feature comes from reducing the number of scans of the table or sample table because more columns can fit in memory in each scan.

Also included in this pull request is a tool, analyzeULOG.py, that can be used to scan ULOGs from UPDATE STATISTICS runs to extract timing data. This is useful for determining where time is spent during UPDATE STATISTICS processing.

@Traf-Jenkins
Copy link

Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/1682/

@Traf-Jenkins
Copy link

Copy link
Contributor

@zellerh zellerh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
Sorry, I don't understand all the details of the code but at a high level and looking at what I do understand it looks good to me.

@asfgit asfgit merged commit 3366fdb into apache:master Mar 31, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants