[TRAFODION-2376] Improve UPDATE STATS performance on varchar columns #1029

DaveBirdsall · 2017-03-28T17:29:11Z

This pull request submits a performance enhancement to the UPDATE STATISTICS utility. This work is the completion of a prototype originally done by Barry Fritchman (@blfritch).

For the moment, the feature is turned off by default. Use CQD USTAT_COMPARE_VARCHARS 'ON' to turn on this enhancement.

What this feature does is compact varchars in memory for the internal sort code path in UPDATE STATISTICS. In the old code, varchars are expanded out to their full length. (Actually, we already truncate them at 256 characters -- the setting of CQD USTAT_MAX_CHAR_COL_LENGTH_IN_BYTES -- giving up some accuracy in UEC computation perhaps but improving performance dramatically for very long varchar columns.) In the new code, we estimate the average length of the column, and allocate space assuming the column still adheres to that average. For columns that already have statistics, we use the average varchar length stored in SB_HISTOGRAMS column V2. For columns that don't, we take a guess that the average is one-half the declared length of the column.

The performance gain from using this feature comes from reducing the number of scans of the table or sample table because more columns can fit in memory in each scan.

Also included in this pull request is a tool, analyzeULOG.py, that can be used to scan ULOGs from UPDATE STATISTICS runs to extract timing data. This is useful for determining where time is spent during UPDATE STATISTICS processing.

Traf-Jenkins · 2017-03-28T17:33:15Z

Check Test Started: https://jenkins.esgyn.com/job/Check-PR-master/1682/

Traf-Jenkins · 2017-03-28T20:33:26Z

Test Passed. https://jenkins.esgyn.com/job/Check-PR-master/1682/

zellerh

+1
Sorry, I don't understand all the details of the code but at a high level and looking at what I do understand it looks good to me.

[TRAFODION-2376] Improve UPDATE STATS performance on varchar columns

3366fdb

zellerh approved these changes Mar 30, 2017

View reviewed changes

asfgit merged commit 3366fdb into apache:master Mar 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRAFODION-2376] Improve UPDATE STATS performance on varchar columns #1029

[TRAFODION-2376] Improve UPDATE STATS performance on varchar columns #1029

DaveBirdsall commented Mar 28, 2017 •

edited

Traf-Jenkins commented Mar 28, 2017

Traf-Jenkins commented Mar 28, 2017

zellerh left a comment

[TRAFODION-2376] Improve UPDATE STATS performance on varchar columns #1029

[TRAFODION-2376] Improve UPDATE STATS performance on varchar columns #1029

Conversation

DaveBirdsall commented Mar 28, 2017 • edited

Traf-Jenkins commented Mar 28, 2017

Traf-Jenkins commented Mar 28, 2017

zellerh left a comment

Choose a reason for hiding this comment

DaveBirdsall commented Mar 28, 2017 •

edited