Improve performance of STARTING WITH with insensitive collations #7038

asfernandes · 2021-11-04T18:27:08Z

To process STARTING WITH with insensitive collations, it's first necessary to generate canonical bytes of the matching strings.

If the matching string is much greater than the pattern string, a time is wasted generating unneeded canonical bytes.

It's necessary to only generate canonical bytes for the initial substring with the same length of the pattern string.

In my tests with character set WIN1252 collate WIN_PTBR matching strings of length 60 and pattern string with length 1, I see performance improvement of ~30%.

With character set UTF8 collate UNICODE_CI I see performance improvement of ~50% in the same test.

Test:

execute block
as
    declare p varchar(1) character set win1252 collate win_ptbr = 'x';
    declare s varchar(60) character set win1252 collate win_ptbr = 'x12345678901234567890123456789012345678901234567890123456789';
    declare n integer = 0;
    declare b boolean;
begin
    while (n < 1000000)
    do
    begin
        b = s starting with p;
        n = n + 1;
    end
end!

execute block
as
    declare p varchar(1) character set utf8 collate unicode_ci = 'x';
    declare s varchar(60) character set utf8 collate unicode_ci = 'x12345678901234567890123456789012345678901234567890123456789';
    declare n integer = 0;
    declare b boolean;
begin
    while (n < 1000000)
    do
    begin
        b = s starting with p;
        n = n + 1;
    end
end!

The text was updated successfully, but these errors were encountered:

asfernandes · 2021-11-04T19:25:53Z

Updated performance improvement verified in test after changes in the implementation.

…e charsets with insensitive collations.

asfernandes · 2021-11-05T12:32:42Z

After small changes in INTL API description for canonical function and small changes in engine I verified more than 50% improvement with UTF8 and UNICODE_CI in the same test, so I'm changing this issue to also optimize multi-byte character sets.

…tive collations. Now also for MBCS.

pavel-zotov · 2021-11-05T22:10:45Z

Currently implemented only for WINDOWS: package 'psutil' can not be installed on Python 2.7 when it is running on Linux (Debian).

…h insensitive collations.

asfernandes added affect-version: 3.0.7 affect-version: 4.0.0 affect-version: 5.0 Initial labels Nov 4, 2021

asfernandes self-assigned this Nov 4, 2021

asfernandes added type: improvement fix-version: 5.0 Beta 1 labels Nov 4, 2021

asfernandes added a commit that referenced this issue Nov 4, 2021

Improvement #7038 - Improve performance of STARTING WITH of fixed-byt…

551ed99

…e charsets with insensitive collations.

asfernandes closed this as completed Nov 4, 2021

asfernandes changed the title ~~Improve performance of STARTING WITH of fixed-byte charsets with insensitive collations~~ Improve performance of STARTING WITH with insensitive collations Nov 5, 2021

asfernandes added a commit that referenced this issue Nov 5, 2021

Improvement #7038 - Improve performance of STARTING WITH with insensi…

279274f

…tive collations. Now also for MBCS.

pavel-zotov added the qa: done with caveats label Nov 5, 2021

asfernandes added a commit that referenced this issue Jun 23, 2022

Backport improvement #7038 - Improve performance of STARTING WITH wit…

05f0cb2

…h insensitive collations.

asfernandes added the fix-version: 4.0.3 label Jun 23, 2022

asfernandes added fix-version: 4.0.2 and removed fix-version: 4.0.3 labels Jul 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of STARTING WITH with insensitive collations #7038

Improve performance of STARTING WITH with insensitive collations #7038

asfernandes commented Nov 4, 2021 •

edited

asfernandes commented Nov 4, 2021

asfernandes commented Nov 5, 2021

pavel-zotov commented Nov 5, 2021

Improve performance of STARTING WITH with insensitive collations #7038

Improve performance of STARTING WITH with insensitive collations #7038

Comments

asfernandes commented Nov 4, 2021 • edited

asfernandes commented Nov 4, 2021

asfernandes commented Nov 5, 2021

pavel-zotov commented Nov 5, 2021

asfernandes commented Nov 4, 2021 •

edited