Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance enhancement to LIKE without wildcards #3976

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

Performance enhancement to LIKE without wildcards #3976

monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2016-04-06 17:47:03 +0200
From: Richard Hughes <<richard.monetdb>>
To: MonetDB5 devs <>
Version: -- development

Last updated: 2016-12-21 13:07:51 +0100

Comment 22018

Date: 2016-04-06 17:47:03 +0200
From: Richard Hughes <<richard.monetdb>>

Created attachment 393
Patch

The attached patch makes expressions of the form "col LIKE 'x'" run approximately 8 times faster.

This seems like a totally useless optimization, except that the ILIKE operator is by far the fastest way to do a case-insensitive string comparison. Some benchmarks:

select count() from t where lower(x)='x': 159s
select count(
) from t where x ilike 'x' [pre-patch]: 4.1s
select count(*) from t where x ilike 'x' [with patch]: 0.520s

The implementation is not very neat but the performance gain is worth it to us.

Attached file: pcre-strcmp.patch (application/octet-stream, 4648 bytes)
Description: Patch

Comment 22019

Date: 2016-04-06 21:24:39 +0200
From: @sjoerdmullender

Interesting idea. I'll take a closer look.

Another idea so that I don't forget:
If pattern ends in % and has no other _ or %, use strncmp or strncasecmp.

Comment 22022

Date: 2016-04-07 12:03:52 +0200
From: Richard Hughes <<richard.monetdb>>

That optimization is actually already covered by the 'use_re' code path. See re_simple() for the conditions under which it is invoked and re_match_ignore() for the matcher.

Adding to my benchmarks above:
select count() from t where x ilike 'x%': 0.411s
select count(
) from t where x ilike '%x%': 3.8s

I'm slightly curious about how the first one ended up even faster than my implementation. It's possible that glibc's strncasecmp is faster than strcasecmp I suppose.

Here's a list of expressions that I would consider to be theoretically amenable to a simplified matcher but which still fall back to the pcre matcher:
'x%' escape ''
'x' escape ''
'%x'

I wrote this patch in order to speed up a set of queries I have that are very frequently run. I got my 8x speed-up for my use case and so I'm happy - anything beyond that is a bonus.

Comment 24685

Date: 2016-11-15 14:00:23 +0100
From: MonetDB Mercurial Repository <>

Changeset 35d404f6597c made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=35d404f6597c

Changeset description:

Use str(case)cmp for (I)LIKE comparisons without wildcards.
This is from the patch in bug #3976.

Comment 24686

Date: 2016-11-15 14:01:13 +0100
From: @sjoerdmullender

This is in the Dec2016 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant