Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibly incorrect call to pcre_exec #6721

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

Possibly incorrect call to pcre_exec #6721

monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2019-07-01 16:24:39 +0200
From: @swingbit
To: SQL devs <>
Version: 11.33.3 (Apr2019)

Last updated: 2019-09-02 16:05:26 +0200

Comment 27095

Date: 2019-07-01 16:24:39 +0200
From: @swingbit

User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36
Build Identifier:

I stumbled upon a couple of calls of pcre_exec that, according to the manual, should be fixed.
The manual says, about the size of the output vector (ovector):

"The first two-thirds of the vector is used to pass back captured substrings, each substring using a pair of integers. The remaining third of the vector is used as workspace by pcre_exec() while matching capturing subpatterns, and is not available for passing back information. The number passed in ovecsize should always be a multiple of three. If it is not, it is rounded down."

Then I suppose the following call will always be rounded down to 9. Better use 9 then.

define BODY (pcre_exec(re, pe, v, (int) strlen(v), 0, 0, ovector, 10) >= 0)

The following one seems more dangerous.
The third element ("used as workspace by pcre_exec") is not allocated. Also, ovectsize is 2, which becomes 0 when rounded down:

PCREindex(int *res, const pcre *pattern, const str *s)
{
ifdef HAVE_LIBPCRE
int v[2];

v[0] = v[1] = *res = 0;
if (pcre_exec(pattern, NULL, *s, (int) strlen(*s), 0, 0, v, 2) >= 0) {
	*res = v[1];

}

The function PCREindex doesn't seem to work correctly indeed:

sql>select pcre_index('1','abc1def');
+------+
| L2 |
+======+
| 4 |
+------+

sql>select pcre_index('\d','abc1def');
+------+
| L2 |
+======+
| 0 |
+------+

Reproducible: Always

Comment 27096

Date: 2019-07-01 19:44:53 +0200
From: @sjoerdmullender

You're right about the multiple of 3. I'll fix that.

How do you use PCREindex and what do you expect that it does?
There is no SQL function pcre_index in the code we distribute, so how did you define that?

Comment 27097

Date: 2019-07-02 11:12:27 +0200
From: @swingbit

True, I had forgotten that I had defined that myself (it is actually an alias for SQL function "patindex"):

CREATE FUNCTION pcre_index(pat string, s string) RETURNS integer EXTERNAL NAME pcre."patindex";

I would expect this to return 4 in both my previous calls. It does when the pattern is a literal '1', but not when it is a '\d' or a '[0-9]' (even after fixing the size of the output vector).

I'm not actually using this function, I just found this by chance.

Speaking of this, I wonder why the most useful pcre functions are available in mal but not mapped in SQL.
These are the extra functions I define:

CREATE OR REPLACE FUNCTION pcre_match(s string, pattern string) RETURNS boolean EXTERNAL NAME pcre."match";
CREATE OR REPLACE FUNCTION pcre_imatch(s string, pattern string) RETURNS boolean EXTERNAL NAME pcre."imatch";
CREATE OR REPLACE FUNCTION pcre_index(pat string, s string) RETURNS integer EXTERNAL NAME pcre."patindex";
CREATE OR REPLACE FUNCTION pcre_replace(s string, pattern string, repl string, flags string) RETURNS string EXTERNAL NAME pcre."replace";
CREATE OR REPLACE FUNCTION pcre_replacefirst(s string, pattern string, repl string, flags string) RETURNS string EXTERNAL NAME pcre."replace_first";

Comment 27098

Date: 2019-07-02 11:48:10 +0200
From: @sjoerdmullender

The problem with pcre.patindex is that it quotes all special characters in the pattern, so it is basically just an expensive way to do an ordinary substring search. This should probably be changed.
But it means that the result with pattern \d or [0-9] is totally expected: those strings don't occur in the to-be-matched string, and the "nomatch" return value is 0.

Comment 27099

Date: 2019-07-02 11:53:00 +0200
From: @swingbit

Ah, I see now. It maps to PCREpatindex, which is meant to accept SQL-like patterns, not pcre patterns.

Comment 27105

Date: 2019-07-03 16:14:17 +0200
From: @sjoerdmullender

I fixed the bug with the multiple-of-three. The rest of the report is, I think, I misunderstandment of what the pcre_patindex function does. So I'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant