Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDFs declared with large varchars take excessive time to execute [CORE5314] #2050

Closed
firebird-issue-importer opened this issue Jul 20, 2016 · 17 comments

Comments

@firebird-issue-importer

Submitted by: @reevespaul

Attachments:
test_udf_large_varchars.zip
glds_udflib.dll
glds_udflib.cpp
udfTest.sql

UDFs declared with large varchars take excessive time to execute

An IBPhoenix client reported the following problem to us.

A single udf, when declared with a 32k string length takes 20 times longer to execute than the same udf declared with a 16k string length.

Likewise, declaring the UDF with a string length of 1K takes half the time again.

Here is a quick summary of the times:

f_strpos_huge - 60s
f_strpos_middling - 3s
f_strpos_1024 - 1.5s

In all cases the same actual function was executed. The only difference was in the declaration.

I have a test case which demonstrates the problem.

Commits: 6c2e26c 13fd2f7

@firebird-issue-importer
Copy link
Author

firebird-issue-importer commented Jul 20, 2016

Modified by: @reevespaul

Attachment: test_udf_large_varchars.zip [ 12992 ]

@firebird-issue-importer
Copy link
Author

firebird-issue-importer commented Jul 20, 2016

Commented by: @reevespaul

The test SP should produce output similar to the following if a null string is passed:

SQL> select aresult from tmp_sp(null);

ARESULT

Testing f_strpos_huge_un took 63.2850
Testing f_strpos_huge took 64.8270
Testing f_strpos_middling took 3.5160
Testing f_strpos_middling_varchar took 4.0160
Testing f_strpos_1024 took 1.5150
Testing f_strpos_huge_alt took 61.9690
Testing f_strpos_middling_alt took 3.6090
Testing f_strpos_middling_alt_varchar took 4.0470
Testing f_strpos_1024_alt took 1.5940
Testing built-in position with large varchar took 0.6250
Testing built-in position with 16383 varchar took 0.6250

Execution is slightly longer when an actual string is searched:

SQL> select aresult from tmp_sp('astring with some text in it');

ARESULT

Testing f_strpos_huge_un took 62.7500
Testing f_strpos_huge took 60.9380
Testing f_strpos_middling took 3.1400
Testing f_strpos_middling_varchar took 5.1100
Testing f_strpos_1024 took 1.5780
Testing f_strpos_huge_alt took 60.2500
Testing f_strpos_middling_alt took 3.1870
Testing f_strpos_middling_alt_varchar took 5.6570
Testing f_strpos_1024_alt took 1.7660
Testing built-in position with large varchar took 1.0150

@firebird-issue-importer
Copy link
Author

firebird-issue-importer commented Jul 20, 2016

Modified by: @reevespaul

priority: Major [ 3 ] => Minor [ 4 ]

@firebird-issue-importer
Copy link
Author

firebird-issue-importer commented Jul 20, 2016

Commented by: Claudio Valderrama C. (robocop)

For what I remember, strings are copied.
Passing the string by descriptor may reduce time and memory used.

@firebird-issue-importer
Copy link
Author

firebird-issue-importer commented Jul 21, 2016

Commented by: Sean Leyne (seanleyne)

A running/pre-compiled version of the UDF and/or a db script with the Test SP and showing how the UDF was defined (BY DESCRIPTOR?) would be appreciated.

@firebird-issue-importer
Copy link
Author

firebird-issue-importer commented Oct 11, 2016

Commented by: Paulius Pazera (ppazera)

glds_udflib.cpp -- UDF source code
glds_udflib.dll -- binary for 32 bit windows
udfTest.sql -- UDF declarations, test SP, cleanup SQL

@firebird-issue-importer
Copy link
Author

firebird-issue-importer commented Oct 11, 2016

Modified by: Paulius Pazera (ppazera)

Attachment: glds_udflib.dll [ 13022 ]

Attachment: glds_udflib.cpp [ 13023 ]

Attachment: udfTest.sql [ 13024 ]

@firebird-issue-importer
Copy link
Author

firebird-issue-importer commented Oct 11, 2016

Commented by: Paulius Pazera (ppazera)

I attached source code and win32 binary for UDF functions doing nothing, accepting two string parameters (cstring, varchar and varchar by desriptor), and returning integer constant

I also attached stored procedure which demonstrating performance of UDF calls when passing two string parameters (cstring, varchar and varchar by descriptor):

ACTION DURATION_IN_SECONDS CYCLES
============================== ======================= ============
empty loop 0.1989792000000000 1000000
position 0.4009824000000000 1000000
2 cstring 100 0.5640192000000001 1000000
2 varchar 100 0.5979744000000000 1000000
2 by descriptor 100 0.4650048000000000 1000000
2 cstring 32000 20.49001920000000 1000000
2 varchar 32000 23.53501440000000 1000000
2 by descriptor 32000 20.17396800000000 1000000

just for comparison I included built-in 'position' function which also accepts two strings and returns integer

@firebird-issue-importer
Copy link
Author

firebird-issue-importer commented Oct 14, 2016

Commented by: @asfernandes

Optimized space for arguments data is 800 bytes. More than that requires an allocation in the heap.

There is also core function * (used not only for UDFs) that clears (why?) whole varchar descriptor beforing move something in it.

* src/common/cvt.cpp:

case dtype\_varying:
	MOVE\_CLEAR\(to\-\>dsc\_address, to\-\>dsc\_length\);
	// fall through \.\.\.
case dtype\_text:
case dtype\_cstring:

@firebird-issue-importer
Copy link
Author

firebird-issue-importer commented Oct 14, 2016

Commented by: @dyemanov

I could be missing something but cannot we reserve the function's input message size inside the impure area (instead of using some temporary buffer) and thus avoid its reallocation at runtime?

@firebird-issue-importer
Copy link
Author

firebird-issue-importer commented Oct 14, 2016

Commented by: @asfernandes

Yes, could be.

@firebird-issue-importer
Copy link
Author

firebird-issue-importer commented Oct 14, 2016

Modified by: @asfernandes

assignee: Adriano dos Santos Fernandes [ asfernandes ]

@firebird-issue-importer
Copy link
Author

firebird-issue-importer commented Oct 17, 2016

Commented by: @asfernandes

The times I see in 3.0 Linux debug build are no near your ones, but anyway I'm fixing the per-call allocation in v3 and master. Please test it.

@firebird-issue-importer
Copy link
Author

firebird-issue-importer commented Oct 17, 2016

Modified by: @asfernandes

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 4.0 Alpha 1 [ 10731 ]

Fix Version: 3.0.2 [ 10785 ]

@firebird-issue-importer
Copy link
Author

firebird-issue-importer commented Oct 19, 2016

Commented by: Paulius Pazera (ppazera)

right, big string UDF parameter performance in fb3 is much better than in 2.5.6, looks like main issue was already fixed in fb3. Latest fix doesn't make significant difference:

3.0.2.32619 (snapshot with a fix):

ACTION DURATION_IN_SECONDS CYCLES
============================== ======================= ============
empty loop 0.2240352000000000 1000000
position 0.3749760000000000 1000000
2 cstring 100 0.8419679999999999 1000000
2 varchar 100 0.8740224000000000 1000000
2 by descriptor 100 0.6910272000000001 1000000
2 cstring 32000 2.527027200000000 1000000
2 varchar 32000 6.509030400000000 1000000
2 by descriptor 32000 2.358979200000000 1000000

but the question still remains why built-in 'position' function is still ~6..17 times faster than UDF calls doing nothing. I played a bit more trying to measure malloc/memcpy/free/ib_util_malloc, here are results for 3.0.1 release:

ACTION DURATION_IN_SECONDS CYCLES
======================================== ======================= ============
empty loop 0.2280096000000000 1000000
position 0.3480192000000001 1000000
2 cstring 100 0.7859808000000000 1000000
2 varchar 100 0.8180352000000001 1000000
2 by descriptor 100 0.6709824000000000 1000000
2 cstring 32000 2.565043200000000 1000000
2 varchar 32000 6.635001600000001 1000000
2 by descriptor 32000 2.440972800000000 1000000
2 mallocmemcpyfree 100 0.7389792000000001 1000000
2 mallocmemcpyfree 32000 3.372969600000000 1000000
2 ibutilmallocmemcpy 100/100 1.557014400000000 1000000
2 ibutilmallocmemcpy 100/32000 6.922972800000000 1000000
2 ibutilmallocmemcpy 32000/32000 9.058003200000000 1000000

sounds like malloc/memcpy/free might be done in all cases. If so, could it be avoided?

here is code for new tests:

extern "C" ISC_LONG EXPORT fn_2mallocmemcpyfree(ISC_LONG *size)
{
void *a=malloc(*size), *b=malloc(*size);
memcpy(a, b, *size);
memcpy(b, a, *size);
free(a);
free(b);
return 4;
}

extern "C" char* EXPORT fn_ibutilmallocmemcpy(ISC_LONG *size)
{
char *a=(char*)ib_util_malloc(*size);
memcpy(a, a+1, (*size)-1);
return a;
}

declare external function f_2mallocmemcpyfree
integer
returns integer by value
entry_point 'fn_2mallocmemcpyfree' module_name 'glds_udflib';

declare external function f_ibutilmallocmemcpy_100
integer
returns cstring(100) free_it
entry_point 'fn_ibutilmallocmemcpy' module_name 'glds_udflib';

declare external function f_ibutilmallocmemcpy_32000
integer
returns cstring(32000) free_it
entry_point 'fn_ibutilmallocmemcpy' module_name 'glds_udflib';

/*test UDF and declarations, expected 4, 5, 5*/
select f_2mallocmemcpyfree(32000), f_ibutilmallocmemcpy_100(100), f_ibutilmallocmemcpy_32000(32000) from rdb$database;

action='2 mallocmemcpyfree 100';
i=0;
started=cast('NOW' as timestamp);
while (i<cycles) do begin
j=f_2mallocmemcpyfree(100);
i=i+1;
end
finished=cast('NOW' as timestamp);
duration_in_seconds=(finished-started)*24*60*60;
suspend;

action='2 mallocmemcpyfree 32000';
i=0;
started=cast('NOW' as timestamp);
while (i<cycles) do begin
j=f_2mallocmemcpyfree(32000);
i=i+1;
end
finished=cast('NOW' as timestamp);
duration_in_seconds=(finished-started)*24*60*60;
suspend;

action='2 ibutilmallocmemcpy 100/100';
i=0;
started=cast('NOW' as timestamp);
while (i<cycles) do begin
f_ibutilmallocmemcpy_100(100);
f_ibutilmallocmemcpy_100(100);
i=i+1;
end
finished=cast('NOW' as timestamp);
duration_in_seconds=(finished-started)*24*60*60;
suspend;

action='2 ibutilmallocmemcpy 100/32000';
i=0;
started=cast('NOW' as timestamp);
while (i<cycles) do begin
f_ibutilmallocmemcpy_100(32000);
f_ibutilmallocmemcpy_100(32000);
i=i+1;
end
finished=cast('NOW' as timestamp);
duration_in_seconds=(finished-started)*24*60*60;
suspend;

action='2 ibutilmallocmemcpy 32000/32000';
i=0;
started=cast('NOW' as timestamp);
while (i<cycles) do begin
f_ibutilmallocmemcpy_32000(32000);
f_ibutilmallocmemcpy_32000(32000);
i=i+1;
end
finished=cast('NOW' as timestamp);
duration_in_seconds=(finished-started)*24*60*60;
suspend;

@firebird-issue-importer
Copy link
Author

firebird-issue-importer commented Nov 25, 2017

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: No test => Cannot be tested

@firebird-issue-importer
Copy link
Author

firebird-issue-importer commented Nov 25, 2017

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Closed [ 6 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants