New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New built-in function BLOB_APPEND #6983
Conversation
I see that the function may fallback to not-optimized way of work (copy blob) when it's already opened. Also, it was implemented in simple way that makes that happens frequently. For example:
There was no reason to copy Much more important example could be with I don't support the function in this way it is done. IMO things should be different.
And then BLOB_APPEND should not fallback to copy. It should fail when not possible to append. Documentation should be added specifying which operations closes a just-created blob. I see value in the function (over Though I think it may be very confusing. It will make blob variables sometimes work by value and sometimes work by reference. |
And it does not copy in this case. Please read description of 3rd case for 1st argument:
= = = = =
Why it is more important ?
I don't want to open can of worms. But I'm perhaps too careful. It could be discussed.
When it is not possible to append ?
= = = = =
Blob variables always works as reference, I'd say, while not everyone understand it. |
I don't see in the code how during initial assignment "v1 = 'abc';" the variable receives BLB_close_on_read flag. |
Ah, sorry. Yes, in this case it have no such flag.
or
|
Shouldn't it receive one? In the documentation I see example of call like |
On 9/27/21 6:39 PM, Vlad Khorsun wrote:
And it does not copy in this case.
I don't see in the code how during initial assignment "v1 =
'abc';" the variable receives BLB_close_on_read flag.
Ah, sorry. Yes, in this case it have no such flag.
If one need efficient code, it should write it as:
|v1 = blob_append(v1, 'abc', 'd');|
or
|v1 = blob_append(null, 'abc'); v1 = blob_append(v1, 'd'); |
Appears we are limited here with syntax of SQL function (returns single
value, arguments not passed by reference) and have to invent artificial
flags to support otherwise simple operation.
May be replace with something like:
v1 =|'abc';
...
v1 ||= 'd';
I understabd that does not look SQL-ish but correctly reflects what we
waht to do.
|
|
Generally speaking I would prefer opposite logic: every blob is created as appendable and receive "copy on write" flag during materialization. |
It could be a large blob. But I mean, every large blob come from the engine. More examples:
Yes, IMO it should be a function that user really know is working fast and must understand when it can be used. Otherwise it's very risk that someone writes the code to be fast and with maintenance it became slower (and untested with large data).
Ok, but for example
In PSQL you'd need first to create-and-close it, then assign to others variables. And when you assign to different sub_type or charset, it's copied (so in this case it's like by-value). So in both cases it appears like a by-value operation with the reference being like an optimization. (I know, there is the blob-id, but the user visible effect is like a by-value in all cases). With this function, some (but not all) usages will appear as a by-reference, as you may have the bid in two variables and updating one will affect the other. |
No. This ticket not about assignment.
There is no example with string literal at 1st arg. By doc it should not be allowed, but I'm not sure it
No, it is not casted and BLOB_APPEND see it as is.
See above. We might demand blob or null at 1st param, and initial implementation did so, but |
IIRC, it was discussed but considered as too risky. |
It is there:
|
Ah, I didn't notice that this PR is into the stable branch. In this case I'm against it (though my opinion worth nothing, of course). |
But it is not because of BLOB_APPEND presence. One could use concatenation and get same old issue.
That's what the documentation is for. Yes, we might disable to pass closed blob into 1st argument, but it gives nothing to the end users. imho.
We can't fix all mistakes that users might do :)
Agree. This case could be documented.
Yes, sure.
Yes, and this also could be documented. BTW, if follow your suggestion to defer blob's closure for all blobs, |
You right. Anyway, I already answered your question, I think. |
Will see. |
Forget to add - this code was developed few month ago and was extensively tested by customer. |
What should be the result of following code?
Will all three variables have the same value? |
Yes:
B1: abcdefghi |
For me, the primary benefit of this function (as compared to IMO, no limitations should be applied to the first argument, literals and expressions should be allowed (with their contents copied to the newly created blob in the beginning of execution). As for other possible improvements (e.g. every blob is created as appendable), I don't mind considering them for v5. Am I right that |
May i ask why this function is needed at all?
and this is totally counterintuitive from any programming language POV
It will be better if blob_append will be procedure not function.
and then result |
It is explained at description, read first message, please.
Because we have no better way to control its lifetime.
It is not as simple as you see it. Consider blob variable as reference to the real blob object. Don't forget that blob could be stored into table and its ID is changed. Then same blob could be referred by its temporary (initial) BlobID and\or by new (materialized) BlobID. Also, blobs could be referred by triggers, derived columns (VIEW) and so on.
Probably. But nobody implemented it so far. Guess, why ? ;) ...
Only if one not looking into details. And details is not too hard to understand, I believe ;)
If you need 'abcdef' at B2 - use concatenation:
BLOB_APPEND is special in that it returns not closed blob. There is no way to "re-open" blob for writing. Please, read description and examples again, I hope you'll change your opinion about BLOB_APPEND. |
I suppose you overcomplicate things (but maybe i oversimplify it). The real problem is that declared variables in PSQL and operations on it are "remembered" and not forget during all operations inside e.g. stored procedure. e.g.
and if you really go into direction of PS. I can compare here blob to the file. I can append to it, unles someone access it by exclusive flag. |
Maybe this is more understandable then above description. This is how it should work during compilation to BLR.
|
Is above anyhow readable or have no sense to you? |
Karol, I'm currently busy with another things but when I'll return back to blobs I'll consider your suggestion. |
Hi All, |
Last year I was asked to run intensive tests of function BLOB_APPEND() which was assumed to be introduced in FB 4.0.x. Some issues were found and (quickly) fixed, several tests was performed (both artificial and also on production DB with real data and complex PSQL). I cannot show here PSQL code from production which was refactored in order to estimate performance gain of BLOB_APPEND() - and not only because it is proprietary: there are lot of complexity and dependencies which can not be dropped. So, I want to show benchmark of simple (synthetic) test: And do this 1000 times, i.e.:
This code was compared with another:
For both of them following values were logged:
Test was done on machine with 128 Gb RAM (half of this volume is RAM drive), database was created on Samsung SSD 870 QVO 1TB. I ran FIVE times two scripts from attached .7z:
Firebird ran as service and was restarted before each measurement. One may see results in logs, but I also made .xlsx-file for better readability. It seems that ratio of results not depends on FB config parameters. Content of firebird.conf that was in use:
One may see that speed of BLOB_APPEND() is much faster than usual concatenation: difference in elapsed time is more than 8400 times(!). Difference for other counters is also very high: about 200 times for page fetches/marks and about 50 times for memory used/allocated. ###################################################################### Of course this was just trivial and artificial test, but it demonstrates the potential speed gain. |
Hello All, 2 months passed after the tests results were presented - since there is no comments and objections, let's commit this awesome feature? |
There were comments and objections ;-) But i do not know if were my comments understandable. But i have only a little voice so you all do what you need. |
Hello, |
My 2 cents: looks like blob_append is a full isolated new feature that it is already tested (in HQBird) and works well bringing an awesome speed enhancement for such operations. As Alexey pointed out, it would not block any other further enhancements in blob "optimization", so I really don't see a reason for not committing it and allow all Firebird users to benefit from it. |
No, i only point to files to accuire some lock, but my comment is near to reference counting but a little different mechanism. |
Well, I think it is much harder to implement than describe :) However, I dont see a reason why we cannot use less universal but working solution right now. If the improved universal contactenation will appear in the future (v6 or v7? in 2025-2027), it can coexists with blob_append. |
On 4/23/22 12:26, Karol Bieniaszewski wrote:
There were comments and objections ;-) But i do not know if were my
comments understandable.
Even if blob_append will be provided instead of full fix, then it
should be procedure not function as described in comments above.
In that case we will need second function to create initially
ready-for-append blob, at least I do not see too simple way to avoid it.
But from SQL semantics and code readability POVs I suppose it's better
than current blob_append().
But i have only a little voice so you all do what you need.
As a programmer, I know that when such a function is made available,
it will be difficult to withdraw from it.
Here I agree - adding such temporary solutions later means to keep them
permanent and support in the presence of good, correct solution.
|
The approach mentioned by @livius2 is good, we asked about similar capabilities when ordered the development of blob_append function. The advantage of blob_append - this function creates temporary blob (not inside database file), so all blob concat operations can be executed in fast temporary storage instead of database file. In parallel we ordered another improvement - option BlobTempSpace in firebird.conf to enable by default creation of all PSQL blobs in temporary space outside database. I think that BlobTempSpace option and @livius2 proposal is the best universal solution. But I think that one thing does not exclude another. In many RDBMS there is internal concat function in addition to standard concatenation syntax. I even proposed to give more common name for "blob_append" function, for example, "concat". So we can implement internal function which for current versions will be more efficient for blobs, and in future universal optimization of concatenation (if it will appear) will not conflict with the internal function. |
You should only recognize where the blob is exposed to outside world. All other blobs created in psql can be modified in any manner. Only blob returned should stay unmodified and new one is needed if someone need to modify it inside the code e.g. next loop. |
@livius2 @AlexPeshkoff Do you have any plans for the implementation of the discussed universal optimization? If yes, then it will be perfect solution. If no, I think you should consider the current function for optimized blob concatenation (at least as an intermediate solution), because at present concatenation of big blobs in PSQL is a pain. We started using blob_append function in our HQBird installations and performance increased up to 10 times. I think we are not only ones with blob performance issues like this, unexpected database growth and so on. |
Thinking about naming, maybe it should be named |
Just a 5 cent, but better is first context then action. It is more natural from programming POV. |
Programming may use different rules, see e.g. Win32 API: SetTimer, PostQuitMessage, TerminateProcess, etc. |
Oracle has DBMS_LOB.APPEND(). There is sense to make migration easier using similar naming. |
We already have a PR with |
Regarding "second function to create initially ready-for-append blob" -- I'd rather avoid it. After all, this function is a custom "shorthand" solution for quick concatenation, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally approve this PR, with a rename suggestion still pending.
On 6/15/22 17:58, Dmitry Yemanov wrote:
And surely this function does not stop us from enhancing blob handling
inside the engine in different ways.
The main worse is that we will have to support that function later.
That's like what now happens with pre-mapping tools for rdb$admin support.
|
Regarding renaming to |
It seems this function got merged without documentation (README). And link in the external repository does not exist anymore. |
Looks so, although I remember myself paying attention to this issue before merging. It seems it was forgotten, sigh. |
Ah, no, it wasn't forgotten, just committed separately: |
Regular operator || (concatenation) with BLOB arguments creates temporary BLOB per every pair of args
with BLOB. This could lead to the excessive memory consumption and growth of database file.
The new BLOB_APPEND function is designed to concatenate (or append) BLOBs without creating intermediate BLOBs.
More detailed description at
https://github.com/sim1984/blob_append/blob/main/blob_append_en.md