Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new "datatype" for blob handling #7739

Open
livius2 opened this issue Sep 6, 2023 · 32 comments
Open

Implement new "datatype" for blob handling #7739

livius2 opened this issue Sep 6, 2023 · 32 comments

Comments

@livius2
Copy link

livius2 commented Sep 6, 2023

Hi

currently we have BLOB field which require additional call to retrive it.
It is so so slow.
Consider implementing different datatype to handle "blob" data. It should be part of record transmission as normal field.
Especially when this data column is readonly, like in belolw query result of LIST function.

Below comparision queries for speed test. Test it from remote connection (e.g. 20-30 ms ping).
I have tested on FB3 and FB4. First query uses blob from LIST function. Second have CAST list to VARCHAR(32000) and i have added additional ~1KB of data to every record, to show that much more data retrived, but still ~60x faster!

query 1. 15089 ms records 540
query 2. 253 ms records 540

query 1.

WITH RECURSIVE CTE AS (

SELECT RF.RDB$RELATION_NAME, RF.RDB$FIELD_NAME, CAST(1 AS INTEGER) AS LEVEL FROM RDB$RELATION_FIELDS RF WHERE RF.RDB$SYSTEM_FLAG = 1
UNION ALL
SELECT C.RDB$RELATION_NAME, C.RDB$FIELD_NAME, C.LEVEL + 1 FROM CTE C WHERE C.LEVEL<10
)

SELECT 
X.RDB$RELATION_NAME, X.LEVEL, LIST(DISTINCT X.RDB$FIELD_NAME) AS SEP
FROM
CTE X
GROUP BY X.RDB$RELATION_NAME, X.LEVEL

query 2.

WITH RECURSIVE CTE AS (

SELECT RF.RDB$RELATION_NAME, RF.RDB$FIELD_NAME, CAST(1 AS INTEGER) AS LEVEL FROM RDB$RELATION_FIELDS RF WHERE RF.RDB$SYSTEM_FLAG = 1
UNION ALL
SELECT C.RDB$RELATION_NAME, C.RDB$FIELD_NAME, C.LEVEL + 1 FROM CTE C WHERE C.LEVEL<10
)

SELECT 
X.RDB$RELATION_NAME, X.LEVEL, CAST(LIST(DISTINCT X.RDB$FIELD_NAME) AS VARCHAR(32000)) || REPLACE('aaaaaaaaaaaaaaaaaaaaaaaaaaa', 'a', uuid_to_char(gen_uuid())) AS SEP
FROM
CTE X
GROUP BY X.RDB$RELATION_NAME, X.LEVEL

If you like more records to test, simply change C.LEVEL<10 to something else like 'C.LEVEL<50`

@mrotteveel
Copy link
Member

This could be solved by other means than a new data type, for example by streaming the blobs inline in a result set, or maybe some compound format that contains the blob ID and the first x bytes of the blob (e.g. +/- 32KB or the actual length if it is less).

For example, by introducing a blr_blob3 to replace blr_blob2, and having the content of the message data be <8-bytes blob id><2-bytes length><data> where given the existing limitations of the message, <data> would be max 32KiB - 10 bytes. Or alternatively, if those limits are dropped, we could use <8-bytes blob id><4-bytes length><data> and maybe make it configurable how much blob data is sent in a message.

If a blob id from a blr_blob3 message is opened, it will be positioned at the first byte after the returned data (or after the end of the blob).

@aafemt
Copy link
Contributor

aafemt commented Sep 6, 2023

If you know for sure that your data fits 32k - you don't need BLOBs. And if your data don't fit 32k - you don't want to transfer them through a thin wire without explicit request.

@mrotteveel
Copy link
Member

If you know for sure that your data fits 32k - you don't need BLOBs. And if your data don't fit 32k - you don't want to transfer them through a thin wire without explicit request.

In a lot of use cases for blobs, the majority of blobs may be relatively small, while there are a few outliers which are larger. So you use blobs to also be able to store the outliers, or because the fact blobs are stored out-of-band has a benefit when querying things that don't involve the blob data. Not to mention that you may need it to avoid the 64KiB row limit.

Returning the first ~32KiB inline will return all data for cases where the blob is actually smallish, and for cases where the blob is larger, you will already have the first 32KiB available for processing, which could very well improve performance even for larger blobs.

@hvlad
Copy link
Member

hvlad commented Sep 6, 2023

This could be solved by other means than a new data type, for example by streaming the blobs inline in a result set

I also beleive it is a way to go. To make it more flexible I can think about per-statement or per-attachment setting that control if blob data should be streamed inline or not. Also, it could be useful to set the number of bytes to stream inline.

And no, I don't like blr_blob3, I see no need in it.

@mrotteveel
Copy link
Member

And no, I don't like blr_blob3, I see no need in it.

Well, I needed something the distinguish from the current behaviour of blr_blob2 which will only report the blob id.

@asfernandes
Copy link
Member

Why a setting?

select blob_field from table is an explicit request that one needs a blob, and faster is preferred than slow.

@hvlad
Copy link
Member

hvlad commented Sep 6, 2023

To let user decide.

@AlexPeshkoff
Copy link
Member

AlexPeshkoff commented Sep 6, 2023 via email

@hvlad
Copy link
Member

hvlad commented Sep 6, 2023

select blob_field from table is an explicit request that one needs a blob, and faster is preferred than slow.

Many apps implemented in a following way: it have a grid with many fields and shows a blob(s) content on demand, when user pressed a button. Almost always the resultset behind a grid contains all fields, including a blob ones. There is no need and no way to show all blobs from all records at the same time. If we will always include blob data into resultset, it will slow such apps significantly. More, it will force users to exclude blob fields from main resultset and create additional request just to fetch desired blob - this is not faster than current separate blob open\fetch\close.

@hvlad
Copy link
Member

hvlad commented Sep 6, 2023

Using new datatype, where first N (not too big value) bytes of BLOB is stored inline

We speak about storage or about network transfers here ?
Blob storage is another theme, I believe.

may have one more advantage - such field may be indexed. Something like Paradox used to have such fields.

Couldn't expression indices serve this goal ?

@asfernandes
Copy link
Member

Using new datatype, where first N (not too big value) bytes of BLOB is stored inline, may have one more advantage - such field may be indexed. Something like Paradox used to have such fields.

How could applications degrades to have the data not indexed (or fully indexed) where the field length increases?

@AlexPeshkoff
Copy link
Member

AlexPeshkoff commented Sep 6, 2023 via email

@hvlad
Copy link
Member

hvlad commented Sep 6, 2023

Blob storage is another theme, I believe.

Yes, certainly. But for sure related.

I don't think so, could you explain ?

Couldn't expression indices serve this goal ?

Am I wrong that to make use of expression index in a plan that expression should be used in WHERE clause?

You correct, but... I still see no relation of index thing with storage details nor with the subject.

@AlexPeshkoff
Copy link
Member

AlexPeshkoff commented Sep 6, 2023 via email

@aafemt
Copy link
Contributor

aafemt commented Sep 6, 2023

Why invent a new datatype when IStatement::openCursor() has enough room in "flags" parameter for CURSOR_PREFETCH_BLOBS flag?

@dyemanov
Copy link
Member

dyemanov commented Sep 6, 2023

Why invent a new datatype when IStatement::openCursor() has enough room in "flags" parameter for CURSOR_PREFETCH_BLOBS flag?

Client-side record is a message which format is described using BLR. If we extend the message with blob data chunk, it should be somehow described. AFAIU, this is what Mark suggests. If you're going to prefetch blobs in some internal buffers of the statement object and feed getSegment() from these buffers, of course a new data type is not required.

@aafemt
Copy link
Contributor

aafemt commented Sep 6, 2023

Yes, that's what I suggested to do: transfer BLOBs as usual by id but if the flag is set - send content of the blob immediately (automatically requesting it from client on fetch() may be) and store it in a temporary file. Then client application request BLOB content as usual but it turn out to be client-only call which is greatly faster.

@aafemt
Copy link
Contributor

aafemt commented Sep 6, 2023

Blob data can be sent by server automatically with usual sequence of packets before op_fetch response and client can handle it while waiting for the response. This will make blob delivery zero round-trip.

@dmitry-lipetsk
Copy link
Contributor

dmitry-lipetsk commented Sep 10, 2023

Hello,

Let me describe how it has already done in another "deprecated" API - OLEDB :)

  1. Result value has status. Status can be: S_OK, S_ISNULL, S_TRUNCATED and so on. See "enum DBSTATUS" in OLEDB documentation.

  2. You can bind (include) a column in a fetched row twice.

For example, you can bind BLOB column

  1. As direct data (bytes or text)
  2. As storage object (stream)

In your case, you can define two binds for BLOB

  1. (bind1) As direct data with a limit on size. For example 4KB.
  2. (bind2) As BLOB ID.

If the data of bind1 has S_OK status - you will use this data.

If the data of bind1 has S_ISNULL, then the data of bind2 will have S_ISNULL too.

If the data of bind1 has S_TRUNCATED, you can use data (BLOB ID) from bind2 and read BLOB through separated calls.


OLEDB also allows to read the row data twice. Because the fetch operation does not return the row data but the row handle.

You can try to read the column with BLOB directly in the user memory buffer with fixed length. If you get a value with status S_TRUNCATED, you can read this column as storage (BLOB ID) again.


Of course, you can always read all the data of BLOB directly. Just use DBTYPE_BYREF modificator for bind datatype. OLEDB provider will allocate memory for BLOB data and return the pointer of this buffer. User must free this memory.


This was invented 25 years ago.

@livius2
Copy link
Author

livius2 commented Feb 13, 2024

I have thought about introducing a new blob data type eg. GBLOB - Good BLOB ;-).
It would be beneficial if it did not have a handle at all, contrary to the current BLOB, which does. Additionally, if stored somewhere, it could be simply forgotten after a SELECT. Currently, a BLOB is not freed after operations such as SELECT, since we can ask for the BLOB content by ID. A new data type would be straightforward to use, and even BLOB_APPEND would not be needed, as there is no delayed fetch at all.

@sim1984
Copy link

sim1984 commented Feb 13, 2024

No new type is needed. It is necessary to optimize the protocol for the current blob implementation. For example, be able to prefetch a small part of a blob. If the entire blob fits into this prefetch, then open, get_segment and close will not create additional network packets at all.

@livius2
Copy link
Author

livius2 commented Feb 13, 2024

@sim1984 In addressing the current issue with blobs, yes, It can conceal some problems, but not the overarching issues related to the lifespan of a blob. GBLOB should be forgotten after it is retrieved. The current BLOB persists 'forever' until you commit or rollback. This is because the engine never knows when someone might request the blob by its BLOB ID.

@dyemanov
Copy link
Member

How GBLOB would be different from e.g. an unlimited-length VARCHAR? Well, nothing is really unlimited in real life, but let's say 4GB-limited. It's also a different data type (API gonna be affected), it also requires some work to implement, but usage-visible behaviour should be exactly like you describe.

@aafemt
Copy link
Contributor

aafemt commented Feb 14, 2024

AFAIU GBLOB is suggested as PSQL-only and never appear on client side.

@livius2
Copy link
Author

livius2 commented Apr 22, 2024

@aafemt AFAIU GBLOB is suggested as PSQL-only and never appear on client side.

It appear in select too so also on client side.

@dyemanov unlimited-length VARCHAR

Interesting thing, but i think that we must then cast all to VARCHAR eg. result of LIST.
I think about making GBLOB as default for all, but if someone need to to have blob ID it will use old BLOB explicitly in declarations.

@sim1984
Copy link

sim1984 commented Apr 22, 2024

It is not necessary to invent new types, it is necessary to optimize work with existing ones. New data types will practically not be used, because it requires rewriting access components or drivers (odbc, etc.). For example, look at decfloat. Do many components support it?

@livius2
Copy link
Author

livius2 commented Apr 23, 2024

@sim1984 Addoption of new data types in drivers is slow but increase. But current design of blob rather prohibit speed in this matter, providing handle to object which cannot be forgotten and can be used/fetched in any times have big consequences as you see with BLOB_APPEND existence...

@sim1984
Copy link

sim1984 commented Apr 23, 2024

There is no need to mix up the method of transferring BLOBs (network protocol) and retaining record versions (database bloat).

  1. Problems with transmission speed occur only for short BLOBs that are not transmitted along with the fetch of each record. In any case, long BLOBs will have to be transmitted in parts (no one will cram 100 megabytes into one network packet).

  2. Version retention does not depend in any way on whether the BLOB was transferred entirely or by identifier. This only makes sense with legacy transaction isolation levels (read_committed [no] record version read). In READ COMMITTED READ CONSISTENCY, versions (including temporary BLOBs) will be retained until the cursor is closed (or the last record fetch), in SNAPSHOT, as expected, until the transaction is completed.

  3. Problems with LIST can be solved by adding an additional parameter to it, which would indicate where BLOBs are created in temporary space or in the main database. In addition, you can add a parameter there indicating whether the BLOB will be streamed or segmented.

  4. Inside PSQL, for local variables and output parameters, it would be possible to somehow indicate that this BLOB is needed only in temporary space, so operations with it would not lead to bloat of the main database.

@livius2
Copy link
Author

livius2 commented Apr 23, 2024

Any impreovement in this matter will be huge benefit. We will use whatever will be probided.
But look at point 2, blob must be remembered, and cannot be forgotten until transaction end, isn't it very bad?
You have not such problems with any other data types, that server must maitain its lifespan.

@aafemt
Copy link
Contributor

aafemt commented Apr 23, 2024

But look at point 2, blob must be remembered, and cannot be forgotten until transaction end, isn't it very bad?

First, in this case they used to occupy temporary space, not database space.
Second, they can be forgotten as soon as they are transferred to client and cached there. In this case they will occupy client temporary space.

@dmitry-lipetsk
Copy link
Contributor

Second, they can be forgotten as soon as they are transferred to client and cached there. In this case they will occupy client temporary space.

I hear that FB supports sharing a transaction between two connection.

If it is true and I am right - I can share transaction between two different processes or computers .... and get blobID in one but read blobData in another :)

@aafemt
Copy link
Contributor

aafemt commented Apr 23, 2024

I hear that FB supports sharing a transaction between two connection.

Not transaction, only database snapshot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants