Revamp BLOB implementations #992

Krzmbrzl · 2022-09-15T13:32:35Z

This PR revamps the BLOB implementation across all supported backends, except DB2 and ODBC (which remain at not supporting BLOBs). The goal is to have a unified interface for BLOB objects in order to be able to write code that is portable between different backends. Besides unification, this PR also introduces new features to the soci::blob type that makes it fit in better with the rest of SOCI's API. The following features are covered by this PR:

BLOBs can be created as empty objects via

soci::blob myBlob(sql);

I/O is possible on BLOBs initialized in this way (mostly behaving like a regular container)

std::size_t blobSize = myBlob.get_len();
myBlob.write_from_start("dummy", 5);

Once the blob contains the desired data, it can be inserted into the database (including its current data) as usual

sql << "insert into my_table(blob_val) values(:b)", soci::use(myBlob);

Selecting a blob into a blob object overwrites its previous contents (if any)
Changes to a blob are not automatically reflected in the database. In order for changes to be committed to the database, the blob must be explicitly used in an insert or update query.
When selecting a blob type via a rowset, they are now actually represented by a blob object instead of a std::string. The blob can be obtained from a given row via the new move_as<> function that complements the already existing get<> function for non-copyable types (such as blobs). Trying to use get<> for blob types will yield a compiler error stating that this is not supported.
blobs work with prepared statements - insert statements as well as select ones.
BLOBs can be reused - e.g. the same blob object can be used multiple times, always yielding the same data in the query (instead of being empty after executing the first query)

Test cases ensure all of this behavior is consistent across the backends that support BLOBs

Notes

Since the MySQL interface seems to completely work by assembling the to-be-executed query as a string and then executing that, handling BLOBs is also embedded in this framework (The data is converted into a hex-literal). Thus, the larger the BLOB that is to be inserted, the larger the resulting query. If the objects are too large for MySQL's message limits, the query will fail.
A more efficient implementation should be possible, but I think only by refactoring the MySQL backend to use prepared statements instead of string queries. I'll leave this optimization for another day.
Using blob objects in soci::values objects for ORMs (Bind a soci::blob field into a custom class with type_conversion<> #476) will probably require an introduction of a move_as function as a complement to the existing get function just as has been done for row. However, I'm not touching ORMs for now as I don't need them and this PR is big enough as it is.
For the same reason bulk operations remain unsupported

The test cases of the affected backends have been extended to cover the new API behavior.

Fixes #985
Fixes #922

TODO:

Write more (actual) tests for rowset access of BLOBs
~~Write tests for batch-insert and -select of BLOBs~~

Krzmbrzl · 2022-09-16T16:35:49Z

@vadz I think I am (mostly) done with this PR. I updated the PR's description to include a summary of the changes. By looking at the individual commits you should get a more detailed picture, still.

The macOS CI runs seem to have been cancelled for some reason, which leads to them being reported as "failing".

Review(s) welcome :)

vadz · 2022-09-16T17:08:35Z

Thanks!

I have no idea what's going on with macOS CI jobs, I've just fixed PostgreSQL one a couple of days ago... I'll try to look at this soon.

Krzmbrzl · 2022-09-16T17:11:57Z

My guess for the macOS jobs is that the free runs that GitHub actions provides have been used up for the current period - but again: just a guess 🤷
EDIT: Maybe not - it seems GA is completely free for public repos...

src/backends/postgresql/blob.cpp

vadz

Thanks, but I have a couple of questions:

What is the purpose of all the changes to PostgreSQL code? I am not really sure why is the new version better than the old one?
Why can't we have backend-independent tests? It seems like they overlap/duplicate each other a lot.

include/soci/soci-backend.h

include/soci/mysql/soci-mysql.h

src/backends/sqlite3/statement.cpp

Krzmbrzl · 2022-09-18T07:08:44Z

Because the old code does not offer the needed functionality? Sorry, but I am a bit confused by your comment. Are you referring to a specific part of the changes?
Don't ask me. I simply extended what was already there. If you ask me, it doesn't really make any sense to have Backend-specific tests for a library like SOCI (except for really Backend-specific functionality) but apparently this is the way that was chosen 🤷

vadz · 2022-09-18T11:21:05Z

Because the old code does not offer the needed functionality? Sorry, but I am a bit confused by your comment. Are you referring to a specific part of the changes?

I was asking about the changes to PostgreSQL because it wasn't obvious to me why was it necessary to introduce blob_details to do this. It's not that I think it's a bad idea, but this results in a lot of changes which make it not really obvious to be sure that there are no regressions, which is my main concern. So I'd just like to understand why was it a good one?

2. Don't ask me. I simply extended what was already there. If you ask me, it doesn't really make any sense to have Backend-specific tests for a library like SOCI (except for really Backend-specific functionality) but apparently this is the way that was chosen shrug

Normally the backend-specific tests are only supposed to test for backend-specific functionality. I realize that this is not always the case and I guess it could be due to the fact that BLOB support was originally implemented only for Oracle, then extended to PostgreSQL and then to the other backends. Still, it would be nice to at least stop compounding the problem and add the new tests to the common code. If they are really exactly the same, could you please move them to common-tests.h? TIA!

Krzmbrzl · 2022-09-18T12:06:19Z

I was asking about the changes to PostgreSQL because it wasn't obvious to me why was it necessary to introduce blob_details to do this. It's not that I think it's a bad idea, but this results in a lot of changes which make it not really obvious to be sure that there are no regressions, which is my main concern. So I'd just like to understand why was it a good one?

Ah, okay 💡
Strictly speaking it was not necessary, but I needed getters and setters for the OID and for the file descriptor and since these two are kinda linked, I didn't like the idea of having separate access functions for these fields as that could lead to an invalid object state (where the set OID does not refer to the same LOB as the file descriptor). Thus, I merged them into a single data structure.

If they are really exactly the same, could you please move them to common-tests.h

Wouldn't that mean that this would also run for the Oracle backend? If so, they will fail as I did not streamline the Oracle Blob API as I didn't quite understand the Oracle docs on the subject.

vadz · 2022-09-18T13:51:44Z

I hope to merge #996 soon, so using std::uint8_t shouldn't be a problem.

I'm a bit worried about the state of Oracle, as it's the most idiosyncratic backend and it wouldn't be a good idea to provide an API that can't be implemented for it. Unfortunately I don't know much about Oracle BLOB support, but surely it should be possible to write a BLOB into it. In fact, this is the part I don't understand: the description of this PR seems to imply that currently you can't do this at all, but this is not true, is it?

Krzmbrzl · 2022-09-18T13:57:25Z

surely it should be possible to write a BLOB into it

Oh, I'm almost certain it is possible. I just had no motivation to dig through the docs and figure out how to do it.

In fact, this is the part I don't understand: the description of this PR seems to imply that currently you can't do this at all, but this is not true, is it?

It is, except for Firebird (and SQLite). To verify, consider this code:

soci::blob b(sql);

char buf[] = "Some data";
b.write_from_start(buf, sizeof(buf));

This will error on most backends due to the BLOB's internals not being initialitzed yet.

In constrast, with this PR, this works for all backends that support BLOBs, except for Oracle.

vadz

Thanks a lot for all your work here, there are clearly a lot of improvements and we need to merge this, however there are also still some questions that need to be addressed first.

Globally I'd like to understand whether we really need to allow using blob::read_from_start() with empty blob work. It's not clear to me why is it so, but I may be missing some good reasons.

In PostgreSQL backend I'm still lukewarm about extracting details because it results in a lot of changes without any obvious gain, but if you really prefer it this way, so be it. I'd like to improve/centralize handling of functions using 64-bit offsets, e.g. have a template which calls 32-bit function if the offset fits into 32 bits and 64-bit one otherwise without fall back on the 32-bit one in this case, as this would just hide the error, but won't do the right thing (and could corrupt the database data).

I also didn't have time to fully understand all "from base" machinery yet, but I'd like to submit the current remarks already and I'll try to get back to the rest a.s.a.p.

Thanks again for all this!

include/soci/blob.h

include/soci/firebird/soci-firebird.h

include/soci/oracle/soci-oracle.h

include/soci/soci-backend.h

tests/postgresql/test-postgresql.cpp

tests/common-tests.h

Krzmbrzl · 2024-01-16T19:32:09Z

Globally I'd like to understand whether we really need to allow using blob::read_from_start() with empty blob work. It's not clear to me why is it so, but I may be missing some good reasons.

It avoids special-casing on user's side. Semantically, it doesn't change a thing whether or not a blob contains any data. The API already works by specifying a max-size (the size of the buffer to read into) and returning the actual amount of read bytes. Consider e.g.

std::size_t read = blob.read_from_start(my_buffer, my_buffer.size());

This code seems like it should always work and it sometimes throwing an exception seems unnecessary.

In PostgreSQL backend I'm still lukewarm about extracting details because it results in a lot of changes without any obvious gain, but if you really prefer it this way, so be it.

You mean blob_details? It simply groups together what logically belongs together. This helps to make sure that these variables are not accidentally brought into an inconsistent state because someone overlooked that there is a relation between the two. Plus, it allows to return them together in a named struct instead of having to fall back to returning a tuple or something like that.

I'd like to improve/centralize handling of functions using 64-bit offsets, e.g. have a template which calls 32-bit function if the offset fits into 32 bits and 64-bit one otherwise without fall back on the 32-bit one in this case, as this would just hide the error, but won't do the right thing (and could corrupt the database data).

My stance here would be to simply drop the 32-bit API altogether. My gut feeling is that there is probably no supported RDBMS version out there anymore that doesn't have support for 64 bit yet.

vadz · 2024-01-16T19:38:36Z

This code seems like it should always work and it sometimes throwing an exception seems unnecessary.

But is it really normal for the blob to be empty when you read from it? Presumably, if you're trying to use it, it should have some data. I.e. I think that not throwing an exception hides a programming error here. But maybe I don't understand how empty blobs are used. In which cases can you end up with an empty blob?

My stance here would be to simply drop the 32-bit API altogether.

I agree that we probably could do this. I definitely would be fine with not providing BLOB support for them and just returning an error.

For the record, I think this is not a good change, but it was very strongly requested during code review, so here we are.

Krzmbrzl · 2024-01-28T18:09:11Z

@vadz I believe I have now addressed all review comments issued thus far

vadz

Thanks a lot for the fixes! I didn't even mean to ask you to change to void* everywhere in this PR (this was just an idea for later), but it's nice to have done it too, thanks!

I'll retest my own code with this PR and will merge it soon, if no unexpected problems are discovered.

include/soci/blob.h

src/core/blob.cpp

tests/common-tests.h

tests/firebird/test-firebird.cpp

Co-authored-by: VZ <vz-github@zeitlins.org>

vadz

Great, thanks, I'll (finally) merge this soon.

Krzmbrzl force-pushed the revamp-blob-impl branch 18 times, most recently from 5531184 to 6921357 Compare September 16, 2022 15:43

Krzmbrzl marked this pull request as ready for review September 16, 2022 16:33

Krzmbrzl changed the title ~~WIP: Revamp BLOB implementations~~ Revamp BLOB implementations Sep 16, 2022

Krzmbrzl commented Sep 16, 2022

View reviewed changes

src/backends/postgresql/blob.cpp Outdated Show resolved Hide resolved

vadz reviewed Sep 17, 2022

View reviewed changes

include/soci/soci-backend.h Outdated Show resolved Hide resolved

include/soci/soci-backend.h Outdated Show resolved Hide resolved

include/soci/mysql/soci-mysql.h Outdated Show resolved Hide resolved

src/backends/sqlite3/statement.cpp Outdated Show resolved Hide resolved

Krzmbrzl requested a review from vadz January 13, 2024 19:27

Krzmbrzl mentioned this pull request Jan 13, 2024

Revamp CMake support #1118

Draft

vadz reviewed Jan 14, 2024

View reviewed changes

Krzmbrzl force-pushed the revamp-blob-impl branch from ae3f8de to 5253e42 Compare January 16, 2024 19:41

Address review comments

be46771

Krzmbrzl force-pushed the revamp-blob-impl branch from 5253e42 to be46771 Compare January 16, 2024 19:45

Krzmbrzl added 5 commits January 16, 2024 21:03

Remove asserts

3e21119

For the record, I think this is not a good change, but it was very strongly requested during code review, so here we are.

Disable Blob tests for ODBC

4fc8736

Moved trivial blob backend impl to private header

766d7a2

Rename type-traits.h to is-detected.h

a5921ce

Switch to a void* based API for blobs

d92a250

Krzmbrzl force-pushed the revamp-blob-impl branch from 74eca35 to 1dc95a3 Compare January 28, 2024 17:28

Oracle: Add 64-bit size support for BLOBs

0dd4a5a

Krzmbrzl force-pushed the revamp-blob-impl branch from 1dc95a3 to 0dd4a5a Compare January 28, 2024 17:38

Krzmbrzl added 4 commits January 28, 2024 18:41

Postgresql: Add 64-bit size support for BLOBs

10d5f08

Make blobs default-constructible

f19c23b

Remove unnecessary reinterpret_casts

ea3e29e

Update BLOB docs

ca32ddd

vadz approved these changes Jan 29, 2024

View reviewed changes

include/soci/blob.h Outdated Show resolved Hide resolved

src/core/blob.cpp Outdated Show resolved Hide resolved

tests/common-tests.h Outdated Show resolved Hide resolved

tests/firebird/test-firebird.cpp Show resolved Hide resolved

Krzmbrzl and others added 3 commits February 6, 2024 19:55

Make blob default ctor implicit

08f750c

Co-authored-by: VZ <vz-github@zeitlins.org>

Fix coding style

8798c3f

Co-authored-by: VZ <vz-github@zeitlins.org>

Don't use macros for what can be done by a member function

f267fb1

vadz approved these changes Feb 6, 2024

View reviewed changes

vadz merged commit 0c26c65 into SOCI:master Mar 19, 2024
16 checks passed

vadz mentioned this pull request Mar 19, 2024

Add automatic type conversion to row::get() #1127

Closed

vadz mentioned this pull request Mar 27, 2024

Add back support for PostgreSQL 9.2 #1134

Merged

Spixmaster mentioned this pull request Jun 7, 2024

File "private/soci-trivial-blob-backend.h" missing #1153

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revamp BLOB implementations #992

Revamp BLOB implementations #992

Krzmbrzl commented Sep 15, 2022 •

edited

Loading

Krzmbrzl commented Sep 16, 2022

vadz commented Sep 16, 2022

Krzmbrzl commented Sep 16, 2022 •

edited

Loading

vadz left a comment

Krzmbrzl commented Sep 18, 2022

vadz commented Sep 18, 2022

Krzmbrzl commented Sep 18, 2022

vadz commented Sep 18, 2022

Krzmbrzl commented Sep 18, 2022

vadz left a comment

Krzmbrzl commented Jan 16, 2024

vadz commented Jan 16, 2024

Krzmbrzl commented Jan 28, 2024

vadz left a comment

vadz left a comment

Revamp BLOB implementations #992

Revamp BLOB implementations #992

Conversation

Krzmbrzl commented Sep 15, 2022 • edited Loading

Notes

Krzmbrzl commented Sep 16, 2022

vadz commented Sep 16, 2022

Krzmbrzl commented Sep 16, 2022 • edited Loading

vadz left a comment

Choose a reason for hiding this comment

Krzmbrzl commented Sep 18, 2022

vadz commented Sep 18, 2022

Krzmbrzl commented Sep 18, 2022

vadz commented Sep 18, 2022

Krzmbrzl commented Sep 18, 2022

vadz left a comment

Choose a reason for hiding this comment

Krzmbrzl commented Jan 16, 2024

vadz commented Jan 16, 2024

Krzmbrzl commented Jan 28, 2024

vadz left a comment

Choose a reason for hiding this comment

vadz left a comment

Choose a reason for hiding this comment

Krzmbrzl commented Sep 15, 2022 •

edited

Loading

Krzmbrzl commented Sep 16, 2022 •

edited

Loading