Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp BLOB implementations #992

Merged
merged 78 commits into from
Mar 19, 2024
Merged

Revamp BLOB implementations #992

merged 78 commits into from
Mar 19, 2024

Conversation

Krzmbrzl
Copy link
Contributor

@Krzmbrzl Krzmbrzl commented Sep 15, 2022

This PR revamps the BLOB implementation across all supported backends, except DB2 and ODBC (which remain at not supporting BLOBs). The goal is to have a unified interface for BLOB objects in order to be able to write code that is portable between different backends. Besides unification, this PR also introduces new features to the soci::blob type that makes it fit in better with the rest of SOCI's API. The following features are covered by this PR:

  • BLOBs can be created as empty objects via
soci::blob myBlob(sql);
  • I/O is possible on BLOBs initialized in this way (mostly behaving like a regular container)
std::size_t blobSize = myBlob.get_len();
myBlob.write_from_start("dummy", 5);
  • Once the blob contains the desired data, it can be inserted into the database (including its current data) as usual
sql << "insert into my_table(blob_val) values(:b)", soci::use(myBlob);
  • Selecting a blob into a blob object overwrites its previous contents (if any)
  • Changes to a blob are not automatically reflected in the database. In order for changes to be committed to the database, the blob must be explicitly used in an insert or update query.
  • When selecting a blob type via a rowset, they are now actually represented by a blob object instead of a std::string. The blob can be obtained from a given row via the new move_as<> function that complements the already existing get<> function for non-copyable types (such as blobs). Trying to use get<> for blob types will yield a compiler error stating that this is not supported.
  • blobs work with prepared statements - insert statements as well as select ones.
  • BLOBs can be reused - e.g. the same blob object can be used multiple times, always yielding the same data in the query (instead of being empty after executing the first query)

Test cases ensure all of this behavior is consistent across the backends that support BLOBs

Notes

  • Since the MySQL interface seems to completely work by assembling the to-be-executed query as a string and then executing that, handling BLOBs is also embedded in this framework (The data is converted into a hex-literal). Thus, the larger the BLOB that is to be inserted, the larger the resulting query. If the objects are too large for MySQL's message limits, the query will fail.
    A more efficient implementation should be possible, but I think only by refactoring the MySQL backend to use prepared statements instead of string queries. I'll leave this optimization for another day.
  • Using blob objects in soci::values objects for ORMs (Bind a soci::blob field into a custom class with type_conversion<> #476) will probably require an introduction of a move_as function as a complement to the existing get function just as has been done for row. However, I'm not touching ORMs for now as I don't need them and this PR is big enough as it is.
  • For the same reason bulk operations remain unsupported

The test cases of the affected backends have been extended to cover the new API behavior.

Fixes #985
Fixes #922


TODO:

  • Write more (actual) tests for rowset access of BLOBs
  • Write tests for batch-insert and -select of BLOBs

@Krzmbrzl Krzmbrzl force-pushed the revamp-blob-impl branch 18 times, most recently from 5531184 to 6921357 Compare September 16, 2022 15:43
@Krzmbrzl Krzmbrzl marked this pull request as ready for review September 16, 2022 16:33
@Krzmbrzl Krzmbrzl changed the title WIP: Revamp BLOB implementations Revamp BLOB implementations Sep 16, 2022
@Krzmbrzl
Copy link
Contributor Author

@vadz I think I am (mostly) done with this PR. I updated the PR's description to include a summary of the changes. By looking at the individual commits you should get a more detailed picture, still.

The macOS CI runs seem to have been cancelled for some reason, which leads to them being reported as "failing".

Review(s) welcome :)

@vadz
Copy link
Member

vadz commented Sep 16, 2022

Thanks!

I have no idea what's going on with macOS CI jobs, I've just fixed PostgreSQL one a couple of days ago... I'll try to look at this soon.

@Krzmbrzl
Copy link
Contributor Author

Krzmbrzl commented Sep 16, 2022

My guess for the macOS jobs is that the free runs that GitHub actions provides have been used up for the current period - but again: just a guess 🤷
EDIT: Maybe not - it seems GA is completely free for public repos...

Copy link
Member

@vadz vadz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but I have a couple of questions:

  1. What is the purpose of all the changes to PostgreSQL code? I am not really sure why is the new version better than the old one?
  2. Why can't we have backend-independent tests? It seems like they overlap/duplicate each other a lot.

include/soci/soci-backend.h Outdated Show resolved Hide resolved
include/soci/soci-backend.h Outdated Show resolved Hide resolved
include/soci/mysql/soci-mysql.h Outdated Show resolved Hide resolved
src/backends/sqlite3/statement.cpp Outdated Show resolved Hide resolved
@Krzmbrzl
Copy link
Contributor Author

  1. Because the old code does not offer the needed functionality? Sorry, but I am a bit confused by your comment. Are you referring to a specific part of the changes?
  2. Don't ask me. I simply extended what was already there. If you ask me, it doesn't really make any sense to have Backend-specific tests for a library like SOCI (except for really Backend-specific functionality) but apparently this is the way that was chosen 🤷

@vadz
Copy link
Member

vadz commented Sep 18, 2022

  1. Because the old code does not offer the needed functionality? Sorry, but I am a bit confused by your comment. Are you referring to a specific part of the changes?

I was asking about the changes to PostgreSQL because it wasn't obvious to me why was it necessary to introduce blob_details to do this. It's not that I think it's a bad idea, but this results in a lot of changes which make it not really obvious to be sure that there are no regressions, which is my main concern. So I'd just like to understand why was it a good one?

2. Don't ask me. I simply extended what was already there. If you ask me, it doesn't really make any sense to have Backend-specific tests for a library like SOCI (except for really Backend-specific functionality) but apparently this is the way that was chosen shrug

Normally the backend-specific tests are only supposed to test for backend-specific functionality. I realize that this is not always the case and I guess it could be due to the fact that BLOB support was originally implemented only for Oracle, then extended to PostgreSQL and then to the other backends. Still, it would be nice to at least stop compounding the problem and add the new tests to the common code. If they are really exactly the same, could you please move them to common-tests.h? TIA!

@Krzmbrzl
Copy link
Contributor Author

I was asking about the changes to PostgreSQL because it wasn't obvious to me why was it necessary to introduce blob_details to do this. It's not that I think it's a bad idea, but this results in a lot of changes which make it not really obvious to be sure that there are no regressions, which is my main concern. So I'd just like to understand why was it a good one?

Ah, okay 💡
Strictly speaking it was not necessary, but I needed getters and setters for the OID and for the file descriptor and since these two are kinda linked, I didn't like the idea of having separate access functions for these fields as that could lead to an invalid object state (where the set OID does not refer to the same LOB as the file descriptor). Thus, I merged them into a single data structure.

If they are really exactly the same, could you please move them to common-tests.h

Wouldn't that mean that this would also run for the Oracle backend? If so, they will fail as I did not streamline the Oracle Blob API as I didn't quite understand the Oracle docs on the subject.

@vadz
Copy link
Member

vadz commented Sep 18, 2022

I hope to merge #996 soon, so using std::uint8_t shouldn't be a problem.

I'm a bit worried about the state of Oracle, as it's the most idiosyncratic backend and it wouldn't be a good idea to provide an API that can't be implemented for it. Unfortunately I don't know much about Oracle BLOB support, but surely it should be possible to write a BLOB into it. In fact, this is the part I don't understand: the description of this PR seems to imply that currently you can't do this at all, but this is not true, is it?

@Krzmbrzl
Copy link
Contributor Author

surely it should be possible to write a BLOB into it

Oh, I'm almost certain it is possible. I just had no motivation to dig through the docs and figure out how to do it.

In fact, this is the part I don't understand: the description of this PR seems to imply that currently you can't do this at all, but this is not true, is it?

It is, except for Firebird (and SQLite). To verify, consider this code:

soci::blob b(sql);

char buf[] = "Some data";
b.write_from_start(buf, sizeof(buf));

This will error on most backends due to the BLOB's internals not being initialitzed yet.

In constrast, with this PR, this works for all backends that support BLOBs, except for Oracle.

@Krzmbrzl Krzmbrzl requested a review from vadz January 13, 2024 19:27
@Krzmbrzl Krzmbrzl mentioned this pull request Jan 13, 2024
Copy link
Member

@vadz vadz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for all your work here, there are clearly a lot of improvements and we need to merge this, however there are also still some questions that need to be addressed first.

Globally I'd like to understand whether we really need to allow using blob::read_from_start() with empty blob work. It's not clear to me why is it so, but I may be missing some good reasons.

In PostgreSQL backend I'm still lukewarm about extracting details because it results in a lot of changes without any obvious gain, but if you really prefer it this way, so be it. I'd like to improve/centralize handling of functions using 64-bit offsets, e.g. have a template which calls 32-bit function if the offset fits into 32 bits and 64-bit one otherwise without fall back on the 32-bit one in this case, as this would just hide the error, but won't do the right thing (and could corrupt the database data).

I also didn't have time to fully understand all "from base" machinery yet, but I'd like to submit the current remarks already and I'll try to get back to the rest a.s.a.p.

Thanks again for all this!

include/soci/blob.h Outdated Show resolved Hide resolved
include/soci/firebird/soci-firebird.h Outdated Show resolved Hide resolved
include/soci/firebird/soci-firebird.h Show resolved Hide resolved
include/soci/oracle/soci-oracle.h Show resolved Hide resolved
include/soci/soci-backend.h Outdated Show resolved Hide resolved
tests/postgresql/test-postgresql.cpp Outdated Show resolved Hide resolved
tests/common-tests.h Outdated Show resolved Hide resolved
tests/common-tests.h Outdated Show resolved Hide resolved
tests/common-tests.h Outdated Show resolved Hide resolved
tests/common-tests.h Outdated Show resolved Hide resolved
@Krzmbrzl
Copy link
Contributor Author

Globally I'd like to understand whether we really need to allow using blob::read_from_start() with empty blob work. It's not clear to me why is it so, but I may be missing some good reasons.

It avoids special-casing on user's side. Semantically, it doesn't change a thing whether or not a blob contains any data. The API already works by specifying a max-size (the size of the buffer to read into) and returning the actual amount of read bytes. Consider e.g.

std::size_t read = blob.read_from_start(my_buffer, my_buffer.size());

This code seems like it should always work and it sometimes throwing an exception seems unnecessary.

In PostgreSQL backend I'm still lukewarm about extracting details because it results in a lot of changes without any obvious gain, but if you really prefer it this way, so be it.

You mean blob_details? It simply groups together what logically belongs together. This helps to make sure that these variables are not accidentally brought into an inconsistent state because someone overlooked that there is a relation between the two. Plus, it allows to return them together in a named struct instead of having to fall back to returning a tuple or something like that.

I'd like to improve/centralize handling of functions using 64-bit offsets, e.g. have a template which calls 32-bit function if the offset fits into 32 bits and 64-bit one otherwise without fall back on the 32-bit one in this case, as this would just hide the error, but won't do the right thing (and could corrupt the database data).

My stance here would be to simply drop the 32-bit API altogether. My gut feeling is that there is probably no supported RDBMS version out there anymore that doesn't have support for 64 bit yet.

@vadz
Copy link
Member

vadz commented Jan 16, 2024

This code seems like it should always work and it sometimes throwing an exception seems unnecessary.

But is it really normal for the blob to be empty when you read from it? Presumably, if you're trying to use it, it should have some data. I.e. I think that not throwing an exception hides a programming error here. But maybe I don't understand how empty blobs are used. In which cases can you end up with an empty blob?

My stance here would be to simply drop the 32-bit API altogether.

I agree that we probably could do this. I definitely would be fine with not providing BLOB support for them and just returning an error.

@Krzmbrzl
Copy link
Contributor Author

@vadz I believe I have now addressed all review comments issued thus far

Copy link
Member

@vadz vadz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the fixes! I didn't even mean to ask you to change to void* everywhere in this PR (this was just an idea for later), but it's nice to have done it too, thanks!

I'll retest my own code with this PR and will merge it soon, if no unexpected problems are discovered.

include/soci/blob.h Outdated Show resolved Hide resolved
src/core/blob.cpp Outdated Show resolved Hide resolved
tests/common-tests.h Outdated Show resolved Hide resolved
tests/firebird/test-firebird.cpp Show resolved Hide resolved
Krzmbrzl and others added 3 commits February 6, 2024 19:55
Co-authored-by: VZ <vz-github@zeitlins.org>
Co-authored-by: VZ <vz-github@zeitlins.org>
Copy link
Member

@vadz vadz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks, I'll (finally) merge this soon.

@vadz vadz merged commit 0c26c65 into SOCI:master Mar 19, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MySQL BLOB appears to be unsupported soci::RowSet support for social::Blob
2 participants