Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle BDB keys of different length #156

Conversation

engboris
Copy link
Contributor

@engboris engboris commented Jul 9, 2024

Fix for issue #154

Problem

The function libcob/fileio.c:bdb_bt_compare fails on keys of different length passed in arguments. This is necessary since valid BDB files containing additional data interpreted as keys of different length may be used with GnuCOBOL (example given from one of our customers). Such files may be produced by external tools but still meant to be readable.

Solution

Use the already implemented libcob/common.c:cob_cmp_alnum function (now made external) to compare keys as alphanumeric fields of possibly different length (padded with spaces).

This PR introduces a new preparser flag USE_BDB_KEYDIFF with may be passed in the following way when compiling GnuCOBOL sources:

CPPFLAGS=-DUSE_BDB_KEYDIFF ./configure
make

This flag allows such BDB keys of different length as an option since it has an overhead cost.

Notes

I have also been suggested to

also include the variant that operates on not 100% confirming BDB files (always using a sort function) (from @GitMensch, cf. issue #154)

Which is something I do not currently understand.

@engboris engboris added the bug Something isn't working label Jul 9, 2024
libcob/fileio.c Outdated
@@ -919,20 +919,34 @@ bdb_bt_compare (DB *db, const DBT *k1, const DBT *k2
)
{
const unsigned char *col = (unsigned char *)DBT_GET_APP_DATA (k1);
#ifdef USE_BDB_KEYDIFF /* flag passed with CPPFLAGS */
COB_MODULE_PTR->collating_sequence = col;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the current module's collating sequence; we don't want that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, we also don't want cob_fields to be used here - so extract the logic that only operates on data + size + col from cob_cmp_alnum and locally export it, then calling this here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's still open... just drop that line

Copy link
Collaborator

@GitMensch GitMensch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main open point is the previously unclear registration:

when USE_BDB_KEYDIFF is used then bdb_bt_compare needs to be always registered (it currently is only registered if there is a col in use)

For your example: parameters should be passed as arguments to configure, instead of being set inline/exported (if you are interested why: the autotools manual has an entry on that, but the biggest part is that passed arguments are checked to be identical between runs).

libcob/common.h Outdated Show resolved Hide resolved
libcob/fileio.c Outdated
@@ -919,20 +919,34 @@ bdb_bt_compare (DB *db, const DBT *k1, const DBT *k2
)
{
const unsigned char *col = (unsigned char *)DBT_GET_APP_DATA (k1);
#ifdef USE_BDB_KEYDIFF /* flag passed with CPPFLAGS */
COB_MODULE_PTR->collating_sequence = col;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, we also don't want cob_fields to be used here - so extract the logic that only operates on data + size + col from cob_cmp_alnum and locally export it, then calling this here

libcob/fileio.c Outdated Show resolved Hide resolved
@engboris engboris requested a review from GitMensch July 10, 2024 12:14
Copy link
Collaborator

@GitMensch GitMensch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please adjust as noted in the fix-commit (I've screwed up and commented there instead of on the PR).

In any case you want to further add to the ChangeLog.

One thing I do wonder: should cob_cmp_strings() have a parameter int national and do a different compare for the spaces in this case? cob_cmp_alnum() would pass zero, bdb_bt_compare() would pass depending on the national flag in k1.

In any case this can be done in a later refactoring on as well.

@engboris engboris force-pushed the bdb-compare-keys-diff-length branch from 2408d79 to 986ffd8 Compare July 10, 2024 13:21
@engboris engboris changed the title [WIP] Handle BDB keys of different length Handle BDB keys of different length Jul 10, 2024
@engboris
Copy link
Contributor Author

engboris commented Jul 10, 2024

One thing I do wonder: should cob_cmp_strings() have a parameter int national and do a different compare for the spaces in this case? cob_cmp_alnum() would pass zero, bdb_bt_compare() would pass depending on the national flag in k1.

It sounds right to me. I've looked at the DBT structure and in particular its flags attribute but I don't see anything corresponding to a national flag

@GitMensch
Copy link
Collaborator

It sounds right to me. I've looked at the DBT structure and in particular its flags attribute but I don't see anything corresponding to a national flag

That's not inside of the DBT->flags (those really belong to BDB), but instead it is inside the cob_field, which only is used to set the DBT->data+size (see DBT_SET macro); we could change the appdata to be a struct, but then: this isn't really useful as we'd check this each time.
The much more easier solution is to register either bdb_bt_compare or bdb_bt_compare_nat, as we do have this information when checking f->keys[i] there.

@engboris
Copy link
Contributor Author

engboris commented Jul 10, 2024

It might be wiser for us to leave that for a later refactoring as there is a need from one of our customers and I'm not very familiar with character sets and national yet (I currently don't know what this int national argument here would be). What do you think?

@GitMensch
Copy link
Collaborator

I agree, let's finish this PR just leaving a note at the place where we register the compare function, possibly something like /* TODO: add national compare function later */.

Copy link
Collaborator

@GitMensch GitMensch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some finalization, then ready to check-in upstream

libcob/common.c Outdated
Comment on lines 2034 to 2037
return cob_cmp_strings ((unsigned char *)data1, (unsigned char *)data2,
size1, size2, col);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to not pass the arguments directly without intermediate variables?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@engboris you've marked that as solved but the question is unanswered?

libcob/fileio.c Outdated
@@ -919,20 +919,34 @@ bdb_bt_compare (DB *db, const DBT *k1, const DBT *k2
)
{
const unsigned char *col = (unsigned char *)DBT_GET_APP_DATA (k1);
#ifdef USE_BDB_KEYDIFF /* flag passed with CPPFLAGS */
COB_MODULE_PTR->collating_sequence = col;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's still open... just drop that line

libcob/fileio.c Outdated
Comment on lines 925 to 929
unsigned char *data1 = k1->data;
unsigned char *data2 = k2->data;
size_t size1 = (size_t)k1->size;
size_t size2 = (size_t)k2->size;
return cob_cmp_strings (data1, data2, size1, size2, col);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for intermediate variables here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason. It was temporary and they were meant to be removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libcob/fileio.c Outdated Show resolved Hide resolved
libcob/ChangeLog Outdated Show resolved Hide resolved
@engboris engboris force-pushed the bdb-compare-keys-diff-length branch from 42bb2de to 373e465 Compare July 11, 2024 07:56
Copy link
Collaborator

@GitMensch GitMensch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the unanswered answer (and the doc comment) that's finished; would you like to commit that upstream then?
[note: upstream 3.x is currently failing CI, you may want to wait until @chuck-haatvedt did his final fix-commit tomorrow

libcob/common.c Outdated
Comment on lines 2034 to 2037
return cob_cmp_strings ((unsigned char *)data1, (unsigned char *)data2,
size1, size2, col);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@engboris you've marked that as solved but the question is unanswered?

libcob/common.c Show resolved Hide resolved
@engboris engboris force-pushed the bdb-compare-keys-diff-length branch from 59f5c22 to 62f1253 Compare July 11, 2024 08:26
@engboris
Copy link
Contributor Author

It should be good now!

@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 50.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 66.11%. Comparing base (db7db96) to head (62f1253).
Report is 39 commits behind head on gcos4gnucobol-3.x.

Files Patch % Lines
libcob/common.c 50.00% 0 Missing and 4 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@                  Coverage Diff                  @@
##           gcos4gnucobol-3.x     #156      +/-   ##
=====================================================
+ Coverage              65.78%   66.11%   +0.32%     
=====================================================
  Files                     32       33       +1     
  Lines                  59416    60044     +628     
  Branches               15694    15802     +108     
=====================================================
+ Hits                   39087    39698     +611     
+ Misses                 14311    14295      -16     
- Partials                6018     6051      +33     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@chuck-haatvedt
Copy link

chuck-haatvedt commented Jul 12, 2024 via email

@ddeclerck
Copy link
Contributor

Merged in SVN @ 5296.

@ddeclerck ddeclerck closed this Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants