Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 6057 - vlv search may result wrong result with lmdb #6091

Merged
merged 4 commits into from Feb 22, 2024

Conversation

progier389
Copy link
Contributor

@progier389 progier389 commented Feb 13, 2024

Different issue related to vlv index and import/bulk import:

  1. vlv sub database was not open when the backend was started
  2. vlv index was not cleaned by import/bulk import
  3. vlv index was not rebuilt by import/bulk import
  4. vlv index not rebuilt by explicit vlv reindex.
  5. vlv index not rebuilt by explicit vlv reindex if vlv name contains hyphen.
  6. vlv index not rebuilt if basedn is not the suffix.

In fact all theses issues had the same cause: the backend vlv search list is empty after the server get restarted.

Solution:
[For 1,2 and 3] Fix the test_vlv_cache_subdb_names to ensure that vlv index are properly cleaned
and recreated by a bulk import
Initialize the vlv search list if it is not yet initialized when starting an instance (just before opening
all the sub databases associated with the backend) rather than doing it before restarting the instance after the import.
[For 4] Add a new member for vlv in the import context and handle it properly.
[For 5] Convert the vlv name as a dbname and store it is a separate list - compare the dbname when checking if vlv is reindexed.
[for 6] Rebuild the proper entry dn (in case of reindex) to be able to evaluate the vlv scope
to rebuild the dn I used the entry_info data (stored in a temporary database) that contains the rdn/nrdn/
and ancestors IDs (used to to rebuild the entryrdn index) and now also store the dn which is simply
propagated by adding the entry rdn to the parent entry dn.

Issue: #6057

Reviewed by: @tbordaz , @droideck (Thanks!)

@progier389 progier389 self-assigned this Feb 13, 2024
@progier389 progier389 added this to the 3.0.0 milestone Feb 13, 2024
@progier389 progier389 linked an issue Feb 13, 2024 that may be closed by this pull request
@progier389 progier389 added the work in progress Work in Progress - can be reviewed, but not ready for merge. label Feb 13, 2024
@progier389 progier389 removed the work in progress Work in Progress - can be reviewed, but not ready for merge. label Feb 16, 2024
@progier389
Copy link
Contributor Author

A bit tedious, but this time, the freeipa test pass ...

@progier389 progier389 added the work in progress Work in Progress - can be reviewed, but not ready for merge. label Feb 19, 2024
@progier389
Copy link
Contributor Author

But unfortunately CI tests show regressions both in LDBM and BDB 😞
On bdb it is likely an issue in the CI test as only lmdb code is impacted by this fix.

@progier389
Copy link
Contributor Author

progier389 commented Feb 20, 2024

Fixed the LMDB regression about missing entries because ancestorid index was corrupted. (I forgot to realign a part of the code that was also using the entry_info record)
About BDB, one of the failure cause is due to an unrelated regression that impacts the main branch
(some weird messages about paged search timeout are logged and the presence of these error messages
cause failure in some tests)
And the other one is due to #6029
(Solved the issue by skipping test_vlv_reindex and test_vlv_cache_subdb_names when using bdb)

@progier389 progier389 removed the work in progress Work in Progress - can be reviewed, but not ready for merge. label Feb 21, 2024
@progier389
Copy link
Contributor Author

This time the CI tests look good (FYI: the import tests fail because of Issue #6103)

@jchapma
Copy link
Contributor

jchapma commented Feb 21, 2024

This time the CI tests look good (FYI: the import tests fail because of Issue #6103)

Fixed

Copy link
Contributor

@tbordaz tbordaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall fix is difficult to review without a deep dive in vlv indexes. I only looked at cosmetic and local logical without understanding the complete view.
LGTM

@@ -3111,7 +3155,8 @@ process_vlv_index(backentry *ep, ImportWorkerInfo *info)
struct vlvIndex *vlv_index = ps->vlv_index;
Slapi_PBlock *pb = slapi_pblock_new();
slapi_pblock_set(pb, SLAPI_BACKEND, be);
if (vlv_index && (ctx->indexAttrs==NULL || attr_in_list(vlv_index->vlv_name, ctx->indexAttrs))) {
if (vlv_index && vlv_index->vlv_attrinfo &&
is_reindexed_attr(vlv_index->vlv_attrinfo->ai_type , ctx, ctx->indexVlvs)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why relaying on vlv_index->vlv_attrinfo->ai_type rather than vlv_index->vlv_name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (BTW: I fall in that pitfall too! (I first used vlvName then got in trouble) 😉
That is because in truncate_index_dbi (that deletes the vlv index that are reindexed) the vlv_index is not available but only the ai.

memcpy(dn, rdn, rdnlen);
if (pinfo[INFO_IDX_DN_LEN]) {
dn[rdnlen] = ',';
memcpy(dn+rdnlen+1, INFO_DN(pinfo), pinfo[INFO_IDX_DN_LEN]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code quite difficult to understand, sorry in advance if my question is confusing.
Few lines above you computed 'dnlen' with +1 and I understood it was for '\0'. Here we are adding ',' between 'rdn' and '<rest_of_dn>'. I am concerned if that added ',' was already account in the dnlen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the space for ',' and final \0 are accounted in dnlen:
dnlen = rdnlen + 1 + pinfo[4]; /* dn len (including final \0) */
in other words: strlen(rdb) + 1 (for the comma) + parent dn len (including the \0)

BTW there is a minor glitch: should better be INFO_IDX_DN_LEN instead of 4:
dnlen = rdnlen + 1 + pinfo[INFO_IDX_DN_LEN];

}
return 0;
}

/* vlv_getindices callback that truncate vlv index (in reindex case) */
static int
truncate_index_dbi(struct attrinfo *ai, ImportCtx_t *ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function truncate_index_dbi looks specific to vlv. Would it make sense to rename it with 'vlv' in its name ?

Copy link
Member

@droideck droideck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that dirsrvtests/tests/suites/import/import_test.py::test_entry_with_escaped_characters_fails_to_import_and_index fails (but it doesn't fail on main). Could it be related?

@progier389
Copy link
Contributor Author

FYI: About the test_entry_with_escaped_characters_fails_to_import_and_index test: a look at the error log shows that it is #6103 That James fixed it yesterday but I have not rebased the branch with that fix. ==> The problem should be fixed by the merge.

@progier389
Copy link
Contributor Author

Fixed the minor glitch and rebased

@progier389 progier389 merged commit a380deb into 389ds:main Feb 22, 2024
193 of 195 checks passed
Copy link
Member

@droideck droideck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments and questions, but besides that - looks good! Ack

log.info(f'Adding {users_num} users')
for i in range(0, users_num):
uid = STARTING_UID_INDEX + i
def add_an_user(inst, users, uid):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick - the correct grammar will be add_a_user. But probably, add_user will be even better for the function name here.

dnlen = rdnlen + 1 + pinfo[INFO_IDX_DN_LEN]; /* dn len (including final \0) */
}

len = rdnlen + strlen(nrdn) + 2 + dnlen + (INFO_IDX_ANCESTORS + 1 + pinfo[INFO_IDX_NB_ANCESTORS]) * sizeof(ID);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a very minor thing, but why not calculate strlen(nrdn) one time in the beginning the same way as it's done for rdnlen? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree: it is cleaner - I added rdnlen because I needed it to compute the dn size but I did not need to use nrdnlen for the dn size computation so I just did not think about it ...

#define INFO_NRDN(info) ((char*)(&((ID*)(info))[INFO_IDX_ANCESTORS+((ID*)(info))[INFO_IDX_NB_ANCESTORS]]))
#define INFO_RDN(info) (INFO_NRDN(info)+((ID*)(info))[INFO_IDX_NRDN_LEN])
#define INFO_DN(info) (INFO_RDN(info)+((ID*)(info))[INFO_IDX_RDN_LEN])
#define INFO_RECORD_LEN(info) ((INFO_DN(info)-(char*)(info))+(info)[INFO_IDX_DN_LEN])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is INFO_RECORD_LEN? I don't see it used anywhere (or mentioned in the comment above)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it's just a LEN of the data, but it could be nice to describe.

@@ -3111,7 +3155,8 @@ process_vlv_index(backentry *ep, ImportWorkerInfo *info)
struct vlvIndex *vlv_index = ps->vlv_index;
Slapi_PBlock *pb = slapi_pblock_new();
slapi_pblock_set(pb, SLAPI_BACKEND, be);
if (vlv_index && (ctx->indexAttrs==NULL || attr_in_list(vlv_index->vlv_name, ctx->indexAttrs))) {
if (vlv_index && vlv_index->vlv_attrinfo &&
is_reindexed_attr(vlv_index->vlv_attrinfo->ai_type , ctx, ctx->indexVlvs)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not related to this change, but it could be nice to fix/clarify. In this function (just a few lines below) there's error message slapi_log_err(SLAPI_LOG_ERR, "process_regular_index", "index_addordel_values_ext_sv failed.\n");.

I think it could be a typo, as it has the wrong function name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! looks like a cut and paste issue! 😉

slapi_task_log_notice(job->task, "%s: Indexing attribute: %s", job->inst->inst_name, mii->name);
if (ii->ai->ai_indexmask == INDEX_VLV) {
if (job->task) {
slapi_task_log_notice(job->task, "%s: Indexing VLV: %s", job->inst->inst_name, mii->name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, where mii is freed? I see one if -

if (a->offset_dbi) {
        *(MdbIndexInfo_t**)(((char*)ctx)+a->offset_dbi) = mii;
    }

But if we are not a->offset_dbi - what then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mii is stored in an avl hashtable so I think it is freed by: avl_free(ctx->indexes, (IFP) free_ii);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's put there only in case if (a->offset_dbi) {, right?

Anyway, I probably miss something, as I'm not sure I fully understand what really happens here with ii, ctx->job and how we build the index list in general. I guess I need dive deeper here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No: it is always stored in ctx->indexes map (last line of the function is: avl_insert(&ctx->indexes, mii, cmp_mii, NULL); )

progier389 added a commit that referenced this pull request Mar 13, 2024
)

* Issue 6057 - vlv search may result wrong result with lmdb - Fix 2
* Issue i6057 - Fix2 - Fix review comment

Previous fix is failing after a restart because of a chicken and egg issue related to vlv_init and backend initialization.
vlv_init requires that the backend get initialized to be able to generate the vlvSearch struct.
Because of deadlocks, and to be able to roll back the database instance open transaction I found it easier to avoid using vlv_getindices if vlv is not initialized but rather perform a search on cn=config to build a list of all possible vlv indexes filenames (ignoring the configuration errors) and use that list to open the database files for vlv indices and their cache.

Also fixed some minor issues:
@droideck minor remarks done about #6091 after the merge
a typo while logging info about the database environment parameters

Issue: #6057

Reviewed by: @tbordaz, @droideck , @mreynolds389 (Thanks!)
jchapma pushed a commit that referenced this pull request Apr 10, 2024
* Issue 6057 - vlv search may result wrong result with lmdb

Different issue related to vlv index and import/bulk import:

vlv sub database was not open when the backend was started
vlv index was not cleaned by import/bulk import
vlv index was not rebuilt by import/bulk import
vlv index not rebuilt by explicit vlv reindex.
vlv index not rebuilt by explicit vlv reindex if vlv name contains hyphen.
vlv index not rebuilt if basedn is not the suffix.
In fact all theses issues had the same cause: the backend vlv search list is empty after the server get restarted.

Solution:
[For 1,2 and 3] Fix the test_vlv_cache_subdb_names to ensure that vlv index are properly cleaned
and recreated by a bulk import
Initialize the vlv search list if it is not yet initialized when starting an instance (just before opening
all the sub databases associated with the backend) rather than doing it before restarting the instance after the import.
[For 4] Add a new member for vlv in the import context and handle it properly.
[For 5] Convert the vlv name as a dbname and store it is a separate list - compare the dbname when checking if vlv is reindexed.
[for 6] Rebuild the proper entry dn (in case of reindex) to be able to evaluate the vlv scope
to rebuild the dn I used the entry_info data (stored in a temporary database) that contains the rdn/nrdn/
and ancestors IDs (used to to rebuild the entryrdn index) and now also store the dn which is simply
propagated by adding the entry rdn to the parent entry dn.

Issue: #6057

Reviewed by: @tbordaz , @droideck (Thanks!)
jchapma pushed a commit that referenced this pull request Apr 10, 2024
)

* Issue 6057 - vlv search may result wrong result with lmdb - Fix 2
* Issue i6057 - Fix2 - Fix review comment

Previous fix is failing after a restart because of a chicken and egg issue related to vlv_init and backend initialization.
vlv_init requires that the backend get initialized to be able to generate the vlvSearch struct.
Because of deadlocks, and to be able to roll back the database instance open transaction I found it easier to avoid using vlv_getindices if vlv is not initialized but rather perform a search on cn=config to build a list of all possible vlv indexes filenames (ignoring the configuration errors) and use that list to open the database files for vlv indices and their cache.

Also fixed some minor issues:
@droideck minor remarks done about #6091 after the merge
a typo while logging info about the database environment parameters

Issue: #6057

Reviewed by: @tbordaz, @droideck , @mreynolds389 (Thanks!)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vlv search may result wrong result with lmdb
4 participants