Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Under specific values of nsDS5ReplicaName, replication may get broken or updates missing #826

Closed
389-ds-bot opened this issue Sep 12, 2020 · 10 comments
Labels
closed: fixed Migration flag - Issue

Comments

@389-ds-bot
Copy link

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/47489


In a replication environment, if the changelog db file name contains extension string multiple times in the file name, the change log file is getting recreated if we perform the db2ldif and ldif2db on the master/hub instance.

Ex: d736e482-198111e1-8d7bedb4-8c53b85f_502ce263000000020000.db4
In this file name "db4" is present twice, once as the extension and other one is in the replica name string ("8d7bedb4").

There is a logic problem in the below function where it is trying to find the filename ends with extension. It calls strstr()function to search the "ext" and which returns the first occurrence of the "ext" string in the filename. if the the "ext" string exist multiple times in the file name it returns false always, which result in creating multiple changelog db file.

====
filename: cl5_api.c

/*
- return 1: true (the "filename" ends with "ext")
- return 0: false
*/
static int _cl5FileEndsWith(const char *filename, const char *ext)
{
char *p = NULL;
int flen = strlen(filename);
int elen = strlen(ext);
if (0 == flen || 0 == elen)
{
return 0;
}
p = strstr(filename, ext);
if (NULL == p)
{
return 0;
}
if (p - filename + elen == flen)
{
return 1;
}
return 0;
}

I have modified this function to fix this issue. Could you please verify the same and include the fix in the master branch?

/*
- return 1: true (the "filename" ends with "ext")
- return 0: false
*/
static int _cl5FileEndsWith(const char *filename, const char *ext)
{
char *p = NULL;
int flen = strlen(filename);
int elen = strlen(ext);
if (0 == flen || 0 == elen)
{
return 0;
}
p = strstr(filename, ext);
if (NULL == p)
{
return 0;
}

    do {
    if (p - filename + elen == flen)
    {
	return 1;
    }
        p = strstr(p+elen, ext);
    } while ( p != NULL );

return 0;

}

Thanks and Regards,
Jyoti

@389-ds-bot 389-ds-bot added the closed: fixed Migration flag - Issue label Sep 12, 2020
@389-ds-bot 389-ds-bot added this to the 1.3.2 - 09/13 (September) milestone Sep 12, 2020
@389-ds-bot
Copy link
Author

Comment from jyotidas81 at 2013-08-29 12:52:47

Hi,

Can anyone please verify this fix?

Thanks in advance.

Regards,
Jyoti

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2013-09-09 23:19:28

Here is the current status

  • Thanks for nailing down the problematic routine, I was able to reproduce the failure of _cl5DBOpen.
    To reproduce this I created a single Master. Then before doing any update, I updated dse.ldif and
    changed the 'nsDS5ReplicaName' of the replica.
   from: c7c6377c-196e11e3-831c8895-1f2ce016
   to:   c7c6377c-196e11e3-831c88db-1f2ce016 (change '95' -> 'db' in the 3rd component)

Then started DS, I can see the logs:

[09/Sep/2013:16:37:34 +0200] NSMMReplicationPlugin - changelog program - _cl5AppInit: fetched backend dbEnv (1efff10)
[09/Sep/2013:16:37:34 +0200] NSMMReplicationPlugin - changelog program - _cl5DBOpen: opened 0 existing databases in /var/lib/dirsrv/slapd-master/changelogdb
[09/Sep/2013:16:37:51 +0200] NSMMReplicationPlugin - replica_add_by_dn: added dn (dc=com)
[09/Sep/2013:16:37:51 +0200] NSMMReplicationPlugin - _replica_configure_ruv: No ruv tombstone found for replica dc=com. Created a new one
[09/Sep/2013:16:37:51 +0200] NSMMReplicationPlugin - replica_delete_by_dn: removed dn (dc=com)
[09/Sep/2013:16:37:51 +0200] NSMMReplicationPlugin - changelog program - _cl5GetDBFile: no DB object found for database /var/lib/dirsrv/slapd-master/changelogdb/4ade9183-195d11e3-831cdb94-1f2ce016_522ddd3f000000010000.db
[09/Sep/2013:16:37:51 +0200] NSMMReplicationPlugin - changelog program - cl5GetOperationCount: could not get DB object for replica
[09/Sep/2013:16:37:51 +0200] NSMMReplicationPlugin - changelog program - _cl5GetDBFile: no DB object found for database /var/lib/dirsrv/slapd-master/changelogdb/4ade9183-195d11e3-831cdb94-1f2ce016_522ddd3f000000010000.db
[09/Sep/2013:16:37:51 +0200] NSMMReplicationPlugin - changelog program - cl5GetOperationCount: could not get DB object for replica
[09/Sep/2013:16:37:51 +0200] NSMMReplicationPlugin - changelog program - _cl5GetDBFile: no DB object found for database /var/lib/dirsrv/slapd-master/changelogdb/4ade9183-195d11e3-831cdb94-1f2ce016_522ddd3f000000010000.db
  • I was unsure of the reported test case. In fact except those errors, db2ldif (master) followed
    by ldif2db (hub) worked and after restart, replication was also running well

  • I created a test case where replication skip updates
    I do not know if it is the reported issue, but it is the one I will use as a test case.

	Create Master, C1, C2
	Update nsDS5ReplicaName on Master, so that it contains 'db' (my database suffix. It can be db3 or db4).
	Create user t1
	Create user t2
	<check replication is working>
	Stop C2
	Create user t3
	<check t3 is replicated on C1>
	Stop Master, C1
	export Master (-r)
	import C1 (this step can likely be skipped)
	Start Master, C1, C2
	Create user t4

		-> On Master: t1, t2, t3, t4
		-> On Cons.1: t1, t2, t3, t4
		-> On Cons.2: t1, t2,     t4
  • The dump of the changelog shows an incomplete record for 'user t3'
	dbid: 0000006f000000000000
		entry count: 7

	dbid: 000000de000000000000
		purge ruv:
			{replicageneration} 522dfa79000000010000
			{replica 1 ldap://pctbordaz.redhat.com:47489}

	dbid: 0000014d000000000000
		max ruv:
			{replicageneration} 522dfa79000000010000
			{replica 1} 522dfb19000000010000 522dfd1b000000010000

	dbid: 522dfb19000000010000
		uniqueid: 31464581-196f11e3-831cdb94-1f2ce016
		dn: uid=t1,dc=com
		operation: add

	dbid: 522dfb38000000010000
		uniqueid: 31464582-196f11e3-831cdb94-1f2ce016
		dn: uid=t2,dc=com
		operation: add

	dbid: 522dfb6c000000010000	<<<<<< broken entry
		uniqueid: 00000000-00000000-00000000-00000000
		dn: cn=start iteration
		operation: delete

	dbid: 522dfcc6000000010000
		uniqueid: 2809a881-197011e3-831cdb94-1f2ce016
		dn: uid=t4,dc=com
	operation: add

Here are the next steps

- I will verify the fix

@389-ds-bot
Copy link
Author

@389-ds-bot
Copy link
Author

Comment from rmeggins (@richm) at 2013-09-12 20:04:15

Can we get this fix into RHEL 6.5? Does this affect 389-ds-base-1.2.11?

@389-ds-bot
Copy link
Author

Comment from rmeggins (@richm) at 2013-09-12 20:07:52

Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1007452

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2013-09-12 20:16:27

Thanks Rich for the review.

At the source level, it applies on 1.2.11.
I will test and confirm if I can reproduce on 1.2.11

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2013-09-12 20:49:41

I confirm the same bug applies on 389-ds-base-1.2.11.
I can reproduce the skipped updates with the same test case, the only difference is that in 1.2.11 database suffix is 'db4' and 'nsDS5ReplicaName' should contain 'db4' to reproduce the issue.

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2013-09-13 01:11:18

Push to master:

git merge ticket47489

Updating b73f1e8..7a7609d
Fast-forward
ldap/servers/plugins/replication/cl5_api.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

git push origin master

Counting objects: 13, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 1.05 KiB, done.
Total 7 (delta 5), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
b73f1e8..7a7609d master -> master

commit 7a7609d
Author: Thierry bordaz (tbordaz) tbordaz@redhat.com
Date: Wed Sep 11 11:08:58 2013 +0200

@389-ds-bot
Copy link
Author

Comment from nhosoi (@nhosoi) at 2013-09-27 05:19:57

389-ds-base-1.3.1 branch: commit ac8aad8
389-ds-base-1.2.11 branch: commit f944cd0

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2017-02-11 23:10:37

Metadata Update from @tbordaz:

  • Issue assigned to tbordaz
  • Issue set to the milestone: 1.3.2 - 09/13 (September)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed: fixed Migration flag - Issue
Projects
None yet
Development

No branches or pull requests

1 participant