Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changelog cache can upload updates from a wrong starting point (CSN) #4492

Closed
tbordaz opened this issue Dec 10, 2020 · 2 comments
Closed

Changelog cache can upload updates from a wrong starting point (CSN) #4492

tbordaz opened this issue Dec 10, 2020 · 2 comments
Assignees
Labels
replication Issue involves replication
Milestone

Comments

@tbordaz
Copy link
Contributor

tbordaz commented Dec 10, 2020

Issue Description
Changelog cache is the mechanism that uploads a bulk of updates from the changelog.
The updates are consecutive (CSN order) updates from a given starting point.
The problem is that the bulk of updates can start from the beginning of the changelog even if the starting point CSN exists in the changelog.

The consequence is that the changelog cache will iterate through the all changelog to retrieve the next update to send. If the changelog is large it will delay the sent updates and create replication lag

Package Version and Platform:
Likely since 1.2.12

Steps to Reproduce
Hammering a large topology with small updates.
Accelerators are
Replica:
nsds5ReplicaReleaseTimeout:30
RA:
nsds5replicaBusyWaitTime: 3
nsds5replicaSessionPauseTime: 5

Expected results
Replication session should start before the initial starting point

@tbordaz tbordaz added the needs triage The issue will be triaged during scrum label Dec 10, 2020
@mreynolds389 mreynolds389 added replication Issue involves replication and removed needs triage The issue will be triaged during scrum labels Dec 10, 2020
@mreynolds389 mreynolds389 added this to the 1.4.3 milestone Dec 10, 2020
tbordaz added a commit to tbordaz/389-ds-base that referenced this issue Dec 14, 2020
… point (CSN)

Bug description:
          When a replication session starts, a starting point is computed
          according to supplier/consumer RUVs.
	  from the starting point the updates are bulk loaded from the CL.
          When a bulk set have been fully evaluated the server needs to bulk load another set.
	  It iterates until there is no more updates to send.
          The bug is that during bulk load, it recomputes the CL cursor position
          and this computation can be wrong. For example if a new update on
          a rarely updated replica (or not known replica) the new position will
          be set before the inital starting point

Fix description:
          Fixing the invalid computation is a bit risky (complex code resulting from
          years of corner cases handling) and a fix could fail to address others flavor
          with the same symptom
          The fix is only (sorry for that) safety checking fix that would end a replication session
          if the computed cursor position goes before the initial starting point.
	  In case of large jump behind (24h) the starting point, a warning is logged.

relates: 389ds#4492

Reviewed by: Mark Reynolds, William Brown

Platforms tested: F31
tbordaz added a commit that referenced this issue Dec 14, 2020
… point (CSN) (#4493)

Bug description:
          When a replication session starts, a starting point is computed
          according to supplier/consumer RUVs.
	  from the starting point the updates are bulk loaded from the CL.
          When a bulk set have been fully evaluated the server needs to bulk load another set.
	  It iterates until there is no more updates to send.
          The bug is that during bulk load, it recomputes the CL cursor position
          and this computation can be wrong. For example if a new update on
          a rarely updated replica (or not known replica) the new position will
          be set before the inital starting point

Fix description:
          Fixing the invalid computation is a bit risky (complex code resulting from
          years of corner cases handling) and a fix could fail to address others flavor
          with the same symptom
          The fix is only (sorry for that) safety checking fix that would end a replication session
          if the computed cursor position goes before the initial starting point.
	  In case of large jump behind (24h) the starting point, a warning is logged.

relates: #4492

Reviewed by: Mark Reynolds, William Brown

Platforms tested: F31
tbordaz added a commit that referenced this issue Dec 14, 2020
… point (CSN) (#4493)

Bug description:
          When a replication session starts, a starting point is computed
          according to supplier/consumer RUVs.
	  from the starting point the updates are bulk loaded from the CL.
          When a bulk set have been fully evaluated the server needs to bulk load another set.
	  It iterates until there is no more updates to send.
          The bug is that during bulk load, it recomputes the CL cursor position
          and this computation can be wrong. For example if a new update on
          a rarely updated replica (or not known replica) the new position will
          be set before the inital starting point

Fix description:
          Fixing the invalid computation is a bit risky (complex code resulting from
          years of corner cases handling) and a fix could fail to address others flavor
          with the same symptom
          The fix is only (sorry for that) safety checking fix that would end a replication session
          if the computed cursor position goes before the initial starting point.
	  In case of large jump behind (24h) the starting point, a warning is logged.

relates: #4492

Reviewed by: Mark Reynolds, William Brown

Platforms tested: F31
tbordaz added a commit that referenced this issue Dec 14, 2020
… point (CSN)

Bug description:
          When a replication session starts, a starting point is computed
          according to supplier/consumer RUVs.
	  from the starting point the updates are bulk loaded from the CL.
          When a bulk set have been fully evaluated the server needs to bulk load another set.
	  It iterates until there is no more updates to send.
          The bug is that during bulk load, it recomputes the CL cursor position
          and this computation can be wrong. For example if a new update on
          a rarely updated replica (or not known replica) the new position will
          be set before the inital starting point

Fix description:
          Fixing the invalid computation is a bit risky (complex code resulting from
          years of corner cases handling) and a fix could fail to address others flavor
          with the same symptom
          The fix is only (sorry for that) safety checking fix that would end a replication session
          if the computed cursor position goes before the initial starting point.
	  In case of large jump behind (24h) the starting point, a warning is logged.

relates: #4492

Reviewed by: Mark Reynolds, William Brown

Platforms tested: F31
@tbordaz
Copy link
Contributor Author

tbordaz commented Dec 14, 2020

32413b5..f629f8f master
18184e5..260128f 389-ds-base-1.4.4
305b75e..82f51a6 389-ds-base-1.4.3
d8963a0..d3364ca 389-ds-base-1.3.10

@tbordaz tbordaz closed this as completed Dec 14, 2020
@tbordaz tbordaz self-assigned this Dec 14, 2020
@tbordaz
Copy link
Contributor Author

tbordaz commented Dec 14, 2020

tbordaz added a commit that referenced this issue Jan 14, 2021
… point (CSN) (#4493)

Bug description:
          When a replication session starts, a starting point is computed
          according to supplier/consumer RUVs.
	  from the starting point the updates are bulk loaded from the CL.
          When a bulk set have been fully evaluated the server needs to bulk load another set.
	  It iterates until there is no more updates to send.
          The bug is that during bulk load, it recomputes the CL cursor position
          and this computation can be wrong. For example if a new update on
          a rarely updated replica (or not known replica) the new position will
          be set before the inital starting point

Fix description:
          Fixing the invalid computation is a bit risky (complex code resulting from
          years of corner cases handling) and a fix could fail to address others flavor
          with the same symptom
          The fix is only (sorry for that) safety checking fix that would end a replication session
          if the computed cursor position goes before the initial starting point.
	  In case of large jump behind (24h) the starting point, a warning is logged.

relates: #4492

Reviewed by: Mark Reynolds, William Brown

Platforms tested: F31
@tbordaz tbordaz modified the milestones: 1.4.3, 1.3.10 Jan 14, 2021
tbordaz added a commit to tbordaz/389-ds-base that referenced this issue Feb 23, 2021
…the changelog

Bug description:
	The replication agreements are using bulk load to load updates.
	For bulk load it uses a cursor with DB_MULTIPLE_KEY and DB_NEXT.
	Before using the cursor, it must be initialized with DB_SET.

	If during the cursor/DB_SET the CSN refers to an update that is larger than
	the size of the provided buffer, then the cursor remains not initialized and
	c_get returns DB_BUFFER_SMALL.

	The consequence is that the next c_get(DB_MULTIPLE_KEY and DB_NEXT) will return the
	first record in the changelog DB. This break CLcache.

Fix description:
	The fix is to harden cursor initialization so that if DB_SET fails
	because of DB_BUFFER_SMALL. It reallocates buf_data and retries a DB_SET.
	If DB_SET can not be initialized it logs a warning.

	The patch also changes the behaviour of the fix 389ds#4492.
	389ds#4492 detected a massive (1day) jump prior the starting csn and ended the
	replication session. If the jump was systematic, for example
	if the CLcache got broken because of a too large updates, then
	replication was systematically stopped.
	This patch suppress the systematically stop, letting RA doing a big jump.
	From 389ds#4492 only remains the warning.

relates: 389ds#4644

Reviewed by: Pierre Rogier (Thanks !!!!)

Platforms tested: F31
tbordaz added a commit that referenced this issue Feb 23, 2021
…the changelog (#4647)

Bug description:
	The replication agreements are using bulk load to load updates.
	For bulk load it uses a cursor with DB_MULTIPLE_KEY and DB_NEXT.
	Before using the cursor, it must be initialized with DB_SET.

	If during the cursor/DB_SET the CSN refers to an update that is larger than
	the size of the provided buffer, then the cursor remains not initialized and
	c_get returns DB_BUFFER_SMALL.

	The consequence is that the next c_get(DB_MULTIPLE_KEY and DB_NEXT) will return the
	first record in the changelog DB. This break CLcache.

Fix description:
	The fix is to harden cursor initialization so that if DB_SET fails
	because of DB_BUFFER_SMALL. It reallocates buf_data and retries a DB_SET.
	If DB_SET can not be initialized it logs a warning.

	The patch also changes the behaviour of the fix #4492.
	#4492 detected a massive (1day) jump prior the starting csn and ended the
	replication session. If the jump was systematic, for example
	if the CLcache got broken because of a too large updates, then
	replication was systematically stopped.
	This patch suppress the systematically stop, letting RA doing a big jump.
	From #4492 only remains the warning.

relates: #4644

Reviewed by: Pierre Rogier (Thanks !!!!)

Platforms tested: F31
tbordaz added a commit that referenced this issue Feb 23, 2021
…the changelog (#4647)

Bug description:
	The replication agreements are using bulk load to load updates.
	For bulk load it uses a cursor with DB_MULTIPLE_KEY and DB_NEXT.
	Before using the cursor, it must be initialized with DB_SET.

	If during the cursor/DB_SET the CSN refers to an update that is larger than
	the size of the provided buffer, then the cursor remains not initialized and
	c_get returns DB_BUFFER_SMALL.

	The consequence is that the next c_get(DB_MULTIPLE_KEY and DB_NEXT) will return the
	first record in the changelog DB. This break CLcache.

Fix description:
	The fix is to harden cursor initialization so that if DB_SET fails
	because of DB_BUFFER_SMALL. It reallocates buf_data and retries a DB_SET.
	If DB_SET can not be initialized it logs a warning.

	The patch also changes the behaviour of the fix #4492.
	#4492 detected a massive (1day) jump prior the starting csn and ended the
	replication session. If the jump was systematic, for example
	if the CLcache got broken because of a too large updates, then
	replication was systematically stopped.
	This patch suppress the systematically stop, letting RA doing a big jump.
	From #4492 only remains the warning.

relates: #4644

Reviewed by: Pierre Rogier (Thanks !!!!)

Platforms tested: F31
tbordaz added a commit that referenced this issue Feb 23, 2021
…the changelog (#4647)

Bug description:
	The replication agreements are using bulk load to load updates.
	For bulk load it uses a cursor with DB_MULTIPLE_KEY and DB_NEXT.
	Before using the cursor, it must be initialized with DB_SET.

	If during the cursor/DB_SET the CSN refers to an update that is larger than
	the size of the provided buffer, then the cursor remains not initialized and
	c_get returns DB_BUFFER_SMALL.

	The consequence is that the next c_get(DB_MULTIPLE_KEY and DB_NEXT) will return the
	first record in the changelog DB. This break CLcache.

Fix description:
	The fix is to harden cursor initialization so that if DB_SET fails
	because of DB_BUFFER_SMALL. It reallocates buf_data and retries a DB_SET.
	If DB_SET can not be initialized it logs a warning.

	The patch also changes the behaviour of the fix #4492.
	#4492 detected a massive (1day) jump prior the starting csn and ended the
	replication session. If the jump was systematic, for example
	if the CLcache got broken because of a too large updates, then
	replication was systematically stopped.
	This patch suppress the systematically stop, letting RA doing a big jump.
	From #4492 only remains the warning.

relates: #4644

Reviewed by: Pierre Rogier (Thanks !!!!)

Platforms tested: F31
tbordaz added a commit that referenced this issue Feb 23, 2021
…the changelog (#4647)

Bug description:
	The replication agreements are using bulk load to load updates.
	For bulk load it uses a cursor with DB_MULTIPLE_KEY and DB_NEXT.
	Before using the cursor, it must be initialized with DB_SET.

	If during the cursor/DB_SET the CSN refers to an update that is larger than
	the size of the provided buffer, then the cursor remains not initialized and
	c_get returns DB_BUFFER_SMALL.

	The consequence is that the next c_get(DB_MULTIPLE_KEY and DB_NEXT) will return the
	first record in the changelog DB. This break CLcache.

Fix description:
	The fix is to harden cursor initialization so that if DB_SET fails
	because of DB_BUFFER_SMALL. It reallocates buf_data and retries a DB_SET.
	If DB_SET can not be initialized it logs a warning.

	The patch also changes the behaviour of the fix #4492.
	#4492 detected a massive (1day) jump prior the starting csn and ended the
	replication session. If the jump was systematic, for example
	if the CLcache got broken because of a too large updates, then
	replication was systematically stopped.
	This patch suppress the systematically stop, letting RA doing a big jump.
	From #4492 only remains the warning.

relates: #4644

Reviewed by: Pierre Rogier (Thanks !!!!)

Platforms tested: F31
kimettog added a commit that referenced this issue Mar 17, 2021
* IDMDS-1068 Update failing ticket48234_test.py test

* IDMDS-1068 Update failing ticket48234_test.py test

* [INTEROP-4009] CodeReady Studio on OpenShift - Run locally

* [INTEROP-4009] CodeReady Studio on OpenShift - Run locally

* [IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* [IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* [IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* [IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* [IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* Issue 4654 Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* Issue 4654 - Updates to tickets/ticket48234_test.py

Bug Description:

Update to tickets/ticket48234_test.py which are currently failing and using
soon to be obsolete classes

Fix Description:

Updated tickets/ticket48234_test.py and ported to the suites directory
Updated to utilise the DSLDAPObject class methods

relates: <The Issue URL>

Author: Gilbert Kimetto

Reviewed by: ???
IDMDS-1068 Update failing ticket48234_test.py test

[INTEROP-4009] CodeReady Studio on OpenShift - Run locally

[INTEROP-4009] CodeReady Studio on OpenShift - Run locally

[IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

[IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* Issue 4609 - CVE - info disclosure when authenticating

Description:  If you bind as a user that does not exist.  Error 49 is returned
              instead of error 32.  As error 32 discloses that the entry does
              not exist.  When you bind as an entry that does not have userpassword
              set then error 48 (inappropriate auth) is returned, but this
              discloses that the entry does indeed exist.  Instead we should
              always return error 49, even if the password is not set in the
              entry.  This way we do not disclose to an attacker if the Bind
              DN exists or not.

Relates: #4609

Reviewed by: tbordaz(Thanks!)

* issue 4612 - Fix pytest fourwaymmr_test for non root user (#4613)

* Issue 4591 - RFE - improve openldap_to_ds help and features (#4607)

Bug Description: Improve the --help page, and finish wiring in some
features.

Fix Description: Wire in exclusion of attributes/schema for migration.

fixes: #4591

Author: William Brown <william@blackhats.net.au>

Review by: @mreynolds389, @droideck

* Issue 4577 - Add GitHub actions

Description:

* Enable IPv6 support for docker daemon
* Set server.example.com as FQDN for container

Relates: #4577

Reviewed by: @droideck (Thanks!)

* Issue 4149 - UI - port TreeView and opther components to PF4

Description:  This ports all th TreeViews to PF4, and also does some proof
              of concept changes for PF3 to PF4 migration.  There is much
              more needed, but this does not break anything

relates: #4149

Reviewed by: spichugi(Thanks!)

* Update dscontainer (#4564)

Issue 4564 - RFE - Add suffix to dscontainer rc file

Bug Description: The suffix was not added before, adding a hurdle to
automatic admin of the container instance

Fix Description: If the suffix is set, add it to the created rc file. 

fixes: #4564

Author: @Jackbennett

Review by: @Firstyear

* Issue 4469 - Backend redesign phase 3a - bdb dependency removal from back-ldbm

A massive change (https://directory.fedoraproject.org/docs/389ds/design/backend-redesign-phase3.html) that implements and use the dbimpl API in the backend.

* Issue 4593 - RFE - Print help when nsSSLPersonalitySSL is not found (#4614)

Description: RHDS instance will fail to start if the TLS server
certificate nickname doesn't match the value of the configuration
parameter "nsSSLPersonalitySSL".

The mismatch typically happens when customers copy the NSS DB from
a previous instance or export the certificate's data but forget to set
the "nsSSLPersonalitySSL" value accordingly.

Log an additional message which should help a user to set up
nsSSLPersonalitySSL correctly.

Fixes: #4593

Reviewed by: @Firstyear (Thanks!)

* Issue 4324 - Some architectures the cache line size file does not exist

Bug Description:  When optimizing our mutexes we check for a system called
                  coherency_line_size that contains the size value, but if
                  the file did not exist the server would crash in PR_Read
                  (NULL pointer for fd).

Fix Description:  Check PR_Open() was successfully before calling PR_Read().

Relates: #4324

Reviewed by: tbordaz(Thanks!)

* Issue 4469 - Backend redesing phase 3a - implement dbimpl API and use it in back-ldbm (#4618)

see design document https://directory.fedoraproject.org/docs/389ds/design/backend-redesign-phase3.html

* Issue 4615 - log message when psearch first exceeds max threads per conn

Desciption:  When a connection hits max threads per conn for the first time
             log a message in the error.  This will help customers diagnosis
             misbehaving clients.

Fixes: #4615

Reviewed by: progier389(Thanks!)

* Issue 4619 - remove pytest requirement from lib389

Description:  Remove the requirement for pytest from lib389, it causes
              unneeded package requirements on Fedora/RHEL.

Fixes: #4619

Reviewed by: mreynolds(one line commit rule)

* Bump version to 2.0.3

* Issue 4513 - CI - make acl ip address tests more robust

Description:  The tests aumme the system is using IPv6 loopback address, but it
              should still check for IPv4 loopback.

Relates: #4513

Reviewed by: ?

* Issue 2820 - Fix CI test suite issues

Description:
tickets/ticket48961_test.py was failing in CI nightly runs.
Fixed the failure by changing the code to use DSLdapObject
and moved the code into the config test suite.

Relates: #2820

Reviewed by: droideck (Thanks!)

* Issue 4169 - UI - port charts to PF4

Description:  Ported the charts under the monitor tab to use PF4 sparkline charts
              and provide realtime stats on the the caches.

Relates: #4169

Reviewed by: spichugi(Thanks!)

* Issue 4595 - Paged search lookthroughlimit bug (#4602)

Bug Description: During a paged search with lookthroughlimit enabled,
lookthroughcount is used to keep track of how many entries are
examined. A paged search reads ahead one entry to catch the end of the
search so it doesn't show the prompt when there are no more entries.
lookthroughcount doesn't take read ahead into account when tracking
how many entries have been examined.

Fix Description: Keep lookthroughcount in sync with read ahead by
by decrementing it during read ahead roll back.

Fixes: #4595

Relates: #4513

Reviewed by: droideck, mreynolds389, Firstyear, progier389 (Many thanks)

* Issue 4169 - UI - Migrate Accordians to PF4 ExpandableSection

Description:  Replace all the CustomCollapse components with PF4
              ExpandableSection component.

relates: #4169

Reviewed by: spichugi(Thanks!)

[IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* Issue 4169 - UI - Migrate alerts to PF4

Description:  Migrate the toast notifications to PF4 Alerts.

              Also fixed a refresh problem on the Tuning page.

relates: #4169

Reviewed by: spichugi(Thanks!)

* Issue 4649 - crash in sync_repl when a MODRDN create a cenotaph (#4652)

Bug description:
	When an operation is flagged OP_FLAG_NOOP, it skips BETXN plugins but calls POST plugins.
	For sync_repl, betxn (sync_update_persist_betxn_pre_op) creates an operation extension to be
	consumed by the post (sync_update_persist_op). In case of OP_FLAG_NOOP, there is no
	operation extension.

Fix description:
	Test that the operation is OP_FLAG_NOOP if the operation extension is missing

relates: #4649

Reviewed by: William Brown (thanks)

Platforms tested: F31

* Issue 4644 - Large updates can reset the CLcache to the beginning of the changelog (#4647)

Bug description:
	The replication agreements are using bulk load to load updates.
	For bulk load it uses a cursor with DB_MULTIPLE_KEY and DB_NEXT.
	Before using the cursor, it must be initialized with DB_SET.

	If during the cursor/DB_SET the CSN refers to an update that is larger than
	the size of the provided buffer, then the cursor remains not initialized and
	c_get returns DB_BUFFER_SMALL.

	The consequence is that the next c_get(DB_MULTIPLE_KEY and DB_NEXT) will return the
	first record in the changelog DB. This break CLcache.

Fix description:
	The fix is to harden cursor initialization so that if DB_SET fails
	because of DB_BUFFER_SMALL. It reallocates buf_data and retries a DB_SET.
	If DB_SET can not be initialized it logs a warning.

	The patch also changes the behaviour of the fix #4492.
	#4492 detected a massive (1day) jump prior the starting csn and ended the
	replication session. If the jump was systematic, for example
	if the CLcache got broken because of a too large updates, then
	replication was systematically stopped.
	This patch suppress the systematically stop, letting RA doing a big jump.
	From #4492 only remains the warning.

relates: #4644

Reviewed by: Pierre Rogier (Thanks !!!!)

Platforms tested: F31

* Issue 4646 - CLI/UI - revise DNA plugin management

Bug Description:

There was a false assumption that you have to create the shared DNA
server configuration entry, but in fact the server creates and manages
this entry.  The only thing you should edit in this entry are the
remote Bind Method and Connection Protocol.

Fix Description:

Remove the options to create the shared config entry, and edit the
core/reserved attributes.

Also fixed some issues where we were not showing CLI plugin output in
proper JSON.  This required some changes to the UI as well.

Relates: #4646

Reviewed by: spichugi(Thanks!)

[IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

[IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* Issue 4643 - Add a tool that generates Rust dependencies for a specfile (#4645)

Description: The Fedora builds of 389-DS uses the vendored crates
to build the official packages for Rawhide. Vendoring and bundling
dependencies is in violation of Fedora policies. As an upstream project
we are free to ship vendored code. But as a downstream Fedora project
we must not use the vendored code.

Add a tool that will help to generate 'Provides: bundled(crate(foo)) = version'
for Cargo.lock file content.
Replace License field which should contain all of the package licenses
we bundle in the specfile.

Fixes: #4643

Reviewed by: @Firstyear, @decathorpe, @mreynolds389 (Thanks!)

* issue 4552 - Backup Redesign phase 3b - use dbimpl in replicatin plugin (#4622)

* issue 4552 - Backup Redesign phase 3b - use dbimpl in replicatin plugin

Merge of a fix in cl5_clcache.c (changelog cache restarts from begining if large update)
Rebase with master

* Issue 4469 - Backend redesing phase 3a - implement dbimpl API and use it in back-ldbm - fix test_maxbersize_repl pytest failure

* issue 4552 - Backup Redesign phase 3b - use dbimpl in replicatin plugin - fix indent issue

* issue 4552 - Backup Redesign phase 3b - use dbimpl in replicatin plugin - fix merge issue

manual Merge of fix about changelog cache iteration restarting from beginning in case of large update + automatic rebase to master

* Issue 4552 - Backend redesign phase 3b - fix indent issue + random crash and memory leak in tombstone handling

* Merge pull request #4664 from mreynolds389/issue4663

Issue 4663 - CLI - unable to add objectclass/attribute without x-origin

Issue 4654 Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* Issue 4654 Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* Issue 4654 - Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

Bug Description:

- Update ticket48234_test.py to verify tests on RHEL 7/8 and Fedora
- Update deprecated "*_s" methods to leverage the DSLDAPObject class
- Move test from the current location in ../tickets to appropriate ../suites/aci/* directory

Fix Description:
- Issue 4654 Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

relates:

Author: Gilbert Kimetto

Reviewed by: ???

Co-authored-by: Mark Reynolds <mreynolds@redhat.com>
Co-authored-by: progier389 <progier@redhat.com>
Co-authored-by: Firstyear <william@blackhats.net.au>
Co-authored-by: Viktor Ashirov <vashirov@redhat.com>
Co-authored-by: Jack <me@jackben.net>
Co-authored-by: Simon Pichugin <spichugi@redhat.com>
Co-authored-by: Barbora Simonova <bsmejkal@redhat.com>
Co-authored-by: James Chapman <jachapma@redhat.com>
Co-authored-by: tbordaz <tbordaz@redhat.com>
kimettog added a commit that referenced this issue Apr 1, 2021
…#4710)

* IDMDS-1068 Update failing ticket48234_test.py test

* IDMDS-1068 Update failing ticket48234_test.py test

* [INTEROP-4009] CodeReady Studio on OpenShift - Run locally

* [INTEROP-4009] CodeReady Studio on OpenShift - Run locally

* [IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* [IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* [IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* [IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* [IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* Issue 4654 Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* Issue 4654 - Updates to tickets/ticket48234_test.py

Bug Description:

Update to tickets/ticket48234_test.py which are currently failing and using
soon to be obsolete classes

Fix Description:

Updated tickets/ticket48234_test.py and ported to the suites directory
Updated to utilise the DSLDAPObject class methods

relates: <The Issue URL>

Author: Gilbert Kimetto

Reviewed by: ???
IDMDS-1068 Update failing ticket48234_test.py test

[INTEROP-4009] CodeReady Studio on OpenShift - Run locally

[INTEROP-4009] CodeReady Studio on OpenShift - Run locally

[IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

[IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* Issue 4609 - CVE - info disclosure when authenticating

Description:  If you bind as a user that does not exist.  Error 49 is returned
              instead of error 32.  As error 32 discloses that the entry does
              not exist.  When you bind as an entry that does not have userpassword
              set then error 48 (inappropriate auth) is returned, but this
              discloses that the entry does indeed exist.  Instead we should
              always return error 49, even if the password is not set in the
              entry.  This way we do not disclose to an attacker if the Bind
              DN exists or not.

Relates: #4609

Reviewed by: tbordaz(Thanks!)

* issue 4612 - Fix pytest fourwaymmr_test for non root user (#4613)

* Issue 4591 - RFE - improve openldap_to_ds help and features (#4607)

Bug Description: Improve the --help page, and finish wiring in some
features.

Fix Description: Wire in exclusion of attributes/schema for migration.

fixes: #4591

Author: William Brown <william@blackhats.net.au>

Review by: @mreynolds389, @droideck

* Issue 4577 - Add GitHub actions

Description:

* Enable IPv6 support for docker daemon
* Set server.example.com as FQDN for container

Relates: #4577

Reviewed by: @droideck (Thanks!)

* Issue 4149 - UI - port TreeView and opther components to PF4

Description:  This ports all th TreeViews to PF4, and also does some proof
              of concept changes for PF3 to PF4 migration.  There is much
              more needed, but this does not break anything

relates: #4149

Reviewed by: spichugi(Thanks!)

* Update dscontainer (#4564)

Issue 4564 - RFE - Add suffix to dscontainer rc file

Bug Description: The suffix was not added before, adding a hurdle to
automatic admin of the container instance

Fix Description: If the suffix is set, add it to the created rc file. 

fixes: #4564

Author: @Jackbennett

Review by: @Firstyear

* Issue 4469 - Backend redesign phase 3a - bdb dependency removal from back-ldbm

A massive change (https://directory.fedoraproject.org/docs/389ds/design/backend-redesign-phase3.html) that implements and use the dbimpl API in the backend.

* Issue 4593 - RFE - Print help when nsSSLPersonalitySSL is not found (#4614)

Description: RHDS instance will fail to start if the TLS server
certificate nickname doesn't match the value of the configuration
parameter "nsSSLPersonalitySSL".

The mismatch typically happens when customers copy the NSS DB from
a previous instance or export the certificate's data but forget to set
the "nsSSLPersonalitySSL" value accordingly.

Log an additional message which should help a user to set up
nsSSLPersonalitySSL correctly.

Fixes: #4593

Reviewed by: @Firstyear (Thanks!)

* Issue 4324 - Some architectures the cache line size file does not exist

Bug Description:  When optimizing our mutexes we check for a system called
                  coherency_line_size that contains the size value, but if
                  the file did not exist the server would crash in PR_Read
                  (NULL pointer for fd).

Fix Description:  Check PR_Open() was successfully before calling PR_Read().

Relates: #4324

Reviewed by: tbordaz(Thanks!)

* Issue 4469 - Backend redesing phase 3a - implement dbimpl API and use it in back-ldbm (#4618)

see design document https://directory.fedoraproject.org/docs/389ds/design/backend-redesign-phase3.html

* Issue 4615 - log message when psearch first exceeds max threads per conn

Desciption:  When a connection hits max threads per conn for the first time
             log a message in the error.  This will help customers diagnosis
             misbehaving clients.

Fixes: #4615

Reviewed by: progier389(Thanks!)

* Issue 4619 - remove pytest requirement from lib389

Description:  Remove the requirement for pytest from lib389, it causes
              unneeded package requirements on Fedora/RHEL.

Fixes: #4619

Reviewed by: mreynolds(one line commit rule)

* Bump version to 2.0.3

* Issue 4513 - CI - make acl ip address tests more robust

Description:  The tests aumme the system is using IPv6 loopback address, but it
              should still check for IPv4 loopback.

Relates: #4513

Reviewed by: ?

* Issue 2820 - Fix CI test suite issues

Description:
tickets/ticket48961_test.py was failing in CI nightly runs.
Fixed the failure by changing the code to use DSLdapObject
and moved the code into the config test suite.

Relates: #2820

Reviewed by: droideck (Thanks!)

* Issue 4169 - UI - port charts to PF4

Description:  Ported the charts under the monitor tab to use PF4 sparkline charts
              and provide realtime stats on the the caches.

Relates: #4169

Reviewed by: spichugi(Thanks!)

* Issue 4595 - Paged search lookthroughlimit bug (#4602)

Bug Description: During a paged search with lookthroughlimit enabled,
lookthroughcount is used to keep track of how many entries are
examined. A paged search reads ahead one entry to catch the end of the
search so it doesn't show the prompt when there are no more entries.
lookthroughcount doesn't take read ahead into account when tracking
how many entries have been examined.

Fix Description: Keep lookthroughcount in sync with read ahead by
by decrementing it during read ahead roll back.

Fixes: #4595

Relates: #4513

Reviewed by: droideck, mreynolds389, Firstyear, progier389 (Many thanks)

* Issue 4169 - UI - Migrate Accordians to PF4 ExpandableSection

Description:  Replace all the CustomCollapse components with PF4
              ExpandableSection component.

relates: #4169

Reviewed by: spichugi(Thanks!)

[IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* Issue 4169 - UI - Migrate alerts to PF4

Description:  Migrate the toast notifications to PF4 Alerts.

              Also fixed a refresh problem on the Tuning page.

relates: #4169

Reviewed by: spichugi(Thanks!)

* Issue 4649 - crash in sync_repl when a MODRDN create a cenotaph (#4652)

Bug description:
	When an operation is flagged OP_FLAG_NOOP, it skips BETXN plugins but calls POST plugins.
	For sync_repl, betxn (sync_update_persist_betxn_pre_op) creates an operation extension to be
	consumed by the post (sync_update_persist_op). In case of OP_FLAG_NOOP, there is no
	operation extension.

Fix description:
	Test that the operation is OP_FLAG_NOOP if the operation extension is missing

relates: #4649

Reviewed by: William Brown (thanks)

Platforms tested: F31

* Issue 4644 - Large updates can reset the CLcache to the beginning of the changelog (#4647)

Bug description:
	The replication agreements are using bulk load to load updates.
	For bulk load it uses a cursor with DB_MULTIPLE_KEY and DB_NEXT.
	Before using the cursor, it must be initialized with DB_SET.

	If during the cursor/DB_SET the CSN refers to an update that is larger than
	the size of the provided buffer, then the cursor remains not initialized and
	c_get returns DB_BUFFER_SMALL.

	The consequence is that the next c_get(DB_MULTIPLE_KEY and DB_NEXT) will return the
	first record in the changelog DB. This break CLcache.

Fix description:
	The fix is to harden cursor initialization so that if DB_SET fails
	because of DB_BUFFER_SMALL. It reallocates buf_data and retries a DB_SET.
	If DB_SET can not be initialized it logs a warning.

	The patch also changes the behaviour of the fix #4492.
	#4492 detected a massive (1day) jump prior the starting csn and ended the
	replication session. If the jump was systematic, for example
	if the CLcache got broken because of a too large updates, then
	replication was systematically stopped.
	This patch suppress the systematically stop, letting RA doing a big jump.
	From #4492 only remains the warning.

relates: #4644

Reviewed by: Pierre Rogier (Thanks !!!!)

Platforms tested: F31

* Issue 4646 - CLI/UI - revise DNA plugin management

Bug Description:

There was a false assumption that you have to create the shared DNA
server configuration entry, but in fact the server creates and manages
this entry.  The only thing you should edit in this entry are the
remote Bind Method and Connection Protocol.

Fix Description:

Remove the options to create the shared config entry, and edit the
core/reserved attributes.

Also fixed some issues where we were not showing CLI plugin output in
proper JSON.  This required some changes to the UI as well.

Relates: #4646

Reviewed by: spichugi(Thanks!)

[IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

[IDMDS-1068] Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* Issue 4643 - Add a tool that generates Rust dependencies for a specfile (#4645)

Description: The Fedora builds of 389-DS uses the vendored crates
to build the official packages for Rawhide. Vendoring and bundling
dependencies is in violation of Fedora policies. As an upstream project
we are free to ship vendored code. But as a downstream Fedora project
we must not use the vendored code.

Add a tool that will help to generate 'Provides: bundled(crate(foo)) = version'
for Cargo.lock file content.
Replace License field which should contain all of the package licenses
we bundle in the specfile.

Fixes: #4643

Reviewed by: @Firstyear, @decathorpe, @mreynolds389 (Thanks!)

* issue 4552 - Backup Redesign phase 3b - use dbimpl in replicatin plugin (#4622)

* issue 4552 - Backup Redesign phase 3b - use dbimpl in replicatin plugin

Merge of a fix in cl5_clcache.c (changelog cache restarts from begining if large update)
Rebase with master

* Issue 4469 - Backend redesing phase 3a - implement dbimpl API and use it in back-ldbm - fix test_maxbersize_repl pytest failure

* issue 4552 - Backup Redesign phase 3b - use dbimpl in replicatin plugin - fix indent issue

* issue 4552 - Backup Redesign phase 3b - use dbimpl in replicatin plugin - fix merge issue

manual Merge of fix about changelog cache iteration restarting from beginning in case of large update + automatic rebase to master

* Issue 4552 - Backend redesign phase 3b - fix indent issue + random crash and memory leak in tombstone handling

* Merge pull request #4664 from mreynolds389/issue4663

Issue 4663 - CLI - unable to add objectclass/attribute without x-origin

Issue 4654 Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* Issue 4654 Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

* Issue 4654 - Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

Bug Description:

- Update ticket48234_test.py to verify tests on RHEL 7/8 and Fedora
- Update deprecated "*_s" methods to leverage the DSLDAPObject class
- Move test from the current location in ../tickets to appropriate ../suites/aci/* directory

Fix Description:
- Issue 4654 Update ticket48234_test.py and move to suites/acl/aci_excl_filter_test.py

relates:

Author: Gilbert Kimetto

Reviewed by: ???

* Test new password policy attribute "pwdReset by DM user

    Description: Verify that the DM user is not permitted to
    change the password policy attribute "pwdReset.

    Reviewed by: ?

Co-authored-by: Mark Reynolds <mreynolds@redhat.com>
Co-authored-by: progier389 <progier@redhat.com>
Co-authored-by: Firstyear <william@blackhats.net.au>
Co-authored-by: Viktor Ashirov <vashirov@redhat.com>
Co-authored-by: Jack <me@jackben.net>
Co-authored-by: Simon Pichugin <spichugi@redhat.com>
Co-authored-by: Barbora Simonova <bsmejkal@redhat.com>
Co-authored-by: James Chapman <jachapma@redhat.com>
Co-authored-by: tbordaz <tbordaz@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
replication Issue involves replication
Projects
None yet
Development

No branches or pull requests

2 participants