Skip to content

Conversation

@DomGarguilo
Copy link
Member

This PR simplifies the check for the Compactor that is used to determine when to change the idle process metric for the Compactor. The new code just returns true when no compaction is running and false when one is. It defers the time checking portion to the upstream logic in AbstractServer.

The old method was to check how much time has passed since the last compaction was completed. There are at least two issues with this:

  1. There is logic that makes sure the amount of time since the last compaction happened is at least idleReportingPeriodNanos. This check also happens upstream when the Supplier<boolean> is read in AbstractServer so we are getting rid of that duplicate check in the new code.
  2. There is a check that the System.nanoTime() value of the last completed compaction is not negative. This is not a safe check to make since System.nanoTime() may provide a negative value. (This bug was identified here)

@DomGarguilo DomGarguilo added the bug This issue has been verified to be a bug. label Jul 11, 2024
@DomGarguilo DomGarguilo self-assigned this Jul 11, 2024
@DomGarguilo
Copy link
Member Author

Working on a test for the compactor metric to make sure it goes back to busy when a compaction is running then back to idle afterwards.

@DomGarguilo
Copy link
Member Author

In 04ed08b I added a test case that checks that the compactor idle metric correctly tracks what we expect it to be before, during and after a compaction. I am able to get the test to pass using the old logic in compactor but with the new logic, the idle value is not returning to 0 as I expect. I am not sure why that is happening at the moment so was hoping someone might be able to spot a flaw in the new logic that sets the idle metric value.

@ctubbsii ctubbsii added this to the 2.1.3 milestone Jul 12, 2024
Copy link
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 04ed08b I added a test case that checks that the compactor idle metric correctly tracks what we expect it to be before, during and after a compaction. I am able to get the test to pass using the old logic in compactor but with the new logic, the idle value is not returning to 0 as I expect. I am not sure why that is happening at the moment so was hoping someone might be able to spot a flaw in the new logic that sets the idle metric value.

Made some comments on what I think might fix things. Looking at the way the code works thinking the following method in AbstractServer

protected void idleProcessCheck(Supplier<Boolean> idleCondition) {

should be changed to

protected void updateIdleStatus(boolean isIdle) {

The supplier that is currently passed in is not cached and is only called once. It seems like the method need something to call it periodically because when its called it updates the idle status. If it is never called again then AFAICT the status will never change.

@DomGarguilo
Copy link
Member Author

I added another test case to test that the scan server idle metric behaves as expected (very similar to compactor test case)

@DomGarguilo DomGarguilo requested a review from keith-turner July 16, 2024 14:18
@DomGarguilo
Copy link
Member Author

DomGarguilo commented Jul 17, 2024

Since the scope of this PR has changed slightly since it has been created, here is an updated description of changes in this PR:

  • AbstractServer.updateIdleStatus() was changed from having Supplier<Boolean> as its input to just a boolean. This was done to simplify things since the method was just using it as a boolean anyways
    • the method was also renamed from idleProcessCheck() to updateIdleStatus()
  • The idle check in Compactor was simplified to just set the idle status to true or false during its operation. This removes duplicate logic that checks it has been idle for a period of time (already done in AbstractServer.idleProcessCheck())
  • Adds test cases to ensure that the idle metric for Compactor and ScanServer behave as expected. Tserver was omitted since it is not expected to be idle in 2.1.

Edit: The most critical portion to review here is probably the changes to the Compactor idle check. Just making sure that it makes sense to mark it as idle or not where I have it marked.

Copy link
Contributor

@dlmarion dlmarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ddanielr ddanielr merged commit 35d60a5 into apache:2.1 Jul 24, 2024
@DomGarguilo DomGarguilo deleted the cleanupCompactorIdleCheck branch July 29, 2024 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug This issue has been verified to be a bug.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants