Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASSNDRA-18780 Fixed ConnectionBurnTest fails with NullPointerException #2613

Open
wants to merge 7 commits into
base: trunk
Choose a base branch
from

Conversation

Mmuzaf
Copy link
Contributor

@Mmuzaf Mmuzaf commented Aug 19, 2023

Thanks for sending a pull request! Here are some tips if you're new here:

  • Ensure you have added or run the appropriate tests for your PR.
  • Be sure to keep the PR description updated to reflect all changes.
  • Write your PR title to summarize what this PR proposes.
  • If possible, provide a concise example to reproduce the issue for a faster review.
  • Read our contributor guidelines
  • If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

{
Reporters reporters = new Reporters(endpoints, connections);
List<Exception> exceptions = new LinkedList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why a linked list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, consider declaring more proximal to usage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a local buffer to fetch results from the unexpectedExceptions queue, so since it is only used locally it should be safe to use non-thread-safe collection, or did you mean something else?

I think we can tie drainUninterruptibly and this exceptions buffer, so that they use the same buffer size via constant. I set it to 1 since we are only interested in the first exception that occurred, but we could increase it if you'd like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that LinkedList is almost always inferior to ArrayList, and there appears to be no reason to prefer it here. It's not an important point, particularly here the distinction is unimportant. Thanks for making the change, though I'm not sure why Lists.newArrayList is preferred here to simply new ArrayList<>().

Regarding declaring it more proximal to usage, I meant I don't see why it is declared at the start of the method rather than above the loop that uses it. It's helpful IMO to declare variables close to where they are logically introduced in the code.

@@ -467,26 +491,42 @@ public void run() throws ExecutionException, InterruptedException, NoSuchFieldEx
}
});

while (deadline > nanoTime() && failed.getCount() > 0)
int added = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at most we need a boolean, but I'd say this should be a part of the loop guard; perhaps use a do/while loop and simply check 0 == Queues.drainUninterruptibly()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, fixed.

{
reporters.update();
reporters.print();
Uninterruptibles.awaitUninterruptibly(failed, 30L, TimeUnit.SECONDS);
added = Queues.drainUninterruptibly(unexpectedExceptions, exceptions, 10, 30L, TimeUnit.SECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 10 items, rather than 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, and move this value to a constant value for visibility.

}

executor.shutdownNow();
ExecutorUtils.awaitTermination(5L, TimeUnit.MINUTES, executor);

if (!exceptions.isEmpty())
throw exceptions.get(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably addSuppressed any items > index 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

WaitQueue.Signal signal = writerWaiting.register();
if (null != (e = chunkList.firstEntry()) && chunkSequenceId - e.getKey() > 1 << 12)
signal.await();
signal.await(5L, TimeUnit.SECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this now timed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for safety's sake and in accordance with the following comment which gave me the feeling that we'd get the next chunk within a reasonable timeframe (5 seconds), and if not - we could fail the test, but of course I could be missing something.

https://github.com/apache/cassandra/blob/trunk/test/burn/org/apache/cassandra/net/Verifier.java#L668

// TODO: even 2s or 5s are unreasonable periods of time without _any_ movement on a message waiting to arrive
//       this seems to happen regularly on MacOS, but we should confirm this does not happen on Linux

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, ok. Though given the earlier comment might be worth making this timeout a bit bigger to avoid false positives, or perhaps condition the wait time on the system under test.

@Mmuzaf
Copy link
Contributor Author

Mmuzaf commented Oct 31, 2023

@belliottsmith I've addressed your comments, please, take a look again :-)

@Mmuzaf
Copy link
Contributor Author

Mmuzaf commented Nov 14, 2023

@belliottsmith Hi Benedict, have you had a chance to look at the comments that have been fixed? :-)

@Mmuzaf
Copy link
Contributor Author

Mmuzaf commented Dec 19, 2023

@belliottsmith Hi Benedict, I think we are really close to fixing it! :-) Would you mind taking one last look?

@@ -467,26 +493,43 @@ public void run() throws ExecutionException, InterruptedException, NoSuchFieldEx
}
});

while (deadline > nanoTime() && failed.getCount() > 0)
while (deadline > nanoTime() &&
0 == Queues.drainUninterruptibly(unexpectedExceptions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the desired behaviour? It looks like the semantics of drainUninterruptibly is that it will wait for exceptionBufferCapacity before returning, whereas we want to return as soon as non-empty. This isn't super important here due to the timeout, but it is good to get this kind of thing right.

We should either wait for just 1 item, or use a different method for waiting, IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants