New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/GEODE-10228 DurableClientTestCase.testDurableHAFailover is failing #7608
Conversation
- The test was failing because it didn't wait for the HARegionQueue to clear before shutting down the durable client for test. Thus when it came back up, there was an extra message in the queue. - Reverse the order of readyforevents and registerinterest
adding a close for the control listener
Starting the server is not synchronous.
@@ -429,6 +431,10 @@ public void run2() throws CacheException { | |||
@Test | |||
public void testDurableNonHAFailover() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were two tests here previously testing two things each.
Broke them out into four tests doing one thing.
} | ||
}); | ||
|
||
registerInterest(durableClientVM, regionName, true, InterestResultPolicy.NONE); | ||
// Verify the durable client is connected to server1 | ||
verifyDurableClientPresent(VERY_LONG_DURABLE_TIMEOUT_SECONDS, durableClientId, server1VM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify the servers are up and ready before proceeding.
verifyDurableClientPresent(VERY_LONG_DURABLE_TIMEOUT_SECONDS, durableClientId, server1VM); | ||
// Wait to until the HARegionQueue has emptied before disconnecting the Durable client | ||
// to avoid having left over messages that processed, in particular, the key 0 message | ||
waitUntilHARegionQueueSizeIsZero(server1VM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait for the HA region queues to clear before disconnecting the durable client we want to process all the messages first, then disconnect, otherwise we will see them again when the client restarts.
|
||
// Verify the durable client received the updates before failover | ||
if (redundancyLevel == 1) { | ||
checkListenerEvents(4, 1, -1, durableClientVM); | ||
checkListenerEvents(3, 1, -1, durableClientVM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These numbers were contradictory to the required record count because key 0 should not have been there to be counted.
...edTest/java/org/apache/geode/internal/cache/tier/sockets/DurableClientBug39997DUnitTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking a wack at cleanup. I would ask for consistency in the usage of AssertJ over junit assertions. Also, I feel like fail
is smell and there is usually an appropriate assertion to replace it. On await
please use the untilAsserted
and use AssertJ assertions in the block to give meaningful failures.
...stributedTest/java/org/apache/geode/cache/client/ClientServerRegisterInterestsDUnitTest.java
Outdated
Show resolved
Hide resolved
...stributedTest/java/org/apache/geode/cache/client/ClientServerRegisterInterestsDUnitTest.java
Outdated
Show resolved
Hide resolved
...stributedTest/java/org/apache/geode/cache/client/ClientServerRegisterInterestsDUnitTest.java
Outdated
Show resolved
Hide resolved
...stributedTest/java/org/apache/geode/cache/client/ClientServerRegisterInterestsDUnitTest.java
Outdated
Show resolved
Hide resolved
...edTest/java/org/apache/geode/internal/cache/tier/sockets/DurableClientBug39997DUnitTest.java
Outdated
Show resolved
Hide resolved
...distributedTest/java/org/apache/geode/internal/cache/tier/sockets/DurableClientTestBase.java
Outdated
Show resolved
Hide resolved
...distributedTest/java/org/apache/geode/internal/cache/tier/sockets/DurableClientTestBase.java
Outdated
Show resolved
Hide resolved
...distributedTest/java/org/apache/geode/internal/cache/tier/sockets/DurableClientTestBase.java
Outdated
Show resolved
Hide resolved
...distributedTest/java/org/apache/geode/internal/cache/tier/sockets/DurableClientTestBase.java
Outdated
Show resolved
Hide resolved
...distributedTest/java/org/apache/geode/internal/cache/tier/sockets/DurableClientTestBase.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with all of Jake's comments. Fix those and add in my approval.
...edTest/java/org/apache/geode/internal/cache/tier/sockets/DurableClientBug39997DUnitTest.java
Outdated
Show resolved
Hide resolved
Would need to do a lot more changes for this to work which is just not a priority at this moment.
@mhansonp As you noticed, you can't convert DUnit based tests to JUni 5. You can rename them to the current naming convention though. |
...a/org/apache/geode/internal/cache/tier/sockets/DurableClientNoServerAvailabileDUnitTest.java
Outdated
Show resolved
Hide resolved
...butedTest/java/org/apache/geode/internal/cache/tier/sockets/DurableClientStatsDUnitTest.java
Outdated
Show resolved
Hide resolved
...butedTest/java/org/apache/geode/internal/cache/tier/sockets/DurableClientStatsDUnitTest.java
Outdated
Show resolved
Hide resolved
...tributedTest/java/org/apache/geode/internal/cache/tier/sockets/DurableClientCQDUnitTest.java
Outdated
Show resolved
Hide resolved
...utedTest/java/org/apache/geode/internal/cache/tier/sockets/DurableClientSimpleDUnitTest.java
Outdated
Show resolved
Hide resolved
Can you also please rename the updated tests to their current naming conventions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor things to consider but nothing blocking from me now. Thanks!
await() | ||
.untilAsserted(() -> assertThat(stats.get_queueDroppedCount()).isEqualTo(queueDropCount)); | ||
await().untilAsserted( | ||
() -> assertThat(stats.get_eventEnqueuedWhileClientAwayCount()).isEqualTo(enqueueCount)); | ||
} catch (Exception e) { | ||
fail("Exception thrown while executing checkStatisticsWithExpectedValues()", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little hesitant on this fail
. I assume it was here to bubble up some proper task failure handling but the use of fail
still smells funny. First off this catch
block is going to catch that assertion failures and wrap them, which makes me curious about behavior on failure. We don't consistently catch exceptions and convert to failure, so maybe it isn't necessary. If it is I wonder if using the assertThatCode(...).doesNotThrowAnyException()
is more appropriate replacement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AssertionError
which is thrown by the assertions in this try block is an Error
rather than an Exception
so it won't be caught here and any assertions that fail will bubble up. I personally dislike the use of try/catch coupled with fail()
since it's not really adding anything here that wouldn't be clear from the stack trace of the exception, and it introduces inconsistent behaviour depending on the type of error/exception we might encounter.
// Verify the durable client still exists on the server | ||
verifyDurableClientPresent(VERY_LONG_DURABLE_TIMEOUT_SECONDS, durableClientId, | ||
server1VM); | ||
Thread.sleep(10000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still no keen on sleeps and would like to see this await on something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice clean-up! There's a lot of work still to do in these old tests, but some is always better than none, so none of my comments should be considered blocking.
protected <V> V put() { | ||
return (V) gemfireServerVm.invoke(() -> { | ||
Cache cache = CacheFactory.getAnyInstance(); | ||
cache.getRegion("/Example").put("2", "TWO"); | ||
return cache.getRegion("/Example").get("2"); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warnings here can be fixed by using:
protected String put() {
return gemfireServerVm.invoke(() -> {
Cache cache = CacheFactory.getAnyInstance();
cache.getRegion("/Example").put("2", "TWO");
return cache.<String, String>getRegion("/Example").get("2");
});
}
Since we explicitly create the Region as <String, String>
, there's no need to use generics here.
final long timeout = (System.currentTimeMillis() | ||
+ ClientServerRegisterInterestsDUnitTest.WAIT_TIME_MILLISECONDS); | ||
|
||
while (entryEvents.empty() && (System.currentTimeMillis() < timeout)) { | ||
synchronized (this) { | ||
try { | ||
TimeUnit.MILLISECONDS.timedWait(this, Math.min(500, waitTimeMilliseconds)); | ||
TimeUnit.MILLISECONDS.timedWait(this, Math.min(500, | ||
ClientServerRegisterInterestsDUnitTest.WAIT_TIME_MILLISECONDS)); | ||
} catch (InterruptedException ignore) { | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole method should probably just be replaced with:
await().untilAsserted(() -> assertThat(entryEvents).isNotEmpty());
final int port = AvailablePortHelper.getRandomAvailableTCPPort(); | ||
vm0.invoke("create cache", () -> { | ||
getSystem(getClientProperties()); | ||
PoolImpl p = (PoolImpl) PoolManager.createFactory().addServer(hostName, port) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be the interface Pool
rather than the concrete PoolImpl
.
import org.apache.geode.test.junit.categories.ClientSubscriptionTest; | ||
|
||
@Category({ClientSubscriptionTest.class}) | ||
public class DurableClientNoServerAvailabileDistributedTest extends JUnit4CacheTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo here, should be "NoServerAvailable"
startAndCloseNonDurableClientCache(durableClientTimeout); | ||
startAndCloseNonDurableClientCache(1); //////// -> Reconnection1 | ||
startAndCloseNonDurableClientCache(); | ||
startAndCloseNonDurableClientCache(); //////// -> Reconnection1 | ||
Wait.pause(1400); //////// -> Queue Dropped1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These waits should ideally be replaced with await()
calls on some reasonable condition. For this one and the one below it the following seems right:
await().untilAsserted(() -> assertThat(durableClientVM.invoke(() -> getClientCache().isClosed())).isTrue());
await() | ||
.untilAsserted(() -> assertThat(stats.get_queueDroppedCount()).isEqualTo(queueDropCount)); | ||
await().untilAsserted( | ||
() -> assertThat(stats.get_eventEnqueuedWhileClientAwayCount()).isEqualTo(enqueueCount)); | ||
} catch (Exception e) { | ||
fail("Exception thrown while executing checkStatisticsWithExpectedValues()", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AssertionError
which is thrown by the assertions in this try block is an Error
rather than an Exception
so it won't be caught here and any assertions that fail will bubble up. I personally dislike the use of try/catch coupled with fail()
since it's not really adding anything here that wouldn't be clear from the stack trace of the exception, and it introduces inconsistent behaviour depending on the type of error/exception we might encounter.
Enumeration<?> e = javaSystemProperties.propertyNames(); | ||
|
||
while (e.hasMoreElements()) { | ||
String key = (String) e.nextElement(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block can be simplified to:
if (javaSystemProperties != null && javaSystemProperties.size() > 0) {
for (String key : javaSystemProperties.stringPropertyNames()) {
System.setProperty(key, javaSystemProperties.getProperty(key));
}
}
if (javaSystemProperties != null && javaSystemProperties.size() > 0) { | ||
Enumeration e = javaSystemProperties.propertyNames(); | ||
Enumeration<?> e = javaSystemProperties.propertyNames(); | ||
|
||
while (e.hasMoreElements()) { | ||
String key = (String) e.nextElement(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block can be simplified to:
if (javaSystemProperties != null && javaSystemProperties.size() > 0) {
for (String key : javaSystemProperties.stringPropertyNames()) {
System.clearProperty(key);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No point in checking the size, the for range loop won't do anything if it is empty.
factory.setScope(Scope.LOCAL); | ||
factory.setPoolName(p.getName()); | ||
RegionAttributes attrs = factory.create(); | ||
RegionAttributes<Object, Object> attrs = factory.create(); | ||
cache.createRegion(regionName1, attrs); | ||
cache.createRegion(regionName2, attrs); | ||
pool = p; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addControlListener
arguments to createCacheClientFromXmlN()
on line 197 and createCacheClientFromXml()
on line 219 are not used and can be removed.
/** | ||
* This method is not implemented to test event count matches the eventsToCheck.size() which is | ||
* confusing. It will wait and return if it got something in the eventsToCheck or not. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should either use correct javadoc semantics (add a @param
tag for each argument and an @return
tag for the returned value) or it should be a block comment rather than a javadoc comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree, I would rather have the hover over documentation with some information than none. Though completing the information should not be that hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have a bee in my bonnet about javadocs after the big clean-up I did lately, I think. I saw so many places where there was stuff like
/**
* Constructs the object
* /
public SomeClass(int param1, int param2) {
...
that provided no value as a javadoc and caused the linter to throw warnings. I think sometimes people just use javadoc comments because they're green in IntelliJ and so stand out more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh for sure, that comment is pointless. I delete those when I find them. I should have been more precise, I would rather have some useful comments missing params and return tags than no comments at all.
@@ -576,6 +577,7 @@ public void run2() throws CacheException { | |||
|
|||
@Test | |||
public void testReadyForEventsNotCalledImplicitlyForRegisterInterestWithCacheXML() { | |||
int[] ports = getRandomAvailableTCPPorts(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable is not used.
So there is one remaining problem in the stress test and that is caused by the fact that the test DurableClientSimpleDUnitTest.testReadyForEventsNotCalledImplicitlyForRegisterInterestWithCacheXML uses a server with a defined port number of 10188. So when the test starts up it references an cache.xml that has the port defined in it. This means that the test cannot be run in parallel with itself as is done through the stress test and be ok. Depending on timing it will have resource contention. This has been in there for quite some time and is not really a problem except in stress test. If we do want to fix it, we will need to find a way to override it by generating it on the fly or modifying the XML on the fly. Generating it on the fly would be the easier path. |
@mhansonp I think we can reasonably ignore that repeated test failure. |
I filed https://issues.apache.org/jira/browse/GEODE-10265 for the stress test failure. |
There were a couple of basic timing issues with the code.
across a restart of the durable client.
Basically all of the problems here are timing issues in the tests.
See my PR comments for key changes