Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEODE-10155: Avoid threads hanging when function execution times-out #7493

Merged
merged 7 commits into from
Jun 6, 2022

Conversation

albertogpz
Copy link
Contributor

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

  • Has your PR been rebased against the latest commit within the target branch (typically develop)?

  • Is your initial contribution a single, squashed commit?

  • Does gradlew build run cleanly?

  • Have you written or updated unit tests to verify your changes?

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

@albertogpz albertogpz marked this pull request as ready for review March 28, 2022 12:25
@albertogpz albertogpz force-pushed the feature/GEODE-10155 branch 2 times, most recently from eafb2bc to bb5e95d Compare March 28, 2022 16:11
Copy link
Contributor

@boglesby boglesby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still reviewing this, but here are a few small comments.

.setPRSingleHopEnabled(false);
if (connectTimeout > 0) {
factory.setSocketConnectTimeout(connectTimeout);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setConnectionTimeout is already being done right above this, so this check isn't necessary.

PoolFactory factory = PoolManager.createFactory().addServer(host, port1)
.addServer(host, port2).addServer(host, port3).setPingInterval(2000)
.setSubscriptionEnabled(true).setSubscriptionRedundancy(-1).setReadTimeout(2000)
.setSocketBufferSize(1000).setRetryAttempts(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need subscriptions in this test.

try {
Thread.sleep(waitBetweenEntriesMs);
} catch (InterruptedException e) {
e.printStackTrace();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see other methods in this class are calling printStackTrace, but I'm not sure thats the best behavior here.

if (getId().equals(TEST_FUNCTION_SLOW)) {
return false;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check can be moved down to the code below in the method like:

if (getId().equals(TEST_FUNCTION_NONHA_SERVER) || getId().equals(TEST_FUNCTION_NONHA_REGION)
    || getId().equals(TEST_FUNCTION_NONHA_NOP) || getId().equals(TEST_FUNCTION_NONHA)
    || getId().equals(TEST_FUNCTION_SLOW)) {
  return false;
}


import java.util.concurrent.TimeUnit;

import org.junit.Test;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Junit 5.


@Test
public void whenResponseToClientInLastResultFailsEndResultsIsCalled_OnlyLocal_NotOnlyRemote() {
// arrange
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments aren't adding to the readability, please remove them.

@Test
public void whenResponseToClientInLastResultFailsEndResultsIsCalled_NotOnlyLocal_NotOnlyRemote() {
// arrange
Mockito.doThrow(new FunctionException()).when(serverToClientFunctionResultSender)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static import the Mockito and AssertJ builder methods.

Comment on lines 534 to 541
Integer port1 = server1.invoke(() -> PRClientServerTestBase
.createCacheServer(commonAttributes, localMaxMemoryServer1, maxThreads));
Integer port2 = server2.invoke(() -> PRClientServerTestBase
.createCacheServer(commonAttributes, localMaxMemoryServer2, maxThreads));
Integer port3 = server3.invoke(() -> PRClientServerTestBase
.createCacheServer(commonAttributes, localMaxMemoryServer3, maxThreads));
client.invoke(() -> PRClientServerTestBase.createNoSingleHopCacheClient(
NetworkUtils.getServerHostName(server1.getHost()), port1, port2, port3, connectTimeout));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor nitpicks here (comment only):

  1. Change those Integer vars to int and prefer primitives to wrapper types if possible.
  2. I recommend using String hostname instance fields instead inlining those getServerHostName and getHost calls in the RMI lambdas. Some of them misbehave when called in the dunit ChildVMs so this avoids all issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kirklund Are you ok with the changes I made after your review? Anything left?

Comment on lines 54 to 57
try {
sender.lastResult(new FunctionException());
} catch (FunctionException expected) {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should always add assertions for expected exceptions:

Throwable thrown = catchThrowable(() -> {
  sender.lastResult(new FunctionException());
}

assertThat(thrown)
  .isInstanceOf(FunctionException.class)
  .(any other assertions that are valuable?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not add the assertion because it was not something I wanted to verify in the test. Nevertheless, thanks to your comment I have seen that the exception is not thrown so I have removed it.
I have also changed the test cases so that instead of sending an exception, an object is sent.

Comment on lines 71 to 74
try {
sender.lastResult(new FunctionException(), true, rc, null);
} catch (FunctionException expected) {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another expected exception that needs assertion(s).

Comment on lines 89 to 92
try {
sender.lastResult(new FunctionException(), true, rc, null);
} catch (FunctionException expected) {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another expected exception that needs assertion(s).

Comment on lines 106 to 95
try {
sender.sendResult(new FunctionException());
} catch (FunctionException expected) {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another expected exception that needs assertion(s).

Comment on lines 123 to 115
try {
sender.sendResult(new FunctionException());
} catch (FunctionException expected) {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another expected exception that needs assertion(s).

Comment on lines 140 to 135
try {
sender.sendResult(new FunctionException());
} catch (FunctionException expected) {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another expected exception that needs assertion(s).


@Test
public void whenResponseToClientInLastResultFailsEndResultsIsCalled_OnlyLocal_NotOnlyRemote() {
doThrow(new FunctionException()).when(serverToClientFunctionResultSender)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend either using a cause like new FunctionException(new CauseException("for test") and then include the cause and cause message in the assertions or at least use a custom message like new FunctionException("for test").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. It is not applicable anymore due to the changes described in a previous comment.

* request will never be served because there would be not ServerConnection
* threads available and the test case will time-out.
*/
@Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename test to current convention, PRClientServerRegionFunctionExecutionNoSingleHopDistributedTest.

factory.setScope(Scope.LOCAL);
factory.setDataPolicy(DataPolicy.EMPTY);
factory.setPoolName(p.getName());
RegionAttributes attrs = factory.create();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't introduce more raw type usage. Preferably clean up all other raw types in existing test.

this(dm, pr, time, msg, function, bucketArray,
(x, y) -> FunctionStatsManager.getFunctionStats((String) x, (InternalDistributedSystem) y));
}

/**
* Have to combine next two constructor in one and make a new class which will send Results back.
*
*/
public PartitionedRegionFunctionResultSender(DistributionManager dm, PartitionedRegion pr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this new constructor is used only testing it should be package private. Evaluate access level on each of these constructors.

this.msg = msg;
this.dm = dm;
this.pr = pr;
this.time = time;
this.function = function;
this.bucketArray = bucketArray;

this.functionStatsFunctionProvider = functionStatsFunctionProvider;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have all overloaded constructs call a single initializing constructor.

Copy link
Contributor

@mhansonp mhansonp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure to leave the code better than you found it (reduced warnings). Please work to not introduce new warnings to the code.

// executions.
await().until(() -> {
client.invoke(() -> executeGet(PartitionedRegionName, "key"));
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the return be in this block?

Do you want to set a specific amount of time for the await here based on the idea that you have an expectation of time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the return be in this block?
Yes, it is needed as the method should return a boolean.

Do you want to set a specific amount of time for the await here based on the idea that you have an expectation of time?
Not really. I am just checking that it did not hang forever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the return have been outside of the await? That is what I was getting at once the await finishes, the return could be called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remove the return you get a compilation error.
The idea is to test that the call previous to the return does not hang.
Do you have a better way to achieve it?

private Object executeSlowFunctionOnRegionNoFilter(Function function, String regionName,
int functionTimeoutSecs) {
FunctionService.registerFunction(function);
Region region = cache.getRegion(regionName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wildcard added.

return region.get(key);
}

private Object executeSlowFunctionOnRegionNoFilter(Function function, String regionName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a type for Function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added wildcard.

FunctionService.registerFunction(function);
Region region = cache.getRegion(regionName);

Execution execution = FunctionService.onRegion(region);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typing? Execution<IN, OUT, AGG>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, function execution types are a mess. Best to just walk away.

CacheServerTestUtil.enableShufflingOfEndpoints();
}
pool = (PoolImpl) p;
AttributesFactory factory = new AttributesFactory();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttributesFactory<Object, Object> factory = new AttributesFactory<>(); ?

RegionAttributes<Object, Object> attrs ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The least common base type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at what is being put in the region, it varies. I prefer Object to ?, but I will defer to your opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you aren't calling any of the methods on this instance that have matching typed parameters then you can get away with ? but as soon as you try to invoke one with a value that does match type ? it will fail to compile. Java! Better type templating is coming...
My general rule is use ? when you don't care or know the type, use per-method generics when the type matters but is not know, use Object when the type matters and it doesn't have any other common ancestor or we don't know all the possible types.

@@ -99,10 +101,11 @@ public final void postSetUp() throws Exception {
server2 = host.getVM(1);
server3 = host.getVM(2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, these can be cleaned up to be
Host host = Host.getHost(0);
server1 = VM.getVM(0);
server2 = VM.getVM(1);
server3 = VM.getVM(2);
client = VM.getVM(3);

return createCacheServer(commonAttributes, localMaxMemory, -1);
}

public static Integer createCacheServer(ArrayList commonAttributes, Integer localMaxMemory,
public static Integer createCacheServer(ArrayList<?> commonAttributes, Integer localMaxMemory,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commonAttributes is of type Object

@@ -120,7 +123,8 @@ protected boolean shouldRegisterFunctionsOnClient() {
return ExecuteFunctionMethod.ExecuteFunctionByObject == functionExecutionType;
}

ArrayList createCommonServerAttributes(String regionName, PartitionResolver pr, int red,
ArrayList<Object> createCommonServerAttributes(String regionName, PartitionResolver<?, ?> pr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As seen here createCommonServerAttributes returns ArrayList

@@ -131,26 +135,26 @@ ArrayList createCommonServerAttributes(String regionName, PartitionResolver pr,
return commonAttributes;
}

public static Integer createCacheServer(ArrayList commonAttributes, Integer localMaxMemory) {
public static Integer createCacheServer(ArrayList<?> commonAttributes, Integer localMaxMemory) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commonAttributes is of type Object

@@ -273,12 +259,12 @@ public static void createCacheClient(String host, Integer port1, Integer port2,
factory.setScope(Scope.LOCAL);
factory.setDataPolicy(DataPolicy.EMPTY);
factory.setPoolName(p.getName());
RegionAttributes attrs = factory.create();
Region region = cache.createRegion(PartitionedRegionName, attrs);
RegionAttributes<?, ?> attrs = factory.create();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use Object, Object instead of ?,? because we only Objects(String, Integer, Boolean) in region.

@mhansonp
Copy link
Contributor

If you take a look at #7608 you can see the level of cleanup we are hoping for.

Copy link
Contributor

@jake-at-work jake-at-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great step in the right direction for cleaning up the tests. I would ask that after you apply the conversions you check the results. If you used the Assertions2AssertJ plugin, or similar, it misses some things, especially around collection assertions.
Please use variations of the hasSize() and contains() assertions.

}
ResultCollector<?, ?> rc1 = executeOnAll(dataSet, Boolean.TRUE, function, isByName);
List<?> resultList = (List<?>) rc1.getResult();
logger.info("Result size : " + resultList.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid adding more logging to tests. If it is worth logging it is worth asserting and you don't need both.

Copy link
Contributor

@jake-at-work jake-at-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to approve now but would prefer you take a bit more cleanup before merging.

Copy link
Contributor

@mhansonp mhansonp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I only saw two problems of real concern what should probably be awaitility awaits rather than wait.pause followed by assertThat...

}
Map<String, Integer> resultMap = region.getAll(testKeysList);
assertThat(resultMap).containsExactlyInAnyOrderEntriesOf(origVals);
Wait.pause(2000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use awaitility here? using a pause is dangerous.

}
Map<String, Integer> resultMap = region.getAll(testKeysList);
assertThat(resultMap).containsExactlyInAnyOrderEntriesOf(origVals);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

FunctionService.registerFunction(function);
Execution dataSet = FunctionService.onRegion(region);
ResultCollector<?, ?> rc1 = execute(dataSet, singleKeySet, Boolean.TRUE, function, isByName);
List<?> l = (List<?>) rc1.getResult();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not your fault, but single letter variable names are not great.

List<?> l = (List<?>) rc1.getResult();
assertThat(l).hasSize(3);
for (Object item : l) {
assertThat(item).isEqualTo(Boolean.TRUE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "true" works instead of "Boolean.TRUE"

List<?> subL = (List<?>) value;
assertThat(subL).hasSizeGreaterThan(0);
for (Object o : subL) {
assertThat(foundVals.add((Integer) o)).isTrue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix the typing of the generics to avoid casting? subL is as List

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary though...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I get what you are exactly proposing here.

dataSet.withFilter(testKeysSet).setArguments(Boolean.TRUE).execute(new FunctionAdapter() {
@Override
public void execute(FunctionContext context) {
@SuppressWarnings("unchecked")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is.

}
});
List<?> l = (List<?>) rc1.getResult();
logger.info("Result size : " + l.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming and typing, not your fault, but could you improve it?

@mhansonp
Copy link
Contributor

There is a lot of technical debt in this file. Good job cleaning a lot up. It all helps.

@albertogpz albertogpz force-pushed the feature/GEODE-10155 branch 2 times, most recently from 55cae14 to 62cb006 Compare May 19, 2022 11:17
Copy link
Contributor

@boglesby boglesby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the JIRA to say that isHA has to be false for this behavior to exist.

@albertogpz
Copy link
Contributor Author

Can you change the JIRA to say that isHA has to be false for this behavior to exist.

Done. Thanks!

@albertogpz albertogpz merged commit 7ccdc82 into apache:develop Jun 6, 2022
@albertogpz albertogpz deleted the feature/GEODE-10155 branch June 6, 2022 18:18
albertogpz added a commit that referenced this pull request Sep 16, 2022
…7493)

* GEODE-10155: Avoid threads hanging when function execution times-out

* GEODE-10155: Updated after review

* GEODE-10155: More changes after review

* GEODE-10155: Changes after more reviews

* GEODE-10155: Some more changes after review

* GEODE-10155: More changes after review

* GEODE-10155: More clean-up after review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants