Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEODE-9204: thread hung waiting for response #6426

Merged
merged 10 commits into from
Sep 20, 2021

Conversation

kamilla1201
Copy link
Contributor

@kamilla1201 kamilla1201 commented May 3, 2021

The original PR #6392 was closed because I took over the work on this ticket.

A not serializable exception can cause a ServerConnection thread to get stuck
waiting for a reply from another member. This is caused by ReplyMessage
not handling a non-serializable "returnValue".

The fix is to catch and log the exception in the sending node and then
handle the problem in the receiving node.

I would have liked to serialize the "returnValue" field into a separate buffer
so as not to transmit a partial serialization of the object but that would require
an extra buffer copy for every ReplyMessage sent, and we send a lot of them.

Thank you for submitting a contribution to Apache Geode.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

  • Has your PR been rebased against the latest commit within the target branch (typically develop)?

  • Is your initial contribution a single, squashed commit?

  • Does gradlew build run cleanly?

  • Have you written or updated unit tests to verify your changes?

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

Note:

Please ensure that once the PR is submitted, check Concourse for build issues and
submit an update to your PR as soon as possible. If you need help, please send an
email to dev@geode.apache.org.

Copy link
Contributor

@Bill Bill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved!

@kamilla1201
Copy link
Contributor Author

@BenjaminPerryRoss @boglesby @gesterzhou @kirklund @nabarunnag @pivotal-eshu Could you please take a look at this PR?

@agingade
Copy link

agingade commented Jun 9, 2021

@kamilla1201
Other options myself and @dschneider-pivotal were considering about:
Is this issue very specific to cache loader? in that case, let's check if the value is serializable as part of load, if not then send error response back to the caller.
@dschneider-pivotal Is it a safe assumption to make, anything we load through the cache-loader needs to be serializable...If not, then we may have to see if its serializable based on the thread/caller ?

@kamilla1201 kamilla1201 marked this pull request as draft June 10, 2021 20:53
@dschneider-pivotal
Copy link
Contributor

It is possible to have a loader on a local region in which case it would not need to be serializable. It is safer to defer the check until it is being serialized.

Copy link
Contributor

@gesterzhou gesterzhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry of the testEventIdOutOfOrderInPartitionRegionSingleHop failure in precheckin. It will be fixed in other geode tickets: GEODE-9242, GEODE-9373.

@lgtm-com
Copy link

lgtm-com bot commented Jun 17, 2021

This pull request introduces 1 alert when merging ec97322 into c9d4f68 - view on LGTM.com

new alerts:

  • 1 for Potential output resource leak

@lgtm-com
Copy link

lgtm-com bot commented Jun 17, 2021

This pull request introduces 1 alert when merging c8efba6 into 2271b7f - view on LGTM.com

new alerts:

  • 1 for Potential output resource leak

@kamilla1201 kamilla1201 marked this pull request as ready for review June 24, 2021 17:52
@Rule
public DistributedRule distributedRule = new DistributedRule(2);

CacheRule cacheRule = CacheRule.builder().build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to annotate this with @Rule or it won't be torn down between tests.

}

@NotNull
private SerializableRunnableIF createRegionWithBadCacheLoader(Properties properties) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a request for change, just a note... I recommend not importing and using SerializableRunnableIF. Best way to use VM.invoke with methods is to use void as the return type:

private void createRegionWithBadCacheLoader(Properties properties)

And then have the caller invoke it with a lambda:

cacheLoaderVM.invoke("create a region with a bad cache loader",
        () -> createRegionWithBadCacheLoader(properties));

@kamilla1201 kamilla1201 marked this pull request as draft August 23, 2021 18:42
@kamilla1201
Copy link
Contributor Author

Need to run performance tests before merging this PR

bschuchardt and others added 10 commits September 17, 2021 15:36
A not serializable exception can cause a ServerConnection thread to get stuck
waiting for a reply from another member.  This is caused by ReplyMessage
not handling a non-serializable "returnValue".

The fix is to catch and log the exception in the sending node and then
handle the problem in the receiving node.

I would have liked to serialize the "returnValue" field into a separate buffer
so as not to transmit a partial serialization of the object but that would require
an extra buffer copy for every ReplyMessage sent, and we send a lot of them.
@kamilla1201 kamilla1201 marked this pull request as ready for review September 20, 2021 23:44
@kamilla1201 kamilla1201 merged commit 19f55ad into apache:develop Sep 20, 2021
kamilla1201 added a commit to kamilla1201/geode that referenced this pull request Oct 5, 2021
kamilla1201 added a commit to kamilla1201/geode that referenced this pull request Oct 6, 2021
kamilla1201 added a commit to kamilla1201/geode that referenced this pull request Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants