GEODE-9642: skip RedundancyRecovery for colocated regions, if colocat… #7114

mivanac · 2021-11-16T13:58:18Z

…ion is not completed

Co-authored-by: albertogpz alberto.gomez@est.tech

In this solution, for each server, if colocation is not completed (due to any possible reason), we are skipping RedundancyRecovery. So, when last server is registering partitioned region, he will set colocation completed and notify all other servers, and he will trigger creation of buckets on all other servers.

For all changes:

[*] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?
[*] Has your PR been rebased against the latest commit within the target branch (typically develop)?
[*] Is your initial contribution a single, squashed commit?
[*] Does gradlew build run cleanly?
[*] Have you written or updated unit tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

…ion is not completed

albertogpz · 2021-11-16T17:19:38Z

...a/org/apache/geode/management/internal/cli/commands/AlterRegionCommandWithRemoteLocator.java

+        "create gateway-sender --id=parallelPositions --remote-distributed-system-id=1 --enable-persistence=true --disk-store-name=data --parallel=true")
+        .statusIsSuccess();

+    GeodeAwaitility.await().until(() -> {


Shouldn't you add an atMost() condition here?

In previous PR this was commented, and updated accordingly.

I agree with you.

albertogpz · 2021-11-16T17:20:30Z

...a/org/apache/geode/management/internal/cli/commands/AlterRegionCommandWithRemoteLocator.java

+   * immediately in all servers.
+   */
+  @Test
+  public void alterInitializedRegionWithGwSenderOnManyServersDoesNotTakeTooLong() {


This test case seems to pass in my laptop without the fix. Should more servers be added?

Also, this is test that reproduce fault in old PR. You can repeat test case several times, it will fail.

For sure, this would fail with changed timeout, but that cannot be changed. For details you can see comments in previous PR.

albertogpz

I have a general comment about how the tests have been modified.

I think that the problem with the modifications of the test cases is that it is not checked anymore if recovery is started at least in one server. It is only checked that if it is started, it is also finished, but it may happen (because it is not checked) that recovery is not started in any server.

I suggest an alternative change in which it is checked that recovery is finished in the last server started.

mivanac · 2021-11-17T15:54:29Z

But recovery could be started, on more servers then just lust. In real situations when region is created, recovery will be started almost on all servers. So I am not sure that checking recovery only on one server is valid test.

mivanac · 2021-11-17T15:56:04Z

It only depends if collocation is completed.

albertogpz · 2021-11-17T15:59:59Z

It only depends if collocation is completed.

If servers are started sequentially, colocation will not be completed until the last one is started and that's why recovery will only be run in it.
If servers are started in parallel, it must be checked that recovery is executed at least in one of them.

As almost all test cases start the servers sequentially, I suggested to check recovery on the last one.
For test cases starting the servers in parallel, it must be checked that recovery is started in at least one of them or alternatively, the start could be changed to start all the servers in parallel except for one and then check that recovery was started on the last one.

mivanac · 2021-11-17T16:06:57Z

We are talking about creation of child region on servers, not starting servers. Normal behavior, when you create region, is that it is created on all servers in parallel.

albertogpz · 2021-11-17T16:09:10Z

We are talking about creation of child region on servers, not starting servers. Normal behavior, when you create region, is that it is created on all servers in parallel.

You are right. I meant region creation on each server and not server start.

mivanac · 2021-11-17T16:37:59Z

It would be good if any code owner could comment on this issue, and give his opinion on new solution, and what is expected from test.

…is not triggered if colocation is not completed

.../src/distributedTest/java/org/apache/geode/internal/cache/execute/PRColocationDUnitTest.java

mhansonp

There seems to be some debate here about the behavior changes. Waiting to see the result

mhansonp · 2021-11-30T02:44:52Z

@albertogpz Did your concerns get addressed?

albertogpz · 2021-11-30T07:07:08Z

@albertogpz Did your concerns get addressed?

I think the modified tests still need to check that recovery was started at least in one of the servers.

DonalEvans · 2021-12-01T23:28:49Z

@albertogpz Did your concerns get addressed?

I think the modified tests still need to check that recovery was started at least in one of the servers.

As far as I understand the tests, any time checkIfRecoveryExecuted() is called, it's implicitly checking that recovery was started, because if recovery is never started on any servers, then the calls to recoveryStarted.await() will time out and recoveryExecuted will not be set to true.

DonalEvans · 2021-12-01T23:30:55Z

...he/geode/internal/cache/partitioned/PersistentColocatedPartitionedRegionDistributedTest.java

+  private boolean getRecoveryStatus() {
+    return recoveryExecuted;
+  }


This can probably be inlined.

albertogpz · 2021-12-02T04:32:23Z

@albertogpz Did your concerns get addressed?

I think the modified tests still need to check that recovery was started at least in one of the servers.

As far as I understand the tests, any time checkIfRecoveryExecuted() is called, it's implicitly checking that recovery was started, because if recovery is never started on any servers, then the calls to recoveryStarted.await() will time out and recoveryExecuted will not be set to true.

That's right. I had not seen that check.

albertogpz · 2021-12-02T04:40:00Z

@albertogpz Did your concerns get addressed?

I think the modified tests still need to check that recovery was started at least in one of the servers.

My concerns have been addressed. Now, recovery started in at least one server is addressed.

mhansonp

See comment. Looks like you should use an await in one case..

mhansonp · 2022-01-04T00:29:49Z

.../src/distributedTest/java/org/apache/geode/internal/cache/execute/PRColocationDUnitTest.java

    public void waitForRegion(Region region, long timeout) throws InterruptedException {
      long start = System.currentTimeMillis();
      synchronized (this) {
+        while (!recoveryStartedOnRegions.contains(region)) {


This should be simplified to an await.until...

GEODE-9642: skip RedundancyRecovery for colocated regions, if colocat…

8633779

…ion is not completed

mivanac requested review from DonalEvans, boglesby, gesterzhou, jchen21, kirklund and nabarunnag as code owners November 16, 2021 13:58

onichols-pivotal requested a review from albertogpz November 16, 2021 13:59

albertogpz mentioned this pull request Nov 16, 2021

GEODE-9642: Wait for colocation completed at partitioned region initialization #6909

Merged

albertogpz reviewed Nov 16, 2021

View reviewed changes

mivanac requested review from BenjaminPerryRoss and mhansonp as code owners November 17, 2021 13:23

albertogpz requested changes Nov 17, 2021

View reviewed changes

GEODE-9642: updated distributed test due to new behaviour - recovery …

8d5d016

…is not triggered if colocation is not completed

mivanac force-pushed the newfeature/GEODE-9642 branch from b21005c to 8d5d016 Compare November 17, 2021 20:51

mhansonp reviewed Nov 17, 2021

View reviewed changes

.../src/distributedTest/java/org/apache/geode/internal/cache/execute/PRColocationDUnitTest.java Outdated Show resolved Hide resolved

mhansonp approved these changes Nov 17, 2021

View reviewed changes

mhansonp reviewed Nov 17, 2021

View reviewed changes

mhansonp self-requested a review November 17, 2021 23:26

GEODE-9642: update after comment

f431983

mivanac force-pushed the newfeature/GEODE-9642 branch from e595393 to f431983 Compare November 18, 2021 14:47

DonalEvans approved these changes Dec 1, 2021

View reviewed changes

albertogpz approved these changes Dec 2, 2021

View reviewed changes

mhansonp requested changes Jan 4, 2022

View reviewed changes

mivanac closed this Mar 15, 2022

GEODE-9642: skip RedundancyRecovery for colocated regions, if colocat… #7114

GEODE-9642: skip RedundancyRecovery for colocated regions, if colocat… #7114

Uh oh!

Conversation

mivanac commented Nov 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

For all changes:

Uh oh!

albertogpz Nov 16, 2021

Choose a reason for hiding this comment

Uh oh!

mivanac Nov 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mivanac Nov 16, 2021

Choose a reason for hiding this comment

Uh oh!

albertogpz Nov 16, 2021

Choose a reason for hiding this comment

Uh oh!

mivanac Nov 16, 2021

Choose a reason for hiding this comment

Uh oh!

mivanac Nov 16, 2021

Choose a reason for hiding this comment

Uh oh!

albertogpz left a comment

Choose a reason for hiding this comment

Uh oh!

mivanac commented Nov 17, 2021

Uh oh!

mivanac commented Nov 17, 2021

Uh oh!

albertogpz commented Nov 17, 2021

Uh oh!

mivanac commented Nov 17, 2021

Uh oh!

albertogpz commented Nov 17, 2021

Uh oh!

mivanac commented Nov 17, 2021

Uh oh!

Uh oh!

mhansonp left a comment

Choose a reason for hiding this comment

Uh oh!

mhansonp commented Nov 30, 2021

Uh oh!

albertogpz commented Nov 30, 2021

Uh oh!

DonalEvans commented Dec 1, 2021

Uh oh!

DonalEvans Dec 1, 2021

Choose a reason for hiding this comment

Uh oh!

albertogpz commented Dec 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albertogpz commented Dec 2, 2021

Uh oh!

mhansonp left a comment

Choose a reason for hiding this comment

Uh oh!

mhansonp Jan 4, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mivanac commented Nov 16, 2021 •

edited

Loading

mivanac Nov 16, 2021 •

edited

Loading

albertogpz commented Dec 2, 2021 •

edited

Loading