Skip to content

HDDS-4731. Fix compatibility issue caused by new enum DatanodeDetails.Port.Name.REPLICATION#1844

Merged
arp7 merged 14 commits intoapache:masterfrom
adoroszlai:HDDS-4731
Feb 2, 2021
Merged

HDDS-4731. Fix compatibility issue caused by new enum DatanodeDetails.Port.Name.REPLICATION#1844
arp7 merged 14 commits intoapache:masterfrom
adoroszlai:HDDS-4731

Conversation

@adoroszlai
Copy link
Contributor

@adoroszlai adoroszlai commented Jan 25, 2021

What changes were proposed in this pull request?

  1. Introduce version number to client messages. Old client does not send version number, in this case it defaults to 0 at the other side. Client with this change sends version = 1.
  2. Only include pre-existing ports of datanode (ie. STANDALONE, RATIS, REST) in DatanodeDetails protobuf message sent to v0 client, which cannot handle new enum constants.
  3. Make client more future-proof by ignoring unknown Port.Name values. This in itself is not enough as a fix, since old client does not have this logic. But this allows the server to send any new port from now on to v >= 1 clients.

https://issues.apache.org/jira/browse/HDDS-4731

How was this patch tested?

Added acceptance test which verifies cross-compatibility of key write/read among 0.5.0, 1.0.0, 1.1.0 versions.

Tested closed container replication using ozone-topology compose environment to check for regression.

https://github.com/adoroszlai/hadoop-ozone/actions/runs/510263073

@adoroszlai adoroszlai self-assigned this Jan 25, 2021
@adoroszlai
Copy link
Contributor Author

@GlenGeng

@adoroszlai adoroszlai requested a review from xiaoyuyao January 25, 2021 21:44
@GlenGeng-awx
Copy link
Contributor

We can help to do a upgrade test to verify whether this issue has been solved in the real environment. Please let me know when I can start the verification.
cc @Xushaohong who helped to find this issue.

@adoroszlai
Copy link
Contributor Author

We can help to do a upgrade test to verify whether this issue has been solved in the real environment. Please let me know when I can start the verification.

Thanks @GlenGeng for the offer. Is it repeatable (ie. after the test can you restore prior state), or do we have only one shot?

@GlenGeng-awx
Copy link
Contributor

It is repeatable. Just setup an ozone cluster with your dev branch, and run a testdfsio with old jars upon it.

@adoroszlai
Copy link
Contributor Author

It is repeatable. Just setup an ozone cluster with your dev branch, and run a testdfsio with old jars upon it.

Thanks, but I meant your test in real environment. I already added a test to reproduce the bug in CI.

@GlenGeng-awx
Copy link
Contributor

Thanks @adoroszlai.
The upgrade test was done in a temporary cluster. During the test, an empty ozone cluster with the new version is setup, a yarn and mapreduce with classpath of jars from old ozone is setup, and we just check the compatibility by running a testDFSIO. Everything is in sandbox.

@adoroszlai adoroszlai marked this pull request as draft January 27, 2021 16:01
@adoroszlai
Copy link
Contributor Author

On second thought it seems necessary to detect at server-side whether client is old (supports only specific port types) or new (can handle unknown ports). Otherwise no new ports for client can be introduced. The current patch should be fine for the REPLICATION port, since it is for datanodes only.

@adoroszlai adoroszlai marked this pull request as ready for review January 27, 2021 22:39
@adoroszlai adoroszlai added bug Something isn't working upgrade labels Jan 28, 2021
@elek
Copy link
Member

elek commented Jan 28, 2021

On second thought it seems necessary to detect at server-side whether client is old (supports only specific port types) or new (can handle unknown ports). Otherwise no new ports for client can be introduced. The current patch should be fine for the REPLICATION port, since it is for datanodes only.

Yes, I have the same thoughts. Current patch makes the new port internal, which is fine. But even after this patch we can introduce new public ports only after Ozone 2.0 (but I prefer to not break compatibility even between Ozone 1.x client and Ozone 2.x servers.)

@adoroszlai adoroszlai requested a review from ChenSammi January 29, 2021 14:29
@adoroszlai
Copy link
Contributor Author

@GlenGeng @Xushaohong can you please review, too?

Copy link
Contributor

@GlenGeng-awx GlenGeng-awx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix.

optional GetCommittedBlockLengthRequestProto getCommittedBlockLength = 22;

optional string encodedToken = 23;
optional uint32 version = 24;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better keep the style consistent. Up to you.

optional string traceID = 2;

optional UserInfo userInfo = 3;
optional uint32 version = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will clientVersion be more straightforward than version in these protos, in case we might add other version into proto message ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are the same.

Copy link
Contributor

@xiaoyuyao xiaoyuyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adoroszlai for working on this. The patch looks good to me, especially I like the new forward/backward compatibility testes added. Just one minor suggestion added inline.

}

public HddsProtos.DatanodeDetailsProto.Builder toProtoBuilder(
int clientVersion) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we embed the version info inside the PORT Enum without passing clientVersion as parameter?

This way, we can make the the handling of the version specific deserialization easier for future changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xiaoyuyao for the review. I'm not sure I understand the proposal. Can you please clarify?

We can embed the minimum version required for handling each port in the enum constants (eg. 0 for existing ports and 1 for the new replication port). But don't we still need the client version to be able to decide whether to include it in the proto message or not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We still need the client version. But that can be a static helper without many interface changes to introduce extra parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need the client version. But that can be a static helper

This conversion must be done on the server-side, and server can have clients of various versions. How would a static helper work?

Copy link
Member

@elek elek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the concept looks good to me. Let's merge when others have no more concerns.

But I couldn't resist to add some more thoughts about concepts:

Originally we had a pattern where the serialization are added to each of the objects (under different method names).

After a while we introduced the Codec interface to separate the serialization logic from POJOs, but still we call back to the original methods in most of the cases (just because it would be a big refactor to move everything to the codec).

And Codec implementations are used mainly for the persistence.

But long-term I would prefer to use Codec for network serialization, too, to separate the serialization logic and manage in the same way.

At that time, we can introduce VersionedCodec which can use different Codec (depends on the versions).

But this is just a dream, would be great to have more time to improve the code style/class hierarchy....

But even if it's just a dream, I am interested about your opinions ;-)

@arp7 arp7 merged commit 5f3e202 into apache:master Feb 2, 2021
@arp7
Copy link
Contributor

arp7 commented Feb 2, 2021

I merged this based on +1s from @xiaoyuyao and @elek.

@adoroszlai
Copy link
Contributor Author

Thanks @GlenGeng, @elek and @xiaoyuyao for the review, @arp7 for merging it.

@arp7
Copy link
Contributor

arp7 commented Feb 2, 2021

It looks like I missed a minor remaining comment from @xiaoyuyao. Let's address in a follow up jira if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working upgrade

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants