Skip to content

Conversation

@dawidwys
Copy link
Contributor

@dawidwys dawidwys commented Mar 1, 2019

What is the purpose of the change

The duplicate method of TypeSerializer used an equality check rather
than reference check of the element serializer to decide if we need a
deep copy. This commit uses proper reference comparison.

Brief change log

  • 79284d6 - Tuple0Serializer serialized null as just a proper Tuple0 instance, which if later deserialized resulted in a regular value rather than null. This commit changes it so that it is no longer possible to serialize null with Tuple0Serializer.
  • 871890d - adds snapshot for serializer TestListCompositeSerializer that is used only in tests
  • 8f970c6 - adds snapshot for serializer NothingSerializer which can be called transitively e.g. from EitherSerializer
  • f3549eb - some tests were comparing contents of the collections rather than the collections itself, what they claimed
  • 1f122e7 - calls all methods from SerializerTestBase from within SerializerTestInstance
  • 3b10292 - The duplicate method of TypeSerializer used an equality check rather
    than reference check of the element serializer to decide if we need a
    deep copy. This commit uses proper reference comparison.
  • 5896e89 - This commit introduced a matcher that performs a deepEquals comparison similarly to Objects#deepEquals. The only difference is that it also performs deep equality check for flink's specific types.

Verifying this change

  • enabled additional test (including duplicate method test in SerializerTestInstance

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Mar 1, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

  • ❓ 1. The [description] looks good.
  • ✅ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details
The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

testSerializedCopyIndividually();
testSerializedCopyAsSequence();
testSerializabilityAndEquals();
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why these were not called before? Also, why do we need the try-catch around this now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess they were added later to the base class and the person that added those cases was probably not aware of existence of this class.

The try-catch is because a few of the new test cases throw exception, whereas the testAll doesn't. Did not want to change it in every place where the testAll is called. I think though in the long run we should revisit those places and actually get rid of this method at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this isn't even compiling at the moment?

Copy link
Contributor Author

@dawidwys dawidwys Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is compiling

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized though there some test failures for different serializers, investigating...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would probably be cleaner to not catch the exception and simply declare that it throws. Which might lead to other places where we need to declare, as you said.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest maybe replacing this with:

for (Method method : getClass().getMethods()) {
			if (method.getAnnotation(Test.class) == null) {
				continue;
			}
			try {
				method.invoke(this);
			}
			catch (IllegalAccessException e) {
				throw new RuntimeException("Unable to invoke test " + method.getName(), e);
			}
			catch (InvocationTargetException e) {
				sneakyThrow(e.getCause());
			}
		}

where sneakyThrow is defined as follows

private static <E extends Throwable> void sneakyThrow(Throwable e) throws E {
		throw (E) e;
	}

@aljoscha aljoscha self-requested a review March 1, 2019 09:23
@dawidwys dawidwys force-pushed the flink11420 branch 2 times, most recently from dcfb614 to 8ca7ced Compare March 1, 2019 16:05
@dawidwys
Copy link
Contributor Author

dawidwys commented Mar 1, 2019

Could you have a look at this PR @igalshilman @aljoscha

Copy link
Contributor

@igalshilman igalshilman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving the test coverage!
The changes look good to me, I've left some inline comments for your consideration.

testSerializedCopyIndividually();
testSerializedCopyAsSequence();
testSerializabilityAndEquals();
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest maybe replacing this with:

for (Method method : getClass().getMethods()) {
			if (method.getAnnotation(Test.class) == null) {
				continue;
			}
			try {
				method.invoke(this);
			}
			catch (IllegalAccessException e) {
				throw new RuntimeException("Unable to invoke test " + method.getName(), e);
			}
			catch (InvocationTargetException e) {
				sneakyThrow(e.getCause());
			}
		}

where sneakyThrow is defined as follows

private static <E extends Throwable> void sneakyThrow(Throwable e) throws E {
		throw (E) e;
	}

} catch (Exception ex) {
throw new RuntimeException(ex);
}
public void testAll() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed, check my previous comment regarding sneakyThrow

private static boolean deepEquals0(Object e1, Object e2) {
assert e1 != null;
boolean eq;
if (e1 instanceof Object[] && e2 instanceof Object[]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be replaced with e1.getClass().isArray()
and the recursively call deepEquals element wise (to catch Tuples of Tuples)
or if this is not needed for the test, then maybe just replace with Arrays.deepEquals

description.appendValue(wanted);
}

private static boolean deepEquals(Object o1, Object o2) {
Copy link
Contributor

@igalshilman igalshilman Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this logic can be simplified with something like

private static Stream<Object> flatten(Object o) {
		if (o == null) {
			return Stream.empty();
		}
		if (o.getClass().isArray()) {
			return Arrays.stream((Object[]) o)
				.flatMap(e -> flatten(e));
		}
		if (o instanceof Tuple) {
			Tuple t = (Tuple) o;
			return IntStream.range(0, t.getArity())
				.mapToObj(t::getField)
				.flatMap(e -> flatten(e));
		}
		return Stream.of(o);
	}

and then

private static boolean deepEquals(Object left ,Object right) {
		Iterator<Object> l = flatten(left).iterator();
		Iterator<Object> r = flatten(right).iterator();
			
		...
	}

update: I gave it a bit more thought, and this isn't exactly what you need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested that tho 😅

@dawidwys dawidwys force-pushed the flink11420 branch 4 times, most recently from f771de4 to 5593e8b Compare March 4, 2019 13:53
dawidwys added 7 commits March 4, 2019 15:30
Tuple0Serializer serialized null as just a proper Tuple0 instance, which if later deserialized resulted in a regular value rather than null. This commit changes it so that it is no longer possible to serialize null with Tuple0Serializer.
The duplicate method of TypeSerializer used an equality check rather
than reference check of the element serializer to decide if we need a
deep copy. This commit uses proper reference comparison.
…ng Tuples into account

This commit introduced a matcher that performs a deepEquals comparison
similarly to Objects#deepEquals. The only difference is that it also
performs deep equality check for some flink specific classes such as
e.g. Tuples, Rows.
@aljoscha
Copy link
Contributor

aljoscha commented Mar 4, 2019

@flinkbot approve consensus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants