Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-7116] Remove use of KV in Schema transforms #10151

Merged
merged 1 commit into from Dec 9, 2019

Conversation

reuvenlax
Copy link
Contributor

Beam's KV type has no schema and due to special casing of KvCoder in Beam it is difficult to give it one. Here we modify the Beam schema transforms that return PCollection to instead return PCollection where the Row contains key and value fields. This is possible now that we support large iterables in schemas.

@reuvenlax
Copy link
Contributor Author

R: @TheNeuralBit

@reuvenlax
Copy link
Contributor Author

run sql postcommit

@reuvenlax
Copy link
Contributor Author

friendly ping

@TheNeuralBit
Copy link
Member

Run Java PreCommit

Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. LGTM, just some minor comments.

}
}

public static class RowIterableFieldMatcher extends BaseMatcher<Row> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this dead? I don't see it used anywhere.

case ITERABLE:
Row[] expectedIterable = Iterables.toArray((Iterable<Row>) expected, Row.class);
List<Row> actualIterable = Lists.newArrayList(row.getIterable(fieldIndex));
return containsInAnyOrder(expectedIterable).matches(actualIterable);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth documenting that RowFieldMatcher doesn't care about order for array and iterable types. Maybe that should even be indicated in the name somehow? It seems like that's the primary difference between this and just using equalTo(expectedRow).

@reuvenlax reuvenlax force-pushed the update_group_and_join_iterable branch from 815529d to e16606e Compare December 7, 2019 08:40
@reuvenlax reuvenlax merged commit 9501152 into apache:master Dec 9, 2019
@suztomo
Copy link
Contributor

suztomo commented Dec 11, 2019

@reuvenlax @TheNeuralBit AvroSchemaTest.testAvroPipelineGroupBy started failing after merging this commit. Do you think this PR is related?

https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/6002/

org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.IllegalStateException: ValueIterator can't be iterated more than once,otherwise there could be data lost

image

@TheNeuralBit
Copy link
Member

Yeah this PR must be the cause. It's not clear to me why this only seems to be a problem with the Spark runner.. I'll file a jira and exclude the test from Spark for now.

@suztomo
Copy link
Contributor

suztomo commented Dec 11, 2019

@TheNeuralBit Thank you for confirmation.

@TheNeuralBit
Copy link
Member

#10358 should fix Spark ValidatesRunner

@reuvenlax
Copy link
Contributor Author

reuvenlax commented Dec 12, 2019 via email

dpcollins-google pushed a commit to dpcollins-google/beam that referenced this pull request Dec 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants