[BEAM-1542] Introduced SpannerIO.readAll#3591
[BEAM-1542] Introduced SpannerIO.readAll#3591mairbek wants to merge 10 commits intoapache:masterfrom
Conversation
SpannerIO.readFn|
Also includes |
jkff
left a comment
There was a problem hiding this comment.
Thanks! Exciting to get more connectors following this pattern (reading a PCollection of sources).
| * {@link Read#createTransaction()}. If null, the doesn't provide a transactional guarantees. | ||
| * @return a {@link DoFn} that concurrently reads from multiple Cloud Spanner sources. | ||
| */ | ||
| @Experimental |
There was a problem hiding this comment.
DoFn's should not be part of the public API of a PTransform https://beam.apache.org/contribute/ptransform-style-guide/#exposing-a-ptransform-vs-something-else
Instead, please follow the example of TextIO.readAll() https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L170 and create a new PTransform SpannerIO.readAll()
jkff
left a comment
There was a problem hiding this comment.
Thanks, the approach looks good.
| sideInputs = Collections.singletonList(getTransaction()); | ||
| } | ||
| NaiveSpannerReadFn fn = new NaiveSpannerReadFn(getSpannerConfig(), getTransaction()); | ||
| return input.apply("Execute query", ParDo.of(fn).withSideInputs(sideInputs)); |
There was a problem hiding this comment.
You might want to reshuffle the input collection before applying this fn, or it may be fused with another fn that was producing multiple queries per element, and they'll all be read sequentially.
|
|
||
| NaiveSpannerReadFn readFn = new NaiveSpannerReadFn(read); | ||
| DoFnTester<Object, Struct> fnTester = DoFnTester.of(readFn); | ||
| NaiveSpannerReadFn readFn = new NaiveSpannerReadFn(read.getSpannerConfig()); |
There was a problem hiding this comment.
Add a TestPipeline test for readAll() to this file?
jkff
left a comment
There was a problem hiding this comment.
Thanks, I'll make these changes myself and merge tomorrow.
| .apply("Pair wth random key", | ||
| WithKeys.of(new SerializableFunction<ReadOperation, String>() { | ||
|
|
||
| @Override public String apply(ReadOperation input1) { |
| @Override | ||
| public PCollection<Struct> expand(PCollection<ReadOperation> input) { | ||
| PCollection<ReadOperation> reshuffled = input | ||
| .apply("Pair wth random key", |
| sideInputs = Collections.singletonList(getTransaction()); | ||
| } | ||
| NaiveSpannerReadFn fn = new NaiveSpannerReadFn(getSpannerConfig(), getTransaction()); | ||
| return reshuffled.apply("Execute all query", ParDo.of(fn).withSideInputs(sideInputs)); |
| } | ||
| } | ||
|
|
||
| /** |
| } | ||
|
|
||
| /** | ||
| * A {@link PTransform} that reads data from Google Cloud Spanner. |
There was a problem hiding this comment.
Usually we document this as "Implementation of {@link #read}" or something like that, to avoid redundancy.
| private ReadOnlyTransaction mockTx; | ||
|
|
||
| private Type fakeType = Type.struct(Type.StructField.of("id", Type.int64()), | ||
| private transient Type fakeType = Type.struct(Type.StructField.of("id", Type.int64()), |
There was a problem hiding this comment.
You could also make these static final.
SpannerIO.readAllallows to pipe reading from Cloud Spanner to other computations. The motivating use case is exporting data from a Spanner database, when list of the tables is generated in a previous transform.R: @jkff