[BEAM-10139][BEAM-10140] Add cross-language support for Java SpannerIO with python wrapper #12611

pjotrekk · 2020-08-18T14:00:40Z

What is left:

transaction support (in the future I suppose)
Is representing Mutation as Row ok? To discuss.
There is a lot of duplication code in struct -> row and row -> struct translation. I would be grateful for some advice how (if possible in this case) to deal with it.
SpannerWriteResult is replaced with PDone thus FailureMode must be FAIL_FAST
To read from spanner Schema is needed to be added to the configuration. Schema can be constructed from Struct while running the pipeline, but idk how to pass the RowCoder to the pipeline otherwise. And it's to discuss whether validation of schemas equality is needed.
I didn't generify Read and Write because there is ReadAll , Transaction etc and this way is much less complicated. ReadRows and WriteRows shouldn't be used outside cross-language transforms.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang	Dataflow	Samza	Twister2
Go	---	---	---
Java
Python		---	---
XLang	---	---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website
Non-portable
Portable	---		---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

pjotrekk · 2020-08-18T14:03:15Z

@TheNeuralBit I know I'm merciless to give such a big PR to review, but I think you're the most up-to-date person about rows and schemas :) There are some unit tests and TODOs left but overall I think it's almost completed. The integration tests work well on FlinkRunner.

TheNeuralBit · 2020-08-18T18:26:47Z

No worries I'm happy to help review :) It might take me a few days to get to it though.

Regarding testing: we could consider adding a spanner instance to apache-beam-testing for integration testing, I'd suggest raising it on dev@ if you want to pursue it. I also just came across https://cloud.google.com/spanner/docs/emulator which could be a good option too. Its a docker container that starts up an in-memory version of spanner to test against.

pjotrekk · 2020-08-19T14:11:22Z

Regarding testing: we could consider adding a spanner instance to apache-beam-testing for integration testing, I'd suggest raising it on dev@ if you want to pursue it. I also just came across https://cloud.google.com/spanner/docs/emulator which could be a good option too. Its a docker container that starts up an in-memory version of spanner to test against.

@TheNeuralBit Great advice as always! I tried to find something like this emulator on dockerhub but without success. I managed to successfully use this emulator, it has much better support than aws for localstack.

Few comments about this PR:

I am almost certain that the Schema doesn't have to be sent as proto in Read but I didn't come up with anything else.

Another issue is representing the Mutation - for now it's a Row containing 4 fields: operation, table, rows and key_set. It does quite well but I wonder whether I can do it better.

I erased SpannerWriteResult and return PDone for now - I don't see the way to keep it without including spanner dependencies to java.core. Because of that failure mode is FAIL_FAST and I didn't include it in configuration params.

Transactions are not supported because they require a ptransform to be transferred. I suppose it's doable though and it could be a good future improvement.

FYI - I'll be OOO the next week so there is absolutely no haste :)

codecov · 2020-08-31T07:21:16Z

Codecov Report

Merging #12611 (1c43284) into master (3d6cc0e) will decrease coverage by 0.04%.
The diff coverage is 56.73%.

@@            Coverage Diff             @@
##           master   #12611      +/-   ##
==========================================
- Coverage   82.48%   82.44%   -0.05%     
==========================================
  Files         455      456       +1     
  Lines       54876    54975      +99     
==========================================
+ Hits        45266    45324      +58     
- Misses       9610     9651      +41

Impacted Files	Coverage Δ
sdks/python/apache_beam/io/gcp/spanner.py	`56.73% <56.73%> (ø)`
...eam/runners/interactive/interactive_environment.py	`89.45% <0.00%> (-0.36%)`	⬇️
sdks/python/apache_beam/io/iobase.py	`83.75% <0.00%> (-0.29%)`	⬇️
...hon/apache_beam/runners/worker/bundle_processor.py	`94.07% <0.00%> (-0.27%)`	⬇️
...ks/python/apache_beam/runners/worker/sdk_worker.py	`89.47% <0.00%> (-0.16%)`	⬇️
...runners/interactive/display/pcoll_visualization.py	`85.26% <0.00%> (-0.08%)`	⬇️
...beam/runners/portability/local_job_service_main.py	`0.00% <0.00%> (ø)`
sdks/python/apache_beam/runners/common.py	`89.20% <0.00%> (+0.44%)`	⬆️
.../python/apache_beam/transforms/periodicsequence.py	`98.24% <0.00%> (+1.75%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02a1cd2...6370a87. Read the comment docs.

TheNeuralBit

Thanks for the contribution Piotr! Sorry it took me until after you were back from OOO to get to this :P

I have a few high-level comments and questions

TheNeuralBit · 2020-08-26T22:01:42Z

...a/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/StructUtils.java

+
+    List<Schema.Field> fields = schema.getFields();
+    Row.FieldValueBuilder valueBuilder = null;
+    // TODO: Remove this null-checking once nullable fields are supported in cross-language


What is the issue here? Nullable fields should be supported in cross-language

NullableCoder is not a standard coder as was mentioned here: https://issues.apache.org/jira/browse/BEAM-10529?jql=project%20%3D%20BEAM%20AND%20text%20~%20%22nullable%20python%22
So I suppose the only way to support null values is not to set them.
I noticed that when I tried to read a null field from Spanner table. But I may be wrong

Hm so it should be supported. RowCoder encodes nulls for top-level fields separately so there's no need for NullableCoder. NullableCoder is only used when you have a nullable type in a container type, e.g. ARRAY<NULLABLE INT>. This wasn't supported in Python until recently - #12426 should have fixed it though.

I'm not sure where my message has gone, but I wrote that nulls come up with no problems, I've just used ImmutableMap which does not allow null values. Replacing it with java.util.HashMap solved the issue.

...ava/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java

TheNeuralBit · 2020-08-31T22:16:32Z

...ava/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java

+    public ReadRows(Read read, Schema schema) {
+      super("Read rows");
+      this.read = read;
+      this.schema = schema;


It would be really great if SpannerIO.ReadRows could determine the schema at pipeline construction time so the user doesn't have to specify it. In SpannerIO.Read#expand we require the user to specify either a query or a list of columns:

beam/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java

Lines 656 to 671 in 2872e37

if (getReadOperation().getQuery() != null) {

// TODO: validate query?

} else if (getReadOperation().getTable() != null) {

// Assume read

checkNotNull(

getReadOperation().getColumns(),

"For a read operation SpannerIO.read() requires a list of "

+ "columns to set with withColumns method");

checkArgument(

!getReadOperation().getColumns().isEmpty(),

"For a read operation SpannerIO.read() requires a"

+ " list of columns to set with withColumns method");

} else {

throw new IllegalArgumentException(

"SpannerIO.read() requires configuring query or read operation.");

}

In both case we're very close to a schema. We just need to analyze the query and/or get the output types for the projected columns. I looked into it a little bit, but I'm not quite sure the best way to use the spanner client to look up the schema. The only thing I could figure out was to start a read and look at the type of ResultSet#getCurrentRowAsStruct which seems less than ideal.

CC @nielm who's done some work with SpannerIO recently - do you have any suggestions for a way to determine the types of the Structs that SpannerIO.Read will produce?

We could also punt on this question and file a jira with a TODO here. I recognize this is a little out of scope for BEAM-10139, BEAM-10140.

I'd really like to do it in this PR, but the only thing that comes to mind is to do what you said - perform the read request with client and then read the schema. The obvious disadvantage is that the Spanner query will be executed twice. I researched that limit of 1 row added to the end of query will not improve the performance so this is not the thing to do for huge result sets

I can reach out to the Spanner team to see if there's a good way to do this, I'll let you know if I learn anything. For now we can just plan on a jira and a TODO

I don't see any good solution here...
When reading an entire table, it could be possible to read the table's schema first, and determine what types the columns are, but this does not work for a query as the query output columns may not correspond to table columns.

Adding LIMIT 1 would only work for simple queries, anything with joins, GROUP BY, ORDER BY will require the majority of the query to be executed before a single row is returned.

So the only solution I can see is for the caller to specify the row Schema as you do here..

It seems like it should be possible to analyze the query and determine the output schema, SqlTransform and JdbcIO both do this.

I got a similar response from my internal queries though, it doesn't look like there's a good way to do this with the Spanner client

Thank you @nielm ! I thought about the LIMIT approach but then I found the same arguments not to do that.

It appears there exist a jdbc client for Spanner: https://cloud.google.com/spanner/docs/jdbc-drivers . I'll try to figure out if I can use it.

There is ResultSetMetadata in Spanner's REST API which extends json object. https://cloud.google.com/spanner/docs/reference/rest/v1/ResultSetMetadata but at the end of the day it requires at least partially to fetch the data.

But I would leave it for another PR as it supposedly require to move SchemaUtils from io/jdbc to some more general place (extensions/sql?). As I can see Struct type is mapped to String/Varchar as is mentioned in the FAQ, so it may not be the best option

The Cloud Spanner STRUCT data type is mapped to a SQL VARCHAR data type, accessible through this driver as String types. All other types have appropriate mappings.

sdks/java/io/google-cloud-platform/expansion-service/build.gradle

...a/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/StructUtils.java

TheNeuralBit · 2020-08-31T22:49:07Z

sdks/python/apache_beam/io/gcp/spanner.py

+)
+
+
+class WriteToSpanner(ExternalTransform):


It looks like there's already a native SpannerIO in the Python SDK in apache_beam/io/gcp/experimental/spannerio.py. Are we planning on removing that one? Should the API for this one be compliant with that one?

I can try to make the API compliant with the native one. I think it'd be valuable for Beam to compare the performance of both IOs and then decide which one to leave.

Yeah that makes sense. There's definitely still value in adding this even if we end up preferring the native Python one, since we can use it from the Go SDK in the future.

Probably it makes sense to converge into one implementation. I'd prefer the Java implementation (hence cross-language) since it's being around for longer and used by many users. We have to make sure that the cross-language version works for all runners before native version can be removed. For example, cross-language version will not work for current production Dataflow (Runner v1) and we have to confirm that it works adequate for Dataflow Runner v2.

pjotrekk · 2020-09-03T12:14:30Z

@TheNeuralBit I've upgraded it a bit.

Checking schemas equality is redundant because it will throw an exception with a good message anyway (class cast failure or unknown column). Also, it's possible to just add row.addFieldValues(Map<String, Object> values) and depend on the casts following the schema.
I managed to unify addArray and addIterable code duplication with a bit ugly casts (needed SuppressWarning("unchecked")) but I don't think it can be easily achieved otherwise.
Nothing comes to my mind to remove duplication in addIterableToMutationBuilder and addIterableToStructBuilder methods. These are unrelated classes (Struct.Builder and Mutation.WriteBuilder. Maybe my Java knowledge is insufficient here. I could make an interface that simulates .setInt64Array, setStructArray etc but it would be even more boilerplate.
I unified a bit the API of both python spanners. Not everything could be done 1:1, but the corresponding keywords were changed and the positions of positional arguments.
Nulls run with no problems - I used ImmutableMap.Builder that doesn't allow null values. I changed it to normal HashMap and now it's ok.

TheNeuralBit

This is looking pretty good overall, my biggest hangup is over the API for the Write transform. I suggested an alternative approach in a comment.

Also FYI - I'm going to be out of the office starting tomorrow (Friday), and back next Thursday. If you get blocked on this before then it may make sense to ask Cham to take a look in the meantime.

...oud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerTransformRegistrar.java

TheNeuralBit · 2020-09-04T00:26:59Z

sdks/python/apache_beam/io/gcp/spanner.py

+)
+
+
+class WriteToSpanner(ExternalTransform):


Yeah that makes sense. There's definitely still value in adding this even if we end up preferring the native Python one, since we can use it from the Go SDK in the future.

TheNeuralBit · 2020-09-04T00:45:37Z

sdks/python/apache_beam/io/gcp/spanner.py

+                                   [('id', int), ('name', unicode)])
+    coders.registry.register_coder(ExampleRow, coders.RowCoder)
+
+    mutation_creator = MutationCreator('table', ExampleRow, 'ExampleMutation')


Overall I think it makes a lot of sense to use Rows for the Mutations, with a nested Row for the data, but this API is pretty tricky. Could you look into adding a separate PTransform (or multiple PTransforms) for converting the Rows to mutations? I think an API like this should be possible:

pc = ... #some PCollection with a schema pc | RowToMutation.insert('table') | WriteToSpanner(...) OR pc | RowToMutation.insertOrUpdate('table') | WriteToSpanner(...) OR pc | RowToMutation.delete('table') | WriteToSpanner(...)

The PTransform would be able to look at the element_type of the input PCollection and create a mutation type that wraps it in the expand method. There's not a lot of examples of logic like this in the Python SDK (yet) the only one I know of is here:

beam/sdks/python/apache_beam/dataframe/schemas.py

Lines 50 to 55 in cfa448d

def expand(self, pcoll):

columns = [

name for name, _ in named_fields_from_element_type(pcoll.element_type)

]

return pcoll | self._batch_elements_transform | beam.Map(

lambda batch: pd.DataFrame.from_records(batch, columns=columns))

That way the user wouldn't need to pass the type they're planning on using to MutationCreator. What do you think of that?

That way we loose possibility of mixing different kinds of mutations. I don't imagine any sane usage of mixed insert/delete as the order is not guaranteed so I aggree that removing this assumption is justified.

Since we will always map rows to mutations before then it would be good to enclose mapping rows to mutations inside WriteToSpanner. How about such an API?:

pc.with_output_types(CustomRow) | WriteToSpanner(...).insert(table) pc.with_output_types(CustomRow) | WriteToSpanner(...).delete(table) pc.with_output_types(List[CustomRow]) | WriteToSpanner(...).delete(table)

It's not consistent with ReadFromSpanner(...) but I think it's better than forcing the user to call RowToMutation each time.
To be more consistent I could do something like ReadFromSpanner(...).from_table(table) and ReadFromSpanner(...).from_sql(sql_query)

pjotrekk · 2020-09-04T12:19:51Z

@chamikaramj Brian asked me to ask you for the further review as he is going OOO this week. I'd be grateful :)
If you won't have time until thursday then this PR can wait, there is no haste with it.
I've changed the API of WriteToSpanner to use WriteToSpanner(config).insert(table) etc instead of MutationCreator.

chamikaramj · 2020-09-10T00:43:39Z

cc: @allenpradeep @nielm

chamikaramj · 2020-09-16T22:18:03Z

In addition to Brian's review, @allenpradeep or @nielm can you briefly look at Java SpannerIO changes here ?

pjotrekk · 2020-09-23T10:09:43Z

@TheNeuralBit @nielm @allenpradeep ping

pjotrekk · 2020-10-06T13:06:52Z

@nielm Could you take a look at this thread? #12611 (comment)

TheNeuralBit · 2020-10-08T00:15:37Z

Sorry for dropping the ball on this @piotr-szuberski. I'll look over the changes to the Python API this week

TheNeuralBit

The way WriteTransform is written it's not possible to mix mutations that perform different operations. If we're going to have that limitation I think this could be simplified if we just had a separate xlang transform for each write operation, and send just the field values over the xlang boundary. Then the Java external transforms would be responsible for making the appropriate Mutation for each operation.

That would remove the need to construct a NamedTuple in RowToMutation.expand.

If we do want to keep using Mutations-as-Rows over xlang there will need to be more work on the type system. The types for row and keyset should really be a union of the relevant types for each table that might be written to. Unfortunately, I'm not sure Python schemas are mature enough for users to be able to express this well. (Alternatively we might express mutations as a logical type, that uses table/operation as a key for the union).

I think what we should do for now is just have separate xlang transforms for each write operation (beam:external:java:spanner:{delete,insert,update,...}). We can file a follow-on jira to add a generic beam:external:java:spanner:write that will allow mixing mutations with various operations and tables, and note that its blocked on support for unions of structs in Python/portable schemas. Does that sound reasonable?

sdks/java/io/google-cloud-platform/expansion-service/build.gradle

...io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/MutationUtils.java

sdks/python/apache_beam/io/gcp/spanner.py

TheNeuralBit · 2020-10-09T19:36:41Z

sdks/python/apache_beam/io/gcp/spanner.py

+        [
+            ('operation', unicode),
+            ('table', unicode),
+            ('keyset', List[row_type]) if is_delete else ('row', row_type),


We should make sure this works when schemas are specified via beam.Row as well, right now I think this will only work with the NamedTuple style.

You could use element_type = named_tuple_from_schema(schema_from_element_type(pcoll.element_type)) to make sure element_type is a NamedTuple that you can use here (it might be worth adding a convenience function for that patttern).

beam/sdks/python/apache_beam/typehints/schemas.py

Line 273 in a66454b

def named_tuple_from_schema(schema):

beam/sdks/python/apache_beam/typehints/schemas.py

Line 282 in a66454b

def schema_from_element_type(element_type): # (type) -> schema_pb2.Schema

Done. I'm not sure if you didn't mean to add that convenience to schemas.py. I'm leaving it in spanner.py for now

TheNeuralBit · 2020-10-09T20:11:23Z

I think most of my comments in that review are actually not relevant any more if we go down the path of separate xlang transforms per operation.

pjotrekk · 2020-10-27T00:00:59Z

@TheNeuralBit

pjotrekk · 2020-11-09T10:18:00Z

@TheNeuralBit I promise that this is the last big review from me. I've just recently realized how much work I've made you to do. Better late than never I guess!

TheNeuralBit

This is moving towards what I had in mind, but I think we should just avoid having a concept of mutations on the Python side for now.

...oud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerTransformRegistrar.java

sdks/python/apache_beam/io/gcp/spanner.py

…th python wrapper to Java SpannerIO

pjotrekk · 2020-11-12T18:21:35Z

Run Python 3.7 PostCommit

TheNeuralBit

LGTM, thank you @piotr-szuberski, and sorry for the incredibly long review cycle!

I just have one last request which is to try to eliminate or minimize the places where were suppressing the nullness warnings

TheNeuralBit · 2020-11-12T22:22:08Z

sdks/python/apache_beam/io/gcp/spanner.py

+  - https://beam.apache.org/roadmap/portability/
+
+  For more information specific to Flink runner see:
+  - https://beam.apache.org/documentation/runners/flink/


This information is getting duplicated across a lot of docstrings. It looks like #13317 will actually add similar information to the programming guide. I think we should re-write all these docstrings to refer to that once its complete.

I agree - it refers to all the existing xlang transforms, so it'll be done in another PR?

Yeah it can be done in another PR. Filed BEAM-11269 to track this.

TheNeuralBit · 2020-11-12T22:25:32Z

...a/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/StructUtils.java

+
+@SuppressWarnings({
+  "nullness" // TODO(https://issues.apache.org/jira/browse/BEAM-10402)
+})


Could you try to address any lingering nullness errors here and in the other files that have it suppressed? If there are any intractable issues we could consider a smaller @SuppressWarnings blocks around a few functions, but in general we should make sure that new classes pass the null checker.

Done. Oh, it was quite painful as all of the row getters return a @nullable value. Especially that checkNotNull doesn't work with the checker and there is even no possibility to check for null in a function (only if (var == null) { throw new NullPointerException("Null var"); } seem to work.

It doesn't even work in chained functions as in this example:

@Nullable Object var = new Object(); if (var != null) { someObject.doSth().doChained(var); // checker doesn't understand that var is checked for nullness) }

So it's quite unfriendly. In general I'm really excited about dealing with NPE problem, but for now it adds much more complexity and reduces the contributor friendliness. But I guess that it's worth it, especially when the checker gets smarter and will work with the Guava checks and chained functions (if it's even possible?)

pjotrekk · 2020-11-13T15:30:51Z

Run Python 3.7 PostCommit

...a/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/StructUtils.java

pjotrekk · 2020-11-14T06:28:12Z

Run Python 3.7 PostCommit

TheNeuralBit · 2020-11-16T18:12:09Z

Looks good, merging now. Thanks for all your work on this @piotr-szuberski :)

pjotrekk · 2020-11-16T18:37:26Z

Looks good, merging now. Thanks for all your work on this @piotr-szuberski :)

Thank you too for your reviews! :)

probot-autolabeler bot added build gcp io java python labels Aug 18, 2020

TheNeuralBit self-requested a review August 18, 2020 18:18

pjotrekk force-pushed the spanner-xlang branch 2 times, most recently from e1d9001 to 2839b8e Compare August 20, 2020 13:12

pjotrekk changed the title ~~[BEAM-10131][BEAM-10140] Add cross-language support for Java SpannerIO with python wrapper~~ [BEAM-10139][BEAM-10140] Add cross-language support for Java SpannerIO with python wrapper Aug 20, 2020

pjotrekk force-pushed the spanner-xlang branch from 2839b8e to fed767a Compare August 31, 2020 07:21

TheNeuralBit reviewed Aug 31, 2020

View reviewed changes

pjotrekk force-pushed the spanner-xlang branch from fed767a to 55ee406 Compare September 2, 2020 13:50

TheNeuralBit self-requested a review September 2, 2020 16:20

TheNeuralBit reviewed Sep 4, 2020

View reviewed changes

pjotrekk force-pushed the spanner-xlang branch from b0c23a2 to c24e8cb Compare September 8, 2020 09:44

TheNeuralBit self-requested a review September 11, 2020 01:02

pjotrekk force-pushed the spanner-xlang branch from c24e8cb to 1c43284 Compare October 8, 2020 12:26

TheNeuralBit requested changes Oct 9, 2020

View reviewed changes

pjotrekk force-pushed the spanner-xlang branch from a5af894 to b76a62a Compare October 26, 2020 21:39

pjotrekk force-pushed the spanner-xlang branch 2 times, most recently from 7aef88a to 46e0f1a Compare October 27, 2020 14:31

TheNeuralBit requested changes Nov 10, 2020

View reviewed changes

...oud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerTransformRegistrar.java Outdated Show resolved Hide resolved

sdks/python/apache_beam/io/gcp/spanner.py Outdated Show resolved Hide resolved

sdks/python/apache_beam/io/gcp/spanner.py Outdated Show resolved Hide resolved

[BEAM-10139][BEAM-10140] Add Support for cross-language transforms wi…

1602c26

…th python wrapper to Java SpannerIO

pjotrekk force-pushed the spanner-xlang branch from 356788f to 1602c26 Compare November 12, 2020 08:54

Change docstrings to use named parameters

17248f3

pjotrekk force-pushed the spanner-xlang branch 2 times, most recently from 9efa8c7 to 7228622 Compare November 12, 2020 14:49

Remove mutation row concept from write operations

3a153ae

pjotrekk force-pushed the spanner-xlang branch from 7228622 to 3a153ae Compare November 12, 2020 18:21

TheNeuralBit approved these changes Nov 12, 2020

View reviewed changes

exactStaleness -> staleness

d9a8342

TheNeuralBit reviewed Nov 13, 2020

View reviewed changes

...a/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/StructUtils.java Show resolved Hide resolved

pjotrekk force-pushed the spanner-xlang branch 3 times, most recently from d05e0e4 to aa73ae1 Compare November 14, 2020 04:52

Piotr Szuberski added 2 commits November 14, 2020 07:27

Deal with nullness

315ef36

Remove deprecated javadoc

6370a87

pjotrekk force-pushed the spanner-xlang branch from aa73ae1 to 6370a87 Compare November 14, 2020 06:28

TheNeuralBit merged commit 2f2ffda into apache:master Nov 16, 2020

damccorm mentioned this pull request Jun 4, 2022

Add General purpose Write operation to the cross-language Spanner transform #20619

Open

	if (getReadOperation().getQuery() != null) {
	// TODO: validate query?
	} else if (getReadOperation().getTable() != null) {
	// Assume read
	checkNotNull(
	getReadOperation().getColumns(),
	"For a read operation SpannerIO.read() requires a list of "
	+ "columns to set with withColumns method");
	checkArgument(
	!getReadOperation().getColumns().isEmpty(),
	"For a read operation SpannerIO.read() requires a"
	+ " list of columns to set with withColumns method");
	} else {
	throw new IllegalArgumentException(
	"SpannerIO.read() requires configuring query or read operation.");
	}

	def expand(self, pcoll):
	columns = [
	name for name, _ in named_fields_from_element_type(pcoll.element_type)
	]
	return pcoll \| self._batch_elements_transform \| beam.Map(
	lambda batch: pd.DataFrame.from_records(batch, columns=columns))

		)


		class WriteToSpanner(ExternalTransform):

		)


		class WriteToSpanner(ExternalTransform):

[BEAM-10139][BEAM-10140] Add cross-language support for Java SpannerIO with python wrapper #12611

[BEAM-10139][BEAM-10140] Add cross-language support for Java SpannerIO with python wrapper #12611

Conversation

pjotrekk commented Aug 18, 2020 • edited Loading

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

GitHub Actions Tests Status (on master branch)

pjotrekk commented Aug 18, 2020

TheNeuralBit commented Aug 18, 2020

pjotrekk commented Aug 19, 2020 • edited Loading

codecov bot commented Aug 31, 2020 • edited Loading

Codecov Report

TheNeuralBit left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheNeuralBit Sep 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheNeuralBit Oct 8, 2020 • edited Loading

Choose a reason for hiding this comment

pjotrekk Oct 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pjotrekk commented Sep 3, 2020 • edited Loading

TheNeuralBit left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pjotrekk Sep 4, 2020 • edited Loading

Choose a reason for hiding this comment

pjotrekk commented Sep 4, 2020 • edited Loading

chamikaramj commented Sep 10, 2020

chamikaramj commented Sep 16, 2020

pjotrekk commented Sep 23, 2020

pjotrekk commented Oct 6, 2020

TheNeuralBit commented Oct 8, 2020

TheNeuralBit left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheNeuralBit commented Oct 9, 2020

pjotrekk commented Oct 27, 2020

pjotrekk commented Nov 9, 2020

TheNeuralBit left a comment

Choose a reason for hiding this comment

pjotrekk commented Nov 12, 2020

TheNeuralBit left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pjotrekk commented Nov 13, 2020

pjotrekk commented Nov 14, 2020

TheNeuralBit commented Nov 16, 2020

pjotrekk commented Nov 16, 2020

pjotrekk commented Aug 18, 2020 •

edited

Loading

pjotrekk commented Aug 19, 2020 •

edited

Loading

codecov bot commented Aug 31, 2020 •

edited

Loading

TheNeuralBit Sep 3, 2020 •

edited

Loading

TheNeuralBit Oct 8, 2020 •

edited

Loading

pjotrekk Oct 8, 2020 •

edited

Loading

pjotrekk commented Sep 3, 2020 •

edited

Loading

pjotrekk Sep 4, 2020 •

edited

Loading

pjotrekk commented Sep 4, 2020 •

edited

Loading