-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing Datasource v2 writing support #283
Implementing Datasource v2 writing support #283
Conversation
/gcbrun |
...tion-test/java/com/google/cloud/spark/bigquery/v2/SparkBigQueryConnectorIntegrationTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial set of feedback. I'm only half way through.
connector/src/main/java/com/google/cloud/spark/bigquery/AvroSchemaConverter.java
Outdated
Show resolved
Hide resolved
connector/src/main/java/com/google/cloud/spark/bigquery/AvroSchemaConverter.java
Outdated
Show resolved
Hide resolved
connector/src/main/java/com/google/cloud/spark/bigquery/AvroSchemaConverter.java
Outdated
Show resolved
Hide resolved
connector/src/main/java/com/google/cloud/spark/bigquery/AvroSchemaConverter.java
Outdated
Show resolved
Hide resolved
connector/src/main/java/com/google/cloud/spark/bigquery/AvroSchemaConverter.java
Outdated
Show resolved
Hide resolved
connector/src/main/java/com/google/cloud/spark/bigquery/v2/IntermediateRecordWriter.java
Outdated
Show resolved
Hide resolved
connector/src/test/java/com/google/cloud/spark/bigquery/BinaryIteratorsTest.java
Outdated
Show resolved
Hide resolved
connector/src/test/scala/com/google/cloud/spark/bigquery/it/AdHocSuite.scala
Outdated
Show resolved
Hide resolved
connector/src/main/java/com/google/cloud/spark/bigquery/v2/AvroIntermediateRecordWriter.java
Outdated
Show resolved
Hide resolved
connector/src/main/java/com/google/cloud/spark/bigquery/v2/BigQueryDataSourceReader.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more feedback.
connector/src/main/java/com/google/cloud/spark/bigquery/AvroSchemaConverter.java
Outdated
Show resolved
Hide resolved
connector/src/main/java/com/google/cloud/spark/bigquery/AvroSchemaConverter.java
Outdated
Show resolved
Hide resolved
connector/src/main/java/com/google/cloud/spark/bigquery/AvroSchemaConverter.java
Outdated
Show resolved
Hide resolved
connector/src/main/java/com/google/cloud/spark/bigquery/AvroSchemaConverter.java
Outdated
Show resolved
Hide resolved
connector/src/main/java/com/google/cloud/spark/bigquery/v2/BigQueryDataSourceV2.java
Outdated
Show resolved
Hide resolved
import org.slf4j.LoggerFactory; | ||
|
||
/** | ||
* Responsible for recursively deleting the intermediate path. Implementing Runnable in order to act |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is a good idea to run data cleanup as a shutdown hook. Generally shutdown hooks should run quickly and it's entirely possible that the GCS clients and associated thread pools may be in the process of shutting down, so it's hard to guarantee that this cleanup actually happened.
Should we consider alternatives? For instance, can we upload the temporary files with random prefixes and use GCS lifecycle policies to automatically clean them up after a safe window?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done just as a safe-guard in case BigQueryIndirectDataSourceWriter.cleanTemporaryGcsPathIfNeeded
has not been called by an error. Relying on GCS object lifecycle will require the bucket to allow lifecycle management, which is not guaranteed.
connector/src/main/java/com/google/cloud/spark/bigquery/v2/ParquetIntermediateRecordWriter.java
Outdated
Show resolved
Hide resolved
065cd03
to
7cc33d3
Compare
/gcbrun |
1 similar comment
/gcbrun |
/gcbrun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general comment, 1) add comments for all public classes and methods; 2) break down a huge PR into multiple PRs, which will be much easier for review.
...tion-test/java/com/google/cloud/spark/bigquery/v2/SparkBigQueryConnectorIntegrationTest.java
Outdated
Show resolved
Hide resolved
connector/src/main/java/com/google/cloud/spark/bigquery/v2/BigQueryDataSourceReaderModule.java
Show resolved
Hide resolved
connector/src/main/java/com/google/cloud/spark/bigquery/v2/BigQueryDataSourceV2.java
Show resolved
Hide resolved
connector/src/main/java/com/google/cloud/spark/bigquery/v2/BigQueryDataSourceV2.java
Show resolved
Hide resolved
connector/src/main/java/com/google/cloud/spark/bigquery/v2/BigQueryDataSourceWriterModule.java
Show resolved
Hide resolved
connector/src/main/java/com/google/cloud/spark/bigquery/v2/BigQueryDataSourceWriterModule.java
Outdated
Show resolved
Hide resolved
...ector/src/main/java/com/google/cloud/spark/bigquery/v2/BigQueryIndirectDataSourceWriter.java
Show resolved
Hide resolved
63dc78a
to
737fcdb
Compare
/gcbrun |
No description provided.