Add MATERIALIZED_VIEW as table type #192

Gaurangi94 · 2020-06-19T21:22:41Z

No description provided.

…-bigquery-connector

Gaurangi94 · 2020-06-19T21:23:51Z

/gcbrun

Gaurangi94 · 2020-06-19T21:46:20Z

/gcbrun

connector/src/test/scala/com/google/cloud/spark/bigquery/it/SparkBigQueryEndToEndITSuite.scala

davidrabinowitz · 2020-06-23T00:40:41Z

in the cloudbuild/cloudbuild.yaml file, add the following entry for the integration test step

env:
  - 'GOOGLE_CLOUD_PROJECT=${_GOOGLE_CLOUD_PROJECT}'

This is needed to provide the environment variable for the test

Gaurangi94 · 2020-06-23T19:25:54Z

/gcbrun

Gaurangi94 · 2020-06-23T20:44:04Z

@davidrabinowitz Integration tests are failing with the following error - Cause: com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
[info] POST https://www.googleapis.com/bigquery/v2/projects/google.com:hadoop-cloud-dev/datasets
[info] {
[info] "code" : 403,
[info] "errors" : [ {
[info] "domain" : "global",
[info] "message" : "Access Denied: Project google.com:hadoop-cloud-dev: User does not have bigquery.datasets.create permission in project google.com:hadoop-cloud-dev.",
[info] "reason" : "accessDenied"
[info] } ],
[info] "message" : "Access Denied: Project google.com:hadoop-cloud-dev: User does not have bigquery.datasets.create permission in project google.com:hadoop-cloud-dev.",
[info] "status" : "PERMISSION_DENIED"
[info] }

Could you grant me require permissions?

davidrabinowitz · 2020-06-23T21:17:25Z

/gcbrun

author Yuval Medina <ymed@google.com> 1592603880 +0000 committer Yuval Medina <ymed@google.com> 1594336084 +0000 Created ProtobufUtils.java in order to convert data and schema from Spark format into protobuf format for ingestion into BigQuery. Created Spark to BigQuery schema conversion suites in SchemaConverters.java. Created ProtobufUtilsTest.java and SchemaConverterTest.java to comprehensively test both classes. Translated Scala testing code in SchemaConvertersSuite.scala into java, and merged with SchemaConverters.java. Fixing SparkBigQueryConnectorUserAgentProvider initialization bug (GoogleCloudDataproc#186) prepare release 0.16.1 prepare for next development iteration Sectioned the schema converter file for easier readability. Added a Table creation method. Wrote comprehensive tests to check YuvalSchemaConverters. Now needs to improve equality testing: assertEquals does not check for more than superficial equality, so if further testing is to be done without the help of logs, it would be useful to write an equality function for schemas. Spark->BQ Schema working correctly. Blocked out Map functionality, as it is not supported. Made SchemaConverters, Schema-unit-tests more readable. Improved use of BigQuery library functions/iteration in SchemaConverters Adding acceptance test on Dataproc (GoogleCloudDataproc#193) In order to run the test: `sbt package acceptance:test` Added support for materialized views (GoogleCloudDataproc#192) Applying Google Java format on compile (GoogleCloudDataproc#203) Created Spark-BigQuery schema converter and created BigQuery schema - ProtoSchema converter. Now awaiting comprehensive tests before merging with master. Fixing SparkBigQueryConnectorUserAgentProvider initialization bug (GoogleCloudDataproc#186) prepare release 0.16.1 prepare for next development iteration Sectioned the schema converter file for easier readability. Added a Table creation method. Wrote comprehensive tests to check YuvalSchemaConverters. Now needs to improve equality testing: assertEquals does not check for more than superficial equality, so if further testing is to be done without the help of logs, it would be useful to write an equality function for schemas. Spark->BQ Schema working correctly. Blocked out Map functionality, as it is not supported. Made SchemaConverters, Schema-unit-tests more readable. Improved use of BigQuery library functions/iteration in SchemaConverters Renamed SchemaConverters file, about to merge into David's SchemaConverters. Improved unit tests to check the toBigQueryColumn method, instead of the more abstract toBigQuerySchema (in order to check each data type is working correctly. Tackling toProtoRows converter. BigQuery->ProtoSchema converter is passing all unit tests. Merged my (YuvalMedina) schema converters with David's (davidrabinowitz) SchemaConverters under spark.bigquery. Renamed my schema converters to SchemaConvertersDevelopment, in which I will continue working on a ProtoRows converter. SchemaConvertersDevelopment is passing all tests on Spark -> Protobuf Descriptor conversion, even on nested structs. Unit tests need to be written to tests actual row conversion (Spark values -> Protobuf values). Minor fixes to SchemaConverters.java: code needs to be smoothed out. ProtoRows converter is passing 10 unit tests, sparkRowToProtoRow test must be revised to confirm that ProtoRows conversion is fully working. All functions doing Spark InternalRow -> ProtoRow and BigQuery Schema -> ProtoSchema conversions were migrated from SchemaConverters.java to ProtoBufUtils.java. SchemaConverters.java now contains both Spark -> BigQuery as well as the original BigQuery -> Spark conversions. ProtoBufUtilsTests.java was created to test for functions in ProtoBufUtils separately. All conversion suites for Spark -> BigQuery, BigQuery -> ProtoSchema, and Spark rows -> ProtoRows are working correctly, and comprehensive tests were written. SchemaConvertersSuite.scala, which tests for BigQuery -> Spark conversions was translated into .java, and merged with SchemaConvertersTests.java. Cleaned up the SchemaConverter tests that were translated from Scala. Added a nesting-depth limit to Records created by the Spark->BigQuery converter. Deleted unnecessary comments Deleted a leftover TODO comment in SchemaConvertersTests Deleted some unnecessary tests. Last commit before write-support implementation Made minor edits according to davidrab@'s comments. Added license heading to all files that were created. Need to test if binary types are converted correctly to protobuf format. Integrated all of DavidRab's suggestions Adds implementation for supporting columnar batch reads from Spark. (GoogleCloudDataproc#198) This bypasses most of the existing translation code for the following reasons: 1. I think there might be a memory leak because the existing code doesn't close the allocator. 2. This avoids continuously recopying the schema. I didn't delete the old code because it appears the BigQueryRDD still relies on it partially. I also couldn't find instructions on formatting/testing (I couldn't find explicit unit tests for existing arrow code, I'll update accordingly if pointers can be provided). Changed tests as well Changed tests as well Added functionality to support more complex Spark types (such as StructTypes within ArrayTypes) in SchemaConverters and ProtobufUtils. There are known issues with Timestamp conversion into BigQuery format when integrating with BigQuery Storage Write API. Added support for materialized views (GoogleCloudDataproc#192) Applying Google Java format on compile (GoogleCloudDataproc#203) Created Spark-BigQuery schema converter and created BigQuery schema - ProtoSchema converter. Now awaiting comprehensive tests before merging with master. Fixing SparkBigQueryConnectorUserAgentProvider initialization bug (GoogleCloudDataproc#186) prepare release 0.16.1 prepare for next development iteration Sectioned the schema converter file for easier readability. Added a Table creation method. Wrote comprehensive tests to check YuvalSchemaConverters. Now needs to improve equality testing: assertEquals does not check for more than superficial equality, so if further testing is to be done without the help of logs, it would be useful to write an equality function for schemas. Spark->BQ Schema working correctly. Blocked out Map functionality, as it is not supported. Made SchemaConverters, Schema-unit-tests more readable. Improved use of BigQuery library functions/iteration in SchemaConverters Renamed SchemaConverters file, about to merge into David's SchemaConverters. Improved unit tests to check the toBigQueryColumn method, instead of the more abstract toBigQuerySchema (in order to check each data type is working correctly. Tackling toProtoRows converter. BigQuery->ProtoSchema converter is passing all unit tests. Merged my (YuvalMedina) schema converters with David's (davidrabinowitz) SchemaConverters under spark.bigquery. Renamed my schema converters to SchemaConvertersDevelopment, in which I will continue working on a ProtoRows converter. SchemaConvertersDevelopment is passing all tests on Spark -> Protobuf Descriptor conversion, even on nested structs. Unit tests need to be written to tests actual row conversion (Spark values -> Protobuf values). Minor fixes to SchemaConverters.java: code needs to be smoothed out. ProtoRows converter is passing 10 unit tests, sparkRowToProtoRow test must be revised to confirm that ProtoRows conversion is fully working. All functions doing Spark InternalRow -> ProtoRow and BigQuery Schema -> ProtoSchema conversions were migrated from SchemaConverters.java to ProtoBufUtils.java. SchemaConverters.java now contains both Spark -> BigQuery as well as the original BigQuery -> Spark conversions. ProtoBufUtilsTests.java was created to test for functions in ProtoBufUtils separately. All conversion suites for Spark -> BigQuery, BigQuery -> ProtoSchema, and Spark rows -> ProtoRows are working correctly, and comprehensive tests were written. SchemaConvertersSuite.scala, which tests for BigQuery -> Spark conversions was translated into .java, and merged with SchemaConvertersTests.java. Cleaned up the SchemaConverter tests that were translated from Scala. Added a nesting-depth limit to Records created by the Spark->BigQuery converter. Deleted unnecessary comments Deleted a leftover TODO comment in SchemaConvertersTests Deleted some unnecessary tests. Last commit before write-support implementation Made minor edits according to davidrab@'s comments. Added license heading to all files that were created. Need to test if binary types are converted correctly to protobuf format. Adds implementation for supporting columnar batch reads from Spark. (GoogleCloudDataproc#198) This bypasses most of the existing translation code for the following reasons: 1. I think there might be a memory leak because the existing code doesn't close the allocator. 2. This avoids continuously recopying the schema. I didn't delete the old code because it appears the BigQueryRDD still relies on it partially. I also couldn't find instructions on formatting/testing (I couldn't find explicit unit tests for existing arrow code, I'll update accordingly if pointers can be provided). Added functionality to support more complex Spark types (such as StructTypes within ArrayTypes) in SchemaConverters and ProtobufUtils. There are known issues with Timestamp conversion into BigQuery format when integrating with BigQuery Storage Write API. Revert "Merge branch 'writesupport' of https://github.com/YuvalMedina/spark-bigquery-connector into writesupport" This reverts commit 65294d8, reversing changes made to 814a1bf. Integrated David Rab's second round of suggestions. Ran sbt build

Gaurangi94 added 30 commits May 20, 2020 15:49

move SchemaConverters to scala

8aab4c1

move SchemaConverters to java

4efed2d

import required libraries

46cdcac

import required libraries

7790700

resolve conflicts on schemaconverter

2325d92

resolve conflicts on schemaconverter

32ebdef

Merge branch 'master' of github.com:Gaurangi94/spark-bigquery-connector

126c807

port avro and arrow binary iterators to java

234a2f0

merge with upstream

f6a00c0

add unchecked IO exception

5a3ee65

add unchecked IO exception

62abeb3

refactoring

e9b1ae8

refactor code

7ef8cd9

Merge branch 'master' of github.com:Gaurangi94/spark-bigquery-connector

d742f5c

Merge branch 'master' of https://github.com/GoogleCloudDataproc/spark…

03f1262

…-bigquery-connector

fix readme file for partition type

59bd9ea

fix readme file for partition type

e136422

fix readme file

45208d8

Merge remote-tracking branch 'direct/master'

26ffeeb

change default read format to avro

2cf3515

fix hasNext method of arrowBinaryIterator

3e836d3

Merge branch 'master' of https://github.com/GoogleCloudDataproc/spark…

9776d9a

…-bigquery-connector

Merge branch 'master' of https://github.com/GoogleCloudDataproc/spark…

e1ddb6d

…-bigquery-connector

Merge branch 'master' of github.com:Gaurangi94/spark-bigquery-connector

df777bc

add integration test for caching

c34f1f8

modify test to read from cache

8c23e4f

add benchmarking script

beefd81

Merge branch 'master' of https://github.com/GoogleCloudDataproc/spark…

cf08c55

…-bigquery-connector

add materialized view as table type

11fe64d

add materialized view as table type

0a1d5e3

Gaurangi94 added 3 commits June 19, 2020 14:21

Merge branch 'master' of github.com:Gaurangi94/spark-bigquery-connector

3d52374

Merge branch 'master' of github.com:Gaurangi94/spark-bigquery-connector

7694dd4

Merge branch 'master' of github.com:Gaurangi94/spark-bigquery-connector

e2bbe46

Gaurangi94 requested a review from davidrabinowitz June 19, 2020 21:22

Gaurangi94 added 2 commits June 19, 2020 14:45

Merge branch 'master' of github.com:Gaurangi94/spark-bigquery-connector

6c23842

Merge branch 'master' of github.com:Gaurangi94/spark-bigquery-connector

c0e1fb5

add support for materialized view

c59ca1f

davidrabinowitz requested changes Jun 23, 2020

View reviewed changes

connector/src/test/scala/com/google/cloud/spark/bigquery/it/SparkBigQueryEndToEndITSuite.scala Outdated Show resolved Hide resolved

connector/src/test/scala/com/google/cloud/spark/bigquery/it/SparkBigQueryEndToEndITSuite.scala Outdated Show resolved Hide resolved

add conditions for Materialized View

b7505d6

davidrabinowitz approved these changes Jun 23, 2020

View reviewed changes

davidrabinowitz merged commit beaed86 into GoogleCloudDataproc:master Jun 23, 2020

Gaurangi94 mentioned this pull request Jun 24, 2020

Add support to BigQuery materialized views #191

Closed

YuvalMedina pushed a commit to YuvalMedina/spark-bigquery-connector that referenced this pull request Jul 9, 2020

Added support for materialized views (GoogleCloudDataproc#192)

3da3708

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MATERIALIZED_VIEW as table type #192

Add MATERIALIZED_VIEW as table type #192

Gaurangi94 commented Jun 19, 2020

Gaurangi94 commented Jun 19, 2020

Gaurangi94 commented Jun 19, 2020

davidrabinowitz commented Jun 23, 2020

Gaurangi94 commented Jun 23, 2020

Gaurangi94 commented Jun 23, 2020 •

edited

Loading

davidrabinowitz commented Jun 23, 2020

Add MATERIALIZED_VIEW as table type #192

Add MATERIALIZED_VIEW as table type #192

Conversation

Gaurangi94 commented Jun 19, 2020

Gaurangi94 commented Jun 19, 2020

Gaurangi94 commented Jun 19, 2020

davidrabinowitz commented Jun 23, 2020

Gaurangi94 commented Jun 23, 2020

Gaurangi94 commented Jun 23, 2020 • edited Loading

davidrabinowitz commented Jun 23, 2020

Gaurangi94 commented Jun 23, 2020 •

edited

Loading