[SUPPORT] BQ synch tool not working with HUDI bundle jar #10629

masthanmca · 2024-02-06T12:27:31Z

Tips before filing an issue

Have you gone through our FAQs? yes
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced
BQ sync is not working with hudi bundle jar
A clear and concise description of the problem.
I wanted to enable BQ sync while writing ingest the data into HUDI table using manifest file.
To Reproduce

Steps to reproduce the behavior:

create data frame with any schema
use the below options for Bq sync along with the other default HUDI configurations

    hiveConfigs.put("org.apache.hudi.gcp.bigquery.BigQuerySyncTool", "true")
 hiveConfigs.put("hoodie.gcp.bigquery.sync.project_id", bqSyncProjectId)
 hiveConfigs.put("hoodie.gcp.bigquery.sync.dataset_name", bqSyncDatasetName)
 hiveConfigs.put("hoodie.gcp.bigquery.sync.table_name", hoodieHiveSyncTable)
 hiveConfigs.put("hoodie.gcp.bigquery.sync.dataset_location", "us")
 hiveConfigs.put("hoodie.gcp.bigquery.sync.source_uri", bqSyncSourceUri)
 hiveConfigs.put("hoodie.gcp.bigquery.sync.source_uri_prefix", bqSyncSourceUriPrefix)
 hiveConfigs.put("hoodie.gcp.bigquery.sync.base_path", bqSyncBasePath)
 hiveConfigs.put("hoodie.gcp.bigquery.sync.partition_fields", hoodieHiveSyncPartitionFields)
 hiveConfigs.put("hoodie.gcp.bigquery.sync.use_bq_manifest_file", "true")

write the data frame in HUDI table.ds.write.format(HudiFormat).options(hoodieConfigs).options(hiveConfigs).mode(writeMode).save(location)

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Hudi version : 0.14.0
Spark version : 3.3.2
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) : GCS
Running on Docker? (yes/no) :no

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

No error , but external table not created in Big Query

The text was updated successfully, but these errors were encountered:

ad1happy2go · 2024-02-06T14:16:35Z

@masthanmca
Is the the first time you are facing this issue or after upgrade you started facing this one.

Your configurations also looks wrong? From where you got these or which doc you referred?
can you refer - https://hudi.apache.org/docs/gcp_bigquery/

abhishekshenoy · 2024-02-19T04:44:46Z

Facing the same issue , does not work with org.apache.hudi:hudi-spark3.3-bundle_2.12:0.14.1 .

Hudi Write to path works , Hive Sync works but BQ sync does not work.

For now have taken this route based on a flag to manually perform the BQSync with BQSyncTool post the dataframe.write

#9355 (comment)

ad1happy2go · 2024-02-19T12:03:37Z

@abhishekshenoy @masthanmca That (#9355 (comment)) i.e. BigQuerySyncTool is the correct way of doing BQ sync with batch jobs.

The another way is doing this with HudiStreamer.

abhishekshenoy · 2024-02-20T04:16:41Z

@ad1happy2go @the-other-tim-brown

But should nt that be internally called when we are providing the Hudi Bq 
configs and enabling META_SYNC_ENABLED. 

In my case we use df.write.options(hudiAndHiveAndBQConfigs).save() and 
the hudiAndHiveAndBQConfigs has both hive and bq related configs . 

*But still only hive sync happens implicitly*. 

Is it by design that as part of our write function we need to perform both 

df.write.options(hudiAndHiveAndBQConfigs).save()
new BigQuerySyncTool(getBigQueryProps).syncHoodieTable()

ad1happy2go · 2024-02-22T08:34:14Z

@masthanmca @abhishekshenoy I went through the code and identified that we need to set both the class names to do both metastync together. The default value for below prop is just hive sync. I tried with 0.14.1 hudi version and after write and hive sync completed, it tried to do Big query sync also.

"hoodie.meta.sync.client.tool.class" : "org.apache.hudi.hive.HiveSyncTool,org.apache.hudi.gcp.bigquery.BigQuerySyncTool"

ad1happy2go · 2024-02-27T15:37:51Z

@masthanmca Closing out this issue as I confirmed it works. Please reopen in case you still see this issue.

codope added meta-sync gcp-support issues related to google ecosystem priority:major degraded perf; unable to move forward; potential bugs labels Feb 7, 2024

codope added the on-call-triaged label Feb 22, 2024

codope closed this as completed Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] BQ synch tool not working with HUDI bundle jar #10629

[SUPPORT] BQ synch tool not working with HUDI bundle jar #10629

masthanmca commented Feb 6, 2024

ad1happy2go commented Feb 6, 2024

abhishekshenoy commented Feb 19, 2024 •

edited

Loading

ad1happy2go commented Feb 19, 2024

abhishekshenoy commented Feb 20, 2024 •

edited

Loading

ad1happy2go commented Feb 22, 2024

ad1happy2go commented Feb 27, 2024

[SUPPORT] BQ synch tool not working with HUDI bundle jar #10629

[SUPPORT] BQ synch tool not working with HUDI bundle jar #10629

Comments

masthanmca commented Feb 6, 2024

ad1happy2go commented Feb 6, 2024

abhishekshenoy commented Feb 19, 2024 • edited Loading

ad1happy2go commented Feb 19, 2024

abhishekshenoy commented Feb 20, 2024 • edited Loading

ad1happy2go commented Feb 22, 2024

ad1happy2go commented Feb 27, 2024

abhishekshenoy commented Feb 19, 2024 •

edited

Loading

abhishekshenoy commented Feb 20, 2024 •

edited

Loading