Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuring DataProc Cluster to Publish BigQuery Metrics on GCP #1233

Open
gmiano opened this issue May 24, 2024 · 1 comment
Open

Configuring DataProc Cluster to Publish BigQuery Metrics on GCP #1233

gmiano opened this issue May 24, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@gmiano
Copy link

gmiano commented May 24, 2024

Hello,

I am currently using the spark-bigquery-connector with a DataProc cluster for my data processing tasks. I would like to configure the DataProc cluster to publish BigQuery metrics to Google Cloud Platform (GCP) for better monitoring and analysis.

I understand that DataProc supports custom metrics as described in the custom metric collection guide. Specifically, I am interested in capturing metrics related to scanned bytes, as this information is crucial for accountability and tracking the cost of the service.

Could you please provide guidance on how to set up the DataProc cluster to enable publishing BigQuery metrics, particularly scanned bytes? Specifically, I am looking for details on:

Any required configurations or properties that need to be set in the DataProc cluster.
How to ensure that the metrics are properly published to GCP.
Any additional steps or best practices for setting up metric collection and monitoring for BigQuery jobs executed via DataProc.
Thank you for your assistance!

Best regards,
Giuseppe.

@isha97
Copy link
Member

isha97 commented Jun 3, 2024

Hi @gmiano ,

Publishing BigQuery metrics is currently on our roadmap.
But we do have metrics corresponding to read streams in the form of logs which can be accessed. Please use the latest connector version 0.39.0.

You can grep the logs by ReadStream Metrics to get those metrics.
The read session metrics are also available on Spark UI.
Currently, you would need to setup your own log processing if you would like to build dashboards on top of those metrics.

@isha97 isha97 added the enhancement New feature or request label Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants