Clarification on Billing and Improved README.md Explanation #1172

LakshmanKishore · 2024-01-17T16:06:06Z

Billing Concerns:

I'm a user of the Spark BigQuery Connector, and I've been using it in two different scenarios:

Directly executing a SQL query using option("query", sql).
Loading a table using load("project.dataset.table"), creating a temporary view, and then querying the view using Spark SQL.

In the first scenario, I observed a new BigQuery job being created, and the bytes billed were visible in the BigQuery console. However, in the second scenario, where I loaded the table and queried a temporary view, I didn't see a dedicated BigQuery job. Despite this, I'm uncertain whether there are still billing implications for the operations performed.

Request for Clarification:

Does the loading of the entire table into Spark using load("project.dataset.table") incur any billing?
How does the Spark BigQuery Connector optimize operations, and are there scenarios where operations don't result in explicit BigQuery jobs but still incur billing?
Upon transitioning from the free tier to a billing account, what are the potential costs associated with creating temporary tables in BigQuery and querying from them using the Spark BigQuery Connector?

Example 1: Direct SQL Query

df_direct_query = spark.read.format("bigquery").option("credentialsFile", "creds.json").option("parentProject", "your-project").option("query", "SELECT * FROM project.dataset.table").load()
df_direct_query.show()

Example 2: Loading Table and Querying Temporary View

df_load_table = spark.read.format("bigquery").option("credentialsFile", "creds.json").option("parentProject", "your-project").load("your-project.dataset.table")
df_load_table.createOrReplaceTempView("temp_view")
spark.sql("SELECT * FROM temp_view").show()

Documentation Enhancement:

For the benefit of new users, could the README.md file provide clearer information on the billing process, especially in scenarios where no explicit job is created but operations might still incur costs?

Additional Information:

Account Type: I'm using a free tier account, and I'm seeking clarification on the billing implications for different operations.
Observations: I noticed bytes billed for explicit BigQuery jobs but not for loading the entire table into Spark.

dmedora · 2024-03-19T19:38:35Z

As a note to others who come across this before docs are update:

Example 1: as noted, creates a BigQuery query job, and so you'd be billed either by bytes scanned (if using on-demand), or it would use query slots (if using reservations).
Example 2: if you read a table directly that does not require any intermediate materialization (ie. not using a query, not reading a view), then it will be read using the Storage Read API. This is still billed. You can identify that a read session was created by looking in Cloud Logging (protoPayload.methodName="google.cloud.bigquery.storage.v1.BigQueryRead.CreateReadSession").
See the BQ pricing docs for more info on query and Read API pricing.

davidrabinowitz assigned isha97 Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on Billing and Improved README.md Explanation #1172

Clarification on Billing and Improved README.md Explanation #1172

LakshmanKishore commented Jan 17, 2024

dmedora commented Mar 19, 2024

Clarification on Billing and Improved README.md Explanation #1172

Clarification on Billing and Improved README.md Explanation #1172

Comments

LakshmanKishore commented Jan 17, 2024

Billing Concerns:

Request for Clarification:

Example 1: Direct SQL Query

Example 2: Loading Table and Querying Temporary View

Documentation Enhancement:

Additional Information:

dmedora commented Mar 19, 2024