# Apache Spark in BigQuery Studio

This notebook demonstrates how to use Apache Spark with your data in BigQuery.

You will use the `gcp_lakehouse_ds.order_items` table, which is a managed [BigQuery table for Apache Iceberg](https://cloud.google.com/bigquery/docs/iceberg-tables).

You can also install additional libraries via `pip` as desired.

Set your project id and project location.

In [1]:
project_id = "" # @param {type:"string"}
location = "" # @param {type:"string"}

Connect to a Spark Session configured to connect to BigQuery Metastore.

In [None]:
from google.cloud.spark_connect import GoogleSparkSession
from google.cloud.dataproc_v1 import Session

session = Session()

catalog = "lakehouse_catalog"

session.runtime_config.properties[f"spark.sql.catalog.{catalog}"] = "org.apache.iceberg.spark.SparkCatalog"
session.runtime_config.properties[f"spark.sql.catalog.{catalog}.catalog-impl"] = "org.apache.iceberg.gcp.bigquery.BigQueryMetastoreCatalog"
session.runtime_config.properties[f"spark.sql.catalog.{catalog}.gcp_project"] = project_id
session.runtime_config.properties[f"spark.sql.catalog.{catalog}.gcp_location"] = location
session.runtime_config.properties[f"spark.sql.catalog.{catalog}.warehouse"] = f"gs://lakehouse-warehouse-{project_id}/warehouse"

## Create a Spark session with the new configuration:
spark = GoogleSparkSession.builder.googleSessionConfig(session).getOrCreate()

Creating Spark session. It may take few minutes.
Interactive Session Detail View:  https://console.cloud.google.com/dataproc/interactive/us-central1/sc-20250402-202523-mup834?project=data-cloud-demo7


View the tables in your BigQuery dataset.

In [None]:
spark.sql(f"USE {catalog};")
spark.sql(f"USE {project_id};")
spark.sql("SHOW TABLES;").show()

Try asking Gemini how to query a specific table, such as a table in your generated `thelook_YOUR_PROJECT_ID`?

In [None]:
# prompt: how do I add 2+2?