# Apache Spark: Query data as an engineer

### Overview

Snowflake’s native integration with Apache Iceberg empowers organizations to build a highly interoperable and open lakehouse architecture. With streamlined support for batch and streaming data ingestion, transformation pipelines, and analytics, Snowflake simplifies complex workflows on Iceberg tables. Additionally, Snowflake Open Catalog, a managed service for Apache Polaris, offers robust role-based access controls, ensuring seamless data governance and secure collaboration across multiple engines.

### Step-By-Step Guide

For prerequisites and environment setup, please refer to the [QuickStart Guide](https://quickstarts.snowflake.com/guide/apache-iceberg-snowflake-open-catalog-snowpipe-streaming/index.html).

In [None]:
!pip install findspark==2.0.1 pyspark==3.5.0
!DEBIAN_FRONTEND=noninteractive apt-get install -y openjdk-13-jdk-headless

In [None]:
# TODO: Update POLARIS_ENGINEER_CLIENT_ID and POLARIS_ENGINEER_CLIENT_SECRET with your values
from pyspark.sql import SparkSession

POLARIS_ENGINEER_CLIENT_ID='5sT7EAyxxxxxxxxxxxxxxxx'
POLARIS_ENGINEER_CLIENT_SECRET='4wAEK0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
WAREHOUSE = 'snowflake_catalog'
PRINCIPAL_ENGINEER_ROLE = 'spark_engineer_role'

try:
    spark = SparkSession.builder.appName('iceberg') \
        .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160,software.amazon.awssdk:url-connection-client:2.20.160') \
        .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
        .config('spark.sql.defaultCatalog', 'polaris') \
        .config('spark.sql.catalog.polaris', 'org.apache.iceberg.spark.SparkCatalog') \
        .config('spark.sql.iceberg.vectorization.enabled', 'false') \
        .config('spark.sql.catalog.polaris.type', 'rest') \
        .config('spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation', 'vended-credentials') \
        .config('spark.sql.catalog.polaris.uri', f"https://obb44892.snowflakecomputing.com/polaris/api/catalog") \
        .config('spark.sql.catalog.polaris.credential', f"{POLARIS_ENGINEER_CLIENT_ID}:{POLARIS_ENGINEER_CLIENT_SECRET}") \
        .config('spark.sql.catalog.polaris.warehouse', f"{WAREHOUSE}") \
        .config('spark.sql.catalog.polaris.scope', f"PRINCIPAL_ROLE:{PRINCIPAL_ENGINEER_ROLE}") \
        .config('spark.sql.catalog.polaris.client.region','us-west-2') \
        .getOrCreate()
    spark
except Exception as e:
      print(e)

In [None]:
spark.sql("SHOW TABLES IN DASH_DB.RAW").show(truncate=False)

In [None]:
spark.sql("SELECT * FROM DASH_DB.RAW.STREAMING_VEHICLE_EVENTS").show(truncate=False)