# Spark Integration Example

This notebook demonstrates how to connect to Spark and interact with Iceberg tables using Spark Connect.

## Prerequisites

**⚠️ This notebook requires the integration test infrastructure to be running.**

To start the infrastructure, use one of these commands:
- `make test-integration-setup` - Start just the infrastructure
- `make notebook-infra` - Start infrastructure and launch JupyterLab

The infrastructure includes:
- Spark Connect server (port 15002)
- Iceberg REST catalog
- S3-compatible storage (MinIO)

In [None]:
# Import required libraries
from pyspark.sql import SparkSession

## Connecting to Spark

Connect to the Spark server using Spark Connect.

In [None]:
# Create SparkSession against the remote Spark Connect server
spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()
spark.sql("SHOW CATALOGS").show()

In [None]:
# Show available namespaces/databases
spark.sql("SHOW NAMESPACES").show()

In [None]:
# Show tables in the default namespace
spark.sql("SHOW TABLES FROM default").show()

## Exploring Iceberg Tables

Use Spark SQL commands to explore Iceberg table structure and metadata.

In [None]:
# Describe a table
spark.sql("DESCRIBE TABLE default.test_all_types").show()