# JDBC/SQL Catalog
Now we are setting up and testing the JDBC/SQL Catalog.

## Importing Required Libraries
We will be importing `SparkSession` and `os`, which is used to read environment variable for the Minio access key and secret.

We also set some styling to display tables better.

In [1]:
from pyspark.sql import SparkSession
import os

# this is to better display pyspark dataframes
from IPython.core.display import HTML
display(HTML("<style>pre { white-space: pre !important; }</style>"))

## Setting up Spark Session
Details docs of the spark configs to use with the Nessie catalog can be found [here](https://iceberg.apache.org/docs/1.5.0/jdbc/#configurations).
We will setting up `iceberg` as the catalog name.

In [2]:
iceberg_catalog_name = "iceberg"
spark = SparkSession.builder \
  .appName("iceberg-jdbc") \
  .config("spark.driver.memory", "4g") \
  .config("spark.executor.memory", "4g") \
  .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
  .config("spark.jars", "/opt/extra-jars/iceberg-spark-runtime.jar,/opt/extra-jars/iceberg-aws-bundle.jar,/opt/extra-jars/postgresql.jar") \
  .config(f"spark.sql.catalog.{iceberg_catalog_name}", "org.apache.iceberg.spark.SparkCatalog") \
  .config(f"spark.sql.catalog.{iceberg_catalog_name}.type", "jdbc") \
  .config(f"spark.sql.catalog.{iceberg_catalog_name}.uri", "jdbc:postgresql://postgres:5432/iceberg") \
  .config(f"spark.sql.catalog.{iceberg_catalog_name}.jdbc.user", "postgres") \
  .config(f"spark.sql.catalog.{iceberg_catalog_name}.jdbc.password", "postgres") \
  .config(f"spark.sql.catalog.{iceberg_catalog_name}.io-impl", "org.apache.iceberg.aws.s3.S3FileIO") \
  .config(f"spark.sql.catalog.{iceberg_catalog_name}.warehouse", "s3://warehouse/iceberg-jdbc/") \
  .config(f"spark.sql.catalog.{iceberg_catalog_name}.s3.endpoint", "http://minio:9000") \
  .config(f"spark.sql.catalog.{iceberg_catalog_name}.s3.path-style-access", "true") \
  .getOrCreate()

24/09/02 15:33:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


## Load Test Data

In [3]:
df = spark.read.parquet("file:///home/iceberg/workspace/downloaded-data/yellow_tripdata_2024-01.parquet")

24/09/02 15:34:08 WARN GarbageCollectionMetrics: To enable non-built-in garbage collector(s) List(G1 Concurrent GC), users should configure it(them) to spark.eventLog.gcMetrics.youngGenerationGarbageCollectors or spark.eventLog.gcMetrics.oldGenerationGarbageCollectors


## Creating namespace under the catalog
Now we created the namespace`jdbc`, with the location `s3://warehouse/iceberg-jdbc/` in Minio

In [4]:
spark.sql("CREATE NAMESPACE IF NOT EXISTS iceberg.jdbc LOCATION 's3://warehouse/iceberg-jdbc/'")

24/09/02 15:34:46 WARN JdbcCatalog: JDBC catalog is initialized without view support. To auto-migrate the database's schema and enable view support, set jdbc.schema-version=V1


DataFrame[]

## Writing the data to Iceberg Table
Finally, writing the data to the Iceberg table.

In [5]:
df.writeTo("iceberg.jdbc.yellow_tripdata_2024_01").create()

                                                                                

We then check the data saved to Minio. 

In [6]:
!mc ls --recursive minio/warehouse/iceberg-jdbc

]11;?\[6n[m[32m[2024-09-02 15:35:07 UTC][0m[33m  16MiB[0m [34mSTANDARD[0m[1m jdbc/yellow_tripdata_2024_01/data/00001-2-3060fcc0-acb0-45c8-b138-9623213545b9-0-00001.parquet[22m[m
[m[32m[2024-09-02 15:35:08 UTC][0m[33m  16MiB[0m [34mSTANDARD[0m[1m jdbc/yellow_tripdata_2024_01/data/00003-4-3060fcc0-acb0-45c8-b138-9623213545b9-0-00001.parquet[22m[m
[m[32m[2024-09-02 15:35:06 UTC][0m[33m  13MiB[0m [34mSTANDARD[0m[1m jdbc/yellow_tripdata_2024_01/data/00006-7-3060fcc0-acb0-45c8-b138-9623213545b9-0-00001.parquet[22m[m
[m[32m[2024-09-02 15:35:09 UTC][0m[33m 3.7KiB[0m [34mSTANDARD[0m[1m jdbc/yellow_tripdata_2024_01/metadata/00000-abf02028-2e56-46b1-9630-f76593fdb133.metadata.json[22m[m
[m[32m[2024-09-02 15:35:08 UTC][0m[33m 8.4KiB[0m [34mSTANDARD[0m[1m jdbc/yellow_tripdata_2024_01/metadata/35c15e93-be9a-4960-9e81-171ea6613ab1-m0.avro[22m[m
[m[32m[2024-09-02 15:35:08 UTC][0m[33m 4.2KiB[0m [34mSTANDARD[0m[1m jdbc/yellow_tripdata_2024_01/