# Jupyter Notebook with MonetDB-Spark

This notebook demonstrates how to use monetdb-spark with pyspark in a Jupyter notebook.
The most important part to make it load the required jars: the monetdb-spark jar and the monetdb-jdbc jar.

In the example below, they are in the directory /home/jvr/work/2025/monetdb-spark/jars: 

In [1]:
from glob import glob
SPARK_JARS = glob('/home/jvr/work/2025/monetdb-spark/jars/*')
SPARK_JARS

['/home/jvr/work/2025/monetdb-spark/jars/monetdb-spark-0.1.1-SNAPSHOT.jar',
 '/home/jvr/work/2025/monetdb-spark/jars/monetdb-jdbc-12.1-SNAPSHOT.jar']

### Create a Spark context with those jars on the classpath

We set the Spark config setting `spark.jars` to a comma-separated list of our jars

In [2]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
spark = SparkSession.builder \
    .appName('sparknotebook') \
    .config('spark.jars', ','.join(SPARK_JARS)) \
    .getOrCreate()

Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
25/08/25 13:31:55 WARN Utils: Your hostname, totoro, resolves to a loopback address: 127.0.1.1; using 100.66.66.19 instead (on interface wlp165s0)
25/08/25 13:31:55 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
25/08/25 13:31:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


### Check if the jars were succesfully loaded

Without monetdb-jdbc.jar, Spark cannot connect to MonetDB.

Without monetdb-spark.jar, Spark cannot create a table with BOOLEAN columns in MonetDB.

In [3]:
df = spark.range(5).withColumn('b', col('id') % 2 == 0)
df.show()

+---+-----+
| id|    b|
+---+-----+
|  0| true|
|  1|false|
|  2| true|
|  3|false|
|  4| true|
+---+-----+



In [6]:
OPTIONS = dict(
    driver='org.monetdb.jdbc.MonetDriver',
    url='jdbc:monetdb://localhost:44002/testspark',
    user='monetdb',
    password='monetdb',
    dbtable='foo',
)
df.write.format("jdbc").mode('overwrite').options(**OPTIONS).save()

25/08/25 13:46:36 WARN JdbcUtils: Requested isolation level 1 is not supported; falling back to default isolation level 8
25/08/25 13:46:36 WARN JdbcUtils: Requested isolation level 1 is not supported; falling back to default isolation level 8
25/08/25 13:46:36 WARN JdbcUtils: Requested isolation level 1 is not supported; falling back to default isolation level 8
25/08/25 13:46:36 WARN JdbcUtils: Requested isolation level 1 is not supported; falling back to default isolation level 8
25/08/25 13:46:36 WARN JdbcUtils: Requested isolation level 1 is not supported; falling back to default isolation level 8


It worked! Now append some data using the MonetDB-specific data source:

In [7]:
df.write.format("org.monetdb.spark").mode('append').options(**OPTIONS).save()

In [8]:
spark.read.format('jdbc').options(**OPTIONS).load().count()

10

Yay!