The Ganymede Kernel is a Jupyter Notebook Java kernel. Java code is compiled and interpreted with the Java Shell tool, JShell. This kernel offers the following additional features:
-
Integrated Project Object Model (POM) for Apache Maven artifact dependency resolution1
-
Integrated support for Structured Query Language (SQL) through JDBC and jOOQ
-
Integrated support for JSR 223 scripting languages including:
-
Templates (via any of Thymeleaf, Markdown (CommonMark) with JMustache, FreeMarker, and Velocity)
-
Support for Apache Spark and Scala binary distributions
The Ganymede Kernel is distributed as a single JAR (download here).
⚠️ Only Jupyter Notebook versions before 7 (<7
) are fully supported at this time. See thePipfile
in ganymede-notebooks for a minimal Python configuration.
Java 11 or later is required. In addition to Java, the Jupyter Notebook
must be installed first and the jupyter
and python
commands must be on the ${PATH}
. Then the typical (and
minimal) installation command line:
$ java -jar ganymede-2.1.2.20230910.jar -i
The kernel will be configured to use the same java
installation as invoked in the install command above. These additional
command line options are supported.
Option | Action | Default |
---|---|---|
--id-prefix=<prefix> | Adds prefix to kernel ID | <none> |
--id=<id> | Specifies kernel ID | ganymede-${version}-java-${java.specification.version} |
--id-suffix=<suffix> | Adds suffix to kernel ID | <none> |
--display-name-prefix=<prefix> | Adds prefix to kernel display name | <none> |
--display-name=<name> | Specifies kernel display name | Ganymede ${version} (Java ${java.specification.version} ) |
--display-name-suffix=<suffix> | Adds suffix to kernel display name | <none> |
--env | Specify NAME=VALUE pair(s) to add to kernel environment | |
--copy-jar=<boolean> | Copies the Ganymede Kernel JAR to the kernelspec directory |
true |
--sys-prefix or --user |
Install in the system prefix or user path (see the jupyter kernelspec install command). |
--user |
The following Java system properties may be configured.
System Properties | Action | Default(s) | ||||
---|---|---|---|---|---|---|
maven.repo.local | Configures the local Maven repository |
|
The following OS environment variables may be configured:
Environment Variable | Option | Action |
---|---|---|
SPARK_HOME | --spark-home=<path> | If configured, the kernel will add the Apache Spark JARs to the kernel's classpath. |
HIVE_HOME | --hive-home=<path> | If configured, the kernel will add the Apache Hive JARs to the kernel's classpath. |
For example, a sophisticated configuration to test a snapshot out of a user's local Maven repository:
$ export JAVA_HOME=$(/usr/libexec/java_home -v 11)
$ ${JAVA_HOME}/bin/java \
-jar ${HOME}/.m2/repository/ganymede/ganymede/2.2.0-SNAPSHOT/ganymede-2.2.0-SNAPSHOT.jar \
-i --sys-prefix --copy-jar=false \
--id-suffix=spark-3.3.4 --display-name-suffix="with Spark 3.3.4" \
--spark_home=/path/to/spark-home --hive_home=/path/to/hive-home
$ jupyter kernelspec list
Available kernels:
...
ganymede-2.2.0-java-11-spark-3.3.4 /.../share/jupyter/kernels/ganymede-2.2.0-java-11-spark-3.3.4
...
would result in the configured
${jupyter.data}/kernels/ganymede-2.2.0-java-11-spark-3.3.4/kernel.json
kernelspec:
{
"argv": [
"/Library/Java/JavaVirtualMachines/graalvm-ce-java11-22.3.0/Contents/Home/bin/java",
"--add-opens",
"java.base/jdk.internal.misc=ALL-UNNAMED",
"--illegal-access=permit",
"-Djava.awt.headless=true",
"-Djdk.disableLastUsageTracking=true",
"-Dmaven.repo.local=/Users/ball/Notebooks/.venv/share/jupyter/repository",
"-jar",
"/Users/ball/.m2/repository/dev/hcf/ganymede/ganymede/2.2.0-SNAPSHOT/ganymede-2.2.0-SNAPSHOT.jar",
"-f",
"{connection_file}"
],
"display_name": "Ganymede 2.2.0 (Java 11) with Spark 3.3.4",
"env": {
"JUPYTER_CONFIG_DIR": "/Users/ball/.jupyter",
"JUPYTER_CONFIG_PATH": "/Users/ball/.jupyter:/Users/ball/Notebooks/.venv/etc/jupyter:/usr/local/etc/jupyter:/etc/jupyter",
"JUPYTER_DATA_DIR": "/Users/ball/Library/Jupyter",
"JUPYTER_RUNTIME_DIR": "/Users/ball/Library/Jupyter/runtime",
"SPARK_HOME": "/path/to/spark-home"
},
"interrupt_mode": "message",
"language": "java"
}
The kernel makes extensive use of templates and POM fragments. While not strictly required, the authors suggest that the Hide Input extension is enabled so notebook authors can hide the input templates and POMs for any finished product. This may be set from the command line with:
$ jupyter nbextension enable hide_input/main --sys-prefix
(or --user
as appropriate).
The following subsections outline many of the features of the kernel.
The Java REPL is JShell and has all the Java features of the installed JVM. The minimum required Java version is 11 and subsequent versions are supported.
The JShell environment includes builtin functions implemented through
methods that wrap the public
methods defined in NotebookContext class
annotated with @NotebookFunction. These functions
include:
Method | Description |
---|---|
print(Object) | Render the Object to a Notebook format |
display(Object) | Render the Object to a Notebook format |
asJson(Object) | Convert argument to JsonNode |
asYaml(Object) | Convert argument to YAML (String) |
The builtin functions are mostly concerned with "printing" or displaying
(rendering) Objects to multimedia formats. For example, print(byte[])
will render the byte array as an image. Integrated renderers for chart and
plot objects include:
The trig.ipynb notebook demonstrates rendering of an XChart.
As discussed in the next section, the magic identifier for java is %%java
.
A cell identified with %%java
with no code will provide a table of variable
bindings in the context with types and values. The types are links to the
corresponding javadoc (if known).
Name | Type | Value |
---|---|---|
$$ | ganymede.notebook.NotebookContext | NotebookContext(super=ganymede.notebook.NotebookContext@af7e376) |
by_state | org.apache.spark.sql.Dataset<Row> | [Country/Region: string, Province/State: string ... 1 more field] |
chart | org.knowm.xchart.PieChart | org.knowm.xchart.PieChart@767f4a69 |
countries_aggregated | org.apache.spark.sql.Dataset<Row> | [Date: date, Country: string ... 3 more fields] |
dates | org.apache.spark.sql.Dataset<Row> | [Date: date] |
interval | org.apache.spark.sql.Row | [2020-01-22,2022-04-16] |
key_countries_pivoted | org.apache.spark.sql.Dataset<Row> | [Date: date, China: int ... 7 more fields] |
reader | org.apache.spark.sql.DataFrameReader | org.apache.spark.sql.DataFrameReader@5a88849 |
reference | org.apache.spark.sql.Dataset<Row> | [UID: int, iso2: string ... 10 more fields] |
session | org.apache.spark.sql.SparkSession | org.apache.spark.sql.SparkSession@1b6683c4 |
snapshot | org.apache.spark.sql.Dataset<Row> | [Country/Region: string, Deaths: int] |
time_series_19_covid_combined | org.apache.spark.sql.Dataset<Row> | [Date: date, Country/Region: string ... 4 more fields] |
us_confirmed | org.apache.spark.sql.Dataset<Row> | [Admin2: string, Date: date ... 3 more fields] |
us_deaths | org.apache.spark.sql.Dataset<Row> | [Admin2: string, Date: date ... 3 more fields] |
us_simplified | org.apache.spark.sql.Dataset<Row> | [Date: date, Admin2: string ... 4 more fields] |
worldwide_aggregate | org.apache.spark.sql.Dataset<Row> | [Date: date, Confirmed: int ... 3 more fields] |
Cell magic commands are identified by %%
starting the first line of a code
cell. The list of available magic commands is shown below. The default
cell magic is java
.
Name(s) | Description |
---|---|
!, script | Execute script with the argument command |
bash | Execute script with 'bash' command |
classpath | Add to or print JShell classpath |
env | Add/Update or print the environment |
freemarker | FreeMarker template evaluator |
groovy | Execute code in groovy REPL |
html | HTML template evaluator |
java | Execute code in Java REPL |
javascript, js | Execute code in javascript REPL |
kotlin | Execute code in kotlin REPL |
magics | Lists available cell magics |
markdown | Markdown template evaluator |
mustache, handlebars | Mustache template evaluator |
perl | Execute script with 'perl' command |
pom | Define the Notebook's Project Object Model |
ruby | Execute script with 'ruby' command |
scala | Execute code in scala REPL |
sh | Execute script with 'sh' command |
spark-session | Configure and start a Spark session |
sql | Execute code in SQL REPL |
thymeleaf | Thymeleaf template evaluator |
velocity | Velocity template evaluator |
script
, bash
, perl
, etc. are executed by creating a Process
instance. groovy
, javascript
,
kotlin
, etc. are provided through their respective JSR 223
interfaces.3 Dependency and classpath
management are provided with the classpath
and pom
magics and are
described in detail in a subsequent subsection. thymeleaf
and html
provide Thymeleaf template evaluation.
The kernel does not implement any "line" magics.
The classpath
magic adds JAR and directory paths to the JShell
classpath. The pom
magic resolves and downloads Maven
artifacts and then adds those artifacts to the classpath.
The trig.ipynb notebook demonstrates the use of the pom
magic to resolve
the org.knowm.xchart:xchart:LATEST
artifact and its transient dependencies.
%%pom
dependencies:
- org.knowm.xchart:xchart:LATEST
The POM is expressed in YAML and repositories and dependencies may be
expressed. The Notebook's POM may be split across multiple cells since each
repository and dependency is added or merged and dependency resolution is
attempted whenever a pom
cell is executed. The default/initial Notebook
POM is:
repositories:
- id: central
layout: default
url: https://repo1.maven.org/maven2
snapshots:
enabled: false
Dependencies may either be expressed in "expanded" YAML or in
groupId:artifactId[:extension]:version
format:
dependencies:
- groupId: groupA
artifactId: groupAartifact1
version: 1.0
- groupB:groupB-artifact2:2.0
The specific attributes for repositories and dependencies are defined by the Apache Maven Artifact Resolver classes RemoteRepository (with RepositoryPolicy) and Dependency. (Note that these classes are slightly different than their Maven settings counterparts.)
Whenever a JAR is added to the classpath, it is analyzed to determine if its Maven coordinates can be determined and, if they can be determined, the JAR is added as an artifact to the resolver. The following checks are made before adding the JAR to the JShell classpath:
-
It is a new, unique path
-
No previously resolved artifact with the same
groupId:artifactId
on the classpath -
Special heuristics for logging configuration:
a. Ignore
commons-logging:commons-logging:jar
b. Allow only one of
org.slf4j:jcl-over-slf4j:jar
ororg.springframework:spring-jcl:jar
to be configuredc. Allow only one of
org.slf4j:slf4j-log4j12:jar
andch.qos.logback:logback-classic:jar
to be configured
Artifacts that fail any of the above checks will be (mostly silently) ignored. Because only the first version of a resolved artifact is ever added to the classpath, the kernel must be restarted if a different version of the same artifact is specified for the change to take effect.
Finally, the kernel provides special processing to add
artifacts from Apache Spark binary distributions. The dependencies for
Spark SQL and corresponding Scala compiler artifacts
for currently available Spark binary distributions as resources. The kernel
searches the ${SPARK_HOME}
for JARs for which it has the corresponding
dependencies and then resolves the dependencies from the ${SPARK_HOME}
hierarchy with the heuristics described above.
The SQL Magic provides the client interface to database servers through JDBC and jOOQ. Its usage is as follows:
Usage: sql [--[no-]print] [<url>] [<username>] [<password>]
[<url>] JDBC Connection URL
[<username>] JDBC Connection Username
[<password>] JDBC Connection Password
--[no-]print Print query results. true by default
For example:
%%sql jdbc:mysql://127.0.0.1:33061/epg?serverTimezone=UTC
SELECT * FROM schedules LIMIT 3;
airDateTime | stationID | json | duration | md5 | programID |
---|---|---|---|---|---|
1533945600 | 10139 | { "programID" : "EP009370080215", "airDateTime" : "2018-08-11T00:00:00Z", "duration" : 3600, "md5" : "S1UDH1R60Eagc1E3V5Qslw", "audioProperties" : [ "cc" ], "ratings" : [ { "body" : "USA Parental Rating", "code" : "TVPG" } ] } | 3600 | S1UDH1R60Eagc1E3V5Qslw | EP009370080215 |
1533945600 | 10142 | { "programID" : "EP006062993248", "airDateTime" : "2018-08-11T00:00:00Z", "duration" : 3600, "md5" : "2FQ8y5PsXl1vtxcmUBeppg", "new" : true, "audioProperties" : [ "cc" ], "ratings" : [ { "body" : "USA Parental Rating", "code" : "TVPG" } ] } | 3600 | 2FQ8y5PsXl1vtxcmUBeppg | EP006062993248 |
1533945600 | 10145 | { "programID" : "EP022439260394", "airDateTime" : "2018-08-11T00:00:00Z", "duration" : 1800, "md5" : "mUewfiqM8+dh24WQg2WfpQ", "audioProperties" : [ "cc" ] } | 1800 | mUewfiqM8+dh24WQg2WfpQ | EP022439260394 |
The SQL Magic accepts the --print
/--no-print
options to print or
suppress query results. If no JDBC URL is specified, the most recently used
connection will be used. The List of most recent jOOQ Queries
are stored in $$.sql.queries with
$$.sql.results containing the corresponding
Results. For example:
%%sql --no-print
SELECT COUNT(*) FROM programs;
%%java
print($$.sql.results.get(0));
count(*) |
---|
1024495 |
MySQL and PostgreSQL JDBC drivers are provided in the Ganymede runtime.
The spark-session
magic is provided to initialize Apache Spark
sessions.
Usage: spark-session [--[no-]enable-hive-if-available] [<master>] [<appName>]
[<master>] Spark master
[<appName>] Spark appName
--[no-]enable-hive-if-available
Enable Hive if available. true by default
Its typical usage:
%%spark-session local[*] covid-19
# Optional name/value pairs parsed as Properties
is roughly equivalent to:
var config = new SparkConf();
/*
* Properties copied to SparkConf instance.
*/
var session =
SparkSession.builder()
.config(config)
.master("local").appName("covid-19")
.getOrCreate();
The SparkSession can then be accessed in Java and other JVM code with the SparkSession.active() static method.
Other Laguages (JSR 223)
The kernel leverages the java.scripting API to provide
groovy
, javascript
, kotlin
, and
scala
.4
The script
magic (with the alias !
) may be used to run an operating
system command with the remaining code in the cell fed to the Process's
standard input. bash
, perl
, ruby
, and sh
are provided as aliases
for %%!bash
, %%!perl
, etc., respectively.
A number of templating languages are supported as magics:
- Markdown (CommonMark preprocessed with JMustache)
- Apache FreeMarker
- Apache Velocity
- JMustache
- Thymeleaf
The following subsections provide examples of the markdown
and thymeleaf
magics but the other template magics are similar. Please refer to the
installation instructions for discussion of enabling the Hide Input
extension so only the template output is displayed in the notebook.
The template magic markdown
provides Markdown processing with JMustache
preprocessing:
%%java
import java.util.stream.Stream;
import static java.util.stream.Collectors.toList;
var fib =
Stream.iterate(new int[] { 0, 1 }, t -> new int[] { t[1], t[0] + t[1] })
.mapToInt(t -> t[0])
.limit(10)
.boxed()
.collect(toList());
%%markdown
| Index | Value |
| --- | --- |
{{#fib}}| {{-index}} | {{this}} |
{{/fib}}
Index | Value |
---|---|
0 | 0 |
1 | 1 |
2 | 1 |
3 | 2 |
4 | 3 |
5 | 5 |
6 | 8 |
7 | 13 |
8 | 21 |
9 | 34 |
The template magics thymeleaf
and html
offer templating with
Thymeleaf. All defined Java variables are bound into the Thymeleaf
context before evaluation. For example (Java implementation detail
removed):
%%java
...
var map = new TreeMap<Ranking,List<Card>>(Ranking.COMPARATOR.reversed());
...
var rankings = Arrays.asList(Ranking.values());
...
%%html
<table>
<tr th:each="ranking : ${rankings}">
<th:block th:if="${map.containsKey(ranking)}">
<th th:text="${ranking}"/><td th:each="card : ${map.get(ranking)}" th:text="${card}"/>
</th:block>
</tr>
<tr><th>Remaining</th><td th:each="card : ${deck}" th:text="${card}"/></tr>
</table>
Would generate:
RoyalFlush | A-♤ | K-♤ | Q-♤ | J-♤ | 10-♤ |
---|---|---|---|---|---|
StraightFlush | K-♡ | Q-♡ | J-♡ | 10-♡ | 9-♡ |
FourOfAKind | 8-♤ | 8-♡ | 8-♢ | 8-♧ | 2-♧ |
FullHouse | A-♡ | A-♢ | A-♧ | K-♢ | K-♧ |
Flush | Q-♢ | J-♢ | 10-♢ | 9-♢ | 7-♢ |
Straight | 7-♤ | 6-♤ | 5-♤ | 4-♤ | 3-♤ |
ThreeOfAKind | 6-♡ | 6-♢ | 6-♧ | 3-♧ | 4-♧ |
TwoPair | 9-♤ | 9-♧ | 7-♡ | 7-♧ | 5-♧ |
Pair | 5-♡ | 5-♢ | 10-♧ | J-♧ | 2-♢ |
HighCard | Q-♧ | 3-♢ | 4-♢ | 2-♡ | 3-♡ |
Remaining | 4-♡ | 2-♤ |
Javadoc is published at https://allen-ball.github.io/ganymede.
Ganymede Kernel is released under the Apache License, Version 2.0, January 2004.
[1] Implemented with Apache Maven Artifact Resolver. ↩
[2] With the built-in Oracle Nashorn engine. ↩
[3]
scala
is special cased: It requires additional dependencies be
specified at runtime and is optimized to be used with Apache Spark.
↩
[4] Ibid. ↩