# Big Data HS 2023

## JSONiq tutorial - week 1

Every week, you will get a small tutorial notebook that introduces you to the JSONiq language with the RumbleDB engine. You can simply copy this notebook to the "notebooks" folder in your Exam MagicBox docker environment (provided by the TA team as a zip to download and uncompress on your laptop), and then open it in your browser.

Do not forget to use localhost:8888 as the URL to make sure the notebook is accessed via docker!

And if it does not work, you should delete all containers, images, and volumes, then try again with



````
docker-compose up
````

# A few words on JSONiq

JSONiq is a query language, just like SQL.

It is functional and declarative, just like SQL.

But there is something more: while SQL was designed for querying tables (ideally, in normal form), JSONiq can be used with denormalized data, even if it is very messy. We will see in the course, in due time, that the data that JSONiq can query is a superset of the data that SQL can query.

# A few words on RumbleDB

RumbleDB is a querying engine that supports JSONiq queries.

It is the result of several years of work by many ETH students, who contributed through their Master's thesis, Bachelor's thesis or semester project.

RumbleDB can query very small datasets (even just a few kilobytes), but it can also query very large datasets (we tested it well into the dozens of Terabytes with no issues, and we are confident it also works with Petabytes of data).

It works on your laptop just as well as on a large cluster in a data center (we tested it with up to 64 machines so far with no issues).

It can be invoked from the command line reading the query from a file, or can be used in shell mode, or can run as a server, interacting through Jupyter notebooks.

It is simply a jar file to download and only requires Java to work, although for your convenience, the TA team packaged it nicely in this docker to work with Jupyter notebooks.

# JSONiq as a calculator

This week, we will start smoothly with simple functionality: that of a calculator. In fact, this is similar to Python in this respect.

But first, some paperwork: just run the cell below to connect the Jupyter notebook with RumbleDB.

In [5]:
%load_ext rumbledb
%env RUMBLEDB_SERVER=http://localhost:9090/jsoniq

The rumbledb extension is already loaded. To reload it, use:
  %reload_ext rumbledb
env: RUMBLEDB_SERVER=http://localhost:9090/jsoniq


Done? Alright. Now, you can execute JSONiq query in Jupyter cells, as long as you include %%jsoniq on the first line. Try to run this!

In [7]:
%%jsoniq
1+1

Took: 0.8739359378814697 ms
2


JSONiq supports basic arithmetic: addition (+), subtraction (-), multiplication (*), division (div), modulo (mod).

In [4]:
%%jsoniq
6-4

Took: 0.042000770568847656 ms
2


In [4]:
%%jsoniq
6*4

Took: 0.02055978775024414 ms
24


In [5]:
%%jsoniq
6 div 4

Took: 0.03036212921142578 ms
1.5


In [6]:
%%jsoniq
6 mod 4

Took: 0.038191795349121094 ms
2


There are of course precedence rules (known as PEMDAS in English-speaking countries: search for it!).

Whenever you want to override the precedence, use parentheses. Whenever unsure about the precedence, use parentheses too. It is obvious that multiplication has precedence over addition, but when we introduce many more JSONiq expression, it will become tough for a human being to remember all the precendence rules. Better to have too many parentheses than too few.

In [7]:
%%jsoniq
(4 + 6) * (6 mod 2 - 1) div 2

Took: 0.042469024658203125 ms
-5


## Logic
Logical operations
JSONiq supports Boolean logic.



In [8]:
%%jsoniq
true and false

Took: 0.033081769943237305 ms
false


In [9]:
%%jsoniq
(true or false) and (false or true)

Took: 0.01787710189819336 ms
true


The unary not is also available:

In [10]:
%%jsoniq
not true

Took: 0.030964136123657227 ms
false


Note that JSONiq, unlike SQL, does two-valued logic. Nulls are automatically converted to false.

In [11]:
%%jsoniq
null and true

Took: 0.03037095069885254 ms
false


Some non-Booleans can also get converted. For example, non-empty strings are converted to true and empty strings to false. Non-zero numbers are converted to true, and zero to false.

In [25]:
%%jsoniq
not ""

Took: 0.07593774795532227 ms
true


In [13]:
%%jsoniq
not "non empty"

Took: 0.03122711181640625 ms
false


In [14]:
%%jsoniq
not 0

Took: 0.03744983673095703 ms
true


Zero is converted to false, non-zero numbers to true.

# Comparison

JSONiq supports comparisons, like SQL and all programming languages including Python.

In [15]:
%%jsoniq
2 = 1

Took: 0.04356193542480469 ms
false


In [16]:
%%jsoniq
2 != 1

Took: 0.03111863136291504 ms
true


In [17]:
%%jsoniq
2 > 1

Took: 0.04473590850830078 ms
true


In [18]:
%%jsoniq
2 < 1

Took: 0.02445697784423828 ms
false


In [19]:
%%jsoniq
2 >= 1

Took: 0.024837970733642578 ms
true


In [20]:
%%jsoniq
2 <= 1

Took: 0.031045913696289062 ms
false


You can use these, as well as logic, in if-then-else expressions!

In [21]:
%%jsoniq
if(2 = 1 + 1 and 3 > 2) then "This is true!" else "This is false!"

Took: 0.03601789474487305 ms
"This is true!"


By the way, lines and indentation is irrelevant, unlike in Python, but it looks nice to a human if you spread such expressions over multiple lines.

In [22]:
%%jsoniq
if(2 = 1 + 1 and 3 > 2)
then "This is true!"
else "This is false!"

Took: 0.034491777420043945 ms
"This is true!"


# Try your own queries!

This notebook is interactive. You can edit all queries above and also execute your own! We will show you more features every week.

In [None]:
%%jsoniq
1+1

In [None]:
%%jsoniq
1+1

In [None]:
%%jsoniq
1+1

In [None]:
%%jsoniq
1+1

In [None]:
%%jsoniq
1+1

In [None]:
%%jsoniq
1+1