# Big Data HS 2025

## JSONiq tutorial - week 1

Every week, you will get a small tutorial notebook that introduces you to the JSONiq language with the RumbleDB engine. You can simply copy this notebook to the "notebooks" subfolder in your Exam MagicBox docker environment (the same environment that contains past exams, PostgreSQL, Spark, RumbleDB, etc).

If you have not set up your Exam MagicBox docker environment yet, then you can do so as follows:

- download it as a zip file from the course Moodle.
- uncompress it on your laptop
- go on the command line to the uncompressed folder
- launch the command `docker-compose up`

(for some people it seems to rather work without the dash: `docker compose up`)

- open the Jupyter notebook interface by directing your favorite browser to http://localhost:8888/lab
- copy this file (JSONiq-tutorial-1-ipynb) to the notebook subfolder (outside of the Jupyter interface, like you would copy any other files on your laptop)
- in the Jupyter notebook interface, this file should become visible on the left. Select this file (JSONiq-tutorial-1-ipynb) to open it, and continue reading.

If anything does not work, then what works in most situations is to "switch it off and on again", that is, you should delete all containers, images, and volumes, then run the docker-compose up command again. Deleting containers/images/volumes can be done on the docker User Interface, or on the command line, most likely:  
    
```
docker stop $(docker ps -aq);docker rm $(docker ps -aq);docker rmi $(docker images -a -q);docker volume rm $(docker volume ls -q);docker-compose up
```

# A few words on JSONiq

JSONiq is a query language, just like SQL.

It is functional and declarative, just like SQL.

But there is something more: while SQL was designed for querying tables (ideally, in normal form), JSONiq can be used with denormalized data, even if it is very messy. We will see in the course, in due time, that the data that JSONiq can query is a superset of the data that SQL can query.

# A few words on RumbleDB

RumbleDB is a querying engine that supports JSONiq queries.

It is the result of several years of work by many ETH students, who contributed through their Master's thesis, Bachelor's thesis or semester project.

RumbleDB can query very small datasets (even just a few kilobytes), but it can also query very large datasets (we tested it well into the dozens of Terabytes with no issues, and we are confident it also works with Petabytes of data).

It works on your laptop just as well as on a large cluster in a data center (we tested it with up to 64 machines so far with no issues).

Since the Summer of 2025, it is available packaged as a Python library that can be installed with "pip install jsoniq". If you are using the course's docker file, then you need not worry about this, because the docker comes with it preinstalled. If you felt adventurous and installed it the jsoniq package outside of our docker (the "fun task" suggested in the introduction lecture), then you can try and see if this notebook also works with your setup. If anything does not work, you can always fall back to our docker setup. 


# JSONiq as a calculator

This week, we will start smoothly with simple functionality: that of a calculator. In fact, this is similar to Python in this respect.

But first, some paperwork: just run the cell below to active the jsoniq magic (it is part of the jsoniq pip package).

In [None]:
%load_ext jsoniqmagic


Done? Alright. Now, you can execute JSONiq query in Jupyter cells, as long as you include %%jsoniq on the first line. Try to run this!

In [None]:
%%jsoniq
1+1

JSONiq supports basic arithmetic: addition (+), subtraction (-), multiplication (*), division (div), modulo (mod).

In [None]:
%%jsoniq
6-4

In [None]:
%%jsoniq
6*4

In [None]:
%%jsoniq
6 div 4

In [None]:
%%jsoniq
6 mod 4

There are of course precedence rules (known as PEMDAS in English-speaking countries: search for it!).

Whenever you want to override the precedence, use parentheses. Whenever unsure about the precedence, use parentheses too. It is obvious that multiplication has precedence over addition, but when we introduce many more JSONiq expression, it will become tough for a human being to remember all the precendence rules. Better to have too many parentheses than too few.

In [None]:
%%jsoniq
(4 + 6) * (6 mod 2 - 1) div 2

## Logic
Logical operations
JSONiq supports Boolean logic.



In [None]:
%%jsoniq
true and false

In [None]:
%%jsoniq
(true or false) and (false or true)

The unary not is also available:

In [None]:
%%jsoniq
not true

Note that JSONiq, unlike SQL, does two-valued logic. Nulls are automatically converted to false.

In [None]:
%%jsoniq
null and true

Some non-Booleans can also get converted. For example, non-empty strings are converted to true and empty strings to false. Non-zero numbers are converted to true, and zero to false.

In [None]:
%%jsoniq
not ""

In [None]:
%%jsoniq
not "non empty"

In [None]:
%%jsoniq
not 0

Zero is converted to false, non-zero numbers to true.

# Comparison

JSONiq supports comparisons, like SQL and all programming languages including Python.

In [None]:
%%jsoniq
2 = 1

In [None]:
%%jsoniq
2 != 1

In [None]:
%%jsoniq
2 > 1

In [None]:
%%jsoniq
2 < 1

In [None]:
%%jsoniq
2 >= 1

In [None]:
%%jsoniq
2 <= 1

You can use these, as well as logic, in if-then-else expressions!

In [None]:
%%jsoniq
if(2 = 1 + 1 and 3 > 2) then "This is true!" else "This is false!"

By the way, lines and indentation is irrelevant, unlike in Python, but it looks nice to a human if you spread such expressions over multiple lines.

In [None]:
%%jsoniq
if(2 = 1 + 1 and 3 > 2)
then "This is true!"
else "This is false!"

# Try your own queries!

This notebook is interactive. You can edit all queries above and also execute your own! We will show you more features every week.

In [None]:
%%jsoniq
1+1

In [None]:
%%jsoniq
1+1

In [None]:
%%jsoniq
1+1

In [None]:
%%jsoniq
1+1

In [None]:
%%jsoniq
1+1

In [None]:
%%jsoniq
1+1