# Big Data HS 2023

## JSONiq tutorial - week 2 -3

This is the JSONiq tutorial for weeks 2 and 3.

Do not forget to use localhost:8888 as the URL to make sure the notebook is accessed via docker! And if it does not work, you should delete all containers, images, and volumes, then try again with



````
docker-compose up
````

Like last week, junst run the cell below to connect the Jupyter notebook with RumbleDB.

In [3]:
%load_ext rumbledb
%env RUMBLEDB_SERVER=http://localhost:9090/jsoniq

The rumbledb extension is already loaded. To reload it, use:
  %reload_ext rumbledb
env: RUMBLEDB_SERVER=http://localhost:9090/jsoniq


## Variable bindings and let clauses

Done? Alright. Let us now get started with some new material.

Since JSONiq is a functional and declarative language, it does not have variable assignment like in imperative languages like Java or Python: you cannot modify the value of a variable.

However, it does have variables and variable bindings; the difference between a variable binding and a variable assignment is that there is no "before" and "after". A variable is bound to some value for the purpose of evaluation in other expressions.

Variables (which start with a dollar sign) can be bound to values using let clauses, like so:


In [2]:
%%jsoniq
let $x := 1
return $x + $x

Took: 0.8695101737976074 ms
2


If you have already seen functional languages such as Haskell or OCaml, this should look familiar to you.

In the above JSONiq query, the variable \\$x is bound to the value 1, and then the expression \\$x + \\$x is evaluated knowing that $x is bound to the value 1, leading to the output 2.

Note that the indentation is irrelevant, the following also works:

In [4]:
%%jsoniq
let $x := 1 return $x + $x

Took: 0.04834318161010742 ms
2


In [5]:
%%jsoniq
let
  $x := 1 
return
  $x + $x

Took: 0.030689239501953125 ms
2


Having said that, we still strongly recommend to stick to the initial convention with the let and return clauses nicely aligned (this is to facilitate reading the query by a human).

Variables can also be bound to other values than numbers, for example booleans:

In [6]:
%%jsoniq
let $x := 1 < 2
return if($x)
       then "this is true"
       else "this is false"

Took: 0.033299922943115234 ms
"this is true"


It is also possible to have a cascade of let clauses -- as many as you want! But always remember to end with a return clause -- this is a functional language, the query must return something!

In [7]:
%%jsoniq
let $x := 1
let $y := $x + $x
let $z := $x < $y
return if($z)
       then "this is true"
       else "this is false"

Took: 0.01401209831237793 ms
"this is true"


As you notice with the above query, every clause can "see" the variables bound in previous clauses, in other words the scope of a variable binding is all the clauses (let and return) that follow.

In particular the following query will throw an error, because variables are used out of their scope:

In [8]:
%%jsoniq
let $x := $y
let $y := $z + $x
let $z := $x < $y
return if($z)
       then "this is true"
       else "this is false"

Took: 0.04521298408508301 ms
There was an error on line 1 in file:/home/:

let $x := $y
          ^

Code: [XPST0008]
Message: Uninitialized variable reference: y
Metadata: file:/home/:LINE:1:COLUMN:10:
This code can also be looked up in the documentation and specifications for more information.



It is possible to hide a variable binding by reusing a variable name -- but keep in mind that this is not an assignment!
It is a new binding that hides the previous one because there is no way to reference the previous binding any more: it is still there, but becomes invisible.
    

In [9]:
%%jsoniq
let $x := 1
let $x := $x + $x
let $x := $x + $x
return $x

Took: 0.030206918716430664 ms
4


If it confuses you, then just do not hide variables and use a new name every time.

## Reading a text file

Next, we will learn how to read some data, starting with text files.

Download the text file (The start of Hamlet, by Shakespeare) from [this location](https://www.rumbledb.org/samples/hamlet.txt) and copy it into the notebooks folder, besides this tutorial file. Rename it to hamlet.txt

Now, you can open it with:

In [None]:
%%jsoniq
unparsed-text("hamlet.txt")

The query above returned the contents of the text file as a single (big) string. This is very declarative, is it not?

By the way, RumbleDB can also directly get the file from the Web:


In [None]:
%%jsoniq
unparsed-text("https://www.rumbledb.org/samples/hamlet.txt")

## String functions

Now let us look into a few useful string functions. JSONiq has quite a large library of builtin functions.

One of them, contains() tests whether the first string contains the second one. For example, it is not surprising that our text file contains the substring "Hamlet":

In [12]:
%%jsoniq
contains(unparsed-text("https://www.rumbledb.org/samples/hamlet.txt"), "Hamlet")

Took: 0.06026482582092285 ms
true


It is perhaps also not surprising that it does not contain the substring "Bitcoin". By the way, let us now use a variable to store our document:

In [13]:
%%jsoniq
let $doc := unparsed-text("https://www.rumbledb.org/samples/hamlet.txt")
return contains($doc, "Bitcoin")

Took: 0.08081912994384766 ms
false


The function starts-with() tests whether the first string starts with the second one:

In [13]:
%%jsoniq
let $doc := unparsed-text("https://www.rumbledb.org/samples/hamlet.txt")
return starts-with($doc, "Hamlet")

Took: 0.5515632629394531 ms
true


And ends-with() tests whether the first string ends with the second one:

In [14]:
%%jsoniq
let $doc := unparsed-text("https://www.rumbledb.org/samples/hamlet.txt")
return ends-with($doc, "Who is there?")

Took: 0.5208020210266113 ms
false


## Opening a file as a sequence of strings

Now, let us start showing the superpowers of JSONiq. Like SQL (who handles tables as large sets of records), JSONiq can handle large collections in so-called sequences.

The function unparsed-text-lines() returns a sequence of strings, rather than a single string, like so:

In [6]:
%%jsoniq
unparsed-text-lines("hamlet.txt")

Took: 0.12991595268249512 ms
"Hamlet"
"by William Shakespeare"
"Edited by Barbara A. Mowat and Paul Werstine"
"  with Michael Poston and Rebecca Niles"
"Folger Shakespeare Library"
"https://shakespeare.folger.edu/shakespeares-works/hamlet/"
"Created on Jul 31, 2015, from FDT version 0.9.2"
""
"Characters in the Play"
"THE GHOST"
"HAMLET, Prince of Denmark, son of the late King Hamlet and Queen Gertrude"
"QUEEN GERTRUDE, widow of King Hamlet, now married to Claudius"
"KING CLAUDIUS, brother to the late King Hamlet"
"OPHELIA"
"LAERTES, her brother"
"POLONIUS, father of Ophelia and Laertes, councillor to King Claudius"
"REYNALDO, servant to Polonius"
"HORATIO, Hamlet's friend and confidant"
"Courtiers at the Danish court:"
"  VOLTEMAND"
"  CORNELIUS"
"  ROSENCRANTZ"
"  GUILDENSTERN"
"  OSRIC"
"  Gentlemen"
"  A Lord"
"Danish soldiers:"
"  FRANCISCO"
"  BARNARDO"
"  MARCELLUS"
"FORTINBRAS, Prince of Norway"
"A Captain in Fortinbras's army"
"Ambassadors to Denmark from England"
"Players who

This ability to process sequences will be fundamental to scale out: here we are reading from your local drive,
but JSONiq can also read from S3, Azure Blob Storage or HDFS, and it can handle sequences with billions of strings
like a charm. It would not be realistic to return a single big string for a dataset with Terabytes of data: a huge sequence of small strings is then the way to go.

# Try your own queries!

This notebook is interactive. You can edit all queries above and also execute your own! We will show you more features every week.

In [5]:
%%jsoniq
1+1

Took: 19.654428243637085 ms
2


In [17]:
%%jsoniq
1+1

Took: 0.014804363250732422 ms
2


In [18]:
%%jsoniq
1+1

Took: 0.01414799690246582 ms
2


In [19]:
%%jsoniq
1+1

Took: 0.015589237213134766 ms
2


In [20]:
%%jsoniq
1+1

Took: 0.014477968215942383 ms
2


In [21]:
%%jsoniq
1+1

Took: 0.015468597412109375 ms
2
