# <center>Rumble sandbox</center>


This is a Rumble sandbox that allows you to play with simple JSONiq queries.

It is a jupyter notebook that you can also download and execute on your own machine, but if you arrived here from the Rumble website, it is likely to be shown within Google's Colab environment.

To get started, you first need to execute the cell below to activate the Rumble magic (you do not need to understand what it does, this is just initialization Python code).

In [1]:
import requests
import json
import time
from IPython.core.magic import register_line_cell_magic

@register_line_cell_magic
def rumble(line, cell=None):
    if cell is None:
        data = line
    else:
        data = cell

    start = time.time()                                                         
    response = json.loads(requests.post(server, data=data).text)                   
    end = time.time()                                                              
    print("Took: %s ms" % (end - start))

    if 'warning' in response:
        print(json.dumps(response['warning']))
    if 'values' in response:
        for e in response['values']:
            print(json.dumps(e))
    elif 'error-message' in response:
        return response['error-message']
    else:
        return response

By default, this notebook uses a small public backend provided by us (very limited in CPU and memory, and with only the http scheme activated) that is sufficient to discover Rumble. This is new and experimental, so that it may occasionally break, especially if too many users use it at the same time, so please bear with us!

In order to use our backend, just execute the cell below.

In [2]:
server='http://localhost:9090/jsoniq'

It is straightforward to execute your own Rumble server on your own Spark cluster (and then you can make full use of all the input file systems and file formats). In this case, just replace the above server with your own hostname and port.

Now we are all set! You can now start reading and executing the JSONiq queries as you go, and you can even edit them!

## JSON

As explained on the [official JSON Web site](http://www.json.org/), JSON is a lightweight data-interchange format designed for humans as well as for computers. It supports as values:
- objects (string-to-value maps)
- arrays (ordered sequences of values)
- strings
- numbers
- booleans (true, false)
- null

JSONiq provides declarative querying and updating capabilities on JSON data.

## Elevator Pitch

JSONiq is based on XQuery, which is a W3C standard (like XML and HTML). XQuery is a very powerful declarative language that originally manipulates XML data, but it turns out that it is also a very good fit for manipulating JSON natively.
JSONiq, since it extends XQuery, is a very powerful general-purpose declarative programming language. Our experience is that, for the same task, you will probably write about 80% less code compared to imperative languages like JavaScript, Python or Ruby. Additionally, you get the benefits of strong type checking without actually having to write type declarations.
Here is an appetizer before we start the tutorial from scratch.


In [13]:
%%rumble

let $stores :=[  { "store number" : 1, "state" : "MA" } ]
let $nations := [  { "name": "US", "state": "MA" }, { "name": "US", "state": "CA" } ]
let $join := 
    for $nation in $nations[], $store allowing empty in $stores[]
        [ $$.state eq $nation.state ] 
    return { "name": $nation.state, "stores": $store."store number" }
return [$join]


Took: 0.031745195388793945 ms
[{"name": "MA", "stores": 1}, {"name": "CA", "stores": null}]


## And here you go

### Actually, you already knew some JSONiq

The first thing you need to know is that a well-formed JSON document is a JSONiq expression as well.
This means that you can copy-and-paste any JSON document into a query. The following are JSONiq queries that are "idempotent" (they just output themselves):

In [49]:
%%rumble
{ "pi" : 3.14, "sq2" : 1.4 }

Took: 0.006221771240234375 ms
{"pi": 3.14, "sq2": 1.4}


In [50]:
%%rumble
[ 2, 3, 5, 7, 11, 13 ]

Took: 0.0064008235931396484 ms
[2, 3, 5, 7, 11, 13]


In [51]:
%%rumble
{
      "operations" : [
        { "binary" : [ "and", "or"] },
        { "unary" : ["not"] }
      ],
      "bits" : [
        0, 1
      ]
    }

Took: 0.008172035217285156 ms
{"operations": [{"binary": ["and", "or"]}, {"unary": ["not"]}], "bits": [0, 1]}


In [52]:
%%rumble
[ { "Question" : "Ultimate" }, ["Life", "the universe", "and everything"] ]

Took: 0.006453990936279297 ms
[{"Question": "Ultimate"}, ["Life", "the universe", "and everything"]]


This works with objects, arrays (even nested), strings, numbers, booleans, null.

It also works the other way round: if your query outputs an object or an array, you can use it as a JSON document.
JSONiq is a declarative language. This means that you only need to say what you want - the compiler will take care of the how. 

In the above queries, you are basically saying: I want to output this JSON content, and here it is.

## JSONiq basics

### The real JSONiq Hello, World!

Wondering what a hello world program looks like in JSONiq? Here it is:

In [53]:
%%rumble
"Hello, World!"

Took: 0.0066568851470947266 ms
"Hello, World!"


Not surprisingly, it outputs the string "Hello, World!".

### Numbers and arithmetic operations

Okay, so, now, you might be thinking: "What is the use of this language if it just outputs what I put in?" Of course, JSONiq can more than that. And still in a declarative way. Here is how it works with numbers:

In [54]:
%%rumble
2 + 2

Took: 0.0072820186614990234 ms
4


In [55]:
%%rumble
 (38 + 2) div 2 + 11 * 2


Took: 0.0064237117767333984 ms
42


(mind the division operator which is the "div" keyword. The slash operator has different semantics).

Like JSON, JSONiq works with decimals and doubles:

In [56]:
%%rumble
 6.022e23 * 42

Took: 0.007363080978393555 ms
25292400000000000000000000


### Logical operations

JSONiq supports boolean operations.

In [57]:
%%rumble
true and false

Took: 0.006527900695800781 ms
false


In [58]:
%%rumble
(true or false) and (false or true)

Took: 0.007046222686767578 ms
true


The unary not is also available:

In [59]:
%%rumble
not true

Took: 0.006941080093383789 ms
false


### Strings

JSONiq is capable of manipulating strings as well, using functions:


In [60]:
%%rumble
concat("Hello ", "Captain ", "Kirk")

Took: 0.005676984786987305 ms
"Hello Captain Kirk"


In [61]:
%%rumble
substring("Mister Spock", 8, 5)

Took: 0.00574493408203125 ms
"Spock"


JSONiq comes up with a rich string function library out of the box, inherited from its base language. These functions are listed [here](https://www.w3.org/TR/xpath-functions-30/) (actually, you will find many more for numbers, dates, etc).



### Sequences

Until now, we have only been working with single values (an object, an array, a number, a string, a boolean). JSONiq supports sequences of values. You can build a sequence using commas:


In [62]:
%%rumble
 (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

Took: 0.0066449642181396484 ms
1
2
3
4
5
6
7
8
9
10


In [63]:
%%rumble
1, true, 4.2e1, "Life"

Took: 0.00654292106628418 ms
1
true
42
"Life"


The "to" operator is very convenient, too:

In [64]:
%%rumble
 (1 to 100)

Took: 0.006345033645629883 ms
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100


Some functions even work on sequences:

In [65]:
%%rumble
sum(1 to 100)

Took: 0.005728006362915039 ms
5050


In [66]:
%%rumble
string-join(("These", "are", "some", "words"), "-")

Took: 0.0058438777923583984 ms
"These-are-some-words"


In [67]:
%%rumble
count(10 to 20)

Took: 0.0066111087799072266 ms
11


In [68]:
%%rumble
avg(1 to 100)

Took: 0.005938053131103516 ms
50.5


Unlike arrays, sequences are flat. The sequence (3) is identical to the integer 3, and (1, (2, 3)) is identical to (1, 2, 3).

## A bit more in depth

### Variables

You can bind a sequence of values to a (dollar-prefixed) variable, like so:

In [69]:
%%rumble
let $x := "Bearing 3 1 4 Mark 5. "
return concat($x, "Engage!")

Took: 0.007143735885620117 ms
"Bearing 3 1 4 Mark 5. Engage!"


In [70]:
%%rumble
let $x := ("Kirk", "Picard", "Sisko")
return string-join($x, " and ")

Took: 0.006165742874145508 ms
"Kirk and Picard and Sisko"


You can bind as many variables as you want:

In [71]:
%%rumble
let $x := 1
let $y := $x * 2
let $z := $y + $x
return ($x, $y, $z)

Took: 0.006880044937133789 ms
1
2
3


and even reuse the same name to hide formerly declared variables:

In [72]:
%%rumble
let $x := 1
let $x := $x + 2
let $x := $x + 3
return $x

Took: 0.006127119064331055 ms
6


### Iteration

In a way very similar to let, you can iterate over a sequence of values with the "for" keyword. Instead of binding the entire sequence of the variable, it will bind each value of the sequence in turn to this variable.

In [73]:
%%rumble
for $i in 1 to 10
return $i * 2

Took: 0.006555080413818359 ms
2
4
6
8
10
12
14
16
18
20


More interestingly, you can combine fors and lets like so:

In [74]:
%%rumble
let $sequence := 1 to 10
for $value in $sequence
let $square := $value * 2
return $square

Took: 0.006516933441162109 ms
2
4
6
8
10
12
14
16
18
20


and even filter out some values:

In [75]:
%%rumble
let $sequence := 1 to 10
for $value in $sequence
let $square := $value * 2
where $square < 10
return $square

Took: 0.0077419281005859375 ms
2
4
6
8


Note that you can only iterate over sequences, not arrays. To iterate over an array, you can obtain the sequence of its values with the [] operator, like so:


In [76]:
%%rumble
[1, 2, 3][]

Took: 0.006000041961669922 ms
1
2
3


### Conditions

You can make the output depend on a condition with an if-then-else construct:

In [77]:
%%rumble
for $x in 1 to 10
return if ($x < 5) then $x
                   else -$x

Took: 0.0064771175384521484 ms
1
2
3
4
-5
-6
-7
-8
-9
-10


Note that the else clause is required - however, it can be the empty sequence () which is often when you need if only the then clause is relevant to you.

### Composability of Expressions

Now that you know of a couple of elementary JSONiq expressions, you can combine them in more elaborate expressions. For example, you can put any sequence of values in an array:

In [78]:
%%rumble
[ 1 to 10 ]

Took: 0.007096052169799805 ms
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


Or you can dynamically compute the value of object pairs (or their key):

In [79]:
%%rumble
{
      "Greeting" : (let $d := "Mister Spock"
                    return concat("Hello, ", $d)),
      "Farewell" : string-join(("Live", "long", "and", "prosper"),
                               " ")
}

Took: 0.007810831069946289 ms
{"Greeting": "Hello, Mister Spock", "Farewell": "Live long and prosper"}


You can dynamically generate object singletons (with a single pair):


In [80]:
%%rumble
{ concat("Integer ", 2) : 2 * 2 }

Took: 0.006745100021362305 ms
{"Integer 2": 4}


and then merge lots of them into a new object with the {| |} notation:

In [81]:
%%rumble
{|
    for $i in 1 to 10
    return { concat("Square of ", $i) : $i * $i }
|}

Took: 0.006300926208496094 ms
{"Square of 1": 1, "Square of 2": 4, "Square of 3": 9, "Square of 4": 16, "Square of 5": 25, "Square of 6": 36, "Square of 7": 49, "Square of 8": 64, "Square of 9": 81, "Square of 10": 100}


## JSON Navigation

Up to now, you have learnt how to compose expressions so as to do some computations and to build objects and arrays. It also works the other way round: if you have some JSON data, you can access it and navigate.
All you need to know is: JSONiq views
an array as an ordered list of values,
an object as a set of name/value pairs


### Objects

You can use the dot operator to retrieve the value associated with a key. Quotes are optional, except if the key has special characters such as spaces. It will return the value associated thereto:

In [82]:
%%rumble
let $person := {
    "first name" : "Sarah",
    "age" : 13,
    "gender" : "female",
    "friends" : [ "Jim", "Mary", "Jennifer"]
}
return $person."first name"

Took: 0.009386062622070312 ms
"Sarah"


You can also ask for all keys in an object:

In [83]:
%%rumble
let $person := {
    "name" : "Sarah",
    "age" : 13,
    "gender" : "female",
    "friends" : [ "Jim", "Mary", "Jennifer"]
}
return { "keys" : [ keys($person)] }

Took: 0.00790095329284668 ms
{"keys": ["name", "age", "gender", "friends"]}


### Arrays

The [[]] operator retrieves the entry at the given position:

In [84]:
%%rumble
let $friends := [ "Jim", "Mary", "Jennifer"]
return $friends[[1+1]]

Took: 0.00620579719543457 ms
"Mary"


It is also possible to get the size of an array:

In [85]:
%%rumble
let $person := {
    "name" : "Sarah",
    "age" : 13,
    "gender" : "female",
    "friends" : [ "Jim", "Mary", "Jennifer"]
}
return { "how many friends" : size($person.friends) }

Took: 0.006299018859863281 ms
{"how many friends": 3}


Finally, the [] operator returns all elements in an array, as a sequence:

In [86]:
%%rumble
let $person := {
    "name" : "Sarah",
    "age" : 13,
    "gender" : "female",
    "friends" : [ "Jim", "Mary", "Jennifer"]
}
return $person.friends[]

Took: 0.0063228607177734375 ms
"Jim"
"Mary"
"Jennifer"


### Relational Algebra

Do you remember SQL's SELECT FROM WHERE statements? JSONiq inherits selection, projection and join capability from XQuery, too.

In [87]:
%%rumble
let $stores :=
[
    { "store number" : 1, "state" : "MA" },
    { "store number" : 2, "state" : "MA" },
    { "store number" : 3, "state" : "CA" },
    { "store number" : 4, "state" : "CA" }
]
let $sales := [
    { "product" : "broiler", "store number" : 1, "quantity" : 20  },
    { "product" : "toaster", "store number" : 2, "quantity" : 100 },
    { "product" : "toaster", "store number" : 2, "quantity" : 50 },
    { "product" : "toaster", "store number" : 3, "quantity" : 50 },
    { "product" : "blender", "store number" : 3, "quantity" : 100 },
    { "product" : "blender", "store number" : 3, "quantity" : 150 },
    { "product" : "socks", "store number" : 1, "quantity" : 500 },
    { "product" : "socks", "store number" : 2, "quantity" : 10 },
    { "product" : "shirt", "store number" : 3, "quantity" : 10 }
]
let $join :=
    for $store in $stores[], $sale in $sales[]
    where $store."store number" = $sale."store number"
    return {
        "nb" : $store."store number",
        "state" : $store.state,
        "sold" : $sale.product
    }
return [$join]

Took: 0.01275491714477539 ms
[{"nb": 1, "state": "MA", "sold": "broiler"}, {"nb": 1, "state": "MA", "sold": "socks"}, {"nb": 2, "state": "MA", "sold": "toaster"}, {"nb": 2, "state": "MA", "sold": "toaster"}, {"nb": 2, "state": "MA", "sold": "socks"}, {"nb": 3, "state": "CA", "sold": "toaster"}, {"nb": 3, "state": "CA", "sold": "blender"}, {"nb": 3, "state": "CA", "sold": "blender"}, {"nb": 3, "state": "CA", "sold": "shirt"}]


### Access datasets

Rumble can read input from many file systems and many file formats. If you are using our backend, you can only use json-doc() with any URI pointing to a JSON file and navigate it as you see fit. 

In [9]:
%%rumble
json-doc("Put any HTTP URL pointing to a JSON document here!").foo[[1]].bar.foobar[]

Took: 0.07178306579589844 ms


'There was an error.\n\nCode: [FODC0002] (this code can be looked up in the documentation and specifications).\n\nLocation information: file:/home/ubuntu/:LINE:1:COLUMN:0:\n\nMalformed URI: Put any HTTP URL pointing to a JSON document here! Cause: Illegal character in path at index 3: Put any HTTP URL pointing to a JSON document here!'

If you are using your own Rumble server on your cluster, you can also use any other function and scheme.

In [10]:
%%rumble
json-file("put the path to a JSON lines file here. This will only work against your own Rumble backend and Spark cluster, though.")

Took: 0.049546241760253906 ms


'There was an error.\n\nCode: [FODC0002] (this code can be looked up in the documentation and specifications).\n\nLocation information: file:/home/ubuntu/:LINE:1:COLUMN:0:\n\nMalformed URI: put the path to a JSON lines file here. This will only work against your own Rumble backend and Spark cluster, though. Cause: Illegal character in path at index 3: put the path to a JSON lines file here. This will only work against your own Rumble backend and Spark cluster, though.'