# Advanced JSON: Unnesting JSON Arrays
Updated: 2019-10-03

## Unnesting Arrays
One of the challenges of dealing with JSON objects is how to handle arrays of values. The relational model was never designed to deal with a column of data that could be an array so alternate techniques have to be used.
The `JSON_QUERY` function can be used to retrieve the entire contents of an array, while `JSON_VALUE` or `JSON_TABLE` can extract the individual elements. However, what method is available to extract all of the elements of an array when the actual array size is unknown?

For example, if we have the JSON array `["A","B","C"]` and we want to have the elements returned from an SQL query in a result set like this:
```
RESULTS
-------
A
B
C
```
How would we do this?

A complete implementation of the ISO SQL definition for `JSON_TABLE` would have that function handle this case by returning multiple rows with all the other row values duplicated but the Db2 implementation of `JSON_TABLE` is not yet at that stage of maturity and cannot handle this scenario. There is an older, proprietary Db2 JSON function (unfortunately) also called `JSON_TABLE` that is part of the `SYSTOOLS` schema that can be used to generate a simple result set where each row  represents an element from the array, but this function does not return multiple values per row and is also not compliant with the ISO SQL JSON standard. 

So, in order to retrieve all the elements of an array as a series of independent values, we have to combine all three new ISO JSON functions (`JSON_EXISTS`, `JSON_VALUE`, `JSON_QUERY`) in a recursive SQL query to retrieve them. 

### Load Db2 Extensions and Connect to the Database
The `connection` notebook contains the `CONNECT` statement which allows access to the `SAMPLE` database. If you need to modify the connection information, edit the `connection.ipynb` notebook.

In [None]:
%run ../db2.ipynb
%run ../connection.ipynb

### Unnesting Simple JSON Arrays
The first example uses the book document which contains a "simple" array field called formats. A simple array contains individual atomic values rather than complex objects. 

In [None]:
book = {
   "authors": 
     [
       {"first_name": "Paul",  "last_name" : "Bird"},
       {"first_name": "George","last_name" : "Baklarz"}
     ],
   "foreword": 
     {
       "primary": {"first_name": "Thomas","last_name" : "Hronis"}
     },
   "formats": ["Hardcover","Paperback","eBook","PDF"]
}

The "formats" field has four values that need to be return as a list. The following SQL uses recursion to extract the values from the array. 

In [None]:
%%sql
WITH BOOKS(INFO) AS (VALUES :book),
FORMATS(INDEX, JSON_PATH, BOOKTYPE) AS 
(
  SELECT 
     0, '$.formats[1]',JSON_VALUE(INFO,'$.formats[0]')
  FROM BOOKS 
     WHERE JSON_EXISTS(INFO,'$.formats[0]') IS TRUE
  UNION ALL
  SELECT 
     INDEX+1, 
     '$.formats[' || TRIM(CHAR(INDEX + 2)) || ']',
     JSON_VALUE(INFO, JSON_PATH) 
  FROM BOOKS, FORMATS
     WHERE JSON_EXISTS(INFO, JSON_PATH) IS TRUE
)
SELECT BOOKTYPE FROM FORMATS

The breakdown of the code is found below. The line numbers are shown below for reference. Note that the first line of code is not included `WITH BOOKS(INFO) AS (VALUES :book)` as it was used as a temporary table to run the SQL.
```
[ 1] WITH FORMATS(INDEX, JSON_PATH, BOOKTYPE) AS 
[ 2] (
[ 3]   SELECT 
[ 4]      0, '$.formats[1]',JSON_VALUE(INFO,'$.formats[0]')
[ 5]   FROM BOOKS 
[ 6]      WHERE JSON_EXISTS(INFO,'$.formats[0]') IS TRUE
[ 7]   UNION ALL
[ 8]   SELECT 
[ 9]      INDEX+1, 
[10]      '$.formats[' || TRIM(CHAR(INDEX + 2)) || ']',
[11]      JSON_VALUE(INFO, JSON_PATH) 
[12]   FROM BOOKS, FORMATS
[13]      WHERE JSON_EXISTS(INFO, JSON_PATH) IS TRUE
[14] )
[15] SELECT BOOKTYPE FROM FORMATS
```	

`[1-14]` `WITH` Block

The first section of code is used to initialize a recursive SQL block. Recursive SQL allows us to continually add rows to an answer set based on the results from a SQL statement that gets repeated multiple times.
```
[1] WITH FORMATS(INDEX, JSON_PATH, BOOKTYPE) AS
```
The common table expression used in this example is called `FORMATS` and contains three columns. The `INDEX` column is used to increment the array item we want to retrieve, the `JSON_PATH` is used as the path expression to find the next value, and `BOOKTYPE` is the value extracted from the array.

`[3-5]` `SELECT` statement

The first part of the `SELECT` statement is used to initialize the recursion by providing the first row of the result set.
```
[ 3]   SELECT 
[ 4]      0, '$.formats[1]',JSON_VALUE(INFO,'$.formats[0]')
[ 5]   FROM BOOKS 
```
The values are:
* `INDEX = 0` – This is the first index value in an array
* `JSON_PATH = '$.formats[1]'` – The path to the next array value
* `BOOKTYPE = JSON_VALUE(INFO,'$.formats[0]')` – The first value in the formats array

The `JSON_PATH` column is used as the path expression to find the next array value. This value could be placed directly in the SQL but since the expression is required twice, there is less likelihood of incorrect syntax! The `JSON_PATH` expression is always set to the next value that we need rather than the current one.

`[6] WHERE JSON_EXISTS() IS TRUE`

The `WHERE` clause is used to check whether or not the first value in the array exists. If it does not, then we return no results.
```
[ 6]      WHERE JSON_EXISTS(INFO,'$.formats[0]') IS TRUE
[ 7] UNION ALL
```

The `UNION ALL` is required to make the SQL recursive in nature. As the SQL executes, it will add more rows to the `FORMATS` table and then the new rows will be acted upon by this SQL block. 
 
 `[8-12]` Get the remainder of the array values
 
This block will continue to iterate as long as there are more array values.
```
[ 8]   SELECT 
[ 9]      INDEX+1, 
[10]      '$.formats[' || TRIM(CHAR(INDEX + 2)) || ']',
[11]      JSON_VALUE(INFO, JSON_PATH) 
[12]   FROM BOOKS, FORMATS
```
The `SELECT` statement increments the index number into the array, creates the next path expression, and retrieves the current array value.

The `JSON_PATH` is generated as a character string:
```
[10]      '$.formats[' || TRIM(CHAR(INDEX + 2)) || ']',
```

The first portion of the string is the path to the object, concatenated with the current index value plus 2 (always one ahead of the current index value).
The tables that are accessed by the SQL are the `BOOKS` table (with the original JSON) and the `FORMATS` table – which is what we are building recursively.
```
[13] WHERE JSON_EXIST() IS TRUE
```

The `WHERE` clause is used to check whether or not the current value in the array exists. If it does not exist, then we stop the recursion. This is often referred to as the stop condition in the recursion loop.
```
[13]      WHERE JSON_EXISTS(INFO, JSON_PATH) IS TRUE
[15] Final SELECT statement
```

Once the recursion is done, we can retrieve the contents of the array. We refer to the `BOOKTYPE` column because that is the only value we are interested in, but if you select everything you will see the index values and path expressions that were generated as part of the SQL.
```
INDEX  JSON_PATH     BOOKTYPE
-----  ------------  --------
    0  $.formats[1]  Hardcover
    1  $.formats[2]  Paperback
    2  $.formats[3]  eBook
    3  $.formats[4]  PDF
```

## Summary
While there is currently no single JSON function within Db2 to retrieve all array values, the combination of `JSON_EXISTS`, `JSON_VALUE`, `JSON_TABLE`, and `JSON_QUERY` can be combined with recursive SQL to extract array objects or individual values.

#### Credits: IBM 2019, George Baklarz [baklarz@ca.ibm.com]