# JSON_TABLE Function
Updated: 2019-10-03

## JSON Table Function
Up to this point, the notebooks have explored functions that can be used to check for the existence of an object and retrieve individual values. While these functions can be used to retrieve all of the values within a JSON document by using multiple calls, an easier method exists in the form of the new `JSON_TABLE` function based on the ISO SQL standard. While this function does not yet implement all of the ISO `JSON_TABLE` function definition, the part that has been implemented in Db2 is still very useful and can help simplify things for you. 

### Load Db2 Extensions and Connect to the Database
The `connection` notebook contains the `CONNECT` statement which allows access to the `SAMPLE` database. If you need to modify the connection information, edit the `connection.ipynb` notebook.

In [None]:
%run ../db2.ipynb
%run ../connection.ipynb

### JSON_TABLE: Publishing JSON Data as a Table
The `JSON_TABLE` function provides two ways to define a column. Regular column expressions mimic the `JSON_VALUE` function, while formatted column expressions use features from the `JSON_QUERY` function. You can have different column definitions in the same `JSON_TABLE` invocation.

***JSON_TABLE Syntax***
![JSON_VALUE](images/JSON_TABLE.png)

***Regular Column Expression***
![JSON_VALUE](images/jt-regular.png)

***Regular Empty and Error Clause***
![JSON_VALUE](images/jt-regular-empty-clause.png)

***Formatted Column Expression***
![JSON_VALUE](images/jt-formatted.png)

***Formatted Wrapper Clause***
![JSON_VALUE](images/jt-wrapper-clause.png)

***Formatted Quotes Clause***
![JSON_VALUE](images/jt-quotes-clause.png)

***Formatted Empty and Error Clause***
![JSON_VALUE](images/jt-formatted-empty-clause.png)

**Important Note:**

There are actually two different `JSON_TABLE` functions provided with Db2, only one of which is the new ISO based function. If you are looking up `JSON_TABLE` in the Db2 documentation, make sure that you are looking at the new built-in `JSON_TABLE` table function under the `SYSIBM` schema and not the older `JSON_TABLE` under the `SYSTOOLS` schema. The former is in the SQL reference alongside the other new JSON functions while the latter is in a separate section with the other older (SYSTOOLS) JSON functions.

#### JSON Expression
The *json-expression*, *json-path-expression* and `ON EMPTY` and `ON ERROR` clauses were discussed in earlier notebooks.

#### STRICT Path Expression
The `JSON_TABLE` function includes a path modifier after the JSON expression. This strict path modifier is mandatory and must be included as part of the `JSON_TABLE` function (i.e. you can't use `lax` here); this is done to ensure that your current use of `JSON_TABLE` will remain compatible with the ISO standard, which has `lax` as the default when not specified, when `JSON_TABLE` is enhanced in the future. The `'strict $'` path modifier prevents multiple rows being generated in any single column definition. If you want to retrieve array values with `JSON_TABLE` then you will need to use the formatted column definition.

#### Columns
The `COLUMNS` clause includes all of the columns that you want to derive from the JSON document. There are two types of column definitions: regular and formatted. The column definition will be described in another section.

#### ERROR ON ERROR
The `ERROR ON ERROR` clause is mandatory at the function top level and will cause the function to raise an error in the event there is any error when retrieving values from the JSON document.

#### JSON_TABLE Minimal Syntax
The minimum syntax of the JSON_TABLE function is:
```sql
SELECT T.*
   FROM AUTHORS A, 
   JSON_TABLE(A.INFO, 'strict $'
              COLUMNS(... column list ...) 
              ERROR ON ERROR) AS T
```

Note how the `'strict $'` and `ERROR ON ERROR` keywords must be present in order for the function to work.

#### COLUMN Definitions
The body of the `JSON_TABLE` function includes the list of columns that you want to create from the JSON document. There are two formats of column definition available: regular and formatted. 

***Regular Column Expression***
![JSON_VALUE](images/jt-regular.png)
***Formatted Column Expression***
![JSON_VALUE](images/jt-formatted.png)

Each of these formats uses the same column name, data type and path definitions. When using formatted column expression, the FORMAT JSON specification must be used.

The column can be defined in one of two ways:
* A column name derived from a JSON path expression and a data type

    `"foreword.primary.last_name" VARCHAR(20)`
<p>  
* A SQL column name with a data type and JSON path expression

    `NAME VARCHAR(20) PATH "$.foreword.primary.last_name"`

The first method can be a convenient shortcut when your JSON document has most of the data at the root (`$`.) level. The column names can become extremely long if you add index values and multi-level objects. 

The following example demonstrates what the output would look like when querying the first and last name of one of the authors using the column name as the path.

In [None]:
book = {
   "authors": [{"first_name": "Paul",  "last_name" : "Bird"},
               {"first_name": "George","last_name" : "Baklarz"}],
   "foreword": {
              "primary": {
                          "first_name": "Thomas",
                          "last_name" : "Hronis"
                         }
              },
   "formats": ["Hardcover","Paperback","eBook","PDF"]
}

In [None]:
%%sql
WITH BOOKS(INFO) AS (VALUES :book)
SELECT T.* FROM BOOKS, 
  JSON_TABLE(INFO, 'strict $'
    COLUMNS( "authors[0].first_name" VARCHAR(20),
             "authors[0].last_name"  VARCHAR(20))
    ERROR ON ERROR) AS T;

Rewriting the query to use the PATH expression will produce the same results.

In [None]:
%%sql
WITH BOOKS(INFO) AS (VALUES :book)
SELECT T.* FROM BOOKS, 
  JSON_TABLE(INFO, 'strict $'
    COLUMNS( 
      FIRST_NAME VARCHAR(20) PATH '$.authors[1].first_name',
      LAST_NAME  VARCHAR(20) PATH '$.authors[1].last_name')
    ERROR ON ERROR) AS T;

### Column Name
The column name must adhere to normal Db2 naming rules:
* Must start with a letter A-Z
* Contains a combination of the letters A-Z, numbers. 0-9, or the underscore character "_"
* Must be enclosed in double quotes (i.e. "$salary") if lowercase letters, a path expression, or special characters need to be used
* Lowercase letters are always folded to uppercase in SQL unless double quotes are used
* Maximum length of 128

### Data Type
The data types available to use in the column definition depends on which column format you use. 
* The regular column format can return data in any valid Db2 data type
* The formatted column format mandates the used of the `FORMAT JSON` clause which restricts results to character strings only

`FORMAT JSON` will cause the `JSON_TABLE` function to return the data as a JSON value. This is useful for returning array data or complex objects as a character string. This format only supports character strings, so you cannot materialize an individual value as a numeric value, only as its character equivalent.

### Column Path Expression
The column path expression is identical to the *json-path-expression* that is discussed in earlier sections. The path is used to locate the object in the JSON document.
```sql
ADDRESS VARCHAR(300) FORMAT JSON '$.address'
```
The path expression must be a constant string expression – there is currently no option for using SQL variables or the contents of a column as input to the path expression. The rules for the path expression are dependent on whether you use the PATH keyword or not. 
* `PATH` 'value'

   If you use the `PATH` keyword, the path expression must include the entire path including the anchor string '`$.`'.
<p>
* No `PATH` provided

    If you do not use the `PATH` keyword, the `JSON_TABLE` function assumes that the path will be found in the column name. In the event you have included the path expression in the column name and included the `PATH` keyword, the `PATH` expression will take precedence. 

### Use of Quotes
Db2 and JSON both use quotes in different ways. The `JSON_TABLE` function requires column names and path expressions to be delimited by quote characters. The column name can use standard Db2 naming rules and so no quotes are required. However, if you are using any special characters for a name or want any lower case letters respected in the name (e.g. a delimited column name), or are using the column name as the path expression, then you must enclose the string in double quotes "column-name".

If you decide to use the `PATH` expression, then you must include the path expression in single quotes `'$.formats'`. The reason for the different quotes characters is due to the way Db2 handles string constants versus identifiers. A constant string is always enclosed in single quotes while delimited column identifiers use double quotes. When using a column name as a path expression, it must be surrounded by double quotes.

### Regular COLUMN Definition
A regular column definition will extract a single SQL value from a JSON document in the same way that `JSON_VALUE` does. 
![JSON_VALUE](images/jt-regular.png)

The path expression can be part of the column name or included as part of the `PATH` keyword. The rules for creating a `PATH` expression were described in a previous section.

The data-type field is required when defining a column result. The other Db2 JSON functions will return a result based on the best data type representation for the data. In the case of the `JSON_TABLE` function, the data type must be defined, or an error will be raised. You must ensure that the size of the field is large enough to support the data being retrieved, and that it is of the proper type. 

### ON EMPTY and ON ERROR with Regular Column Definition 
When an empty or error condition is encountered when using a regular column definition, Db2 will raise one of two exceptions: `ON EMPTY` or `ON ERROR`. While there is a higher level `ON ERROR` clause for the entire `JSON_TABLE` function, each column defined can also have its own `ON EMPTY` and `ON ERROR` clause specified if so desired. As usual, which condition fires is dependent on the use of the `lax` and `strict` keywords and details can be found in an earlier notebook. 
![JSON_VALUE](images/jt-regular-empty-clause.png)

The actions for these two exception handling clauses are:
* `NULL` – Return a null value instead of an error
* `ERROR` – Raise an error
* `DEFAULT <value>` – Return a default value instead

These actions are specified in front of the error handling clause. The default value is `NULL ON EMPTY` and `NULL ON ERROR`. The other option for handling missing values is to return a default value using the `DEFAULT` clause. 
![JSON_VALUE](images/DEFAULT.png)

This option allows the function to return a value rather than a null.

### Formatted COLUMN Definition
A formatted column expression is similar to the `JSON_QUERY` function and will extract single JSON compatible values, arrays, and objects from a JSON document. 

### Formatted Column Expression

The path expression can be part of the column name or included as part of the PATH keyword. The rules for creating a PATH expression were described in a previous section.
The data-type field is required when defining a column result. When using a formatted column definition, the data type must be a character type and the size of the field must be large enough to support the data being retrieved.

![JSON_VALUE](images/jt-formatted.png)

### Wrappers
When using formatted column definitions, the results could end up producing a series of values. Similar to JSON_QUERY, the wrapper clause must be used to handle multiple values by making them into a JSON array. 

#### Formatted Wrapper Clause

![JSON_VALUE](images/jt-wrapper-clause.png)

There are three options when dealing with wrapping results:
* `WITHOUT (ARRAY) WRAPPER`
* `WITH CONDITIONAL (ARRAY) WRAPPER`
* `WITH UNCONDITIONAL (ARRAY) WRAPPER`

The `WITHOUT` clause is the default setting which means that the results will not be wrapped as an array. If the result of your search results in more than one value being returned, the function will return NULL or an error (depending on the `ON ERROR` behavior specified for the column).

The two other options will create an `ARRAY WRAPPER` based on the number of values returned. An `UNCONDITIONAL WRAPPER` will always create an array of values, while a `CONDITIONAL WRAPPER` will only create an array if there are one or more elements returned or if it is an object. If the result is an array, it will not place an array wrapper 
around it. 

### Quotes
A formatted column definition has an option to eliminate the quotes that surround character strings. 

![JSON_VALUE](images/jt-quotes-clause.png)

There are two options:
* `KEEP QUOTES` – The default is to keep the existing quotes
* `OMIT QUOTES` – Remove a quotation around a string

The `OMIT QUOTES` option is limited to use with the `WITHOUT ARRAY WRAPPER` clause, so multiple values cannot be returned using this keyword. 

### ON EMPTY and ON ERROR with Formatted Column Definition 
Formatted column definitions have similar `ON EMPTY` and `ON ERROR` clauses as `JSON_QUERY`. 

![JSON_VALUE](images/jt-formatted-empty-clause.png)

The difference between the regular column definitions and formatted ones is that formatted columns do not allow for a default value other than an empty object or array.
The actions for the `ON EMPTY` and `ON ERROR` clauses are:
* `NULL` – Return a null instead of an error
* `ERROR` – Raise an error
* `EMPTY ARRAY` – Return an empty array
* `EMPTY OBJECT` – Return an empty object

Similar to the `JSON_QUERY` function, you can add more control over what is returned for missing values and for error conditions by using the `ON EMPTY` and `ON ERROR` clauses. Both of these clauses can be added to the formatted column definition. 

### JSON_TABLE Example
The following example will retrieve contents from the CUSTOMER table using the `JSON_TABLE` function. Here is a snapshot of the CUSTOMER table with one document displayed.
```json
{
    "customerid": 100000,
    "identity":  {
                  "firstname": "Jacob",
                  "lastname": "Hines",
                  "birthdate": "1982-09-18"
                 },
    "contact":   {
                  "street": "Main Street North",
                  "city": "Amherst",
                  "state": "OH",
                  "zipcode": "44001",
                  "email": "Ja.Hines@yahii.com",
                  "phone": "813-689-8309"
                 },
    "payment":   {
                  "card_type": "MCCD",
                  "card_no": "4742-3005-2829-9227"
                 },
    "purchases": [
                   {
                    "tx_date": "2018-02-14",
                    "tx_no": 157972,
                    "product_id": 1860,
                    "product": "Ugliest Snow Blower",
                    "quantity": 1,
                    "item_cost": 51.86
                   }, …additional purchases…
                 ]
}
```
The results of the JSON_TABLE function will include :
* CUSTID (customerid) as an integer column
* FIRST_NAME, LAST_NAME as character strings
* STATE, ZIPCODE as character strings
* Array of Product ID's that they have purchased
* Restrict the results to those customers who live in OHIO (OH)

We first check that we still have our customer table and then will insert the contents into a Db2 table.

In [None]:
fname = os.getcwd() + "/customers.js"
print("Input file: " + fname)

In [None]:
%%sql -quiet 
DROP TABLE CUSTOMERS;
CREATE TABLE CUSTOMERS
  (
  DETAILS VARCHAR(2000)
  );

This code will load the data into the table.

In [None]:
import io
import json
print("Starting Load")
start_time = time.time()
%sql autocommit off
x = %sql prepare INSERT INTO CUSTOMERS VALUES (?)
if (x != False):
    i = 0
    with open(fname,"r") as records:
        for record in records:
            i += 1
            rc = %sql execute :x using record@char
            if (rc == False): break
            if ((i % 5000) == 0): 
                print(str(i)+" rows read.")
                %sql commit hold
                
    %sql commit work  
%sql autocommit on
end_time = time.time()
print('Total load time for {:d} records is {:.2f} seconds'.format(i,end_time-start_time))

***Step 1: Filter the Results***
The `CUSTOMERS` table contains only one column called `DETAILS` which contains the JSON document for each customer. The shell of the JSON_TABLE command looks like this:
```sql
SELECT RESULTS.* FROM CUSTOMERS C, 
   JSON_TABLE(DETAILS, 'strict $'
      COLUMNS(...)
      ERROR ON ERROR) AS RESULTS
WHERE
   condition
```
The `WHERE` condition needs to filter the rows based on the `STATE` that the customer lives in. To select the documents that quality, the `JSON_VALUE` function has to be used to check the address field within the record and match it to Ohio (OH).

The SQL that is required to search for this value is:
```sql
WHERE JSON_VALUE(C.DETAILS,'$.contact.state' RETURNING CHAR(2)) = 'OH'
```
A quick check reveals that there are about 700 customers that live in Ohio.

**Note:** The count is dependent on the customer records that are randomly generated.

In [None]:
%%sql 
SELECT COUNT(*) FROM CUSTOMERS C
WHERE JSON_VALUE(C.DETAILS,'$.contact.state' RETURNING CHAR(2)) = 'OH'

***Step 2: Determine the Path Expressions***

There are 6 fields that need to be returned so the JSON path expression has to be created for each one. Since we are mostly dealing with simple values and we want to return them as regular relational data types, the `PATH` expression syntax will be used in creating the regular column definitions. The first 5 fields are straightforward.
* CUSTID (`$.customerid`)
* FIRST_NAME (`$.identity.firstname`)
* LAST_NAME (`$.identity.lastname`)
* STATE (`$.contact.state`)
* ZIPCODE (`$.contact.zipcode`)

Creating the last field (Array of product IDs) needs to use a formatted column expression as it can contain multiple values. The JSON path expression for getting one product ID is: 
```json
$.purchases[0].product_id
```
Since there are multiple product_id's the column expression needs to use the syntax below and wrap it in an array using the `WRAPPER` clause.
```
PATH '$.purchases[*].product_id' WITH UNCONDITIONAL WRAPPER
```
Since product_id is a numeric value in the document, there is no need to use the `OMIT QUOTES` clause. 

***Step 3: Build the COLUMNS clause***

We can now combine the path expressions to create the final `JSON_TABLE` function.

In [None]:
%%sql
SELECT RESULTS.* FROM CUSTOMERS C,
   JSON_TABLE(C.DETAILS, 'strict $'
      COLUMNS(
         CUSTID     INT          PATH '$.customerid',
         FIRST_NAME VARCHAR(20)  PATH '$.identity.firstname',
         LAST_NAME  VARCHAR(20)  PATH '$.identity.lastname',
         STATE      CHAR(2)      PATH '$.contact.state',
         ZIPCODE    CHAR(5)      PATH '$.contact.zipcode',
         PURCHASES  VARCHAR(200) FORMAT JSON 
                                 PATH '$.purchases[*].product_id'
                                 WITH UNCONDITIONAL WRAPPER 
      ) 
      ERROR ON ERROR) AS RESULTS
WHERE JSON_VALUE(C.DETAILS,'$.contact.state' RETURNING CHAR(2)) = 'OH'

## Summary
The `JSON_TABLE` function can help you publish the contents of a JSON document in a form that resembles a relational table. To use the `JSON_TABLE` function, you need to determine:
* The path expression of the fields you want to retrieve
* The name of the derived columns
* The format that you want use when retrieving the fields
  * Db2 (SQL) data type (regular column expression)
  * JSON data type (formatted column expression)
* How to handle missing values or errors in the document
* Any additional WHERE clause logic to limit the rows returned

#### Credits: IBM 2019, George Baklarz [baklarz@ca.ibm.com]