# {"title": "Db2 and JSON" }


### Dr. Henrik Loeser, 26.04.2019

Offering Manager & Developer Advocate    
Focus: IBM Cloud with Data & Analytics, Data Security, Privacy & Compliance

* Email: hloeser@de.ibm.com
* Twitter: @data_henrik
* Blog: https://blog.4loeser.net
* LinkedIn: http://de.linkedin.com/in/henrikloeser
* GitHub: https://github.com/data-henrik



#### [Mod Pack 4, Fix Pack 4 Updates](https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.wn.doc/doc/c0061179.html) - Application Development
> A new set of built-in JSON SQL functions are included with both new installations and upgrades to Db2 version 11.1.4.4. These JSON SQL functions better support SQL interaction with JSON data. With these functions, you can store, retrieve, and query JSON and BSON data directly, using SQL. You can also create JSON documents using SQL. These new functions follow the grammar and semantics outlined in the ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) SQL Technical Report "Part 6: SQL support for JavaScript Object Notation (JSON)" (TR 19075-6:2017). Part 6 of the report outlines a set of SQL language for storing, querying, and publishing JSON data. As built-in functions, the Db2 JSON SQL functions reside in the SYSIBM schema and do not require users to hold any privilege on a function to invoke it. The new built-in JSON SQL functions replace the original set of JSON SQL functions provided in the SYSTOOLS schema. For more information, see JSON scalar functions, JSON_TABLE table function, and JSON_EXISTS predicate.

# New JSON functionality
<br>

 * [JSON_QUERY](https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0070413.html)   
 
 * [JSON_OBJECT](https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0070412.html)   
 
 * [JSON_ARRAY](https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0070411.html)
 
 * [JSON_VALUE](https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0070417.html)   
 
 * [JSON_TABLE](https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0070414.html)   
 
 * [JSON_EXISTS](https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0070428.html)   

# Lax and strict processing

* Lax: tolerate structural issues, return empty on error

* Strict: error on any issue

The [Db2 documentation on sql-json-path-expression](https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0070416.html) has details...

# Load sql extension and connect to Db2 run as Docker container

In [2]:
%load_ext sql

In [3]:
%sql db2+ibm_db://db2inst1:psswrd@localhost:50000/testdb

'Connected: db2inst1@testdb'

# Create a test table, doc holds the JSON data

In [6]:
%sql create table myjson(id int, doc varchar(3000))

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.


[]

# Insert some random test data, DeDUG style... :)

In [7]:
%%sql
insert into myjson values(1, '{"id":"701", "name":{"first":"Henrik", "last":"Loeser"}}'), 
(2, '{"id":"702", "name":{"first":"Michael", "last":"Te"}}'), (3, '{"id":"703", "name":{"first":"Roland", "last":"Es"}}')

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
3 rows affected.


[]

# Functional index on the last name
maybe a performance test later on

In [8]:
%%sql
CREATE INDEX JIX1 ON MYJSON(JSON_VALUE(DOC, 'strict $.name.last' RETURNING varchar(60)));

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.


[]

# Return the first name

In [9]:
%%sql
select json_value(doc,'$.name.first' RETURNING VARCHAR(30)) from myjson

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.


1
Henrik
Michael
Roland


# the same again but with sorting

In [10]:
%%sql

select json_value(doc,'$.name.first' RETURNING VARCHAR(30)) as fname from myjson order by fname desc

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.


fname
Roland
Michael
Henrik


# another doc, but with missing first name

In [11]:
%%sql
insert into myjson values(4, '{"id":"704", "name":{"last":"Laser"}}')

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
1 rows affected.


[]

# Now, repeat the query from above
**None** is returned for the missing first name

In [13]:
%%sql

select json_value(doc,'$.name.first' RETURNING VARCHAR(30)) as fname from myjson order by fname desc;

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.


fname
""
Roland
Michael
Henrik


# apply "strict" rules
default is "lax"

In [14]:
%%sql
select json_value(doc,'strict $.name.first' RETURNING VARCHAR(30)) as fname from myjson order by fname desc;

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.


fname
""
Roland
Michael
Henrik


# once we switch to "error on error"...
Db2 returns an error because of the failed access to the missing first name

In [15]:
%%sql
select json_value(doc,'strict $.name.first' RETURNING VARCHAR(30) error on error) as fname from myjson order by fname desc;

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.


DBAPIError: (ibm_db_dbi.Error) ibm_db_dbi::Error: [IBM][CLI Driver][DB2/LINUXX8664] SQL16410N  SQL/JSON member not found.  SQLSTATE=2203A SQLCODE=-16410
(Background on this error at: http://sqlalche.me/e/dbapi)

# Back to "lax" (default), everything is fine

In [16]:
%%sql
select json_value(doc,'$.name.first' RETURNING VARCHAR(30) error on error) as fname from myjson order by fname desc;

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.


fname
""
Roland
Michael
Henrik


# Table function to extract values and return rows

In [23]:
%%sql
SELECT U.id, U.firstname, U.lastname 
FROM myjson my, JSON_TABLE(my.doc, 'strict $' 
                           COLUMNS( id INTEGER PATH '$.id', 
                                   firstname  VARCHAR(20) PATH '$.name.first',
                                   lastname  VARCHAR(20) PATH '$.name.last') error on error) AS U

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.


id,firstname,lastname
701,Henrik,Loeser
702,Michael,Te
703,Roland,Es
704,,Laser


## Create unique index on the ID within the JSON docs

In [24]:
%%sql
CREATE UNIQUE INDEX JIX2 ON MYJSON(JSON_VALUE(DOC, 'strict $.id' RETURNING integer));

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.


[]

## Now insert the same record again, bang!!/&%!!!

In [25]:
%%sql
insert into myjson values(4, '{"id":"704", "name":{"last":"Laser"}}')

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb


IntegrityError: (ibm_db_dbi.IntegrityError) ibm_db_dbi::IntegrityError: Statement Execute Failed: [IBM][CLI Driver][DB2/LINUXX8664] SQL0803N  One or more values in the INSERT statement, UPDATE statement, or foreign key update caused by a DELETE statement are not valid because the primary key, unique constraint or unique index identified by "2" constrains table "DB2INST1.MYJSON" from having duplicate values for the index key.  SQLSTATE=23505 SQLCODE=-803
[SQL: insert into myjson values(4, '{"id":"704", "name":{"last":"Laser"}}')]
(Background on this error at: http://sqlalche.me/e/gkpj)

## But this should succeed

In [27]:
%%sql
insert into myjson values(5, '{"id":"705", "name":{"last":"Loser"}}')

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
1 rows affected.


[]

# Index in queries (1)

In [12]:
result = %sql SELECT id FROM myjson WHERE JSON_VALUE(doc, 'strict $.id' returning integer)=701
print(result)

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.
+----+
| id |
+----+
| 1  |
+----+


# Index in queries (2)

In [5]:
%%sql
explain plan for SELECT id FROM myjson
  WHERE JSON_VALUE(doc, 'strict $.id' returning integer)=701

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.


[]

In [9]:
the_plan=%sql select * from last_explained
print(the_plan)

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.
+---------------------------------------------------------------------------------------------+
|                                         Explain Plan                                        |
+---------------------------------------------------------------------------------------------+
|                         ID | Operation     |             Rows | Cost                        |
|                          1 | RETURN        |                  |    6                        |
|                          2 |  FETCH MYJSON | 1 of 1 (100.00%) |    6                        |
|                          3 |   IXSCAN JIX2 | 1 of 5 ( 20.00%) |    0                        |
|                                                                                             |
|                                    Predicate Information                                    |
|  3 - START ( JSON_VALUE() = 701)                                            

# Index in queries (3)

In [10]:
%%sql
explain plan for SELECT id FROM myjson
  WHERE JSON_VALUE(doc, 'strict $.name.last' returning varchar(60))='Loeser'

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.


[]

In [11]:
the_plan=%sql select * from last_explained
print(the_plan)

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.
+---------------------------------------------------------------------------------------------+
|                                         Explain Plan                                        |
+---------------------------------------------------------------------------------------------+
|                         ID | Operation     |             Rows | Cost                        |
|                          1 | RETURN        |                  |    6                        |
|                          2 |  FETCH MYJSON | 1 of 1 (100.00%) |    6                        |
|                          3 |   IXSCAN JIX1 | 1 of 5 ( 20.00%) |    0                        |
|                                                                                             |
|                                    Predicate Information                                    |
|  3 - START ( JSON_VALUE() = 'Loeser')                                       

# Index in queries (4)

In [13]:
%%sql
explain plan for SELECT id FROM myjson
  WHERE JSON_VALUE(doc, 'strict $.name.last' returning varchar(60))='Loeser'
  AND JSON_VALUE(doc, 'strict $.id' returning integer)=701

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.


[]

In [14]:
the_plan=%sql select * from last_explained
print(the_plan)

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.
+---------------------------------------------------------------------------------------------+
|                                         Explain Plan                                        |
+---------------------------------------------------------------------------------------------+
|                         ID | Operation     |             Rows | Cost                        |
|                          1 | RETURN        |                  |    6                        |
|                          2 |  FETCH MYJSON | 0 of 1 (   .00%) |    6                        |
|                          3 |   IXSCAN JIX2 | 1 of 5 ( 20.00%) |    0                        |
|                                                                                             |
|                                    Predicate Information                                    |
|   2 - SARG ( JSON_VALUE() = 'Loeser')                                       

# Generate some JSON from catalog tables (1)

In [72]:
%%sql
SELECT JSON_OBJECT(KEY 'name' VALUE TABNAME, 
                   KEY 'schema' VALUE TABSCHEMA)
FROM SYSCAT.TABLES
WHERE TABNAME LIKE 'SYSXML%'

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.


1
"{""name"":""SYSXMLPATHS"",""schema"":""SYSIBM ""}"
"{""name"":""SYSXMLSTRINGS"",""schema"":""SYSIBM ""}"


# Generate some JSON from catalog tables (2)

In [71]:
%%sql
SELECT JSON_OBJECT(KEY 'name' VALUE TABNAME, 
                   KEY 'schema' VALUE TABSCHEMA,
                   KEY 'columns'  VALUE
                   JSON_ARRAY((SELECT COLNAME
                              FROM SYSCAT.COLUMNS C 
                              WHERE C.TABNAME=T.TABNAME AND C.TABSCHEMA=T.TABSCHEMA)) FORMAT JSON)
FROM SYSCAT.TABLES T
WHERE TABNAME LIKE 'SYSXML%'

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.


1
"{""name"":""SYSXMLPATHS"",""schema"":""SYSIBM "",""columns"":[""PATH"",""PATHID"",""PATHTYPE""]}"
"{""name"":""SYSXMLSTRINGS"",""schema"":""SYSIBM "",""columns"":[""IS_TEMPORARY"",""STRING"",""STRINGID""]}"


# Nesting ?! :(

In [63]:
%%sql
values JSON_ARRAY(JSON_OBJECT(Key 'foo' value 'bar') format json)

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.


1
"[{""foo"":""bar""}]"


In [68]:
%%sql
values JSON_ARRAY((select JSON_OBJECT(KEY 'colname' VALUE 1234 ) from syscat.xmlstrings) format json)

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
(ibm_db_dbi.ProgrammingError) ibm_db_dbi::ProgrammingError: SQLNumResultCols failed: [IBM][CLI Driver][DB2/LINUXX8664] SQL0901N  The SQL statement or command failed because of a database system error. (Reason "Unexpected agg opparm".)  SQLSTATE=58004 SQLCODE=-901
[SQL: values JSON_ARRAY((select JSON_OBJECT(KEY 'colname' VALUE 1234 ) from syscat.xmlstrings) format json)]
(Background on this error at: http://sqlalche.me/e/f405)


# Find the docs that do not have a first name

### Make use of JSON_EXISTS

In [8]:
%%sql
SELECT id, doc FROM myjson
  WHERE NOT JSON_EXISTS(COALESCE(doc, ''), 'strict $.name.first' FALSE ON ERROR);

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.


id,doc
4,"{""id"":""704"", ""name"":{""last"":""Laser""}}"
5,"{""id"":""705"", ""name"":{""last"":""Loser""}}"


# Summary


* JSON functions in SQL standard

* similar to SQL/XML

* generate, extract, query JSON data

* no native JSON storage, store data as (binary) string

* use JSON_VALUE as input to functional indexes

* Start with Db2 documentation on [SQL access to JSON documents](https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.swg.im.dbclient.json.doc/doc/c0070285.html)

### {"closing" : ["Danke", "Thank you", "Das war's", "Servus"]}

# Index follow-up: enforce structure

In [15]:
%%sql
CREATE INDEX JIX3 ON MYJSON(JSON_VALUE(DOC, 'strict $.name.first' RETURNING varchar(60)));

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb
Done.


[]

In [16]:
%%sql
CREATE INDEX JIX4 ON MYJSON(JSON_VALUE(DOC, 'strict $.name.first' RETURNING varchar(60) error on error));

 * db2+ibm_db://db2inst1:***@localhost:50000/testdb


DataError: (ibm_db_dbi.DataError) ibm_db_dbi::DataError: Statement Execute Failed: [IBM][CLI Driver][DB2/LINUXX8664] SQL16410N  SQL/JSON member not found.  SQLSTATE=2203A SQLCODE=-16410
[SQL: CREATE INDEX JIX4 ON MYJSON(JSON_VALUE(DOC, 'strict $.name.first' RETURNING varchar(60) error on error));]
(Background on this error at: http://sqlalche.me/e/9h9h)