<a href="https://colab.research.google.com/github/brendanpshea/database_sql/blob/main/Monsters_of_JSON.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Monsters of JSON
### Brendan Shea, PhD
In the ever-evolving digital world, data has become the lifeblood of modern technology. Every day, countless amounts of data are generated, processed, and exchanged across various systems and platforms. This data can come in a multitude of formats, and one of the most prevalent is JSON, short for JavaScript Object Notation.

**JSON** is a lightweight data-interchange format that is easy for humans to read and write and easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others.

This chapter aims to provide a comprehensive introduction to JSON, its structure, and its practical uses. We will delve into the essential details of JSON and how it fits into the broader landscape of data formats. We will also explore why JSON has gained such popularity in the realm of databases and data exchange.

One such database that has embraced JSON is PostgreSQL. Known for its robustness and advanced capabilities, PostgreSQL offers substantial support for JSON, allowing developers to store, query, and manipulate JSON data efficiently. We will explore these features and provide a hands-on guide to interacting with JSON within the PostgreSQL environment.

To bring these concepts to life, we will be using a fascinating case study throughout this chapter. We will work with a JSON dataset related to the world of Dungeons and Dragons (D&D), a popular fantasy role-playing game. This dataset contains detailed information about various monsters in the game, encoded in JSON format. This dataset not only serves as a practical example of real-world JSON data but also adds an element of fun and intrigue to our exploration of JSON.

By the end of this chapter, you should have a solid understanding of JSON, its importance, and how to work with it in PostgreSQL. So, prepare yourself for a journey into the realm of data, where monsters lurk in JSON format, and PostgreSQL is our tool of choice for taming them.

## What is JSON?

JSON, or JavaScript Object Notation, is a text-based data format that is designed to be human-readable and easy for computers to parse and generate. JSON originated from the JavaScript language, but it is a language-independent data format. It is often used for transmitting data in web applications, serving as an alternative to XML.

JSON is built on two simple structures: objects and arrays/

### Objects
An **object** in JSON is an unordered set of name-value pairs, similar to a dictionary, a hash, or a record in other languages. Objects are enclosed in curly braces {}. A colon : separates the keys and the values, and a comma , separates the pairs. For example:

```
{
  "monsterName": "Fluffy Bunny",
  "monsterType": "Beast",
  "hitPoints": 3,
  "abilities": ["Hop", "Nibble", "Cuteness Overload"]
}
```
In this example, the object describes a somewhat unusual D&D monster, a "Fluffy Bunny" with abilities including "Hop", "Nibble", and "Cuteness Overload".

### Arrays
An **array** in JSON is an ordered list of values, similar to a list, a vector, or an array in other languages. Arrays are enclosed in square brackets [], and the values are separated by commas. For example:

```
[
  "Bumbling Beholder",
  "Kind Kobold",
  "Dancing Doppelganger",
  "Jolly Jelly Cube"
]
```
In this example, the array includes a list of whimsical D&D monster names. The order of these monster names in the array can be important, depending on how the data is to be used.

## Why JSON is not Relational
Relational databases and JSON structures have fundamentally different ways of representing and organizing data.

In a relational database, data is structured into tables, similar to a spreadsheet. Each table has a defined schema, which is a set of attributes (or columns), and each attribute has a specific data type. Each row in the table represents a single record, and each cell in the row represents a value for the corresponding attribute. Relationships can be defined between tables using primary and foreign keys, allowing for complex data models to be represented.

In contrast, JSON data is represented as a collection of key-value pairs (objects) or ordered lists of values (arrays). JSON objects do not have a predefined schema, meaning each object can have a different set of keys. Arrays and objects can be nested within each other, allowing for complex, hierarchical data structures.

The differences between these two data structures have several implications:

1.  *Flexibility:* JSON is schema-less, meaning it can represent a wider variety of data structures compared to relational databases. It's easy to add new fields or nest data structures. However, this flexibility can lead to inconsistencies in data if not managed carefully.

2.  *Complexity of Relationships:* Relational databases excel at representing complex relationships between different entities (tables). In JSON, while it's possible to nest objects within each other to represent relationships, it's less straightforward to manage many-to-many or complex relationships.

3.  *Data Integrity:* Relational databases provide strong data integrity through constraints, such as unique, primary key, and foreign key constraints. JSON does not inherently support these constraints, so data integrity must be managed at the application level.

4.  *Querying:* Relational databases have a powerful querying language (SQL) that supports complex queries, aggregations, and joins. Querying JSON data can be less straightforward and often requires parsing the JSON structure at the application level. However, some databases like PostgreSQL offer JSON functions and operators that make querying JSON data easier.

5.  *Storage and Performance:* Generally, relational databases are optimized for performance and can handle large volumes of data efficiently. JSON can be less efficient for large data volumes, especially when the JSON structures are complex and deeply nested. However, some modern databases have improved JSON storage and query performance.

In the context of our D&D example, if we were to use a relational database, we might have separate tables for "Monsters", "Abilities", and "Types", and we would define relationships between these tables. With JSON, we could potentially store all this data in a single JSON object, with abilities and types nested within each monster. The best approach would depend on the specific requirements and complexity of our application.

## Why is JSON Used?
JSON, or JavaScript Object Notation, has found extensive usage in modern software development due to its lightweight nature and easy readability. Here are some of the practical uses of JSON:

1.  *Data Storage:* JSON is often used for storing data locally or remotely in a structured, human-readable format. For example, a web application might store user settings in JSON format for easy retrieval and update.

2.  *Data Exchange:* JSON is commonly used as the data format for sending data between a server and a web application, or between different parts of a web application. Its human-readable nature and wide support across different programming languages make it ideal for this purpose.

3.  *Configuration Files:* JSON is often used to store configuration settings for web applications, desktop applications, servers, and more. These configuration files can be easily read and written by the application, and also easily understood and modified by humans.

4.  *Web APIs:* Most modern web APIs (Application Programming Interfaces) use JSON as their communication format. When a web application makes a request to an API (for example, to fetch data), the API often sends back the response in JSON format, which the web application can then easily parse and use.

5.  *Database Operations:* Many NoSQL databases, like MongoDB, use JSON for storing and manipulating data. Even SQL databases, like PostgreSQL, now often include support for JSON data types, allowing for more flexible data models.

6.  *Frontend Frameworks:* Many modern frontend frameworks (like React and Vue.js) use JSON for handling data within the application.

In the context of our Dungeons & Dragons example, a D&D game application might use JSON to store data about different monsters, characters, and game settings. The game might also fetch additional data from a D&D API, which sends back responses in JSON format. This JSON data can then be parsed and used by the application to enhance the gameplay experience.

## Example: Working With DnD Monsters
For the remainder of this chapter, we'll be working with the JSON file `srd_5e_monsters.json` contains data about 327 different monsters from the Dungeons and Dragons universe. Each monster is represented as a JSON object with several attributes. (https://github.com/brendanpshea/database_sql/blob/main/data/srd_5e_monsters.json)

Here are the keys (attributes) for the first monster in the data:

-   `name`: The name of the monster.
-   `meta`: Metadata about the monster, such as its size and alignment.
-   `Armor Class`: The monster's armor class.
-   `Hit Points`: The monster's hit points.
-   `Speed`: The monster's speed.
-   `STR`, `STR_mod`, `DEX`, `DEX_mod`, `CON`, `CON_mod`, `INT`, `INT_mod`, `WIS`, `WIS_mod`, `CHA`, `CHA_mod`: These are the monster's ability scores and their modifiers.
-   `Saving Throws`: The monster's saving throws.
-   `Skills`: An array (list) of the monster's skills.
-   `Senses`: An array (list) of the monster's senses.
-   `Languages`: The languages the monster can speak.
-   `Challenge`: The challenge rating of the monster.
-   `XP`: The amount of experience points  the monster is worth.
-   `Traits`: Traits that the monster possesses.
-   `Actions`: Actions that the monster can take.
-   `Legendary Actions`: Legendary actions that the monster can take.
-   `img_url`: A URL to an image of the monster.

The JSON data is structured as an array of such objects. Each object represents a unique monster, and the keys within each object provide detailed information about that particular monster.

The next step, having understood the structure, would be to import this JSON data into a PostgreSQL database.

## Importing JSON data into SQL
Here's a script for importing JSON data into a PostgreSQL database.

In [None]:
!pip install SQLAlchemy==1.3.24 -q # Needed o avoid problems with more recent version in Colab

!apt install postgresql postgresql-contrib &>log
!service postgresql start
!sudo -u postgres psql -c "CREATE USER root WITH SUPERUSER"
# set connection
%load_ext sql
%sql postgresql+psycopg2://@/postgres

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.4/6.4 MB[0m [31m22.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for SQLAlchemy (setup.py) ... [?25l[?25hdone
 * Starting PostgreSQL 12 database server
   ...done.
CREATE ROLE


In [None]:
# download the file
!wget https://github.com/brendanpshea/database_sql/raw/main/data/processed_srd_5e_monsters.json -q

In [None]:
%%capture
import json

# Load JSON file
with open('processed_srd_5e_monsters.json', 'r') as f:
    data = json.load(f)

# Use SQL magic to connect to PostgreSQL
%sql postgresql+psycopg2://@/postgres

# Create table
%sql DROP TABLE IF EXISTS monsters;
%sql CREATE TABLE monsters (data JSONB);

# Insert JSON data into the table
for monster in data:
    monster_json = json.dumps(monster)
    %sql INSERT INTO monsters (data) VALUES (:monster_json)


In [None]:
## Make sure we have got all the monsters!
%sql SELECT COUNT(*) FROM monsters;

 * postgresql+psycopg2://@/postgres
1 rows affected.


count
327


In [None]:
## Display the first monster
%sql SELECT * FROM monsters LIMIT 1;

 * postgresql+psycopg2://@/postgres
1 rows affected.


data
"{'XP': 5, 'CHA': '18', 'CON': '15', 'DEX': '9', 'INT': '18', 'STR': '21', 'WIS': '15', 'meta': 'Large aberration, lawful evil', 'name': 'Aboleth', 'Speed': ['10 ft.', 'swim 40 ft.'], 'Senses': ['Darkvision 120 ft.', 'Passive Perception 20'], 'Skills': ['History +12', 'Perception +10'], 'Traits': ""<p><em><strong>Amphibious.</strong></em> The aboleth can breathe air and water. </p><p><em><strong>Mucous Cloud.</strong></em> While underwater, the aboleth is surrounded by transformative mucus. A creature that touches the aboleth or that hits it with a melee attack while within 5 feet of it must make a DC 14 Constitution saving throw. On a failure, the creature is diseased for 1d4 hours. The diseased creature can breathe only underwater. </p><p><em><strong>Probing Telepathy.</strong></em> If a creature communicates telepathically with the aboleth, the aboleth learns the creature's greatest desires if the aboleth can see the creature.</p>"", 'Actions': ""<p><em><strong>Multiattack.</strong></em> The aboleth makes three tentacle attacks. </p><p><em><strong>Tentacle.</strong></em> <em>Melee Weapon Attack:</em> +9 to hit, reach 10 ft., one target. <em>Hit:</em> 12 (2d6 + 5) bludgeoning damage. If the target is a creature, it must succeed on a DC 14 Constitution saving throw or become diseased. The disease has no effect for 1 minute and can be removed by any magic that cures disease. After 1 minute, the diseased creature's skin becomes translucent and slimy, the creature can't regain hit points unless it is underwater, and the disease can be removed only by heal or another disease-curing spell of 6th level or higher. When the creature is outside a body of water, it takes 6 (1d12) acid damage every 10 minutes unless moisture is applied to the skin before 10 minutes have passed. </p><p><em><strong>Tail.</strong></em> <em>Melee Weapon Attack:</em> +9 to hit, reach 10 ft. one target. <em>Hit:</em> 15 (3d6 + 5) bludgeoning damage. </p><p><em><strong>Enslave (3/Day).</strong></em> The aboleth targets one creature it can see within 30 feet of it. The target must succeed on a DC 14 Wisdom saving throw or be magically charmed by the aboleth until the aboleth dies or until it is on a different plane of existence from the target. The charmed target is under the aboleth's control and can't take reactions, and the aboleth and the target can communicate telepathically with each other over any distance. </p><p>Whenever the charmed target takes damage, the target can repeat the saving throw. On a success, the effect ends. No more than once every 24 hours, the target can also repeat the saving throw when it is at least 1 mile away from the aboleth.</p>"", 'CHA_mod': '(+4)', 'CON_mod': '(+2)', 'DEX_mod': '(-1)', 'INT_mod': '(+4)', 'STR_mod': '(+5)', 'WIS_mod': '(+2)', 'img_url': 'https://media-waterdeep.cursecdn.com/avatars/thumbnails/0/11/1000/1000/636238825975375671.jpeg', 'Challenge': 10, 'Languages': ['Deep Speech', 'Telepathy 120 ft.'], 'Hit Points': 135, 'Armor Class': 17, 'Saving Throws': ['CON +6', 'INT +8', 'WIS +6'], 'Legendary Actions': ""<p>The aboleth can take 3 legendary actions, choosing from the options below. Only one legendary action option can be used at a time and only at the end of another creature's turn. The aboleth regains spent legendary actions at the start of its turn. </p><p><em><strong>Detect.</strong></em> The aboleth makes a Wisdom (Perception) check. </p><p><em><strong>Tail Swipe.</strong></em> The aboleth makes one tail attack. </p><p><em><strong>Psychic Drain</strong></em> (Costs 2 Actions). One creature charmed by the aboleth takes 10 (3d6) psychic damage, and the aboleth regains hit points equal to the damage the creature takes.</p>""}"


## How to Query JSON Data Using Postgres
First, it's important to understand how we've stored our JSON data in PostgreSQL. We've created a table named `monsters`, which has a single column named `data` of type `JSONB`. Each row in this table contains a JSON object, which represents a single monster.

Now, when it comes to querying JSON data in PostgreSQL, we can use the `->`, `->>`, and `#>` operators.

-   `->`: This operator is used to get a JSON object field by key. The result is a JSON object or array. For example, `data->'name'` would give us the name of the monster as a JSON object.

-   `->>`: This operator is also used to get a JSON object field by key, but the result is text. For example, `data->>'name'` would give us the name of the monster as text.

-   `#>`: This operator is used to get a JSON object field by a path. A path is an array of key strings. This is useful when our JSON objects have nested objects or arrays.

Let's start with a simple example:

In [None]:
%%sql
--Get the names of the first five monsters
SELECT data->>'name' AS Monster_Name
FROM monsters
LIMIT 5;

 * postgresql+psycopg2://@/postgres
5 rows affected.


monster_name
Aboleth
Acolyte
Adult Black Dragon
Adult Blue Dragon
Adult Brass Dragon


Here's what this query does:

-   `SELECT`: This keyword is used to select data from a database. The data returned is stored in a result table, called the result-set.

-   `data->>'name'`: Here, we're using the `->>` operator to get the `name` field from our `data` JSON objects. The result will be the name of the monster, as text.

-   `AS Monster_Name`: The `AS` keyword is used to rename a column or table with an alias. Here, we're renaming our `data->>'name'` column to `Monster_Name`.

-   `FROM monsters`: This specifies the name of the table that we're selecting data from. In this case, it's our `monsters` table.

-   `LIMIT 5`: This is used to limit the number of rows returned in a result set. Here, we're limiting our result set to the first 5 rows.

The result of this query should be the names of the first 5 monsters in our table.

## Retrieving Nested Data
Now that we've learned how to retrieve a specific attribute from our JSON data (in this case, the monster's name), let's explore more ways to interact with our JSONB column in PostgreSQL.

In our `monsters` table, each monster object has several skills. Some of these attributes could be objects or arrays themselves. For instance, if a monster has an attribute `Skills` that is an array of abilities, we can use the `->` operator to access this array. Let's consider an example where we want to access the first ability of each monster:

In [None]:
%%sql
SELECT data->'name' AS Name,
  data->'Skills'->0  AS First_Ability
FROM monsters
LIMIT 5;


 * postgresql+psycopg2://@/postgres
5 rows affected.


name,first_ability
Aboleth,History +12
Acolyte,Medicine +4
Adult Black Dragon,Perception +11
Adult Blue Dragon,Perception +12
Adult Brass Dragon,History +7


## Filtering Data

We can also use these operators in the `WHERE` clause to filter our data. For example, if we want to get all monsters with a hit points greater than 300:

Here, `(data->>'Hit Points')::int > 300` is used in the `WHERE` clause to filter our data. We're casting the hit points to an integer using `::int` because the `->>` operator returns text, but we need to perform a numerical comparison.

In [None]:
%%sql
SELECT data->>'name' AS Monster_Name
FROM monsters
WHERE (data->>'Hit Points')::int > 300;


 * postgresql+psycopg2://@/postgres
12 rows affected.


monster_name
Ancient Black Dragon
Ancient Blue Dragon
Ancient Bronze Dragon
Ancient Copper Dragon
Ancient Gold Dragon
Ancient Green Dragon
Ancient Red Dragon
Ancient Silver Dragon
Ancient White Dragon
Dragon Turtle


## Aggregating Data

We can perform aggregation operations like COUNT, MAX, MIN, AVG, etc., on the data. For instance, if we want to find the average hit points of all monsters:

In [22]:
%%sql
SELECT AVG((data->>'Hit Points')::int) AS AVG_HP,
  MAX((data->>'Hit Points')::int) AS Max_HP,
  MIN((data->>'Hit Points')::int) AS Min_HP
FROM monsters;

 * postgresql+psycopg2://@/postgres
1 rows affected.


avg_hp,max_hp,min_hp
81.34250764525994,676,1


In this query, we're using the AVG function to find the average of the hit points.

Remember, the keys we use to access the data (`name`, `Skills`, `Hit Points`, etc.) are dependent on the structure of the JSON objects in your monsters table. Always make sure to understand the structure of your JSON data before attempting to query it.

Take some time to try out these queries, and see what kind of data you can extract from the monsters table. As you get more comfortable with these operators and JSON data in general, you'll find that you can perform very powerful and flexible queries with PostgreSQL and JSONB.

## JSON and Semi-Structured Data
One of the most significant advantages of JSON (JavaScript Object Notation) is its ability to handle semi-structured data. Unlike structured data, which is organized in a predefined manner (like tables in a relational database), semi-structured data is more flexible, allowing for the representation of hierarchical relationships, lists, and other complex structures.

Let's illustrate this concept with our Dungeons and Dragons (DnD) monsters dataset. In this dataset, not all monsters have "Legendary Actions". This is a perfect example of semi-structured data - while all monsters share some common attributes (like "name" or "hitPoints"), some monsters have additional attributes that others don't.

In a traditional relational database, dealing with such inconsistencies can be challenging. We would typically create a separate table for "Legendary Actions", and then use a foreign key to link this table to the main "Monsters" table. However, this approach can become complex and inefficient as the number of unique attributes grows. Moreover, the majority of the entries in the "Legendary Actions" table might be null if only a few monsters have legendary actions.

On the other hand, JSON handles this gracefully. The JSON data model is based on key-value pairs and arrays, making it inherently flexible and suitable for semi-structured data. If a monster has legendary actions, we simply add a "Legendary Actions" field to its JSON object. If not, we omit this field. This approach is intuitive and efficient, as it avoids the need for null entries and complex joins.

JSON is also excellent at representing hierarchical relationships, which are common in real-world data. For instance, in an e-commerce application, an order might consist of multiple items, each with its own properties. This "Order -> Items" relationship can be represented as a JSON object, with "Order" being a key, and "Items" being an array of item objects.

Beyond DnD and e-commerce, JSON is ubiquitous in today's software landscape. It's the de facto standard for web APIs due to its simplicity, readability, and wide support across programming languages. It's used in configuration files, for data serialization, in NoSQL databases like MongoDB, and much more. Its ability to handle semi-structured data makes it a versatile tool for many different use cases.

In conclusion, while relational databases are excellent for structured data with well-defined schemas, JSON provides a flexible and efficient alternative for handling semi-structured data. As with all tools, the key is to understand its strengths and use it where it excels. In a world where data is becoming increasingly complex and varied, JSON is a valuable tool to have in your data handling toolbox.