# Working with JSON Data

The American National Standards Institute (ANSI) SQL standard added syntax definitions for JSON & specified functions for creating & accessing JSON objects in 2016. Major database systems have added JSON support in recent years as well, although implementations vary. PostgreSQL, for example, supports some of the ANSI standard while implementing a number of nonstandard operators.

---

# Understanding JSON Structure

JSON data primarily comprises two structures: an *object*, which is an unordered set of name/value pairs, & an *array*, which is an ordered collection of values. If you've used programming languages such as JavaScript, Python, or C#, these aspects of JSON should look familiar.

Inside an object, we use name/value pairs as a structure for storing & reference individual data items. The object in its entirely is enclosed within curly brackets, & each name, more often referred to as a *key*, is enclosed in double quotes, followed by a colon & its corresponding value. The object can encapsulate multiple key/value pairs, separated by commas. Here's an example using movie information:

```
{"title": "The Incredibles", "year": 2004}
```

The keys are `title` & `year`, & their values are `"The Incredibles"` & `"2004"`. If the value is a string, it goes in double quotes. If it's a number, a Boolean value, or a `null`, we omit the quotes. If you're familiar with the Python language, you'll recognise this structure as a *dictionary*.

An array is an ordered list of values enclosed in square brackets. We separate each value in the array with a comma. For example, we might list movie genres like this:

```
["animation", "action"]
```

Arrays are common in programming languages, & we've used them already in SQL queries. In Python, this structure is called a *list*.

We can create many permutations of these structures, including nesting objects & arrays inside each other. For example, we can create an array of objects or use an array as the value of a key. We can add or omit key/value pairs or create additional arrays of objects without violating a preset schema. This flexibility -- in contrast to the strict definition of a SQL table -- is both part of the appeal of using JSON as a data store as well as one of the biggest difficulties in working with JSON data.

As an example, the query below shows information about two films stored in JSON. The outermost structure is an array with two elements -- on object for each film. We know the outermost structure is an array with two elements -- one object for each film. We know the outermost structure is an array because the entire JSON begins & ends with square brackets.

```
[{
  "title": "The Incredibles",
  "year": 2004,
  "rating": {"MPAA": "PG"},
  "characters": [{"name": "Mr. Incredible",
                  "actor": "Craig T. Nelson"},
                 {"name": "Elastigirl",
                  "actor": "Holly Hunter"},
                 {"name": "Frozone",
                  "actor": "Samuel L. Jackson"}],
  "genre": ["animation", "action", "sci-fi"]},
 {"title": "Cinema Paradiso",
  "year": 1988,
  "characters": [{"name": "Salvatore",
                  "actor": "Salvatore Cascio"},
                 {"name": "Alfredo",
                  "actor": "Philippe Noiret"}],
  "genre": ["romance", "drama"]}
}]
```

Inside the outermost array, each film object is surrounded by curly brackets. The open brace starts the object for the first film *The Incredibles*. For both films, we store the `title` & `year` as key/value pairs, & they have string & integer values, respectively. The third key, `rating`, has a JSON object for its value. That object contains a single key/value pair showing the film's rating from the Motion Picture Association of America.

Here, we can see the flexibility JSON affords us as a storage medium. First, if we later wanted to add another country's rating for the film, we could easily add a second key/value pair to the `rating` value object.

Second, we're not required to include `rating` -- or any key/value pair -- in every film object. In fact, I omitted a `rating` for *Cinema Paradiso*. If a particular piece of data isn't available, in this case a rating, some systems that generate JSON might simply exclude that pair. Other systems might include `rating` but with a `null` value. Both are valid, & that flexibility is one of JSON's advantages: its data definition, or *schema*, can flex as needed.

The final two key/value pairs show other ways to structure JSON. For `characters`, the value is an array of objects, with each object surrounded by curly brackets & separated by a comma. The value of `genre` is an array of strings.

---

# Considering When to Use JSON with SQL

There are no advantages to using *NoSQL* or *document* databases that store data in JSON or other text-based data formats, as opposed to the relational tables SQL uses. Document databases are flexible in terms of data definitions. You can redefine a data structure on the fly if needed. Document databases are often also used for high-volume applications because they can be scaled by adding servers. On the flip side, you may give up SQL advantages such as easily added constraints that enforce data integrity & support for transactions.

The arrival of JSON support in SQL has made it possible to enjoy the best of both worlds by adding JSON data as columns in relational tables. The decision to use a SQL or NoSQL database should be multifaceted. PostgreSQL performs favorably relative to NoSQL in terms of speed, but we must also consider the kinds & volumne of data being stored, the applications being served, & more.

That said, some cases where you might want to take advantage of JSON in SQL include the following:

When storing related data in a JSON column instead of a separate table. An employees table could have the usual columns for name & contact information plus a JSON column with a flexible collection of key/value pairs that might hold additional attributes that don't apply to every employee, such as company awards or performance metrics.

When saving time by analysing JSON data fetched from other systems without first parsing it into a set of tables.

Keep in mind that using JSON in PostgreSQL or other SQL databases can also present challenges. Constraints that are trivial to set up on regular SQL tables can be more difficult to set & enforce on JSON data. JSON data can consume more space as key names get repeated in text along with the quotes, commas, & braces that define its structure. Finally, the flexibility of JSON can create issues for the code that interacts with it -- whether SQL or another language -- if keys unexpectedly disappear or the data type of a value changes.

---

# Using json & jsonb Data Types

PostgreSQL provides two data types for storing JSON. Both allow insertion of valid JSON only -- text that includes required elements of the JSON specification, such as open & closing curly brackets around an object, commas separating objects, & proper quoting of keys. If you try to insert invalid JSON, the database will generate an error.

The main difference between the two is that one stores JSON as text & the other as binary data. The binary implementation is newer to PostgreSQL & generally preferred because it's faster at querying & has indexing capabilities.

The two types are as follows:

**json:** Stores JSON as text, keeping white space & maintaining the order of keys. If a single JSON object contains a particular key more than once (which is valid), the `json` type will preserve each of the repeated key/value pairs. Finally, each time a database function processes `json`-stored text, it must parse the object to interpret its structure. This can make roads from the database slower than with the `jsonb` type. Indexing is not supported. Typically, the `json` type is useful when an application has duplicate keys or need to preserve the order of keys.

**jsonb:** Stores JSON in a binary format, removing white space & not maintaining the order of keys. If a single JSON object contains a particular key more than once, the `jsonb` type will preserve only the last of the key/value pairs. The binary format adds some overhead to writing data to the table, but processing is faster. Indexing is supported.

Neither `json` nor `jsonb` is part of the ANSI SQL standard, which doesn't specify a JSON data type & leaves it to database makers to decide how to implement support. The [PostgreSQL documentation](https://www.postgresql.org/docs/current/datatype-json.html) recommends using `jsonb` unless there's a need to preserve the order of key/value pairs.

We'll use `jsonb` exclusively in the remainder of the chapter, both because of speed considerations but also because many of PostgreSQL's JSON functions work the same way with both `json` & `jsonb` -- & there are more functions available for `jsonb`. 

---

# Importing & Indexing JSON Data

The file *films.json* in this lesson's resources contains JSON data on films. View the file with a text editor & you'll see each film's JSON object is placed on a single line, with no line breaks between elements. The file contains two film objects -- each is a valid JSON object:

```
{"title": "The Incredibles", "year": 2004, ...}
{"title": "Cinema Paradiso", "year": 1988, ...}
```

Usually, there would be an outer bracket that encompasses the two film objects, but we have it set up this way so that PostgreSQL's `COPY` command will interpret each film's JSON object as a separate row on import, the same way it does when importing a CSV file. The query below makes a simple `films` table with a surrogate primary key & a `jsonb` column called `film`.

```
CREATE TABLE films (
    id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    film jsonb NOT NULL
);

COPY films (film)
FROM '/YourDirectory/films.json';

CREATE INDEX idx_film ON films USING GIN (film);
```

Note that the `COPY` statement ends with the `FROM` clause instead of continuing to include a `WITH` statement as in previous examples. The reason we no longer need the `WITH` statement, which we've used to specify options for file headers & CSV formatting, is that this file has no header & isn't delimited. We just want the database to read each line & process it.

After import, we add an index to the `jsonb` column using the GIN index type. We discussed the generalised inverted index (GIN) with full-text search. GIN's implementation of indexing the location of words or key values within text is particularly suited to JSON data. Note that because index entries point to rows in a table, `jsonb` column indexing works best when each row contains a relatively small chunk of JSON -- as opposed to a table with one row that has a single enormous JSON value & repeated keys.

Execute the commands & create the table & add the index. Run `SELECT * FROM films;` & you should see two rows containing the autogenerated `id` & the JSON object text. 

<img src = "Creating a Table to Hold JSON Data.png" width = "600" style = "margin:auto"/>



---

# Using json & jsonb Extraction Operators

To retrieve values from our stored JSON, we can use PostgreSQL-specific *extraction operators*, which return either a JSON object, an element of an array, or an element that exists at a path in the JSON structure we specify. The table below shows the operators & their functions, which can vary based on the data type of the input. Each works with `json` & `jsonb` data types.

|Operator, syntax|Function|Returns|
|:---|:---|:---|
|`json -> text` <br>`jsonb -> text`|Extracts a key value, specified as text|`json` or `jsonb` (matching the input)|
|`json ->> text` <br>`jsonb ->> text`|Extracts a key value, specified as text|`text`|
|`json -> integer` <br>`jsonb -> integer`|Extracts an array element, specified as an integer denoting its array position|`json` or `jsonb` (matching the input)|
|`json ->> integer` <br>`jsonb ->> integer`|Extracts an array element, specified as an integer denoting its array position|`text`|
|`json #> text array` <br>`jsonb #> text array`|Extracts a JSON object at a specified path|`json` or `jsonb` (matching the input)|
|`json #>> text array` <br>`jsonb #>> text array`|Extracts a JSON object at a specified path|`text`|

## Key Value Extraction

In the query below, we use the `->` & `->>` operators followed by text naming the key value to retrieve. In that context, with text input, these are called *field extraction operators* because they extract a field, or key value, from the JSON. The difference between the two is that `->` returns the key value as JSON in the same type as stored, & `->>` returns the key value as text.

```
SELECT id, film -> 'title' AS title
FROM films
ORDER BY id;

SELECT id, film ->> 'title' AS title
FROM films
ORDER BY id;

SELECT id, film -> 'genre' AS genre
FROM films
ORDER BY id;
```

In the `SELECT` list, we specify our JSON column name followed by the operator & the key name in single quotes. In the first example, the syntax `-> 'title'` returns the value ofthe `title` key as JSON in the same data type as stored `jsonb`. Run the first query & you should see the output like this:

<img src = "Retrieving JSON Key Values with Field Extraction Operators 1.png" width = "600" style = "margin:auto"/>

In pgAdmin, the data type listed in the `title` column header should indicate `jsonb`, & the film titles remain quoted, as they are in the JSON object.

Changing the field extraction operator to `->>` returns the film title as text instead:

<img src = "Retrieving JSON Key Values with Field Extraction Operators 2.png" width = "600" style = "margin:auto"/>

Finally, we'll return an array. In our films JSON, the value of the key `genre` is an array of values. Using the field extraction operator `->` returns the array as `jsonb`.

<img src = "Retrieving JSON Key Values with Field Extraction Operators 3.png" width = "600" style = "margin:auto"/>

If we used `->>` here, we'd return the arrays as text.

## Array Element Extraction

To retrieve a specific value from an array, we follow the `->` & `->>` operators with an integer specifying the value's position, or *index* in the array. We call these *element extraction operators* because they retrieve an element from a JSON array. As with field extraction, `->` returns the value as JSON in the same type as stored, & `->>` returns as text.

The query below shows four examples using the array values of `"genre"`.

```
SELECT id, film -> 'genre' -> 0 AS genres
FROM films
ORDER BY id;

SELECT id, film -> 'genre' -> -1 AS genres
FROM films
ORDER BY id;

SELECT id, film -> 'genre' -> 2 AS genres
FROM films
ORDER BY id;

SELECT id, film -> 'genre' ->> 0 AS genres
FROM films
ORDER BY id;
```

We must first retrieve the array value from the key as JSON & then retrieve the desired element from the array. In the first example, we specify the JSON column `film`, followed by the film extraction operator `->` & the `genre` key name in single quotes. This returns the `genre` value as `jsonb`. We follow the key name with `->` & the integer `0` to get the first element.

Why not use `1` for the first value in the array? In many languages, including Python & JavaScript, index values start at zero, & that's also true when access JSON arrays with SQL.

Run the first query, & your results, should look like this, showing the first element in each film's `genre` array, returned as `jsonb`:

<img src = "Retrieving JSON Array Values with Element Extraction Operators 1.png" width = "600" style = "margin:auto"/>

We can also access the last element of the array, even if we aren't sure of its index, because the number of genres per film can vary. We count backward from the end of the list using a negative index number. Supplying `-1` tells `->` to get the first element from the end of the list:

<img src = "Retrieving JSON Array Values with Element Extraction Operators 2.png" width = "600" style = "margin:auto"/>

We can count back further if we want -- an index of `-2` will get the next-to-last element.

Note that PostgreSQL won't return an error if there's no element at the supplied index position; it will simply return a `NULL` for that row. For example, if we supply `2` for the index, we see results for one of our films & a `NULL` for the other:

<img src = "Retrieving JSON Array Values with Element Extraction Operators 3.png" width = "600" style = "margin:auto"/>

We get a `NULL` back for *Cinema Paradiso* because it has only two elements in its `genre` value array, & index `2` (since we count up starting with zero) represents the third elements. 

Finally, changing the element extraction operator to `->>` returns the desired element as a `text` data type rather than JSON:

<img src = "Retrieving JSON Array Values with Element Extraction Operators 4.png" width = "600" style = "margin:auto"/>

This is the same pattern as we saw when extracting key values: `->` returns a JSON data type, & `->>` returns text.

## Path Extraction

Both `#>` & `#>>` are *path extraction operators* that return an object located at a JSON path. A path is a series of keys or arrays indices that lead to the location of a value. In our example JSON, it might be just the `title` key if we want the name of the film. Or it could be more complex, such as the `characters` key followed by an index value of `1`, then the `actor` key; this would provide the path to the name of the actor at index `1`. The `#>` path extraction operator returns a JSON data type matching the stored data, & `#>>` returns text.

Consider the MPAA rating for the film *The Incredibles*, which appears in our JSON like this:

```
"rating": {"MPAA": "PG"}
```

The structure is a key named `rating` with an object for its value; inside that objects is a key/value pair with `MPAA` as the key name. Thus, the path to the film's MPAA rating begins with the `rating` key & ends with the `MPAA` key. To donote this path's elements, we use the PostgreSQL string syntax for arrays, creating a comma-separated list inside curly brackets & single quotes. We then feed that string to the path extraction operators.

```
SELECT id, film #> '{rating, MPAA}' AS mpaa_rating
FROM films
ORDER BY id;

SELECT id, film #> '{characters, 0, name}' AS name
FROM films
ORDER BY id;

SELECT id, film #>> '{characters, 0, name}' AS name
FROM films
ORDER BY id;
```

To get each film's MPAA rating, we specify the path in an array: `{rating, MPAA}` with each item separated by commas. Run the query & you should see these results:

<img src = "Retrieving JSON Key Values with Path Extraction Operators 1.png" width = "600" style = "margin:auto"/>

The query returns the PG rating for *The Incredibles* & a `NULL` for `Cinema Paradiso` because, in our data, the latter film has no MPAA rating present.

The second example works with the array of `characters`, which in our JSON looks like this:

```
"characters": [{"name": "Salvatore",
                "actor": "Salvatore Cascio"},
               {"name": "Alfredo",
                "actor": "Philippe Noiret"}]
```

The `characters` array shown is for the second movie, but both films have a similar structure. Array objects each represent a character & the name & the actor who played them. To locate the name of the first character in the array, we specify a path that starts at the `characters` key, continues to the element of the array using the index `0`, & ends at the `name` key. The query results should look like this:

<img src = "Retrieving JSON Key Values with Path Extraction Operators 2.png" width = "600" style = "margin:auto"/>

The `#>` operators returns results as a JSON data type, in our case `jsonb`. If we want the results as text, we use `#>>` with the same path.

## Containment & Existence

The final collection of operators we'll explore performs two kinds of evaluations. The first concerns *containment* & checks whether a specified JSON value contains a second specified JSON value. The second tests for *existence*: whether a string of text within a JSON object exists as a top-level key (or as an element of an array nested inside a deeper object). Both kinds of operators returns a Boolean value, which means we can use them in a `WHERE` clause to filter query results.

This set of operators works only with the `jsonb` data type -- another good reason to favor `jsonb` over `json` -- & can make use of our GIN index for efficient searching. The table below lists the operators with their syntax & function.

|Operator, syntax|Function|Returns|
|:---|:---|:---|
|`jsonb @> jsonb`|Tests whether the first JSON value contains the second JSON value|`boolean`|
|`jsonb <@ jsonb`|Tests whether the second JSON value contains the first JSON value|`boolean`|
|`jsonb ? text`|Tests whether the text exists as a top-level (not nested) key or an array value|`boolean`|
|`jsonb ?\| text array`|Tests whether any of the text elements in the array exist as a top-level (not nested) key or as an array value|`boolean`|
|`jsonb ?& text array`|Tests whether all of the text elements in the array exist as a top-level (not nested) key or as an array value|`boolean`|

### Using Containment Operators

In the query below, we use `@>` to evaluate whether one JSON value contains a second JSON value.

```
SELECT id, film ->> 'title' AS title,
       film @> '{"title": "The Incredibles"}'::jsonb
           AS is_incredible
FROM films
ORDER BY id;
```

In our `SELECT` list, we check whether the JSON stored in the `film` column in each row contains the key/value pair for *The Incredibles*. We use the `@>` containment operator in an expression that generates a column with the Boolean result `true` if `film` contains `"title": "The Incredibles"`. We give the name of our JSON column, `film`, then the `@>` operator, & then a string (cast to `jsonb`) specifying the key/value pair. In our `SELECT` list, we also return the text of the film title as a column. Running the query should produce these results:

<img src = "Demonstrating the @> Containment Operator.png" width = "600" style = "margin:auto"/>

As expected, the expression evaluates to `true` for *The Incredibles* & `false` for *Cinema Paradiso*. Because the expression evaluates to a Boolean result, we can use it in a query's `WHERE` clause, as shown in the query below:

```
SELECT film ->> 'title' AS title,
       film ->> 'year' AS year
FROM films
WHERE film @> '{"title": "The Incredibles"}'::jsonb;
```

Here, we again check that the JSON in the `film` column contains the key/value pair for the title of *The Incredibles*. By placing the evaluation in a `WHERE` clause, the query should return just the row where the expression returns `true`:

<img src = "Using a Containment Operator in a WHERE Clause.png" width = "600" style = "margin:auto"/>

Finally, we flip the order of evaluation to check whether the key/value pair specified is contained within the `film` column.

```
SELECT film ->> 'title' AS title,
       film ->> 'year' AS year
FROM films
WHERE '{"title": "The Incredibles"}'::jsonb <@ film;
```

Here we use the `<@` operator instead of `@>` to flip the order of evaluation. This expression also evaluates to `true`, returning the same result as the previous query.

<img src = "Demonstrating the <@ Containment Operator.png" width = "600" style = "margin:auto"/>

### Using Existence Operators

In the query below, we explore three existence operators. These check whether the text we supply exists as a top-level key or as an element of an array. All return a Boolean value.

```
SELECT film ->> 'title' AS title
FROM films
WHERE film ? 'rating';

SELECT film ->> 'title' AS title,
       film ->> 'rating' AS rating,
       film ->> 'genre' AS genre
FROM films
WHERE film ?| '{rating, genre}';

SELECT film ->> 'title' AS title,
       film ->> 'rating' AS rating,
       film ->> 'genre' AS genre
FROM films
WHERE film ?& '{rating, genre}';
```

The `?` operator checks for the existence of a single key or array element. In the first query's `WHERE` clause, we give the `film` column, the `?` operator & then the string `rating`. This syntax says, "In each row, does `rating` exist as a key in the JSON in the `film` column?" When we run the query the results should show one film that has a `rating` key, *The Incredibles*.

<img src = "Demonstrating Existence Operators 1.png" width = "600" style = "margin:auto"/>

The `?|` & `?&` operators acts as `or` & `and`. For example, using `?|` tests whether `rating` or `genre` exists as top-level keys. Running that second query returns both films, because both `rating` & `genre` exist as keys, & that's true for only *The Incredibles*.

<img src = "Demonstrating Existence Operators 2.png" width = "600" style = "margin:auto"/>

<img src = "Demonstrating Existence Operators 3.png" width = "600" style = "margin:auto"/>

---

# Analysing Earthquake Data

In this section, we'll analyse a collection of JSON data about earthquakes compiled by US Geological Survey, an agency of the US Departmenet of the Interior that monitors natural phenomenon including volcanoes, landslides, & water quality. The USGS uses a network of seismographs that record the earth's vibrations, compiling data on each seismic event's location & intensity. Minor earthquakes occur around the world many times a day; the big ones are less frequent but potentially devastating.

We have a month's worth of JSON-formatted earthquake data from a USGS *application programming interface* or API. An API is a resource for transmitting data & commands between computers, & JSON is often used for APIs. You'll find the data file *earthquakes.json* along with this lesson's resources.

## Exploring & Loading the Earthquake Data

The JSON below shows the data structures for each earthquake record in the file, along with a selection of its key/value pairs:

```
{
 "type": "Feature",
 "properties": {"mag": 1.44,
                "place": "134 km W of Adak, Alaska",
                "time": 1612051063470,
                ...
                "type": "earthquake",
                "title": "M 1.4 - 1.34 km W of Adak, Alaska"},
 "geometry": {"type": "Point",
              "coordinates": [-178.581, 51.8418333333333, 22.48]},
 "id": "av91018173"
}
```

This data is in *GeoJSON* format, a JSON-based specification for spatial data. GeoJSON will include one or more `Feature` objects, denoted by inclusion of the key/value pair `"type": "Feature"`. Each `Feature` describes a single spatial object & contains both descriptive attributes (such as event time or related codes) under `properties` plus a `geometry` key that includes the coordinates of the spatial object. In our data, each `geometry` is a Point, a simple feature with the coordinates of one earthquake's longitude, latitude, & depth in kilometers. 

Let's load our data into a table called `earthquakes` using the code below:

```
CREATE TABLE earthquakes (
    id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    earthquake jsonb NOT NULL
);

COPY earthquakes (earthquake)
FROM '/YourDirectory/earthquakes.json';

CREATE INDEX idx_earthquakes ON earthquakes USING GIN (earthquake);
```

As with our `films` table, we use `COPY` to copy the data into a single `jsonb` column & add a GIN index. Running `SELECT * FROM earthquakes;` should return 12,899 rows.

<img src = "Creating & Loading the earthquakes Table.png" width = "600" style = "margin:auto"/>

## Working with Earthquake Times

The `time` key/value pair represents the moment the earthquake occurred. In the query below, we retrieve the value of `time` using a path extraction operator.

```
SELECT id,
       earthquake #>> '{properties, time}' AS time
FROM earthquakes
ORDER BY id
LIMIT 5;
```

In the `SELECT` list, we give the `earthquake` column followed by a `#>>` path extraction operator & the path to the time value denoted as an array. The `#>>` operator will return our value as text. Running the query should return five rows:

<img src = "Retrieving the Earthquake Time.png" width = "600" style = "margin:auto"/>

If those values don't look like times to you, that's not surprising. By default, the USGS represents time as milliseconds since the Unix epoch at 00:00 UTC on January 1, 1970. That's a variant of the standard epoch time, which measures seconds since the epoch. We can convert this USGS `time` value to something understandable using `to_timestamp()` & a little math, as shown below:

```
SELECT id,
       earthquake #>> '{properties, time}' AS time,
       to_timestamp((earthquake #>> '{properties,
           time}')::bigint / 1000) AS time_formatted
FROM earthquakes
ORDER BY id LIMIT 5;
```

Inside the parentheses of the `to_timestamp()` function, we repeat the code to extract the `time` value. The `to_timestamp()` function requires a number representing seconds, but the extracted value is text & in milliseconds, so we also case the extracted text to `bigint` & divide by 1,000 to convert it to seconds.

The query generates the following results showing the extracted `time` value & its converted timestamp (your values will vary depending on your PostgreSQL server's time zone, so `time_formatted` will show when the earthquake occurred in your server's time zone time):

<img src = "Converting the time Value to a Timestamp.png" width = "600" style = "margin:auto"/>

Now that we have an understandable timestamp, let's find the oldest & newest earthquake times using the `min()` & `max()` aggregate functions in the query below:

```
SELECT min(to_timestamp((earthquake #>> '{properties,
           time}')::bigint / 1000)) AT TIME ZONE 'UTC'
           AS min_timestamp,
       max(to_timestamp((earthquake #>> '{properties,
           time}')::bigint / 1000)) AT TIME ZONE 'UTC'
           AS max_timestamp
FROM earthquakes;
```

We place `to_timestamp()` & our milliseconds-to-seconds conversion inside both the `min()` & `max()` functions in our `SELECT` list. This time, we add the keywords `AT TIME ZONE 'UTC'` after both functions; regardless of our server time zone settings, the results will display the timestamps in UTC, as USGS records them. Our results should look like this:

<img src = "Finding the Minimum & Maximum Earthquake Times.png" width = "600" style = "margin:auto"/>

This collection of earthquakes spans a month -- from early morning January 1, 2021, through the end of day on January 31. That's helpful context as we continue to dig for usable information.

## Finding the Largest & Most-Reported Earthquakes

Next, we'll look at two data points that measure an earthquake's size & the degree to which citizen reported feeling it & apply JSON extraction techniques to simple sorting of results.

### Extracting by Magnitude

The USGS reports each earthquake's magnitude in the `mag` key, beneath `properties`. Magnitude, according to the USGS, is a number of representing the size of an earthquake at its source. Its scale is logarithmic: a magnitude 4 earthquake has seismic waves whose amplitude is about 10 times bigger than a quake with a magnitude of 3. With that context, let's find the five largest earthquakes in our data using the code in the query below:

```
SELECT earthquake #>> '{properties, place}' AS place,
       to_timestamp((earthquake #>> '{properties,
           time}')::bigint / 1000) AT TIME ZONE 'UTC'
           AS time,
       (earthquake #>> '{properties, mag}')::numeric
           AS magnitude
FROM earthquakes
ORDER BY (earthquake #>> '{properties,
             mag}')::numeric DESC NULLS LAST
LIMIT 5;
```

We again use path extraction operators to retrieve our desired elements, including values for `place` & `mag`. To show the largest five in our results, we add an `ORDER BY` clause with `mag`. We cast the value to numeric & in the `SELECT` because we want to display & sort the value as a number rather than as text. We also add the `DESC NULLS LAST` keywords, which sorts the results in descending order & places `NULL` values (of which there are two) last. Your results should look like this:

<img src = "Finding Earthquakes with Largest Magnitudes.png" width = "600" style = "margin:auto"/>

The largest, of magnitude 7, was located beneath the ocean southeast of the small city of Pondaguitan in the Philippines. The second was in the Antartic near the South Shetland Islands.

### Extracting by Citizen Reports

Our JSON includes the number of reports for each earthquake under the key `felt`, beneath `properties`. Lets see which earthquakes in our data generate the most reports using the query below:

```
SELECT earthquake #>> '{properties, place}' AS place,
       to_timestamp((earthquake #>> '{properties,
           time}')::bigint / 1000) AT TIME ZONE 'UTC'
           AS time,
       (earthquake #>> '{properties, mag}')::numeric
           AS magnitude
       (earthquake #>> '{properties, felt}')::integer
           AS felt
FROM earthquakes
ORDER BY (earthquake #>> '{properties,
             felt}')::integer DESC NULLS LAST
LIMIT 5;
```

Structurally, this query is similar to the query that found the largest quakes. We add a path extraction operator for the `felt` key, casting the returned text value to an `integer` type. We cast to `integer` so the extracted text is treated as a number for sorting & display. Finally, we palce the extraction code in `ORDER BY`, using `NULLS LAST` because there are many earthquakes with no reports & we want those to appear last in the list. You should see these results:

<img src = "Finding Earthquakes with the Most Reports.png" width = "600" style = "margin:auto"/>

The top five are in California, which makes sense. California is earthquake-prone. Also, some ofthe largest quakes in our data occurred beneath oceans or in remote regions. The quake with more than 19,900 reports was moderate, but its nearness to cities meant more chance for people to notice it.

## Converting Earthquake JSON to Spatial Data

Our JSON data has longitude & latitude values for each earthquake, meaning we can perform spatial analysis. For example, we'll use a PostGIS distance function to locate earthquakes that occurred within 50 miles from a city. First though, we must convert the coordinates stored in JSON to a PostGIS data type.

The longitude & latitude values are found in the array of the `coordinates` key, under `geometry`. Here's an example:

```
"geometry": {"type": "Point",
             "coordinates": [-178.581, 51.8418333333333, 22.48]}
```

The first coordinate, at position `0` in the array, represents longitude; the second, at position `1`, is latitude. The third value denotes depth in kilometers, which we won't use. To extract these elements as text, we make use of a `#>>` path operator:

```
SELECT id,
       earthquake #>> '{geometry, coordinates}'
           AS coordinates,
       earthquake #>> '{geometry, coordinates, 0}'
           AS longitude,
       earthquake #>> '{geometry, coordinates, 1}'
           AS latitude
FROM earthquakes
ORDER BY id
LIMIT 5;
```

The query should return 5 rows:

<img src = "Extracting the Earthquake's Location Data.png" width = "600" style = "margin:auto"/>

A quick visual comparison of our result to the JSON `longitude` & `latitude` values tells us we've extracted the values properly. Next, we'll use a PostGIS function to convert those values to a Point in the `geography` data type.

The query below generates a Point of type `geography` for each earthquake, which we can use as input for PostGIS spatial functions.

```
SELECT ST_SetSRID(ST_MakePoint((earthquake #>>
           '{geometry, coordinates, 0}')::numeric,
           (earthquake #>> '{geometry, coordinates,
           1}')::numeric), 4326):: geography
           AS earthquake_point
FROM earthquakes
ORDER BY id;
```

Inside `ST_MakePoint()`, we place our code to extract longitude & latitude, casting both values to type `numeric` as required by the function. We nest that function inside `ST_SetSRID()` to set a spatial reference system identifier (SRID) for the resulting Point. The SRID value `4326` denotes the commonly used WGS 84 coordinate system. Finally, we cast the entire output to the `geography` type. The first several rows should look like this:

<img src = "Converting JSON Location Data to PostGIS Geography.png" width = "600" style = "margin:auto"/>

We can't interpret those strings of digits & letters directly, but we can use pgAdmin's Geometry View to see the points plotted on a map. With your query results visible in the pgAdmin Data Output pane, click the map/brochure icon in the `earthquake_point` result header. You should see the earthquakes plotted on a map that uses OpenStreetMap as the base layer, as in the figure below:

<img src = "Viewing Earthquake Locations in pgAdmin.png" width = "600" style = "margin:auto"/>

Even with only a month of data, it's easy to see the abundance of earthquakes concentrated around the edges of the Pacific Ocean, in the so-called Ring of Fire where tectonic plates meet & volcanoes are more active.

### Finding Earthquakes Within a Distance

Next, we'll narrow our study to earthquakes that occurred near Tulsa, Oklahoma -- a part of the country that has seen increased seismic activity since 2009 as a result of oil & gas processing, according to the USGS.

To perform more complex GIS tasks like this, it's easier if we permanently convert the JSON coordinates to a column of PostGIS type `geography` in the `earthquakes` table. That allows us to avoid the clutter of adding conversion code in each query.

The query below adds a column called `earthquake_point` to the `earthquakes` table & fills the new column with the JSON coordinates converted to type `geography`.

```
ALTER TABLE earthquakes
ADD COLUMN earthquake_point geography(POINT, 4326);

UPDATE earthquakes
SET earthquake_point = ST_SetSRID(ST_MakePoint(
        (earthquake #>> '{geometry, coordinates,
        0}')::numeric, (earthquake #>> '{geometry,
        coordinates, 1}')::numeric), 4326)::geography;

CREATE INDEX quake_pt_idx ON earthquakes
USING GIST (earthquake_point);
```

We use `ALTER TABLE` to add a column `earthquake_point` of type `geography`, specifying that the column will hold Points with an SRID of `4326`. Next, we `UPDATE` the table, setting the `earthquake_point` column using the same syntax as the before query, & adding spatial index using GIST to the new column.

That done, we can use the query below to find earthquakes within 50 miles of Tulsa.

```
SELECT earthquake #>> '{properties, place}' AS place,
       to_timestamp((earthquake -> 'properties' ->>
           'time')::bigint / 1000) AT TIME ZONE 'UTC'
           AS time,
       (earthquake #>> '{properties, mag}')::numeric
           AS magnitude,
       earthquake_point
FROM earthquakes
WHERE ST_DWithin(earthquake_point, ST_GeogFromText(
          'POINT(-95.989505 36.155007)'), 80468)
ORDER BY time;
```

In the `WHERE` clause, we employe the `ST_DWithin()` function, which returns a Boolean value of `true` if one spatial object is within a specified distance of another object. Here, we want to evaluate each earthquake Point to check whether it's within 50 miles of downtown Tulsa. We designate the city's coordinates in `ST_GeogFromText()` & supply the value of 50 miles using its meters equivalent, `80468`, as meters is the required input. The query should return 19 rows:

<img src = "Finding Earthquakes Within 50 Miles of Downtown Tulsa, Oklahoma.png" width = "600" style = "margin:auto"/>

View the earthquake locations by clicking the map/brochure icon atop the `earthquake_point` column in the results in pgAdmin. We should see 19 dots around the city, as in the figure below:

<img src = "Viewing Earthquakes Near Tulsa, Oklahoma in pgAdmin.png" width = "600" style = "margin:auto"/>

Achieving these results required some coding gymnastics that would have been unnecessary if the data had arrived in a shapefile or in a typical SQL table. Nevertheless, it's possible to extract meaningful insights from JSON data using PostgreSQL's support for the format.

---

# Generating & Manipulating JSON

We can use PostgreSQL functions to create JSON from existing rows in a SQL table to modify JSON stored in a table to add, subtract, or change keys & values.

## Turning Query Results into JSON

Because JSON is primary a format for sharing data, it's useful to be able to quickly convert the results of a SQL query into JSON for delivery to another computer system. The query below uses the PostgreSQL-specific `to_json()` function to turn rows from the `employees` table into JSON.

```
SELECT to_json(employees) AS json_rows
FROM employees;
```

The `to_json()` function does what it says: transforms a supplied SQL value to JSON. To convert all values in each row of the `employees` table, we use `to_json()` in a `SELECT` & supply the table name as the function's argument; that returns each row as a JSON object with column names as keys:

<img src = "Turning Query Results into JSON with to_json().png" width = "600" style = "margin:auto"/>

We can modify our query a few ways to limit which columns to include in the results. In the query below, we use a `row()` constructor as the argument for `to_json()`.

```
SELECT to_json(row(emp_id, last_name))
           AS json_rows
FROM employees;
```

A `row()` constructor (which is ANSI SQL complaint) builds a row value from the arguments passed to it. In this case, we supply the column names `emp_id` & `last_name` & place `row()` inside `to_json()`. This syntax returns just those columns in the JSON result:

<img src = "Specifying Columns to Convert to JSON.png" width = "600" style = "margin:auto"/>

Notice, however, that the keys are named `f1` & `f2` instead of their source column names. That's a side effect of `row()`, which doesn't preserve column names when it builds the row record. We can set the names of the keys, which is often done to keep the names short & reduce JSON file size, improving transfer speeds. We can do this via a subquery.

```
SELECT to_json(employees) AS json_rows
FROM (SELECT emp_id, last_name AS ln
      FROM employees) AS employees;
```

We write a subquery that grabs the columns we want & alias the result as `employees`. In the process, we alias a column name to shorten its appearance as a key in the JSON. The result should look like this:

<img src = "Generating Key Names with a Subquery.png" width = "600" style = "margin:auto"/>

Finally, the query below shows how to compile all the rows of JSON into a single array of objects. You may want to do this if you've providing this data to another application that will iterate over the array of objects to perform a task, such as a calculate, or to render data on a device.

```
SELECT json_agg(to_json(employees)) AS json
FROM (SELECT emp_id, last_name AS ln
      FROM employees) AS employees;
```

We wrap `to_json()` in the PostgreSQL-specific `json_agg()` function, which aggregates values, including `NULL`, into a JSON array. Its output should look like this:

<img src = "Aggregating Rows & Converting to JSON.png" width = "600" style = "margin:auto"/>

These are simple examples, but you can build more complex JSON structures using subqueries to generate nested objects.

## Adding, Updating, & Deleting Keys & Values

We can add, update, & delete from JSON with a combination or contenation & PostgreSQL-specific funcions. Let's work through some examples.

### Adding or Updating a Top-Level Key/Value Pair

In the query below, we return to our `films` table & add a top-level key/value pair `"studio": "Pixar"` to the film *The Incredibles* using two different techniques:

```
UPDATE films
SET film = film || '{"studio": "Pixar"}'::jsonb
WHERE film @> '{"title": "The Incredibles"}'::jsonb;

UPDATE films
SET film = film || jsonb_build_object('studio', 'Pixar')
WHERE film @> '{"title": "The Incredibles"}'::jsonb;
```

Both examples use `UPDATE` statements to set new values for the `jsonb` column `film`. In the first, we use the PostgreSQL concatenation operator `||` to combine the existing film JSON with the new key value pair that we cast to `jsonb`. In the second, we use concatenation again but with `jsonb_build_object()`. This function takes a series of key & value names as arguments & returns a `jsonb` object, letting us concatenate several key value pairs at a time if we wanted.

Both statements will insert the new key value pair if the key doesn't exist in the JSON being concatenated; it will overwrite a key that's present. There's no functional difference between the two statements, so feel free to use whichever you prefer. Note that this behaviour is specific to `jsonb`, which doesn't allow duplicate key names.

If you run `SELECT * FROM films;` & double-click the updated data in the `film` column, you should see the new key value pair:

<img src = "Adding Top-Level Key Value Pair via Concatenation.png" width = "600" style = "margin:auto"/>

### Updating a Value at a Path

Currently, we have two entries for the `genre` key for *Cinema Paradiso*:

```
"genre": ["romance", "drama"]
```

To add a third entry to the array, we use the function `jsonb_set()`, which allows us to specify a value to update at a specific JSON path. In the query below, we use the `UPDATE` statement & `jsonb_set()` to add the genre `World War II`.

```
UPDATE films
SET film = jsonb_set(film, '{genre}',
        film #> '{genre}' || '["World War II"]', true)
WHERE film @> '{"title": "Cinema Paradiso"}'::jsonb;
```

In `UPDATE`, we `SET` the value of `film` to the result of `jsonb_set()` & use `WHERE` to limit the update to just the row with *Cinema Paradiso*. The function's first argument is the target JSON we want to modify, here `film`. The second argument is the path to the array value -- the `genre` key. Third, we give the new value for `genre`, which we specify as the current value of `genre` concatenated with an array with one value `"World War II"`. That concatenation will produce an array with three elements. The final argument is an optional Boolean value that dictates whether `jsonb_set()` should create the value if it's not already present. It's redundant here since `genre` already exists.

Run the query & then perform a quick `SELECT` to check the updated JSON. You should see the `genre` array including three values: `["romance", "drama", "World War II"]`.

<img src = "Adding an Array Value at a Path with jsonb_set().png" width = "600" style = "margin:auto"/>

### Deleting a Value

We can remove keys & values from a JSON object by pairing two operators. The query below shows two `UPDATE` examples.

```
UPDATE films
SET film = film - 'studio'
WHERE film @> '{"title": "The Incredibles"}'::jsonb;

UPDATE films
SET film = film #- '{genre, 2}'
WHERE film @> '{"title": "Cinema Paradiso"}'::jsonb;
```

The minus sign acts as a *deletion operator*, removing the key `studio` & its value, which we added earlier for *The Incredibles*. Supplying a text string after the minus sign indicates we want to remove a key & its value; supplying an integer will remove the element at that index.

The `#-` sign is a *path deletion operator* that removes the JSON element that exists at a path we specify. The syntax is similar to that of the path extraction operators `#>` & `#>>`. Here, we use `{genre, 2}` to indicate the third element of the array for `genre` (remember, JSON array indexes begin counting at zero). This will remove the value `World War II` that we added earlier to *Cinema Paradiso*.

Run both statements & the use `SELECT` to view the altered film JSON. You should see both elements removed.

---

# Using JSON Processing Functions

To finish our JSON studies, we'll reiview a selection of PostgreSQL-specific functions for processing JSON data, including expanding array values into table rows & formatting output. You can find a complete listing of functions in the [PostgreSQL documentation](https://www.postgresql.org/docs/current/functions-json.html).

## Finding the Length of an Array

Counting the number of items in an array is a routine programming & analysis task. We might, for example, want to know how many actors are stored for each film in our JSON. To do this, we can use the `jsonb_array_length()` function in the query below:

```
SELECT id,
       film ->> 'title' AS title,
       jsonb_array_length(film -> 'characters')
           AS num_characters
FROM films
ORDER BY id;
```

As its only argument, the function takes an expression that extracts the value of the `character` key from `film`. Running the query should produce these results.

<img src = "Finding the Length of an Array.png" width = "600" style = "margin:auto"/>

The output correctly shows we have three characters for *The Incredibles* & two for *Cinema Paradiso*. Note there's a similar `json_array_length()` function for the `json` type.

## Returning Array Elements as Rows

The `jsonb_array_elements()` & `jsonb_array_elements_text()` functions convert array elements into rows, with one row per element. This is a useful tool for data processing. To convert JSON into structured SQL data, for example, we could use this function to generate the rows to `INSERT` into a table or to generate rows that we can aggregate by grouping & counting.

The query below uses both functions to turn the `genre` key's array values into rows. Each function takes a `jsonb` array as an argument. The difference between the two is that `jsonb_array_elements()` returns the array elements as rows of `jsonb` values, while `jsonb_array_elements_text()` returns elements as, you guessed it, `text`.

```
SELECT id,
       jsonb_array_elements(film -> 'genre')
           AS genre_jsonb,
       jsonb_array_elements_text(film -> 'genre')
           AS genre_text
FROM films
ORDER BY id;
```

Running the query should produce these results:

<img src = "Returning Array Elements as Rows.png" width = "600" style = "margin:auto"/>

On an array with a simple list of values, that works nicely, but if an array contains a collection of JSON objects with their own key value pairs, like `character` in our `film` JSON, we need additional processing to unpack the values first. The query below walks through the process.

```
SELECT id,
       jsonb_array_elements(film -> 'characters')
FROM films
ORDER BY id;

WITH characters (id, json)
AS (
    SELECT id,
           jsonb_array_elements(film -> 'characters')
    FROM films
)
SELECT id,
       json ->> 'name' AS name,
       json ->> 'actor' AS actor
FROM characters
ORDER BY id;
```

We use `jsonb_array_elements()` to return the elements of the `characters` array, which should return each JSON object in the array as a row:

<img src = "Returning Key Values From Each Item in an Array 1.png" width = "600" style = "margin:auto"/>

To convert the `name` & `actor` values to columns, we employ a common table expression (CTE). Our CTE uses `jsonb_array_elements()` to generate a simple temporary `characters` table with two columns: the film's `id` & the unpacked array values in a column called `json`. We follow with a `SELECT` statement that queries the temporary table, extracting the values of `name` & `actor` from the `json` column:

<img src = "Returning Key Values From Each Item in an Array 2.png" width = "600" style = "margin:auto"/>

Those values are neatly parsed into a standard SQL structure & suitable for further analysis using standard SQL.

---

# Wrapping Up

JSON is a ubiquitous format that it's likely you'll encounter it often in your journey analysing data. Now you can handle loading, indexing, & parsing JSON, but JSON sometimes requires extra steps to process that aren't needed with data handled via standard SQL conventions. As with many areas of coding, your decision on whether to make sure of JSON will depend on your specific circumstances. Now, you're equipped to understand the context.