## Executing queries with sqlalchemy and pandas

To work with data stored in Postgres tables, you'll use sqlalchemy's create_engine(), and pandas's read_sql() functions. To get the hang of these tools, you'll practice connecting to a Postgres database, and executing a query. Good luck!

### Instructions
    - Update the connection URI to create a connection to the disneyland database, over port 5432.
    - Use pandas to read the results of the provided SQL query into a DataFrame, using the connection object created in the previous step.

In [None]:
import pandas as pd
import sqlalchemy

# Create a connection to the reviews database
db_engine = sqlalchemy.create_engine("postgresql+psycopg2://repl:password@localhost:5432/disneyland")

# Execute a query against the nested_reviews table
results = pd.read_sql("SELECT * FROM nested_reviews;", db_engine)
print(results)

## Querying JSON and JSONB data from Postgres

With Postgres' built-in JSON and JSONB data types, it's easy to store and interact with semi-structured data in a Postgres table. In this exercise, you'll observe some of the tooling that Postgres offers to query data of type JSON from the nested_reviews table. Best of luck!

### Instructions
    - Create a connection to the disneyland database with user repl, using sqlalchemy.
    - Execute the query stored in the query variable, using the previously-defined db_engine.
    - Output the review column of the results DataFrame, and observe the data that was returned.

In [None]:
import pandas as pd
import sqlalchemy

# Create a connection to the reviews database
db_engine = sqlalchemy.create_engine("postgresql+psycopg2://repl:password@localhost:5432/disneyland")

query = """SELECT * FROM nested_reviews;"""

# Execute the query, check out the results
results = pd.read_sql(query, db_engine)

# Print the review column from the results DataFrame
print(results["review"])

## Converting tabular data to JSON

Sometimes, data is loaded to a Postgres table using INSERT INTO, or COPY ... FROM commands. Other times, it's generated from an existing table or set of columns. In this exercise, you'll explore some of Postgres' built-in tooling to create a JSON object.

To help get you started, pandas has been imported as pd, and a connection object has been created and stored in the variable db_engine. Good luck!

### Instructions
    - Use the row_to_json function to convert the review_id, rating, and year_month columns to a single column of type JSON.
    - Execute the query, and print the first ten rows of the resulting DataFrame. Inspect the table to confirm the row_to_json function worked as expected.

In [None]:
# Build a query to create a JSON-object
query = """
SELECT
	row_to_json(row(review_id, rating, year_month))
FROM reviews;
"""

# Execute the query, and output the results
results = pd.read_sql(query, db_engine)
print(results.head(10))

## Extracting keys from JSON objects with Postgres

When getting your feet wet with semi-structured data, you'll often be curious about the different keys that exist in a set of JSON objects. In this exercise, you'll practice doing just this with Postgres' built-in JSON functionality.

Like before, pandas has been imported as pd, as well as a connection object, which is available via the variable db_engine. Go get 'em!

### Instructions
    - Write a query to create a result set containing the unique set of keys in the JSON objects stored in the review column of the nested_reviews table.
    - Store the result set in a variable with name unique_keys, and output the results. Validated that there are two keys in this DataFrame.

In [None]:
# Build a query to find the unique keys in the review column
query = """
SELECT
	DISTINCT json_object_keys(review)
FROM nested_reviews;
"""

# Execute the query, show the results
results = pd.read_sql(query, db_engine)
print(results)

## Querying top-level JSON data

With Postgres JSON, querying semi-structured data is a breeze! Postgres provides built-in operators, including -> and ->>. In this example, you'll practice using these operators to query review data from a column of type JSON. This table takes the form below:

<br/>![](../../imgs/Chapter-3_3-Exercises-2.png)<br/>

To give you a head-start, pandas has been imported as pd, and a connection object has been created and stored in the db_engine variable. Have fun!

### Instructions
    - Use the -> operator to query the location field from the review column in the nested_reviews table, as JSON.
    - Query the statement field as text from the review column in the nested_reviews table.
    - Execute the query using pandas, and print the result.

In [None]:
# Build the query to select the review_id and rating fields
query = """
	SELECT 
    	review -> 'location' AS location, 
        review -> 'statement' AS statement 
    FROM nested_reviews;
"""

# Execute the query, render results
data = pd.read_sql(query, db_engine)
print(data)

## Finding the type of JSON data

Sometimes, you may be tasked to work with a semi-structured dataset with little documentation. When this is the case, you may have to do a certain amount of discovery around data schema and types. To help with this, Postgres offers the json_typeof function, which you'll explore more of in this exercise.

A connection to the disneyland database has been created and is available in the db_engine variable. pandas has been imported as pd, and is ready to use. Happy querying!

### Instructions
    - Extract the type of the location field from the review column in the nested_reviews table, aliasing the result as location_type.
    - Execute the query using pandas, and print the result set.

In [None]:
# Find the data type of the location field
query = """
SELECT
    json_typeof(review -> 'location') AS location_type
FROM nested_reviews;
"""

# Execute the query, render results
data = pd.read_sql(query, db_engine)
print(data)

## Working with nested JSON objects

Often, semi-structured data includes nested working with nested objects. In this exercise, you'll practice using the -> and ->> operators to query nested data from the nested_reviews table. As a reminder, this table takes the following form:

<br/>![](../../imgs/Chapter-3_3-Exercises-3.png)<br/>

Similar to before, pandas has been imported as pd, and a connection object has been created and stored in the db_engine variable. Best of luck!

### Instructions 1/3
    - Query the object stored at location field, returning the value as JSON.

In [None]:
# Build the query to select the object stored at the 
# location key
query = """
SELECT 
	review -> 'location'
FROM nested_reviews;
"""

# Execute the query, render results
data = pd.read_sql(query, db_engine)
print(data)

### Instructions 2/3
    - Update the query to pull the nested branch field from the object stored in the location field, as text. Alias the column name as branch.

In [None]:
# Build the query to select the nested branch field
query = """
SELECT 
	review -> 'location' ->> 'branch' AS branch
FROM nested_reviews;
"""

# Execute the query, render results
data = pd.read_sql(query, db_engine)
print(data)

### Instructions 3/3
    - Add an additional line to the query to query the nested reviewer field, returning the result as text. Alias the column name as reviewer.

In [None]:
# Update the query to select the nested reviewer field
query = """
SELECT 
	review -> 'location' ->> 'branch' AS branch,
    review -> 'location' ->> 'reviewer' AS reviewer
FROM nested_reviews;
"""

# Execute the query, render results
data = pd.read_sql(query, db_engine)
print(data)

## Filtering document databases with Postgres JSON

Using Postgres JSON, data stored in documents can be queried and filtered using the -> and ->> operators. To practice, you'll filter reviews using Postgres JSON. Similar to before, the nested_reviews table takes the form below, and a sqlalchemy connection object has been configured, and made available for you via the db_engine variable. pandas has also been loaded as pd.

<br/>![](../../imgs/Chapter-3_3-Exercises-3.png)<br/>

### Instructions
    - Use Postgres JSON to retrieve the value stored at the statement key in the review column, for each record in the nested_reviews table.
    - Only return results with a branch nested in the location object of the review column equal to 'Disneyland_California'.

In [None]:
# Build the query to select the rid and rating fields
query = """
SELECT
	review ->> 'statement' AS customer_review 
FROM nested_reviews 
WHERE review -> 'location' ->> 'branch' = 'Disneyland_California';
"""

# Execute the query, render results
data = pd.read_sql(query, db_engine)
print(data)

## #> and #>>

Previously, to query nested document data with Postgres JSON, you had chained the -> and ->> operators together. However, when working with deeply nested data, these statements could become long and difficult to read and troubleshoot. To remedy this, Postgres offers the #> and #>> operators. In this example, you'll practice using these operators by querying the nested_reviews table, which takes the form below:

<br/>![](../../imgs/Chapter-3_3-Exercises-3.png)<br/>

pandas has been imported as pd, and a connection object has been created and stored in the variable db_engine. Best of luck!

### Instructions
    - Use the json_typeof() function and the #> operator function to find the data type of the value stored in the statement key of the review column in the nested_reviews table.
    - Query the branch field, which is nested in the locations object, from the review column, as text. Alias the field as branch.
    - Try to return the zipcode field nested in the location object, as text, aliasing the field as zipcode.

In [None]:
# Attempt to query the statement, nested branch, and nested
# zipcode fields from the review column
query = """
	SELECT 
    	json_typeof(review #> '{statement}'),
        review #>> '{location, branch}' AS branch,
        review #> '{location, zipcode}' AS zipcode
    FROM nested_reviews;
"""

# Execute the query, render results
data = pd.read_sql(query, db_engine)
print(data)

## Extracting document data

In this exercise, you'll practice using the json_extract_path and json_extract_path_text functions to query the review column of the nested_reviews table, which is shown below.

<br/>![](../../imgs/Chapter-3_3-Exercises-3.png)<br/>

A connection object has been created and stored in the variable db_engine, and pandas has been imported as pd. Best of luck!

### Instructions
    - Query the value stored in the statement field in the review column of the nested_reviews table, using the json_extract_path function.
    - Query the nested reviewer field, using the json_extract_path_text function.
    - Refine your query to include only those records where the branch information, extracted as text from the JSON data, matches 'Disneyland_California'. Use the appropriate function to parse through the JSON structure and isolate this particular field to be filtered.

In [None]:
# Return the statement and reviewer fields, filter by the 
# nested branch field
query = """
    SELECT 
        json_extract_path(review, 'statement'),
        json_extract_path_text(review, 'location', 'reviewer')
    FROM nested_reviews
    WHERE json_extract_path_text(review, 'location', 'branch') = 'Disneyland_California';
"""

data = pd.read_sql(query, db_engine)
print(data)

## Manipulating document data

Throughout this chapter, you've explored a number of tools to work with semi-structured document data in Postgres. In this final exercise, you'll put all of these tools to work to create an analytics-ready dataset. You'll be working with the nested_reviews table, which takes the form shown below.

<br/>![](../../imgs/Chapter-3_3-Exercises-3.png)<br/>

To help get you started, pandas has been imported as pd, and a connection object has been created and stored in the variable db_engine. Best of luck!

### Instructions
    - Using the #> operator to return the nested branch field from the location object in the review column, as JSON. Alias as branch.
    - Query the statement field in the review column, using the ->> operator, aliasing the result as statement.
    - Filter results to only include records with a reviewer location of 'Australia', with the help of the json_extract_path_text function.

In [None]:
# Extract fields from JSON, and filter by reviewer location
query = """
    SELECT
    	review_id,
        review #> '{location, branch}' AS branch,
        review ->> 'statement' AS statement,
        rating
    FROM nested_reviews
    WHERE json_extract_path_text(review, 'location', 'reviewer') = 'Australia'
    ORDER BY rating DESC;
"""

data = pd.read_sql(query, db_engine)
print(data)