# Snowpark Transforming JSON Data

In this lab you will perform the following:

- [ ] Upload a JSON file to a Snowflake internal stage
- [ ] Build a DataFrame that can read the data from the JSON file
- [ ] Transform a JSON array using the flatten function
- [ ] Load the transformed DataFrame to a table

---

## Create a Session

Create a Snowpark Session by passing in the connection properties file created in the [first lab exercise](../A-Dataframes/01-Sessions.ipynb).

In [None]:
import com.snowflake.snowpark._
import com.snowflake.snowpark.functions._
import com.snowflake.snowpark.types._

// Set connection properties file variable
val pwd = sys.env.get("PWD").fold("")(_.toString)
val filename = s"$pwd/de_snowpark/connect.properties"

val session = Session.builder.configFile(s"$filename").create

---
## Put nested.json file

In this section you will use the `file` convenience method of the Session object, to return a FileOperation object, which has access to `PUT`/`GET` data from Snowflake internal stages. 

Before calling `file.put()`, let's create a set of options to override the default values. 
  
In our case below, we want the `PUT` command to upload a file without automatically GZIPing it by setting the AUTO_COMPRESS to FALSE, and to OVERWRITE the file if it already exists.

Run the `PUT` to upload the local `nested.json` file specified by `localFileName` to your user stage `@~` specified as the `stageLocation` along with the specified options in `putOptions`.

See [PUT command](https://docs.snowflake.com/en/sql-reference/sql/put.html) for full list of options.

In [None]:
val putOptions = Map("AUTO_COMPRESS" -> "FALSE", "OVERWRITE"->"TRUE")
val localFileName = "./nested.json"
val stageLocation = "@~"

session.file.put(localFileName, stageLocation, putOptions)

---
### Progress: Check

- [X] Upload a JSON file to a Snowflake internal stage
- [ ] Build a DataFrame that can read the data from the JSON file
- [ ] Transform a JSON array using the flatten function
- [ ] Load the transformed DataFrame to a table

---

## Build a DataFrame to Read the File Data

The Session object has a `read` method that can be used to load data in various supported formats, with definition of format-specific options, from a Snowflake stage to a DataFrame. 

Run the `json` function to upload JSON files in the stage specified by `jsonFilePath` and set the JSON file format option with the `options` method, to strip any outer array specified in the `readOptions` configuration passed in.


In [None]:
val jsonFilePath = "@~/nested.json"
val dfRawJson = session.read.json(jsonFilePath)

dfRawJson.show
dfRawJson.count

Use the `sqlExpr` function to construct a SQL query using dot notation to traverse a path in a JSON object, and transform the column types and order.

In [None]:
val df = dfRawJson.select(sqlExpr("$1:aircraft_type::string as aircraft_type, $1:engine_type as engine_type, $1:aircraft as aircraft" ))
df.show

Examine the output of the `show` method. We can now access and see values from the array in the `nested.json` file from a DataFrame. 

---
### Progress: Check

- [X] Upload a JSON file to a Snowflake internal stage
- [X] Build a DataFrame that can read the data from the JSON file
- [ ] Transform a JSON array using the flatten function
- [ ] Load the transformed DataFrame to a table

---

## Transform a JSON array using the Flatten Function

The flatten function (explodes) compound values like arrays into multiple rows (similar to the SQL FLATTEN function).

The flatten method adds the following columns to the returned DataFrame:

* SEQ
* KEY
* PATH
* INDEX
* VALUE
* THIS



In [None]:
val flattened = df.flatten(df("aircraft"))
flattened.show

Use the `sqlExpr` function to construct a SQL query using dot notation to traverse a path in a JSON object.

In [None]:
val dx = flattened.select( df("aircraft_type"), 
                          df("engine_type"), 
                          sqlExpr("value:engine_model , value:manufacturer_name, value:manufacturer_year, value:model_name, value:number_seats") )

dx.show


---
### Progress: Check

- [X] Upload a JSON file to a Snowflake internal stage
- [X] Build a DataFrame that can read the data from the JSON file
- [X] Transform a JSON array using the flatten function
- [ ] Load the transformed DataFrame to a table
---

## Load to a table

Using the DataFrame above, let's load a table with the contents of the file.

Also, use the `cast` and `as` functions to define the column types and rename the DataFrame columns to valid column names.
    

In [None]:
dx.select(col("aircraft_type").cast(StringType).as("aircraft_type"), 
         col("engine_type").cast(StringType).as("engine_type"),
         col("VALUE:ENGINE_MODEL").cast(StringType).as("engine_model"),
         col("VALUE:MANUFACTURER_YEAR").cast(IntegerType).as("manufacturer_year"),
         col("VALUE:MODEL_NAME").cast(StringType).as("model_name"),
         col("VALUE:NUMBER_SEATS").cast(IntegerType).as("number_seats"))
 .write.saveAsTable("raw.NESTED_JSON")

var dxCount = session.table("raw.NESTED_JSON").count()
println(s"The table raw.NESTED_JSON has $dxCount rows.")

---
### Progress: Check

- [X] Upload a JSON file to a Snowflake internal stage
- [X] Build a DataFrame that can read the data from the JSON file
- [X] Transform a JSON array using the flatten function
- [X] Load the transformed DataFrame to a table
---