# Reviews (load)

Your business team is keen to understand how SnowBearAir is perceived by customers.  Our customers are our lifeblood, so understanding what areas of trouble or discontent exist is crucial to ensuring SnowBearAir remains at the top of industry polling, and customer satisfaction surveys.

![](../assets/gateclerk_bear.png)

Luckily, some airline reviews have been made available, [freely via CC0, on the web](https://www.kaggle.com/efehandanisman/skytrax-airline-reviews)

Your job in this lab is to:

- [ ] Upload the CSV file to a Snowflake internal stage
- [ ] Create a StructType / UserSchema for reading the CSV file
- [ ] Build a DataFrame that can read the data from the file
- [ ] Load the entire file to the ALL_REVIEWS table

![](../assets/reviews_load.gif)


In [None]:
import com.snowflake.snowpark._
import com.snowflake.snowpark.functions._
import com.snowflake.snowpark.types._

In [None]:
// Set connection properties built in de_snowpark/A-Dataframes/01-Sessions.ipynb
val pwd = sys.env.get("PWD").fold("")(_.toString)
val filename = s"$pwd/de_snowpark/connect.properties"

val session = Session.builder.configFile(s"$filename").create

## Put reviews.csv File

In the following we use the file convenience method of Session to return a FileOperation object, which has access to `PUT`/`GET` data from Snowflake internal stages.  Notice that as part of the call `file.put()` we are also able to set options.  In our case below, we're instructing the `PUT` command to upload this file, without automatically GZIPing it by setting the AUTO_COMPRESS to FALSE.  See [PUT command](https://docs.snowflake.com/en/sql-reference/sql/put.html) for the full list of options.

In [None]:
session.file.put("./reviews.csv.gz", "@~", Map("OVERWRITE"->"TRUE", "AUTO_COMPRESS" -> "FALSE"))

<div class="alert alert-block alert-info">
<i class="fas fa-question fa-2x"></i>
    <b>Question:</b> What type of stage was <mark>reviews.csv.gz</mark> uploaded into?
</div>


### Progress: Check

- [X] Upload the CSV file to a Snowflake internal stage
- [ ] Create a StructType / UserSchema for reading the CSV file
- [ ] Build a DataFrame that can read the data from the file
- [ ] Load the entire file to the ALL_REVIEWS table

## Create a Schema that Maps CSV to SQL

CSV files aren't strongly typed; they are a string of characters that can be converted to native SQL datatypes.  The name, position and datatype all need to be described to Snowflake for use in a DataFrame.  Here we are creating a Schema that indicates the position, column name, and Snowflake datatype of these columns. Keep in mind that this Snowpark concept of a Schema differs from a Database Schema in Snowflake.

In [None]:
val userSchema = StructType(Seq(
    StructField("AIRLINE", StringType)
    , StructField("OVERALL", ShortType)
    , StructField("AUTHOR", StringType)
    , StructField("REVIEW_DATE", StringType)
    , StructField("CUSTOMER_REVIEW", StringType)
    , StructField("AIRCRAFT", StringType)
    , StructField("TRAVELLER_TYPE", StringType)
    , StructField("CABIN", StringType)
    , StructField("ROUTE", StringType)
    , StructField("DATE_FLOWN", StringType)
    , StructField("SEAT_COMFORT", StringType)
    , StructField("CABIN_SERVICE", StringType)
    , StructField("FOOD_BEV", StringType)
    , StructField("ENTERTAINMENT", StringType)
    , StructField("GROUND_SERVICE", StringType)
    , StructField("VALUE_FOR_MONEY", StringType)
    , StructField("RECOMMENDED", StringType)    
))

### Progress: Check

- [X] Upload the CSV file to a Snowflake internal stage
- [X] Create a StructType / UserSchema for reading the CSV file
- [ ] Build a DataFrame that can read the data from the file
- [ ] Load the entire file to the ALL_REVIEWS table

## Build a DataFrame to Read the File Data

The Session object has a `read` method that can be used to access data in a file in a Snowflake stage.

In [None]:
val reviewsDF = session
    .read
        .option("field_optionally_enclosed_by", "'\"'")
        .option("skip_header", 1)
    .schema(userSchema)       // We pass in the schema we created above                      
    .csv("@~/reviews.csv.gz") // Read the reviews file from the STAGE

<div class="alert alert-block alert-info">
<i class="fas fa-question fa-2x"></i>
    <b>Question:</b> Note the use of .option listed above (<mark>field_optionally_enclosed_by</mark> and <mark>skip_header</mark>)... What are these options exactly, and where are they documented in Snowflake?  Which SQL command will Snowpark build that will use these option definitions?
</div>

### Progress: Check

- [X] Upload the CSV file to a Snowflake internal stage
- [X] Create a StructType / UserSchema for reading the CSV file
- [X] Build a DataFrame that can read the data from the file
- [ ] Load the entire file to the ALL_REVIEWS table

## Load `RAW.ALL_REVIEWS`

Using the DataFrame above, we can now access and see values in the reviews.csv.gz file.  Let's load a table with the contents of the file.

In [None]:
reviewsDF.write.saveAsTable("raw.ALL_REVIEWS")

<div class="alert alert-block alert-info">
<i class="fas fa-question fa-2x"></i>
    <b>Question:</b> What command was executed above to perform the loading of the table?  An UPDATE/COPY/INSERT?
</div>

### Progress: Check

- [X] Upload the CSV file to a Snowflake internal stage
- [X] Create a StructType / UserSchema for reading the CSV file
- [X] Build a DataFrame that can read the data from the file
- [X] Load the entire file to the ALL_REVIEWS table

## You Try It

You have two choices to review and play around with the data that was just loaded.

Either:

1. Head over to https://app.snowflake.com/ and login to the class Snowflake account using your animal name and password. Create a worksheet, and run the following commands.

```sql
// use [login]_db; -- use your default DB
use schema raw;
select * from raw.all_reviews limit 100;
```

2. Create a DataFrame object below and display a subset of the records.

In [None]:
// hint, start with .table
// then use .show(100)