# Reviews Refine

In this lab section, we will be cleaning up the reviews we loaded into the `RAW.ALL_REVIEWS` table, to produce a conformed model. We will get the dates cleaned up and select a handful of columns for display. 

![](../assets/reviews_refine.gif)

In this lab we will:

- [ ] Build a DataFrame using a `sqlExpr` to clean and cast the date
- [ ] Review (evaluate) the DataFrame to validate our work
- [ ] Create a view using this DataFrame method `createOrReplaceView()`

In [None]:
import com.snowflake.snowpark._
import com.snowflake.snowpark.functions._
import com.snowflake.snowpark.types._

In [None]:
// Set connection properties built in de_snowpark/A-Dataframes/01-Sessions.ipynb
val pwd = sys.env.get("PWD").fold("")(_.toString)
val filename = s"$pwd/de_snowpark/connect.properties"

val session = Session.builder.configFile(s"$filename").create

// Set session to use the CONFORMED schema
session.sql("use schema CONFORMED").collect

## Build DataFrame

In the following we will generate a DataFrame from our loaded `RAW.ALL_REVIEWS` table, and then build a `sqlExpr` to augment our DataFrame.  While the list of [functions available in Snowpark](https://docs.snowflake.com/en/developer-guide/snowpark/reference/scala/com/snowflake/snowpark/functions$.html) is comprehensive, you can also use SQL expressions directly.  

Here we move from the programmatic DataFrame syntax and intersperse a snippet of raw SQL to read and remove the ordinal English pronunciations (aka 3rd) from the dates in our DataFrame:

```sql
to_date(
 replace(replace(replace(replace(replace(
   review_date, 'th'), 'st'), 'nd'), 'rd'), 'Augu', 'August')
 , 'DD MON YYYY')
```


In [None]:
val conformedDF = session
    .table("RAW.ALL_REVIEWS")
    .withColumn("REVIEW_DATE_TYPED", sqlExpr("to_date(replace(replace(replace(replace(replace(review_date, 'th'), 'st'), 'nd'), 'rd'), 'Augu', 'August'), 'DD MON YYYY')"))
    .select(col("AIRLINE"), col("OVERALL"), col("REVIEW_DATE_TYPED"), col("TRAVELLER_TYPE"), col("CUSTOMER_REVIEW"))

<div class="alert alert-block alert-info">
<i class="fas fa-question fa-2x"></i>
    <b>Question:</b> The SQL expression has a column named <mark>review_date</mark>. What table did that come from, and what was its original form?  HINT: You can go into Snowflake and issue a SQL query on ALL_REVIEWS to explore.
</div>

### Progress Check
- [X] Build a DataFrame using a `sqlExpr` to clean and cast the date
- [ ] Review (evaluate) the DataFrame to validate our work
- [ ] Create a view using this DataFrame method `createOrReplaceView()`

## Evaluate and `show` the Results

In [None]:
conformedDF.show(10)

<div class="alert alert-block alert-warning">
<i class="fas fa-search fa-2x"></i>
<b>SQL Sleuth</b>: Did you spy your raw SQL expression?  Copy and paste (and format) this into Snowflake and run this query.... Do you see what Snowpark is generating on your behalf?
</div>

### Progress Check
- [X] Build a DataFrame using a `sqlExpr` to clean and cast the date
- [X] Review (evaluate) the DataFrame to validate our work
- [ ] Create a view using this DataFrame method `createOrReplaceView()`

## Create a View from Our DataFrame

You can build a DataFrame, and then make it available outside of Snowpark.  For instance, it's possible you may want to make this DataFrame or flow of data available to a BI tool.  Snowpark can help you take the definition of your DataFrame and [publish it as a Snowflake view](https://docs.snowflake.com/en/developer-guide/snowpark/reference/scala/com/snowflake/snowpark/DataFrame.html#createOrReplaceView(viewName:String):Unit) so it's available without someone needing to have Scala/Notebook access.  

In [None]:
conformedDF.createOrReplaceView("conformed.CLEAN_REVIEWS_VW")


### Progress Check
- [X] Build a DataFrame using a `sqlExpr` to clean and cast the date
- [X] Review (evaluate) the DataFrame to validate our work
- [X] Create a view using this DataFrame method `createOrReplaceView()`

## You Try It

1. Head over to https://app.snowflake.com/ and login to the class Snowflake account using your animal name and password.
Play around with and visualize the data available in the `CONFORMED.CLEAN_REVIEWS_VW`.

1. See if you can recreate the same `CLEAN_REVIEW_VW` using no `sqlExpr`.
Create a new DataFrame, and build the same output, but instead of using a SQL Expression, build the entire expression using Snowpark operators only [replace, to_date](https://docs.snowflake.com/en/developer-guide/snowpark/reference/scala/com/snowflake/snowpark/functions$.html)