# Scala Visualization

In this section, we are going to make use of an existing graphing library in Scala, to visualize a dataset showing `Delay in Minutes` for a sample of flights from Seattle to San Francisco.

In [None]:
import com.snowflake.snowpark._
import com.snowflake.snowpark.functions._
import com.snowflake.snowpark.types._

// Set connection properties built in de_snowpark/A-Dataframes/01-Sessions.ipynb
val pwd = sys.env.get("PWD").fold("")(_.toString)
val filename = s"$pwd/de_snowpark/connect.properties"

val session = Session.builder.configFile(s"$filename").create

// Set session to use the RAW schema
session.sql("use schema RAW").collect

## Build DataFrame and Review

First, we want to construct a new DataFrame by loading data for the Seattle (SEA) to San Francisco (SFO) flights from 2019. We then select a subset of columns and aggregate, producing new DataFrames, grouping by `OP_CARRIER`, computing max, min and mean values, and ordering the output. The resultant DataFrame can then be reviewed.

In [None]:
val SEAtoSFODelayStatsDF =  session.table("raw.ONTIME_REPORTING")
                            .filter(col("YEAR") === 2019 &&                 
                                      col("ORIGIN") === "SEA" && 
                                      col("DEST") === "SFO")
                            .select(                                    
                                       col("ARR_DELAY").cast(IntegerType) as "ARR_DELAY",
                                       col("DEP_DELAY").cast(IntegerType) as "DEP_DELAY",
                                       col("ORIGIN"), 
                                       col("DEST"),
                                       col("OP_CARRIER")
                            )
                            .groupBy("OP_CARRIER")
                            .agg(
                                max(col("ARR_DELAY")),
                                min(col("ARR_DELAY")),
                                mean(col("ARR_DELAY")).as("AVG_ARR_DELAY")
                            )
                            .sort (col("AVG_ARR_DELAY").desc)



In [None]:
var rows = SEAtoSFODelayStatsDF.collect()
SEAtoSFODelayStatsDF.schema

## Create Scala Method

In the following code we import the open-source [EvilPlot data visualization library](https://cibotech.github.io/evilplot/), which is written in Scala, defining a new method which we will call with parameters, to make use of specific functionality from this library:

In [None]:
import $ivy.`io.github.cibotech::evilplot:0.8.1`
def showPlot(plot: com.cibo.evilplot.geometry.Drawable) =
  Image.fromRenderedImage(plot.asBufferedImage, Image.PNG)

## Render Output

Finally, we can take the array output from our DataFrame, `rows`, and pass this data into the visualisation routine run to in Scala, producing graphical output. `OP_CARRIER` is used for the x-axis value, and `AVG_ARR_DELAY` (average arrival delay) for the y-axis. We can then review the bar chart displaying `Delay in Minutes` for a sample of flights from Seattle to San Francisco.

In [None]:
import com.cibo.evilplot.plot._
import com.cibo.evilplot.plot.aesthetics.DefaultTheme._

val delaysPlot = BarChart(
       rows.map(r => r.getDecimal(3).doubleValue() * -1)
   )
  .title("Seattle to San Francisco (Delayed Minutes)")
  .xAxis(
      rows.map(r => r.getString(0))
  )
  .yAxis()
  .frame()
  .bottomLegend()
  .render()


showPlot(delaysPlot)