# Read Data from Silver VAUsage Table and Write to Gold VAUsage Action Counts Table
To run this notebook, import it into Azure Synapse and attach it to an Apache Spark Pool.
When creating the Apache Spark Pool, choose "Small" as the Node Size. Choose the option to disable autoscaling. For the number of nodes, choose the lowest number, 3.
Be sure to run the "rate-streaming-to-bronze" Notebook and "bronze-to-silver-vausage" Notebook beforehand.


In [None]:
%%spark
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
import org.apache.spark.sql.SaveMode

## Configure the Storage Account (to read from)
Replace the value `<storageAccountName>` with the name of the storage account where the Silver Telemetry Table data is stored.

In [None]:
%%spark
val storageAccountName = "<storageAccountName>"
val silverDataLocation: String = "abfss://datalake@"+storageAccountName+".dfs.core.windows.net/silverSynapse/VAUsage"
val goldDataLocation : String = "abfss://datalake@"+storageAccountName+".dfs.core.windows.net/goldSynapseCDM/VAUsageActionCounts"  

## Prepare to Use CDM
If this Notebook is being run for the first time (i.e. inside the "GoldSynapseCDM/VAUsageActionCounts" folder in the storage account there is no 'default.manifest.cdm.json' file or 'GoldVAUsage' folder), then set the value of "entitiesExist" to 'false'.

If this is not the first time the Notebook is being run with this storage account, set to 'true'.

In [None]:
%%spark
var entitiesExist : Boolean = false // change to 'true' if you have run the Notebook before

## Read the Data
Here the data is read from the `silverDataLocation` specified in the previous cell, which is configured using the value inputted for `storageAccount`.

In [None]:
%%spark
var silverDF = spark.read.format("delta").load(silverDataLocation)

## View the Data from the Silver VAUsage Table
Run this cell to see 10 rows from the Silver VAUsage Table, ordered in descending order by *ProcessedTimestamp*.
Change the value of `numRows` to however many number of rows you would like to be displayed.

In [None]:
%%spark
silverDF.orderBy(col("ProcessedTimestamp").desc).show(numRows = 10)

In [None]:
%%spark
silverDF = silverDF.drop("processedTime")

Run the following cell to get a better look at the Silver VAUsage Delta Table data schema.

In [None]:
%%spark
silverDF.printSchema()

## Configure the Schema of the Data
The schema of the Dataframe is configured to match the schema of the Gold VAUSage Action Counts Table.

In [None]:
%%spark
var goldDF = silverDF.withWatermark("ProcessedTimestamp", "10 second").groupBy(
    window(col("ProcessedTimestamp"), "10 seconds", "10 second"),
    col("Object"),
    col("Action")
  ).count().withColumn("WindowStartDate", to_date(col("window").getItem("start")))

goldDF.printSchema()

## Write Data to Gold VAUSage Action Counts Table in the CDM Format

In [None]:
%%spark
val CDMStorageAccount : String = storageAccountName + ".dfs.core.windows.net" 
val manifestPath : String = "datalake/goldSynapseCDM/VAUsageActionCounts/default.manifest.cdm.json"

## Writing for the First Time
Run this cell if the CDM manifest and entities have not yet been created (if you are running this Notebook for the first time).

In [None]:
%%spark
if (!entitiesExist) { 
    goldDF.write.format("com.microsoft.cdm").
    option("storage", CDMStorageAccount).
    option("manifestPath", manifestPath).
    option("entity", "GoldVAUsage").
    option("format", "parquet").
    save()
}

## CDM Manifest and Entity Already Created
Run this cell if you have run the Notebook before, and already have a manifest and entity inside your storage account.

The manifest (default.manifest.cdm.json) and entity (GoldVAUsage) can be found in your storage account, inside the "datalake" container, and the "goldSynapseCDM/VAUsageActionCounts" folder.


In [None]:
%%spark
if (entitiesExist) { 
    goldDF.write.format("com.microsoft.cdm").
    option("storage", CDMStorageAccount).
    option("manifestPath", manifestPath).
    option("entity", "GoldVAUsage").
    mode(SaveMode.Append).
    option("format", "parquet").
    save()
}

## View the Data

In [None]:
%%spark
var goldReadDF = spark.read.format("com.microsoft.cdm").
option("storage", CDMStorageAccount).
option("manifestPath", manifestPath).
option("entity", "GoldVAUsage").
load()

In [None]:
%%spark
goldReadDF.orderBy(col("window.start").desc).show()

In [None]:
%%spark
goldReadDF.printSchema()

In [None]:
%%spark
display(goldReadDF.orderBy(col("window.start").desc))