# Linux Foundation Delta Lake example using EMR Serverless on EMR Studio

#### Topics covered in this example
<ol>
    <li> Configure a Spark session </li>
    <li> Create a Delta lake table </li>
    <li> Query the table </li>
</ol>

***

## Prerequisites
<div class="alert alert-block alert-info">
<b>NOTE :</b> In order to execute this notebook successfully as is, please ensure the following prerequisites are completed.</div>

* EMR Serverless should be chosen as the Compute. The Application version should be 6.14 or higher.
* Make sure the Studio user role has permission to attach the Workspace to the Application and to pass the runtime role to it.
* This notebook uses the `PySpark` kernel.
***

## 1. Configure your Spark session.
Configure the Spark Session. Set up Spark SQL extensions to use Delta lake. 

In [None]:
%%configure -f
{
    "conf": {
        "spark.sql.extensions" : "io.delta.sql.DeltaSparkSessionExtension",
        "spark.sql.catalog.spark_catalog": "org.apache.spark.sql.delta.catalog.DeltaCatalog",
        "spark.jars": "/usr/share/aws/delta/lib/delta-core.jar,/usr/share/aws/delta/lib/delta-storage.jar,/usr/share/aws/delta/lib/delta-storage-s3-dynamodb.jar",
        "spark.hadoop.hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
    }
}

---
## 2. Create a Delta lake Table
We will create a Spark Dataframe with sample data and write this into a Delta lake table. 

<div class="alert alert-block alert-info">
    <b>NOTE :</b> You will need to update <b>my_bucket</b> in the Spark SQL statement below to your own bucket. Please make sure you have read and write permissions for this bucket.</div>

In [None]:
tableName = "delta_table"
basePath = "s3://my_bucket/aws_workshop/delta_data_location/" + tableName


In [None]:
data = spark.createDataFrame([
 ("100", "2015-01-01", "2015-01-01T13:51:39.340396Z"),
 ("101", "2015-01-01", "2015-01-01T12:14:58.597216Z"),
 ("102", "2015-01-01", "2015-01-01T13:51:40.417052Z"),
 ("103", "2015-01-01", "2015-01-01T13:51:40.519832Z")
],["id", "creation_date", "last_update_time"])

In [None]:
data.write.format("delta"). \
  save(basePath)


---
## 3. Query the table
We will read the table using spark.read into a Spark dataframe

In [None]:
df = spark.read.format("delta").load(basePath)
df.show()

### You have made it to the end of the this notebook!!