# Parquet Modular Encryption Example
## Key Management by Application

Learn to encrypt parquet files and read encrypted parquet files with a pre-configured IBM Analytics Engine. 

This notebook demonstrates the new parquet modular encryption that IBM supports, for the purpose of encrypting columns, sections or sensitive parts of data, such as Personal Information (PI). For the purposes of this tutorial, you will learn how to encrypt sample blood test data in parquet format, and learn to read and write such encrypted data by using [Key management by application](https://cloud.ibm.com/docs/AnalyticsEngine?topic=AnalyticsEngine-key-management-application). This notebook runs on Scala and Spark.

### Table of Contents:
* [1. Setup the environment](#cell0)
* [2. Generate the data](#cell1)
* [3. Write the encrypted data](#cell2)
* [4. Read the encrypted data](#cell3)
* [Summary](#summary)
* [Authors](#authors)

<a id="cell0"></a>
## Setup the environment

<div class="alert alert-block alert-warning">
    A standard IBM Analytics Engine (IAE) will not be useful for parquet modular encryption. You must configure your IAE according to the <a href="https://cloud.ibm.com/docs/AnalyticsEngine?topic=AnalyticsEngine-parquet-encryption">Analytics Engine Parquet Encryption</a> documentation before you run the rest of the notebook.
</div>

1. After you have a configured IAE, have your Spark classpaths point to the Parquet jar files as instructed [here](https://cloud.ibm.com/docs/AnalyticsEngine?topic=AnalyticsEngine-parquet-encryption#running-ibm-analytics-engine-with-parquet-encryption). 
1. Before you add this notebook to your project, associate your configured IAE with the project you are running this notebook in. 
1. When you create the notebook, choose it as the environment engine for the notebook.

Define the path to the folder with encrypted parquet files:

In [1]:
val encryptedParquetFullName = "/tmp/bloodtests.parquet.encrypted" // Change to your encrypted files path

encryptedParquetFullName = /tmp/bloodtests.parquet.encrypted


/tmp/bloodtests.parquet.encrypted

The application would manage the keys in the Key Management service. Enter the setup keys for Parquet Encryption in the following cell. To learn more about provision of the master keys, see the documentation [here](https://cloud.ibm.com/docs/AnalyticsEngine?topic=AnalyticsEngine-key-management-application).

In [2]:
sc.hadoopConfiguration.set("encryption.key.list",
      "key1: <your_key_here>==, key2: <your_key_here>==") // Fill in your personal setup keys

Waiting for a Spark session to start...

<a id="cell1"></a>
## Generate the data

In [3]:
val dataRange = (1 to 40).toList
val bloodTestList = dataRange.map(i => (i, (i * 2)))

dataRange = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
bloodTestList = List((1,2), (2,4), (3,6), (4,8), (5,10), (6,12), (7,14), (8,16), (9,18), (10,20), (11,22), (12,24), (13,26), (14,28), (15,30), (16,32), (17,34), (18,36), (19,38), (20,40), (21,42), (22,44), (23,46), (24,48), (25,50), (26,52), (27,54), (28,56), (29,58), (30,60), (31,62), (32,64), (33,66), (34,68), (35,70), (36,72), (37,74), (38,76), (39,78), (40,80))


List((1,2), (2,4), (3,6), (4,8), (5,10), (6,12), (7,14), (8,16), (9,18), (10,20), (11,22), (12,24), (13,26), (14,28), (15,30), (16,32), (17,34), (18,36), (19,38), (20,40), (21,42), (22,44), (23,46), (24,48), (25,50), (26,52), (27,54), (28,56), (29,58), (30,60), (31,62), (32,64), (33,66), (34,68), (35,70), (36,72), (37,74), (38,76), (39,78), (40,80))

<a id="cell2"></a>
## Write the encrypted data

Write the encrypted data with the following code.

In [4]:
bloodTestList.toDF("id", "value").write
        // Configure which columns to encrypt with which keys
      .option("encryption.column.keys", "key1: id")
      .option("encryption.footer.key", "key2")
      .mode("overwrite").parquet(encryptedParquetFullName)

<a id="cell3"></a>
## Read the encrypted data

Read the actual encrypted data in its decrypted form with the following code.

In [5]:
val encrypedDataDF = spark.read.parquet(encryptedParquetFullName)
encrypedDataDF.createOrReplaceTempView("bloodtests")
val queryResult = spark.sql("SELECT id, value FROM bloodtests")
queryResult.show(10, false)

Hive Session ID = de6e3c91-1a04-4b60-a7b4-da7226e5b266
+---+-----+
|id |value|
+---+-----+
|1  |2    |
|2  |4    |
|3  |6    |
|4  |8    |
|5  |10   |
|6  |12   |
|7  |14   |
|8  |16   |
|9  |18   |
|10 |20   |
+---+-----+
only showing top 10 rows



encrypedDataDF = [id: int, value: int]
queryResult = [id: int, value: int]


[id: int, value: int]

Make sure files were written with parquet encryption (in encrypted footer mode):

In [6]:
import scala.sys.process._
val parquetPartitionFile = Seq("hdfs", "dfs", "-ls", "-S", "-C", encryptedParquetFullName).!!.split("\\s+")(0)
Seq("hdfs", "dfs", "-tail", parquetPartitionFile) #| "tail -c 4" !

PARE

parquetPartitionFile = /tmp/bloodtests.parquet.encrypted/part-00000-dcf7f17a-3065-49b5-8aef-56422d4402b6-c000.snappy.parquet




0

If you have done everything correctly, the output of the cell above should be <em>"PARE"</em>.

<a id="summary"></a>
## Summary

Congratulations! You have successfully completed this notebook and learned to associate a configured IAE to your notebook, and learned to read and write encrypted parquet files and learned more how to deal with the integrity and protection of sensitive data.

<a id="authors"></a>
### Authors

**Maya Anderson** is a Cloud Storage Researcher at IBM.  

Copyright © 2020 IBM. This notebook and its source code are released under the terms of the MIT License.

<div style='background:#F5F7FA; height:110px; padding: 2em; font-size:14px;'>
<span style='font-size:18px;color:#152935;'>Love this notebook? </span>
<span style='font-size:15px;color:#152935;float:right;margin-right:40px;'>Don't have an account yet?</span><br>
<span style='color:#5A6872;'>Share it with your colleagues and help them discover the power of Watson Studio!</span>
<span style='border: 1px solid #3d70b2;padding:8px;float:right;margin-right:40px; color:#3d70b2;'><a href='https://ibm.co/wsnotebooks' target='_blank' style='color: #3d70b2;text-decoration: none;'>Sign Up</a></span><br>
</div>