# Ensuring Governance and security for our C360 lakehouse

Data governance and security is hard when it comes to a complete Data Platform. SQL GRANT on tables isn't enough and security must be enforced for multiple data assets (dashboards, Models, files etc).

To reduce risks and driving innovation, Emily's team needs to:

- Unify all data assets (Tables, Files, ML models, Features, Dashboards, Queries)
- Onboard data with multiple teams
- Share & monetize assets with external Organizations

<style>
.box{
  box-shadow: 20px -20px #CCC; height:300px; box-shadow:  0 0 10px  rgba(0,0,0,0.3); padding: 5px 10px 0px 10px;}
.badge {
  clear: left; float: left; height: 30px; width: 30px;  display: table-cell; vertical-align: middle; border-radius: 50%; background: #fcba33ff; text-align: center; color: white; margin-right: 10px}
.badge_b { 
  height: 35px}
</style>
<link href='https://fonts.googleapis.com/css?family=DM Sans' rel='stylesheet'>
<div style="padding: 20px; font-family: 'DM Sans'; color: #1b5162">
  <div style="width:200px; float: left; text-align: center">
    <div class="box" style="">
      <div style="font-size: 26px;">
        <strong>Team A</strong>
      </div>
      <div style="font-size: 13px">
        <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/da.png" style="" width="60px"> <br/>
        Data Analysts<br/>
        <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/ds.png" style="" width="60px"> <br/>
        Data Scientists<br/>
        <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/de.png" style="" width="60px"> <br/>
        Data Engineers
      </div>
    </div>
    <div class="box" style="height: 80px; margin: 20px 0px 50px 0px">
      <div style="font-size: 26px;">
        <strong>Team B</strong>
      </div>
      <div style="font-size: 13px">...</div>
    </div>
  </div>
  <div style="float: left; width: 400px; padding: 0px 20px 0px 20px">
    <div style="margin: 20px 0px 0px 20px">Permissions on queries, dashboards</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
    <div style="margin: 20px 0px 0px 20px">Permissions on tables, columns, rows</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
    <div style="margin: 20px 0px 0px 20px">Permissions on features, ML models, endpoints, notebooks…</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
    <div style="margin: 20px 0px 0px 20px">Permissions on files, jobs</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
  </div>
  
  <div class="box" style="width:550px; float: left">
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/gov.png" style="float: left; margin-right: 10px;" width="80px"> 
    <div style="float: left; font-size: 26px; margin-top: 0px; line-height: 17px;"><strong>Emily</strong> <br />Governance and Security</div>
    <div style="font-size: 18px; clear: left; padding-top: 10px">
      <ul style="line-height: 2px;">
        <li>Central catalog - all data assets</li>
        <li>Data exploration & discovery to unlock new use-cases</li>
        <li>Permissions cross-teams</li>
        <li>Reduce risk with audit logs</li>
        <li>Measure impact with lineage</li>
      </ul>
      + Monetize & Share data with external organization (Delta Sharing)
    </div>
  </div>
  
  
</div>

# Implementing a global data governance and security with Unity Catalog

<img style="float: right; margin-top: 30px" width="500px" src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/retail/lakehouse-churn/lakehouse-retail-c360-churn-2.png" />

Let's see how the Lakehouse can solve this challenge leveraging Unity Catalog.

Our Data has been saved as Delta Table by our Data Engineering team.  The next step is to secure this data while allowing cross team to access it. <br>
A typical setup would be the following:

* Data Engineers / Jobs can read and update the main data/schemas (ETL part)
* Data Scientists can read the final tables and update their features tables
* Data Analyst have READ access to the Data Engineering and Feature Tables and can ingest/transform additional data in a separate schema.
* Data is masked/anonymized dynamically based on each user access level

This is made possible by Unity Catalog. When tables are saved in the Unity Catalog, they can be made accessible to the entire organization, cross-workpsaces and cross users.

Unity Catalog is key for data governance, including creating data products or organazing teams around datamesh. It brings among other:

* Fined grained ACL
* Audit log
* Data lineage
* Data exploration & discovery
* Sharing data with external organization (Delta Sharing)

<!-- Collect usage data (view). Remove it to disable collection. View README for more details.  -->
<img width="1px" src="https://www.google-analytics.com/collect?v=1&gtm=GTM-NKQ8TT7&tid=UA-163989034-1&cid=555&aip=1&t=event&ec=field_demos&ea=display&dp=%2F42_field_demos%2Fretail%2Flakehouse_churn%2Fuc&dt=LAKEHOUSE_RETAIL_CHURN">

## Exploring our Customer360 database

<img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/uc/uc-base-1.png" style="float: right" width="800px"/> 

Let's review the data created.

Unity Catalog works with 3 layers:

* CATALOG
* SCHEMA (or DATABASE)
* TABLE

All unity catalog is available with SQL (`CREATE CATALOG IF NOT EXISTS my_catalog` ...)

To access one table, you can specify the full path: `SELECT * FROM &lt;CATALOG&gt;.&lt;SCHEMA&gt;.&lt;TABLE&gt;`


## Let's review the tables we created under our schema

<img src="https://raw.githubusercontent.com/QuentinAmbard/databricks-demo/main/retail/resources/images/lakehouse-retail/lakehouse-retail-churn-data-explorer.gif" style="float: right" width="800px"/> 

Unity Catalog provides a comprehensive Data Explorer that you can access on the left menu.

You'll find all your tables, and can use it to access and administrate your tables.

They'll be able to create extra table into this schema.

### Discoverability 

In addition, Unity catalog also provides explorability and discoverability. 

Anyone having access to the tables will be able to search it and analyze its main usage. <br>
You can use the Search menu (⌘ + P) to navigate in your data assets (tables, notebooks, queries...)

In [0]:
%run ./includes/SetupLab

In [0]:
print("Working catalog and schema: " + labContext.catalogAndSchema())

Working catalog and schema: cloud_lakehouse_labs.odl_user_1237583_databrickslabs_com_retail


In [0]:
%sql
-- This lab has been set up to use a specific catalog, if it has been set up, or "main". 
SELECT CURRENT_CATALOG();

current_catalog()
cloud_lakehouse_labs


In [0]:
%sql
-- Shows the schemas (databases) in the the current catalog
SHOW DATABASES;

databaseName
information_schema
odl_instructor_1036177_databrickslabs_com_retail
odl_user_1225167_databrickslabs_com_retail
odl_user_1237583_databrickslabs_com_retail
odl_user_1238721_databrickslabs_com_retail
odl_user_1238938_databrickslabs_com_retail


In [0]:
%sql
-- Shows the tables in the current database
SHOW TABLES;

database,tableName,isTemporary
odl_user_1237583_databrickslabs_com_retail,churn_app_events,False
odl_user_1237583_databrickslabs_com_retail,churn_features,False
odl_user_1237583_databrickslabs_com_retail,churn_orders,False
odl_user_1237583_databrickslabs_com_retail,churn_orders_bronze,False
odl_user_1237583_databrickslabs_com_retail,churn_users,False
odl_user_1237583_databrickslabs_com_retail,churn_users_bronze,False


In [0]:
%sql
-- FILL IN <SCHEMA> and <TABLE>
GRANT USE SCHEMA ON SCHEMA <SCHEMA> TO `account users`;
GRANT SELECT ON TABLE <SCHEMA>.<TABLE> TO `account users`;

[0;31m---------------------------------------------------------------------------[0m
[0;31mParseException[0m                            Traceback (most recent call last)
File [0;32m<command-3235179284848309>:8[0m
[1;32m      6[0m     display(df)
[1;32m      7[0m     [38;5;28;01mreturn[39;00m df
[0;32m----> 8[0m   _sqldf [38;5;241m=[39m [43m____databricks_percent_sql[49m[43m([49m[43m)[49m
[1;32m      9[0m [38;5;28;01mfinally[39;00m:
[1;32m     10[0m   [38;5;28;01mdel[39;00m ____databricks_percent_sql

File [0;32m<command-3235179284848309>:4[0m, in [0;36m____databricks_percent_sql[0;34m()[0m
[1;32m      2[0m [38;5;28;01mdef[39;00m [38;5;21m____databricks_percent_sql[39m():
[1;32m      3[0m   [38;5;28;01mimport[39;00m [38;5;21;01mbase64[39;00m
[0;32m----> 4[0m   [43mspark[49m[38;5;241;43m.[39;49m[43msql[49m[43m([49m[43mbase64[49m[38;5;241;43m.[39;49m[43mstandard_b64decode[49m[43m([49m[38;5;124;43m"[39;49m[38;5;124;43mLS0g


## Going further with Data governance & security

By bringing all your data assets together, Unity Catalog let you build a complete and simple governance to help you scale your teams.

Unity Catalog can be leveraged from simple GRANT to building a complete datamesh organization.

<img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/uc/lineage/lineage-table.gif" style="float: right; margin-left: 10px"/>

### Fine-grained ACL

Need more advanced control? You can chose to dynamically change your table output based on the user permissions.

### Secure external location (S3/ADLS/GCS)

Unity Catatalog let you secure your managed table but also your external locations.

### Lineage 

UC automatically captures table dependencies and let you track how your data is used, including at a row level.

This leat you analyze downstream impact, or monitor sensitive information across the entire organization (GDPR).


### Audit log

UC captures all events. Need to know who is accessing which data? Query your audit log.

This leat you analyze downstream impact, or monitor sensitive information across the entire organization (GDPR).

### Upgrading to UC

Already using Databricks without UC? Upgrading your tables to benefit from Unity Catalog is simple.

### Sharing data with external organization

Sharing your data outside of your Databricks users is simple with Delta Sharing, and doesn't require your data consumers to use Databricks.