# Ensuring Governance and security for our Banking platform

Data governance and security is hard when it comes to a complete Data Platform. SQL GRANT on tables isn't enough and security must be enforced for multiple data assets (dashboards, Models, files etc).

To reduce risks and driving innovation, Emily's team needs to:

- Unify all data assets (Tables, Files, ML models, Features, Dashboards, Queries)
- Onboard data with multiple teams
- Share & monetize assets with external Organizations

<style>
.box{
  box-shadow: 20px -20px #CCC; height:300px; box-shadow:  0 0 10px  rgba(0,0,0,0.3); padding: 5px 10px 0px 10px;}
.badge {
  clear: left; float: left; height: 30px; width: 30px;  display: table-cell; vertical-align: middle; border-radius: 50%; background: #fcba33ff; text-align: center; color: white; margin-right: 10px}
.badge_b { 
  height: 35px}
</style>
<link href='https://fonts.googleapis.com/css?family=DM Sans' rel='stylesheet'>
<div style="padding: 20px; font-family: 'DM Sans'; color: #1b5162">
  <div style="width:200px; float: left; text-align: center">
    <div class="box" style="">
      <div style="font-size: 26px;">
        <strong>Team A</strong>
      </div>
      <div style="font-size: 13px">
        <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/da.png" style="" width="60px"> <br/>
        Data Analysts<br/>
        <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/ds.png" style="" width="60px"> <br/>
        Data Scientists<br/>
        <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/de.png" style="" width="60px"> <br/>
        Data Engineers
      </div>
    </div>
    <div class="box" style="height: 80px; margin: 20px 0px 50px 0px">
      <div style="font-size: 26px;">
        <strong>Team B</strong>
      </div>
      <div style="font-size: 13px">...</div>
    </div>
  </div>
  <div style="float: left; width: 400px; padding: 0px 20px 0px 20px">
    <div style="margin: 20px 0px 0px 20px">Permissions on queries, dashboards</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
    <div style="margin: 20px 0px 0px 20px">Permissions on tables, columns, rows</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
    <div style="margin: 20px 0px 0px 20px">Permissions on features, ML models, endpoints, notebooks…</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
    <div style="margin: 20px 0px 0px 20px">Permissions on files, jobs</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
  </div>
  
  <div class="box" style="width:550px; float: left">
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/gov.png" style="float: left; margin-right: 10px;" width="80px"> 
    <div style="float: left; font-size: 26px; margin-top: 0px; line-height: 17px;"><strong>Emily</strong> <br />Governance and Security</div>
    <div style="font-size: 18px; clear: left; padding-top: 10px">
      <ul style="line-height: 2px;">
        <li>Central catalog - all data assets</li>
        <li>Data exploration & discovery to unlock new use-cases</li>
        <li>Permissions cross-teams</li>
        <li>Reduce risk with audit logs</li>
        <li>Measure impact with lineage</li>
      </ul>
      + Monetize & Share data with external organization (Delta Sharing)
    </div>
  </div>
  
</div>

<!-- Collect usage data (view). Remove it to disable collection. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=lakehouse&org_id=984752964297111&notebook=%2F02-Data-governance%2F02-UC-data-governance-ACL-fsi-fraud&demo_name=lakehouse-fsi-fraud&event=VIEW&path=%2F_dbdemos%2Flakehouse%2Flakehouse-fsi-fraud%2F02-Data-governance%2F02-UC-data-governance-ACL-fsi-fraud&version=1&user_hash=086247655aad7f847fc5af0bced92d31b6454844129a39a1b73eef221886867a">

# Implementing a global data governance and security with Unity Catalog

<img style="float: right; margin-top: 30px" width="500px" src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/main/images/fsi/fraud-detection/lakehouse-fsi-fraud-overview-2.png" />

Let's see how the Lakehouse can solve this challenge leveraging Unity Catalog.

Our Data has been saved as Delta Table by our Data Engineering team.  The next step is to secure this data while allowing cross team to access it. <br>
A typical setup would be the following:

* Data Engineers / Jobs can read and update the main data/schemas (ETL part)
* Data Scientists can read the final tables and update their features tables
* Data Analyst have READ access to the Data Engineering and Feature Tables and can ingest/transform additional data in a separate schema.
* Data is masked/anonymized dynamically based on each user access level

This is made possible by Unity Catalog. When tables are saved in the Unity Catalog, they can be made accessible to the entire organization, cross-workpsaces and cross users.

Unity Catalog is key for data governance, including creating data products or organazing teams around datamesh. It brings among other:

* Fined grained ACL
* Audit log
* Data lineage
* Data exploration & discovery
* Sharing data with external organization (Delta Sharing)

## Cluster setup for UC

<img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/uc/uc-cluster-setup-single-user.png" style="float: right"/>


To be able to run this demo, make sure you create a cluster with the security mode enabled.

Go in the compute page, create a new cluster.

Select "Single User" and your UC-user (the user needs to exist at the workspace and the account level)

In [0]:
%run ../config

In [0]:
%python
dbutils.widgets.text("catalog", catalog)
dbutils.widgets.text("schema", schema)

In [0]:
%run ../_resources/00-setup $reset_all_data=false

## Exploring our Customer360 database

<img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/uc/uc-base-1.png" style="float: right" width="800px"/> 

Let's review the data created.

Unity Catalog works with 3 layers:

* CATALOG
* SCHEMA (or DATABASE)
* TABLE

All unity catalog is available with SQL (`CREATE CATALOG IF NOT EXISTS my_catalog` ...)

To access one table, you can specify the full path: `SELECT * FROM &lt;CATALOG&gt;.&lt;SCHEMA&gt;.&lt;TABLE&gt;`

In [0]:
-- the catalog has been created for your user and is defined as default. 
-- make sure you run the 00-setup cell above to init the catalog to your user. 
SELECT CURRENT_CATALOG();


## Let's review the tables we created under our schema

<img src="https://raw.githubusercontent.com/QuentinAmbard/databricks-demo/main/retail/resources/images/lakehouse-retail/lakehouse-retail-churn-data-explorer.gif" style="float: right" width="800px"/> 

Unity Catalog provides a comprehensive Data Explorer that you can access on the left menu.

You'll find all your tables, and can use it to access and administrate your tables.

They'll be able to create extra table into this schema.

### Discoverability 

In addition, Unity catalog also provides explorability and discoverability. 

Anyone having access to the tables will be able to search it and analyze its main usage. <br>
You can use the Search menu (⌘ + P) to navigate in your data assets (tables, notebooks, queries...)

# 

# **_# Task 3 Start_**

The current catalog has already been selected for you; Please check which tables exist in this catalog via code


In [0]:
/* YOUR CODE HERE */

In [0]:
SHOW TABLES;

# **_Task 3 End_**

# **_# Task 4 Start_**

Lets grant our analysts group a SELECT permission on gold_transactions;

Further, please also grant an extra select and modify permission to the Data Engineer group 'dataengineers' for the schema YourCatalogName.YourSchemaName.YourTableName

Note: Make sure you or someone else created an analysts and dataengineers group beforehand.
Not sure if its already there? Check via code if it exists, ask the assistant to provide you a code snippet. 
You could also use the UI -> Tipp: Click on your user icon at the top right and navigate to settings. 


In [0]:
/* YOUR CODE */

In [0]:
-- Let's grant our ANALYSTS a SELECT permission:
-- Note: make sure you created an analysts and dataengineers group first.
GRANT SELECT ON TABLE ${catalog}.${schema}.gold_transactions TO `analysts`;

-- We'll grant an extra MODIFY to our Data Engineer
GRANT SELECT, MODIFY ON SCHEMA ${catalog}.${schema} TO `dataengineers`;

# **_Task 4 End_**


## Going further with Data governance & security

By bringing all your data assets together, Unity Catalog let you build a complete and simple governance to help you scale your teams.

Unity Catalog can be leveraged from simple GRANT to building a complete datamesh organization.

<img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/uc/lineage/lineage-table.gif" style="float: right; margin-left: 10px"/>

### Fine-grained ACL

Need more advanced control? You can chose to dynamically change your table output based on the user permissions: `dbdemos.intall('uc-01-acl')`

### Secure external location (S3/ADLS/GCS)

Unity Catatalog let you secure your managed table but also your external locations:  `dbdemos.intall('uc-02-external-location')`

### Lineage 

UC automatically captures table dependencies and let you track how your data is used, including at a row level: `dbdemos.intall('uc-03-data-lineage')`

This leat you analyze downstream impact, or monitor sensitive information across the entire organization (GDPR).


### Audit log

UC captures all events. Need to know who is accessing which data? Query your audit log:  `dbdemos.intall('uc-04-audit-log')`

This leat you analyze downstream impact, or monitor sensitive information across the entire organization (GDPR).

### Upgrading to UC

Already using Databricks without UC? Upgrading your tables to benefit from Unity Catalog is simple:  `dbdemos.intall('uc-05-upgrade')`

### Sharing data with external organization

Sharing your data outside of your Databricks users is simple with Delta Sharing, and doesn't require your data consumers to use Databricks:  `dbdemos.intall('delta-sharing-airlines')`

# **_# Task 5 Start_**

Full transparency on Billing tables, Audit Logs, Lineage are crucial. 
Research & write a simple SQL statement to monitor some details (e.g. billing logs for used compute!  


In [0]:
/* YOUR CODE */


In [0]:
/* Solution: https://notebooks.databricks.com/demos/uc-04-system-tables/index.html */

/*select * from system.billing.usage limit 50*/
/*select * from system.access.audit limit 50*/

# **Task 5 End**


# Next: Start building analysis with Databricks SQL

Now that these tables are available in our Lakehouse and secured, let's see how our Data Analyst team can start leveraging them to run BI workloads

Jump to the [BI / Data warehousing notebook]($../03-BI-data-warehousing/03-BI-Datawarehousing-fraud) or [Go back to the introduction]($../00-FSI-fraud-detection-introduction-lakehouse)