# Ensuring Governance and security for our IOT platform

Data governance and security is hard when it comes to a complete Data Platform. SQL GRANT on tables isn't enough and security must be enforced for multiple data assets (dashboards, Models, files etc).

To reduce risks and driving innovation, Emily's team needs to:

- Unify all data assets (Tables, Files, ML models, Features, Dashboards, Queries)
- Onboard data with multiple teams
- Share & monetize assets with external Organizations

<style>
.box{
  box-shadow: 20px -20px #CCC; height:300px; box-shadow:  0 0 10px  rgba(0,0,0,0.3); padding: 5px 10px 0px 10px;}
.badge {
  clear: left; float: left; height: 30px; width: 30px;  display: table-cell; vertical-align: middle; border-radius: 50%; background: #fcba33ff; text-align: center; color: white; margin-right: 10px}
.badge_b { 
  height: 35px}
</style>
<link href='https://fonts.googleapis.com/css?family=DM Sans' rel='stylesheet'>
<div style="padding: 20px; font-family: 'DM Sans'; color: #1b5162">
  <div style="width:200px; float: left; text-align: center">
    <div class="box" style="">
      <div style="font-size: 26px;">
        <strong>Team A</strong>
      </div>
      <div style="font-size: 13px">
        <img src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/alice.png" style="" width="60px"> <br/>
        Data Analysts<br/>
        <img src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/marc.png" style="" width="60px"> <br/>
        Data Scientists<br/>
        <img src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/john.png" style="" width="60px"> <br/>
        Data Engineers
      </div>
    </div>
    <div class="box" style="height: 80px; margin: 20px 0px 50px 0px">
      <div style="font-size: 26px;">
        <strong>Team B</strong>
      </div>
      <div style="font-size: 13px">...</div>
    </div>
  </div>
  <div style="float: left; width: 400px; padding: 0px 20px 0px 20px">
    <div style="margin: 20px 0px 0px 20px">Permissions on queries, dashboards</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
    <div style="margin: 20px 0px 0px 20px">Permissions on tables, columns, rows</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
    <div style="margin: 20px 0px 0px 20px">Permissions on features, ML models, endpoints, notebooks…</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
    <div style="margin: 20px 0px 0px 20px">Permissions on files, jobs</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
  </div>
  
  <div class="box" style="width:550px; float: left">
    <img src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/emily.png" style="float: left; margin-right: 10px;" width="80px"> 
    <div style="float: left; font-size: 26px; margin-top: 0px; line-height: 17px;"><strong>Emily</strong> <br />Governance and Security</div>
    <div style="font-size: 18px; clear: left; padding-top: 10px">
      <ul style="line-height: 2px;">
        <li>Central catalog - all data assets</li>
        <li>Data exploration & discovery to unlock new use-cases</li>
        <li>Permissions cross-teams</li>
        <li>Reduce risk with audit logs</li>
        <li>Measure impact with lineage</li>
      </ul>
      + Monetize & Share data with external organization (Delta Sharing)
    </div>
  </div>
  
  
</div>


<!-- Collect usage data (view). Remove it to disable collection. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=lakehouse&org_id=4003492105941350&notebook=%2F02-Data-governance%2F02-UC-data-governance-security-iot-turbine&demo_name=lakehouse-iot-platform&event=VIEW&path=%2F_dbdemos%2Flakehouse%2Flakehouse-iot-platform%2F02-Data-governance%2F02-UC-data-governance-security-iot-turbine&version=1">

# Implementing a global data governance and security with Unity Catalog

<img style="float: right; margin-top: 30px" width="500px" src="https://raw.githubusercontent.com/databricks-demos/dbdemos-resources/refs/heads/main/images/manufacturing/lakehouse-iot-turbine/team_flow_emily.png" />

Let's see how the Lakehouse can solve this challenge leveraging Unity Catalog.

Our Data has been saved as Delta Table by our Data Engineering team.  The next step is to secure this data while allowing cross team to access it. <br>
A typical setup would be the following:

* Data Engineers / Jobs can read and update the main data/schemas (ETL part)
* Data Scientists can read the final tables and update their features tables
* Data Analyst have READ access to the Data Engineering and Feature Tables and can ingest/transform additional data in a separate schema.
* Data is masked/anonymized dynamically based on each user access level

This is made possible by Unity Catalog. When tables are saved in the Unity Catalog, they can be made accessible to the entire organization, cross-workpsaces and cross users.

Unity Catalog is key for data governance, including creating data products or organazing teams around datamesh. It brings among other:

* Fined grained ACL
* Audit log
* Data lineage
* Data exploration & discovery
* Sharing data with external organization (Delta Sharing)


<!-- Collect usage data (view). Remove it to disable collection. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=lakehouse&org_id=4003492105941350&notebook=%2F02-Data-governance%2F02-UC-data-governance-security-iot-turbine&demo_name=lakehouse-iot-platform&event=VIEW&path=%2F_dbdemos%2Flakehouse%2Flakehouse-iot-platform%2F02-Data-governance%2F02-UC-data-governance-security-iot-turbine&version=1">

## Cluster setup for UC

<img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/uc/uc-cluster-setup-single-user.png" style="float: right; margin-left: 10px"/>

To be able to run this demo, make sure you create a cluster with the security mode enabled & Unity Catalog enabled at the account level (see [documentation](https://docs.databricks.com/data-governance/unity-catalog/get-started.html))

In the compute page, make sure you select "Single User" and your UC-user (the user needs to exist at the workspace and the account level)

If you're using the cluster created by `dbdemos`, you're all good.

In [0]:
%run ../_resources/00-setup $reset_all_data=false

## Exploring our Iot Platform database

<img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/uc/uc-base-1.png" style="float: right" width="800px"/> 

Let's review the data created.

Unity Catalog works with 3 layers:

* CATALOG
* SCHEMA (or DATABASE)
* TABLE

All unity catalog is available with SQL (`CREATE CATALOG IF NOT EXISTS my_catalog` ...)

To access one table, you can specify the full path: `SELECT * FROM &lt;CATALOG&gt;.&lt;SCHEMA&gt;.&lt;TABLE&gt;`

In [0]:
-- the catalog has been created for your user and is defined as default. 
-- make sure you run the 00-setup cell above to init the catalog to your user. 
SELECT CURRENT_CATALOG();


## Let's review the tables we created under our schema

<img src="https://raw.githubusercontent.com/QuentinAmbard/databricks-demo/main/retail/resources/images/lakehouse-retail/lakehouse-retail-churn-data-explorer.gif" style="float: right" width="800px"/> 

Unity Catalog provides a comprehensive Data Explorer that you can access on the left menu.

**Open it and navigate under `dbdemos`.`lakehouse_iot` to review the tables created.**

You'll find all the options for your data management and governance: review the tables created, add new one, share them with Delta Sharing...

**You'll also be able to explore data and GRANT permissions to your users directly using the UI**


### Discoverability 

In addition, Unity catalog also provides explorability and discoverability. 

Anyone having access to the tables will be able to search it and analyze its main usage. <br>
You can use the Search menu (⌘ + P) to navigate in your data assets (tables, notebooks, queries...)

In [0]:
SHOW TABLES;

In [0]:
-- Let's grant our ANALYSTS a SELECT permission:
-- Note: make sure you created an analysts and dataengineers group first.
GRANT SELECT ON TABLE main.dbdemos_iot_turbine.sensor_bronze TO `analysts`;
GRANT SELECT ON TABLE main.dbdemos_iot_turbine.sensor_hourly TO `analysts`;
GRANT SELECT ON TABLE main.dbdemos_iot_turbine.historical_turbine_status TO `analysts`;

-- We'll grant an extra MODIFY to our Data Engineer
GRANT SELECT, MODIFY ON SCHEMA main.dbdemos_iot_turbine TO `dataengineers`;

In [0]:
SHOW GRANT ON turbine


## Dynamically filtering data base on current user, row and column-level filtering

Let's see how Unity Catalog can be used to filter data and return different results based on who is querying it.

Let's pretend we're based in Chicago, and we want the `parts` table to only return the parts available in the Chicago location as this is where we operate.

We'll add a new table doing a matching between users and the parts locations *(Note: this could also be done with groups)*.

You'll be based in Chicago, John in Honolulu and Lea in Denvers:

In [0]:
-- create the table matchying the users and the country/location
CREATE OR REPLACE TABLE parts_users_country_permission (email STRING, country STRING);

INSERT INTO parts_users_country_permission (email, country)
  VALUES 
    (current_user(), 'America/Chicago'),
    ('john@mycompany.com', 'America/Honolulu'),
    ('lea@mycompany.com', 'America/Denver');

In [0]:
CREATE OR REPLACE VIEW parts_secured AS
SELECT
  CASE 
    WHEN is_account_group_member('iot_admin') THEN EAN  -- allow admin to see all
    ELSE '***' -- filter other users, they won't be able to see the EAN
  END as EAN,
  p.* EXCEPT (EAN)
FROM parts p 
INNER JOIN parts_users_country_permission u -- Get the country/location permission table
  ON p.stock_location = u.country 
  AND (u.email = current_user() OR is_account_group_member('iot_admin')); --Filter based on the current user, admin also have all permission


-- Let's test our secured view. We'll only see the 'America/Chicago' parts, and the EAN will be filtered.
SELECT * FROM parts_secured;


## Sharing data with external organization

We've seen how to GRANT access to our tables internally (to any entity within your Databricks account)

However, this might not be enough. You'll have to share this data with external organizations (for data monetization, partners etc). 

The Lakehouse provides this capability while being agnostic about your partners data stack / cloud.

This is powered by [Delta Sharing](https://www.databricks.com/en/product/delta-sharing), an open protocol creating open standard and data normalization across industries.

For a full example on Delta Sharing, run: `dbdemos.intall('delta-sharing-airlines')`


## Going further with Data governance & security

By bringing all your data assets together, Unity Catalog let you build a complete and simple governance to help you scale your teams.

Unity Catalog can be leveraged from simple GRANT to building a complete datamesh organization.

<img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/uc/lineage/lineage-table.gif" style="float: right; margin-left: 10px"/>

### Fine-grained ACL

Need more advanced control? You can chose to dynamically change your table output based on the user permissions: `dbdemos.intall('uc-01-acl')`

### Secure external location (S3/ADLS/GCS)

Unity Catatalog let you secure your managed table but also your external locations:  `dbdemos.intall('uc-02-external-location')`

### Lineage 

UC automatically captures table dependencies and let you track how your data is used, including at a row level: `dbdemos.intall('uc-03-data-lineage')`

This leat you analyze downstream impact, or monitor sensitive information across the entire organization (GDPR).


### Audit log

UC captures all events. Need to know who is accessing which data? Query your audit log:  `dbdemos.intall('uc-04-audit-log')`

This leat you analyze downstream impact, or monitor sensitive information across the entire organization (GDPR).

### Upgrading to UC

Already using Databricks without UC? Upgrading your tables to benefit from Unity Catalog is simple:  `dbdemos.intall('uc-05-upgrade')`

# Next: Start building analysis with Databricks SQL

Now that these tables are available in our Lakehouse and secured, let's see how our Data Analyst team can start leveraging them to run BI workloads

Jump to the [BI / Data warehousing notebook]($../03-BI-data-warehousing/03-BI-Datawarehousing-iot-turbine) or [Go back to the introduction]($../00-IOT-wind-turbine-introduction-lakehouse)