# Lecture 37. Unity Catalog

In the previous lectures, we discussed the data governance model of **Databricks Hive metastore**.

In this lecture, we will see an overview of **Unity Catalog**, which is the new governance solution of the Databricks platform.

You will understand **Unity Catalog** and its architecture.

And we will see the **three-level namespace** introduced by **Unity Catalog**.

Lastly, we will describe the **security model** of **Unity Catalog**.

**Unity Catalog** is a centralized governance solution across all your workspaces on any cloud. It unifies governance for all data and AI assets in your **Lakehouse**, including files, tables, machine learning models, and dashboards.

And these can be simply achieved using **SQL language**.

So, with **Unity Catalog**, you define your **data access rules** once across multiple workspaces and clouds.

Before **Unity Catalog**, users and groups were defined per workspace. Also, access control was managed via the **Hive metastore** within the workspace.

By contrast, **Unity Catalog** sits outside of the workspace and is accessed via a user interface called the **Account Console**.

Users and groups for **Unity Catalog** are managed through this **Account Console** and assigned to one or more workspaces.

Metastores are likewise separated out of the workspace and managed through the **Account Console**, where they can be assigned to the workspaces.

A **Unity Catalog metastore** can be assigned to more than one workspace, enabling multiple workspaces to share the same **DBFS storage** and the same **access control lists**.



### Unity Catalog Three-Level Namespace

We saw previously the traditional two-level namespaces used to address tables within the schemas.

**Unity Catalog** introduces a **third level**, which is **catalogs**.

Let us understand better the hierarchy of **Unity Catalog**:

- The **metastore** is the top-level logical container in **Unity Catalog**. It represents metadata, that is, information about the objects being managed by the metadata, as well as the **access control list** that governs access to those objects.
  
- In a **metastore**, you have **catalogs**, which is the top-level container for data objects in **Unity Catalog** and forms the first part of the **three-level namespace** we just saw.

Don't confuse **Unity Catalog metastore** with the **Hive metastore**.

The **Hive metastore** is the default metastore linked to each Databricks workspace.

And while it may seem functionally similar to a **Unity Catalog metastore**, **Unity Catalog metastores** offer improved security and advanced features.

A **Unity Catalog metastore** can have as many catalogs as desired.

**Catalogs** contain **schemas**.

A **schema**, also known as a **database**, is the second part of the **three-level namespace**.

Schemas usually contain **data assets** like tables, views, and functions, forming the third part of the **three-level namespace**.



### Authentication and Storage Integration

**Unity Catalog** also supports authentication to the underlying cloud storage through **Storage Credentials**.

**Storage Credentials** apply to an entire storage container.

On the other hand, **External Locations** represent the storage directories within a cloud storage container.

In addition, **Unity Catalog** adds **Shares** and **Recipients** related to **Delta Sharing**.

- **Shares** are collections of tables shared with one or more **Recipients**.
  
**Delta sharing** is out of scope for this course.



### Identity Management in Unity Catalog

In **Unity Catalog**, we have three types of **identities** or **principals**, which are **users**, **service principals**, and **groups**.

- **Users** are individual physical users uniquely identified by their email addresses. A user can have an **admin role** to perform several administrative tasks important to **Unity Catalog**, such as managing and assigning metastores to workspaces and managing other users.

- A **Service Principal** is an individual identity for use with automated tools and applications. **Service Principals** are uniquely identified by **Application ID**. Like users, **Service Principals** can have admin roles which allow them to programmatically carry out administrative tasks.

- Lastly, we have **groups** that collect users and **Service Principals** into a single entity. **Groups** can be nested with other groups. For example, a parent group called **Employees** can contain two inner groups: **HR** and **Finance** groups.



### Identity Federation and Privileges

Databricks identities exist at two levels: at **account-level** and at **workspace-level**.

**Unity Catalog** supports a feature called **Identity Federation**, where identities are simply created once in the **Account Console**. Then, they can be assigned to one or more workspaces as needed.

So, **Identity Federation** eliminates the need to manually create and maintain copies of identities at the workspace level.

**Unity Catalog** has **CREATE**, **USAGE**, **SELECT**, and **MODIFY** privileges.

In addition, we also have privileges related to the underlying storage, which are **READ FILES** and **WRITE FILES**, replacing the **ANY FILE** privilege we saw previously with the **Hive metastore**.

Lastly, we have the **EXECUTE** privilege to allow executing **user-defined functions**.



### Unity Catalog Security Model

So, putting it all together, we can see here the **security model** of **Unity Catalog**.

As you can see, **Unity Catalog** uses a different security model than **Hive metastores** for granting privileges. There are different privilege types and extra securable objects and principals.

Here, we continue to use the `GRANT` statement in order to give a privilege on a secure object to a principal.

**Unity Catalog** is additive, which means that your legacy **Hive metastore** is still accessible once **Unity Catalog** is enabled.

Regardless of the **Unity Catalog metastore** assigned to the workspace, the catalog named `hive_metastore` always provides access to the **Hive metastore** local to that workspace.

In addition to its **centralized governance model**, **Unity Catalog** also has a built-in **data search and discovery** feature.

It also provides **automated lineage**, where you can identify the origin of your data and where it is used across all data types, like **tables**, **notebooks**, **workflows**, and **dashboards**.

Lastly, as we saw, **Unity Catalog** unifies existing legacy catalogs. So, there is no hard migration needed when enabling **Unity Catalog**.

At the end, in order to access the **Account Console**, you can log in as an **Account Administrator** via this link:

**accounts.cloud.databricks.com**

Great! That's it for this lecture.

See you in the next one.