# 00_unity_catalog_configuration

## Unity Catalog – Configuration of Catalog and Schemas

**Unity Catalog** is a unified governance layer for all data assets in the Databricks Lakehouse Platform. It provides centralized access control, auditing, lineage, and secure data sharing across workspaces.

### Key concepts:
- **Catalog** – the top-level namespace that groups schemas and tables. Each catalog can have a storage location.
- **Schema** – also known as a database, it organizes tables, views, and other objects.
- **Table** – a managed or external dataset stored in Delta format.
- **Managed location** – a directory in cloud storage managed automatically by Unity Catalog.

### This notebook demonstrates:
- Creating a Unity Catalog with managed location in Azure Data Lake
- Creating `bronze`, `silver`, and `gold` schemas according to the medallion architecture
- Creating managed and external Delta tables
- Cleaning up (dropping) tables, schemas, and catalog

In [0]:
# Set parameters
catalog_name = "data_ml_preparation"
external_location_base = "abfss://uc-altkom-akademia@altkommltrainingwesa.dfs.core.windows.net/"
full_location_path = f"{external_location_base}/data_ml_preparation"

# Create Unity Catalog
spark.sql(f"CREATE CATALOG IF NOT EXISTS {catalog_name} MANAGED LOCATION '{full_location_path}'")

In [0]:
%sql

CREATE CATALOG IF NOT EXISTS data_ml_preparation MANAGED LOCATION 'abfss://uc-altkom-akademia@altkommltrainingwesa.dfs.core.windows.net/data_ml_preparation'

In [0]:
%sql
GRANT ALL PRIVILEGES ON CATALOG data_ml_preparation TO `account users`

In [0]:
%sql
USE CATALOG data_ml_preparation;

In [0]:
%sql
USE CATALOG data_ml_preparation;
CREATE SCHEMA IF NOT EXISTS bronze

In [0]:
%sql
CREATE SCHEMA IF NOT EXISTS silver

In [0]:
%sql
CREATE SCHEMA IF NOT EXISTS gold

In [0]:
# Set catalog and schema
spark.sql("USE CATALOG data_ml_preparation")
spark.sql("USE SCHEMA bronze")

data = [
    (1, "Anna", "2023-01-01"),
    (2, "Bartek", "2023-02-15"),
    (3, "Celina", "2023-03-20")
]
columns = ["customer_id", "name", "signup_date"]
df = spark.createDataFrame(data, columns)

# Save as managed table
df.write.mode("overwrite").saveAsTable("customer_raw")

In [0]:
# Save as external table
df.write.mode("overwrite").format("delta").save(
    "abfss://uc-altkom-akademia@altkommltrainingwesa.dfs.core.windows.net/altkom-training/data-ml-preparation/bronze/external_customer_raw"
)

In [0]:
%sql
CREATE TABLE IF NOT EXISTS data_ml_preparation.bronze.external_customer_raw
USING DELTA
LOCATION 'abfss://uc-altkom-akademia@altkommltrainingwesa.dfs.core.windows.net/altkom-training/data-ml-preparation/bronze/external_customer_raw'

In [0]:
# Clean up
"""
spark.sql("USE CATALOG data_ml_preparation")
spark.sql("DROP TABLE IF EXISTS bronze.customer_raw")
spark.sql("DROP TABLE IF EXISTS bronze.external_customer_raw")

for schema in ["bronze", "silver", "gold"]:
    spark.sql(f"DROP SCHEMA IF EXISTS {schema} CASCADE")

spark.sql("DROP CATALOG IF EXISTS data_ml_preparation CASCADE")
""""""