### Catalog → Schema → Table hierarchy

- In Databricks, this three-tier hierarchy is used to organize data and manage security.
- **Catalog:** The top-level container. It usually represents a broad business unit, a specific project, or an environment (like production vs. development).
- **Schema (or Database):** The middle layer inside a catalog. It organizes data into logical groups, such as sales, hr, or marketing.
- **Table:** The lowest level where the actual data rows and columns live.
- In a professional Databricks environment using Unity Catalog, we use Three-Part Naming to uniquely identify every piece of data. 
- This ensures that even if two different projects have a table named sales, the system knows exactly which one you mean.
- The format is: <catalog>.<schema>.<table_name>
- To query this in SQL, you would write:
```
SELECT * FROM main.ecommerce.silver_events;
```

### Access control (GRANT/REVOKE)

- In the Catalog → Schema → Table hierarchy, Access Control is the security gatekeeper. 
- We use two primary SQL commands to manage who can interact with our data: GRANT (to give permission) and REVOKE (to take it away).

- Think of the hierarchy like a building:
- **Catalog:** The building itself.
- **Schema:** A specific floor or department.
- **Table:** A specific filing cabinet or document.

**The Syntax**
- To give a permission to data, you follow the hierarchy down from the top. Usually, a user needs USAGE on the Catalog and Schema before they can SELECT (read) the Table.
- Example:
```
-- Step 1: Give access to the "building"
GRANT USAGE ON CATALOG prod_catalog TO `analyst_team`;

-- Step 2: Give access to the "floor"
GRANT USAGE ON SCHEMA prod_catalog.ecommerce_dept TO `analyst_team`;

-- Step 3: Give access to the "document"
GRANT SELECT ON TABLE prod_catalog.ecommerce_dept.silver_events TO `analyst_team`;
```
- Why Hierarchy Matters: If you REVOKE access at the Catalog level, the user loses access to everything inside it (all schemas and all tables), even if they still have individual permissions on a specific table. It's a "top-down" security model.

### Data lineage

- Data lineage is the map that tracks data's journey from its origin to its final destination. It shows you exactly where data comes from, how it was transformed, and where it is used. 
- Think of it like a food traceability system: if someone gets sick, you need to be able to trace the lettuce back through the distributor, the processing plant, all the way to the specific farm it came from. 
- In data, lineage helps you "trace the lettuce" when a report looks wrong or a pipeline breaks.

- Why Lineage is Critical
- **Trust & Troubleshooting:** When a dashboard shows a weird number, lineage tells you exactly which upstream table or transformation code created that value.
- **Impact Analysis:** Before you delete a column or change a table, lineage shows you every report or downstream process that will break because of that change.
- **Compliance & Auditing:** For regulated industries (like banking or healthcare), lineage proves that data hasn't been tampered with and follows legal privacy rules.

### Managed vs external tables

- In Databricks and Delta Lake, the distinction between Managed and External tables centers on who controls the data's lifecycle—the system or you. 
- Think of it like the difference between staying in a hotel and renting an unfurnished apartment. In a hotel (Managed), when you "check out" and delete the reservation, the room and everything in it is cleared out by the hotel. In an apartment (External), if you end your lease, your furniture (your data) stays yours, and you have to move it yourself.
- Here is a breakdown of how they operate:
| Feature | Managed Table | External Table
| ----- | ----- | ----- | 
| Data Location | Managed by Databricks (in the "Hive Warehouse" or Root storage). | Managed by You (in a specific path you provide, like S3 or ADLS).
| DROP TABLE | Deletes Everything: The metadata (schema) AND the physical data files. | Deletes Link Only: The metadata is gone, but the physical data files remain safe.
| Storage Path | You don't specify a path; Databricks picks it for you. | You must specify a LOCATION '/mnt/data/...'.


#### Task 1: Create Catalog & Schemas

In [0]:
%sql

CREATE CATALOG IF NOT EXISTS ecommerce_prod;

-- Create schemas for each layer 
CREATE SCHEMA IF NOT EXISTS ecommerce_prod.bronze;
CREATE SCHEMA IF NOT EXISTS ecommerce_prod.silver;
CREATE SCHEMA IF NOT EXISTS ecommerce_prod.gold;

#### Task 2: Register Delta Tables

updated code in project01->notebook_02_silver file.

```
silver_df.write.format("delta") \
    .mode("overwrite") \
    .saveAsTable("ecommerce_prod.silver.cleaned_events")
```

#### Task 3: Set Up Permissions (Access Control)

In [0]:
%sql
-- 1. For the Catalog
GRANT USE CATALOG ON CATALOG ecommerce_prod TO `marketing_team`;

-- 2. For the Schema (Changed USAGE to USE SCHEMA)
GRANT USE SCHEMA ON SCHEMA ecommerce_prod.gold TO `marketing_team`;

-- 3. For the Table
GRANT SELECT ON TABLE ecommerce_prod.gold.brand_metrics TO `marketing_team`;

#### Task 4: Create Views for Controlled Access

In [0]:
%sql

CREATE OR REPLACE VIEW ecommerce_prod.gold.high_value_sales AS
SELECT brand, main_category, total_revenue
FROM ecommerce_prod.gold.brand_metrics
WHERE total_revenue > 500;

GRANT SELECT ON VIEW ecommerce_prod.gold.high_value_sales TO `marketing_team`;