###Process the Customers Data
1. Ingest the data into the lakehouse - bronze_customers
2. Perform data quality checks and transform the data as required - silver_customers_clean
3. Apply changes to the Customers data - silver_customers

####1. Ingest the data into the lakehouse : bronze_customers

In [0]:
-- CREATING DELTA LIVE TABLES in SQL

CREATE OR REFRESH STREAMING TABLE bronze_customers
COMMENT 'Raw customers data ingested from the source system operational data into the dlt bronze_customers table'
TBLPROPERTIES( 'quality' =  'bronze')
AS
SELECT *,
       _metadata.file_path AS input_file_path,
       current_timestamp AS ingestion_timestamp
FROM cloud_files('/Volumes/circuitbox/landing/operational_data/customers/', 'json', map('cloudFiles.inferColumnTypes', 'true'));

Name,Type
created_date,string
customer_id,bigint
customer_name,string
date_of_birth,string
email,string
telephone,string
_rescued_data,string
input_file_path,string
ingestion_timestamp,timestamp


In [0]:
-- Running alone the above code is not sufficient. Now we need to create a DELTA LIVE TABLE PIPEPLINE or ETL PIPELINE to ingest the data from the landing schema into the bronze table.
-- This can be either done clicking the "Create Pipeline" button or going to the Workflows menu and clicking "Create ETL Pipeline".
-- After creating the pipeline, we can start it and monitor the progress in the "Workflows" menu
-- If there are any errors related to resources, we can go to Quotas and increase the resources for the workspace

####2. Perform data quality checks and transform the data as required : silver_customers_clean
#####STREAM(LIVE.bronze_customers) says that the bronze_customers table is treated as a streaming table and we are loading only the new data that is coming in i.e., it will process the data increamentally

In [0]:
CREATE OR REFRESH STREAMING TABLE silver_customers_clean(
  CONSTRAINT valid_customer_id EXPECT (customer_id IS NOT NULL) ON VIOLATION FAIL UPDATE,
  CONSTRAINT valid_customer_name EXPECT (customer_name IS NOT NULL) ON VIOLATION DROP ROW,
  CONSTRAINT valid_telephone EXPECT(LENGTH(telephone) >= 10),
  CONSTRAINT valid_email EXPECT(email IS NOT NULL),
  CONSTRAINT valid_date_of_birth EXPECT(date_of_birth >= '1920-01-01')
)
COMMENT 'Cleaned customers data ingested from the bronze_customers table'
TBLPROPERTIES( 'quality' =  'silver')
AS
SELECT customer_id,
       customer_name,
       CAST(date_of_birth AS DATE) as date_of_birth,
       telephone,
       email,
       CAST(created_date AS DATE) AS created_date
FROM STREAM(LIVE.bronze_customers)
  

####3. Apply changes to the Customers data : silver_customers
1. Silver_Customers is a TYPE 1 deminsion table with primary-key as customerid. Applying changes to the Customer data based on created_date
2. Databricks offers APPLY CHANGES API to do this
3. But APPLY CHANGES API does not create a table. So we should create a table upfront
4. While using this API, the target table should also be a streaming table

In [0]:
-- Creating the streaming table

CREATE OR REFRESH STREAMING TABLE silver_customers
  COMMENT 'SCD Type 1 customer data'
  TBLPROPERTIES( 'quality' =  'silver');


In [0]:
--Using APPLY CHANGES API 

APPLY CHANGES INTO LIVE.silver_customers
FROM STREAM(LIVE.silver_customers_clean)
KEYS(customer_id)
SQUENCE BY created_date
STORED AS SCD Type 1; -- Optional. Type 1 is the default value