### For reading files we use SELECT option, but it has limitations.
### To overcome this, we have 2 option:
  1. Table Valued Functions(eg: read_files)
  2. External Tables
    - Databricks does not allow us to create an external table on an Unity Catalog VOLUME. Hence for operational_data => used read_files()
  3. Since we have not created unity catalog volume on external_data folder, we CAN use external tables here
    - we can use the abfs protocal to access the data, as we have already created the external location on the gizmo box container itself

## Extract Data From the Payments Files
####(using EXTERNAL TABLE to query the payments data in bronze schema) 
###1. List the files from Payment folder
###2. Create External Table
###3. Demonstrate the effect of adding/updating/deleting files
###4. Demonstrate the effect of dropping the Table

###1. List the files from Payment folder

In [0]:
%fs ls 'abfss://gizmobox@dbcertificationsa.dfs.core.windows.net/landing/external_data/payments'

path,name,size,modificationTime
abfss://gizmobox@dbcertificationsa.dfs.core.windows.net/landing/external_data/payments/payments_2024_10.csv,payments_2024_10.csv,487,1747676633000
abfss://gizmobox@dbcertificationsa.dfs.core.windows.net/landing/external_data/payments/payments_2024_11.csv,payments_2024_11.csv,773,1747676633000
abfss://gizmobox@dbcertificationsa.dfs.core.windows.net/landing/external_data/payments/payments_2024_12.csv,payments_2024_12.csv,1157,1747676633000
abfss://gizmobox@dbcertificationsa.dfs.core.windows.net/landing/external_data/payments/payments_2025_01.csv,payments_2025_01.csv,1084,1747676633000


###2. Create External Table

In [0]:
CREATE TABLE IF NOT EXISTS gizmobox.bronze.payments
(
  payment_id INTEGER,
  order_id INTEGER,
  payment_timestamp TIMESTAMP,
  payment_status INTEGER,
  payment_method STRING
  )
  USING CSV
  OPTIONS(
          HEADER = 'TRUE',
          DELIMITER = ','
  )
  LOCATION 'abfss://gizmobox@dbcertificationsa.dfs.core.windows.net/landing/external_data/payments'

In [0]:
SELECT * FROM gizmobox.bronze.payments

payment_id,order_id,payment_timestamp,payment_status,payment_method
35,13,2024-12-17T05:25:34Z,2,PayPal
36,16,2024-12-09T10:45:56Z,4,Bank Transfer
37,17,2024-12-02T13:50:20Z,1,Bank Transfer
38,24,2024-12-22T08:30:55Z,1,PayPal
39,27,2024-12-03T11:45:30Z,2,Credit Card
40,28,2024-12-27T05:40:10Z,4,Credit Card
41,32,2024-12-29T09:20:40Z,1,PayPal
42,37,2024-12-23T12:10:05Z,2,Credit Card
43,38,2024-12-26T20:15:15Z,1,Credit Card
44,44,2024-12-21T14:25:45Z,1,Credit Card


In [0]:
-- To check if the table is external or managed
DESCRIBE EXTENDED gizmobox.bronze.payments



col_name,data_type,comment
payment_id,int,
order_id,int,
payment_timestamp,timestamp,
payment_status,int,
payment_method,string,
,,
# Detailed Table Information,,
Catalog,gizmobox,
Database,bronze,
Table,payments,


###3. Demonstrate the effect of adding/updating/deleting files
##### If we add or remove some files from the payments folder, the metadata changes and we cannot access the files. Hence we need to refresh the metadata when we delete or add new files to the folder.

In [0]:
-- this is done using refresh table

REFRESH TABLE gizmobox.bronze.payments

##### when using external tables, it is a good idea to use refresh table before using them.

###4. Demonstrate the effect of dropping the Table

In [0]:
-- when we drop an external table, only metadata gets deleted from the unity catalog. The data files remains in the storage account
DROP TABLE IF EXISTS gizmobox.bronze.payments;

-- But when we drop a managed table, even the files in strorage account gets deleted

#### 