This notebook is to make basic cleanup of the data while moving files from Bronze to Silver Layer in ADLS which will include 
Setting dates to the needed standard format of "MM-dd-yyyy" across all table columns:

In [0]:
dbutils.fs.ls("/mnt/bronze")

[FileInfo(path='dbfs:/mnt/bronze/SalesLT/', name='SalesLT/', size=0, modificationTime=0)]

In [0]:
%fs ls "/mnt/bronze/SalesLT"

path,name,size,modificationTime
dbfs:/mnt/bronze/SalesLT/Address/,Address/,0,0
dbfs:/mnt/bronze/SalesLT/Customer/,Customer/,0,0
dbfs:/mnt/bronze/SalesLT/CustomerAddress/,CustomerAddress/,0,0
dbfs:/mnt/bronze/SalesLT/Product/,Product/,0,0
dbfs:/mnt/bronze/SalesLT/ProductCategory/,ProductCategory/,0,0
dbfs:/mnt/bronze/SalesLT/ProductDescription/,ProductDescription/,0,0
dbfs:/mnt/bronze/SalesLT/ProductModel/,ProductModel/,0,0
dbfs:/mnt/bronze/SalesLT/ProductModelProductDescription/,ProductModelProductDescription/,0,0
dbfs:/mnt/bronze/SalesLT/SalesOrderDetail/,SalesOrderDetail/,0,0
dbfs:/mnt/bronze/SalesLT/SalesOrderHeader/,SalesOrderHeader/,0,0


Check one of the tables to find what is the preset date column format: 

In [0]:
df = spark.read.format("parquet").load("/mnt/bronze/SalesLT/ProductModel/ProductModel.parquet")

In [0]:
display(df)

ProductModelID,Name,CatalogDescription,rowguid,ModifiedDate
1,Classic Vest,,29321d47-1e4c-4aac-887c-19634328c25e,2007-06-01T00:00:00Z
2,Cycling Cap,,474fb654-3c96-4cb9-82df-2152eeffbdb0,2005-06-01T00:00:00Z
3,Full-Finger Gloves,,a75483fe-3c47-4aa4-93cf-664b51192987,2006-06-01T00:00:00Z
4,Half-Finger Gloves,,14b56f2a-d4aa-40a4-b9a2-984f165ed702,2006-06-01T00:00:00Z
5,HL Mountain Frame,,fdd5407b-c2db-49d1-a86b-c13a2e3582a2,2005-06-01T00:00:00Z
6,HL Road Frame,,4d332ecc-48b3-4e04-b7e7-227f3ac2a7ec,2002-05-02T00:00:00Z
7,HL Touring Frame,,d60ed2a5-c100-4c54-89a1-531404c4a20f,2009-05-16T16:34:28.98Z
8,LL Mountain Frame,,65bf3f6d-bcf2-4db6-8515-fc5c57423037,2006-11-20T09:56:38.273Z
9,LL Road Frame,,ddc67a2f-024a-4446-9b54-3c679baba708,2005-06-01T00:00:00Z
10,LL Touring Frame,,66c63844-2a24-473c-96d5-d3b3fd57d834,2009-05-16T16:34:28.98Z


In [0]:
from pyspark.sql.functions import date_format

df = df.withColumn("ModifiedDate", date_format("ModifiedDate", "MM-dd-yyyy"))
display(df)


ProductModelID,Name,CatalogDescription,rowguid,ModifiedDate
1,Classic Vest,,29321d47-1e4c-4aac-887c-19634328c25e,06-01-2007
2,Cycling Cap,,474fb654-3c96-4cb9-82df-2152eeffbdb0,06-01-2005
3,Full-Finger Gloves,,a75483fe-3c47-4aa4-93cf-664b51192987,06-01-2006
4,Half-Finger Gloves,,14b56f2a-d4aa-40a4-b9a2-984f165ed702,06-01-2006
5,HL Mountain Frame,,fdd5407b-c2db-49d1-a86b-c13a2e3582a2,06-01-2005
6,HL Road Frame,,4d332ecc-48b3-4e04-b7e7-227f3ac2a7ec,05-02-2002
7,HL Touring Frame,,d60ed2a5-c100-4c54-89a1-531404c4a20f,05-16-2009
8,LL Mountain Frame,,65bf3f6d-bcf2-4db6-8515-fc5c57423037,11-20-2006
9,LL Road Frame,,ddc67a2f-024a-4446-9b54-3c679baba708,06-01-2005
10,LL Touring Frame,,66c63844-2a24-473c-96d5-d3b3fd57d834,05-16-2009


Now, we apply the same logic across all tables as shown below below:

1. Getting table names from the base location: `/mnt/bronze/SalesLT`:

In [0]:
dbutils.fs.ls("/mnt/bronze/SalesLT/")

[FileInfo(path='dbfs:/mnt/bronze/SalesLT/Address/', name='Address/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/mnt/bronze/SalesLT/Customer/', name='Customer/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/mnt/bronze/SalesLT/CustomerAddress/', name='CustomerAddress/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/mnt/bronze/SalesLT/Product/', name='Product/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/mnt/bronze/SalesLT/ProductCategory/', name='ProductCategory/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/mnt/bronze/SalesLT/ProductDescription/', name='ProductDescription/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/mnt/bronze/SalesLT/ProductModel/', name='ProductModel/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/mnt/bronze/SalesLT/ProductModelProductDescription/', name='ProductModelProductDescription/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/mnt/bronze/SalesLT/SalesOrderDetail/', name='SalesOrderDetail/', size=0, modificat

In [0]:
alltables = []
bronzepath = "/mnt/bronze/SalesLT/"

for table in dbutils.fs.ls(bronzepath):
    alltables.append(table.name.split("/")[0])
  
alltables

['Address',
 'Customer',
 'CustomerAddress',
 'Product',
 'ProductCategory',
 'ProductDescription',
 'ProductModel',
 'ProductModelProductDescription',
 'SalesOrderDetail',
 'SalesOrderHeader']

2. Applying the date logic across all columns for all tables:

In [0]:
for table in alltables:
    table_path = bronzepath + "/"+ table + "/"+ table+ ".parquet"
    df = spark.read.format("parquet").load(table_path)
    columns = df.columns

    for col in columns:
        if "date" in col or "Date" in col:
            print("Transforming column: "+ col + "in table: "+ table)
            df = df.withColumn(col, date_format(col, "MM-dd-yyyy"))

    silver_path = "/mnt/silver/SalesLT/" + table + "/"
    print("Saving table: "+ table + "at path: "+ silver_path)
    df.write.mode("overwrite").format("delta").save(silver_path)



Transforming column: ModifiedDatein table: Address
Saving table: Addressat path: /mnt/silver/SalesLT/Address/
Transforming column: ModifiedDatein table: Customer
Saving table: Customerat path: /mnt/silver/SalesLT/Customer/
Transforming column: ModifiedDatein table: CustomerAddress
Saving table: CustomerAddressat path: /mnt/silver/SalesLT/CustomerAddress/
Transforming column: SellStartDatein table: Product
Transforming column: SellEndDatein table: Product
Transforming column: DiscontinuedDatein table: Product
Transforming column: ModifiedDatein table: Product
Saving table: Productat path: /mnt/silver/SalesLT/Product/
Transforming column: ModifiedDatein table: ProductCategory
Saving table: ProductCategoryat path: /mnt/silver/SalesLT/ProductCategory/
Transforming column: ModifiedDatein table: ProductDescription
Saving table: ProductDescriptionat path: /mnt/silver/SalesLT/ProductDescription/
Transforming column: ModifiedDatein table: ProductModel
Saving table: ProductModelat path: /mnt/sil