## ### CSV to Delta 
CSV is row-based, text, schema-less, so you typically read into a DataFrame with options, then write as Delta for ACID + performance.
**Dataset used:
dim_customer.csv
**Schema:
- customer_id:integer
- name:string
- email:string
- location:string
- signup_date:date

**Using approach**
- Read the csv with pyspark.
- Inspect the schema.
- write the dataframe into bronze table using (CTAS or DataFrame Write)

 **Modes**:

**PERMISSIVE** mode sets to null field values when corrupted records are detected. By default, if you don’t specify the parameter mode, Spark sets the PERMISSIVE value.

**DROPMALFORMED** mode ignores corrupted records. The meaning that, if you choose this type of mode, the corrupted records won’t be list.

**FAILFAST** throws an exception when detects corrupted records.

In [0]:

from pyspark.sql.functions import *

df=spark.read.format("csv")\
    .option("header", "true")\
    .option("inferSchema", "true")\
    .option("delimiter",",")\
    .option("multiline","false")\
    .option("ignoreLeadingWhiteSpace","true")\
    .option("ignoreTrailingWhiteSpace","true")\
    .option("mode","PERMISSIVE")\
    .load("/Volumes/sandeshmsdatabricks/sourcefiles/sourcevolume/DLT_ETL_SOURCE/customers/dim_customers.csv")



In [0]:
df=df.withColumn('UPLOAD_DATE',current_timestamp())

In [0]:
df.write.format("delta").mode("append").saveAsTable("sandeshmsdatabricks.bronze.PRACTICE_BRONZE")

In [0]:
df = spark.read.table("sandeshmsdatabricks.bronze.PRACTICE_BRONZE")

df=df.dropDuplicates(['customer_id'])

df=df.withColumn('domain',split(col('email'),"@",).getItem(1))

In [0]:
df.write.format("delta").mode("overwrite")\
    .option("overwriteSchema", "true")\
    .saveAsTable("sandeshmsdatabricks.bronze.PRACTICE_Silver")

In [0]:
%sql
DESCRIBE TABLE sandeshmsdatabricks.bronze.PRACTICE_Silver

col_name,data_type,comment
customer_id,int,
name,string,
email,string,
location,string,
signup_date,date,
UPLOAD_DATE,timestamp,
domain,string,


In [0]:
%sql
SELECT * FROM sandeshmsdatabricks.bronze.PRACTICE_Silver

customer_id,name,email,location,signup_date,UPLOAD_DATE,domain
94,Christopher Phelps,phernandez@gmail.com,Port Amanda,2025-02-22,2025-12-13T19:23:15.268Z,gmail.com
29,Juan Schroeder,douglasroger@garcia-yang.com,Crosbyhaven,2024-12-06,2025-12-13T19:23:15.268Z,garcia-yang.com
88,Anthony Cruz,mark06@stanley.biz,Traciview,2023-07-30,2025-12-13T19:23:15.268Z,stanley.biz
108,Antonio Hart,jessegomez@hotmail.com,Port Richard,2023-02-03,2025-12-13T19:23:15.268Z,hotmail.com
131,Kimberly Baker,hortonbethany@jimenez.biz,West Natashastad,2024-04-27,2025-12-13T19:23:15.268Z,jimenez.biz
148,Jeffrey Henson,alexanderwagner@yahoo.com,Lake Tammy,2025-02-28,2025-12-13T19:23:15.268Z,yahoo.com
190,Mark Brown,fflores@gmail.com,East Kendraville,2025-04-11,2025-12-13T19:23:15.268Z,gmail.com
195,Susan Baker,andersonanthony@gmail.com,West Kelly,2023-02-03,2025-12-13T19:23:15.268Z,gmail.com
56,Jeffrey Leonard,khunter@aguirre-avila.com,Halltown,2022-10-22,2025-12-13T19:23:15.268Z,aguirre-avila.com
172,Carolyn Frazier,rhenry@williams-miller.com,East Zachary,2023-04-14,2025-12-13T19:23:15.268Z,williams-miller.com
