# DAY 06 - Read/Write to Lakehouse Tables
- Youtube Link: https://www.youtube.com/watch?v=02lSlhwLU4c

### Writing from DataFrame to Lakehouse Table

In [None]:
# Get the data
df = spark.read.json('Files/json/property-sales.json')

display(df)

### Beware of Limitations of Lakehouse Column Naming
- ##### No Special Characters
  Allowed: Letters, Numbers, Underscore
- ##### No Spaces
- ##### Case Sensitivity Differences
  Spark is 'case-sensitive'.
  SQL Endpoint is 'case-insensitive'.
- ##### Cannot Use SQL Reserved Keywords
  Ex: SELECT, GROUP, ORDER, JOIN, DATE, TABLE, INDEX, NULL
- ##### Cannot Start with a Number
- ##### Column Names Must be Unique
  Fabric does not allow duplicate names, even with different casing.
- ##### Avoid Leading and Trailing Underscores
  These may cause issues during schema inference.
- ##### Renaming Columns in Delta Tables Is Limited
  Once written, column names are not easy to rename. May require: recreating the table, rewriting data with a new schema
- ##### Avoid Very Long Column Names
- ##### No Duplicate Column Names Across Joins Without Aliasing
  Spark will enforce unique names when joining tables â€” duplicates must be renamed.

### Inspecting the Schema

In [None]:
df.printSchema()

# output: (nullable = true) means that the column can have null values

### Renaming Columns

In [None]:
# Changing column names before writing to Lakehouse Tables
df = df.withColumnRenamed("ColumnName", "NewColumnName")

# Example:
df = df.withColumnRenamed("SalePrice ($)", "SalePrice_USD")\
        .withColumnRenamed("Address ", "Address")\
        .withColumnRenamed("City ", "City")

# Check if the renaming is successful
df.printSchema()

### Writing DF to Table, with different "Modes"

### Managed Table

In [None]:
delta_table_name = 'PropertySales'

# Use saveAsTable to save as a Managed Table
df.write.mode("overwrite").format("delta").saveAsTable(delta_table_name)

#### Four (4) Different Write Modes
- Append
- Overwrite
- Error
- Ignore

In [None]:
# These are the 4 different write 'modes'

# Append the new dataframe to an existing table
df.write.mode("append").format("delta").saveAsTable(delta_table_name)

# Overwrite existing table with the new dataframe
df.write.mode("overwrite").format("delta").saveAsTable(delta_table_name)

# Throw error if data already exists
df.write.mode("error").format("delta").saveAsTable(delta_table_name)

# Fail silently if data already exists
df.write.mode("ignore").format("delta").saveAsTable(delta_table_name)

### Write to an Unmanaged Table
- for export to external file system/Databricks/Snowflake

In [None]:
# Unmanaged Table
df.write.mode("overwrite").format("delta").save(path = "Files/delta/unmanaged.delta")