#### Writing from DataFrame to Lakehouse Table

In [None]:
# first, let's get some data 
df = spark.read.json('Files/pyspark/json/property-sales.json')

display(df)

#### Beware of the limitations of Lakehouse column naming
- Read more [here](https://learn.microsoft.com/en-us/fabric/data-engineering/load-to-tables) 

In [None]:
#inspecting the schema 
df.printSchema()

In [None]:
# changing column names to allow write to Lakehouse tables
df = df.withColumnRenamed("SalePrice ($)","SalePrice_USD")\
        .withColumnRenamed("Address ", "Address")\
        .withColumnRenamed("City ", "City")
display(df)

In [None]:
df.printSchema()

#### Writing DF to Table, with different 'modes'
Using saveAsTable, we save the DataFrame as a 'Managed Table' (Spark terminology) - meaning both the metadata and the data is managed by Spark.

With a managed table, because Spark manages everything, a SQL command such as DROP TABLE table_name deletes both the metadata and the data. With an unmanaged table, the same command will delete only the metadata, not the actual data.

In [None]:
delta_table_name = 'PropertySales'

# use saveAsTable to save as a Managed Table
df.write.mode("overwrite").format("delta").saveAsTable(delta_table_name)


**Four different write modes**

In [None]:
# these are four different write 'modes' 

# append the new dataframe to the existing Table
df.write.mode("append").format("delta").saveAsTable(delta_table_name)

df_2 = spark.sql("SELECT * FROM OneLake_Shaun.propertysales LIMIT 1000")
df_2.toPandas().info()

In [None]:
# overwrite existing Table with new DataFrame
df.write.mode("overwrite").format("delta").saveAsTable(delta_table_name)

df_2 = spark.sql("SELECT * FROM OneLake_Shaun.propertysales LIMIT 1000")
df_2.toPandas().info()

In [None]:
# Throw error if data already exists
df.write.mode("error").format("delta").saveAsTable(delta_table_name)

df_2 = spark.sql("SELECT * FROM OneLake_Shaun.propertysales LIMIT 1000")
df_2.toPandas().info()

In [None]:
# Fail silently if data already exists 
df.write.mode("ignore").format("delta").saveAsTable(delta_table_name)

df_2 = spark.sql("SELECT * FROM OneLake_Shaun.propertysales LIMIT 1000")
df_2.toPandas().info()

#### Write to an unmanaged delta table (perhaps for export to external file system/ Databricks/ Snowflake)

In [None]:
# unmanaged table
df.write.mode("overwrite").format("delta").save(path="Files/pyspark/delta/unmanaged.delta")


#### Read from Table into DataFrame

In [None]:
df = spark.sql("SELECT * FROM OneLake_Shaun.propertysales LIMIT 1000")
display(df)