#### Data_storage

storing transformed data in Delta Lake, here's a detailed guide.  writing the transformed data to Delta format, which is optimized for efficient storage and fast queries in Spark.

### Step 1: Import Required Libraries

In [None]:
import os
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
from delta import *

# Initialize Spark session with Delta Lake
spark = SparkSession.builder \
    .appName("Customer Churn Data Storage") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
    .getOrCreate()

print("Spark session initialized with Delta Lake.")


### Step 2: Load Transformed Data
Here, you will load the transformed data (as a Spark DataFrame) that we created in earlier steps.

In [None]:
# data is already transformed and stored in a DataFrame `df_spark`
df_spark.show(5)  # Display the first 5 rows to confirm data

# If the data is stored as a CSV file or any other format, load it like this:
# df_spark = spark.read.csv("path_to_transformed_data.csv", header=True, inferSchema=True)


### Step 3: Write Data to Delta Lake
Delta Lake is a storage format that provides ACID transactions, scalable metadata handling, and data versioning. Below is the code to write our  Spark DataFrame to Delta Lake.

In [None]:
# Define the Delta table path
delta_table_path = "C:/path_to_store_delta_table/customer_churn_delta"

# Write data to Delta Lake (overwrite if the table already exists)
df_spark.write.format("delta").mode("overwrite").save(delta_table_path)

print("Data stored in Delta Lake successfully.")


You can replace mode("overwrite") with other modes such as:

-append: Adds new data to an existing Delta table.

-overwrite: Overwrites the existing Delta table with the new data.

### Step 4: Verify Delta Table Creation
To verify that the data has been written correctly to Delta format, we can read it back and check

In [None]:
# Read the data back from the Delta table
df_delta = spark.read.format("delta").load(delta_table_path)

# Show the data
df_delta.show(5)


### Step 5: Perform Time Travel

Delta Lake supports time travel, which allows you to query previous versions of the data. This is useful when you want to roll back to an earlier state of your data or perform audits.

In [None]:
# Show the history of the Delta table
from delta.tables import DeltaTable

delta_table = DeltaTable.forPath(spark, delta_table_path)
delta_table.history().show()  # Show the history of Delta table operations

# Query the table as it existed at a specific version
df_version = spark.read.format("delta").option("versionAsOf", 0).load(delta_table_path)
df_version.show(5)  # Show data from version 0 (or any version you choose)


### Step 6: Create or Update Delta Table 

If you need to perform further transformations and update the existing Delta table, you can use Delta’s MERGE operation for efficient updates.

In [None]:
from delta.tables import DeltaTable

# Load Delta table
delta_table = DeltaTable.forPath(spark, delta_table_path)

# Example: Merge new data into existing Delta table
delta_table.alias("old") \
    .merge(df_spark.alias("new"), "old.CustomerID = new.CustomerID") \
    .whenMatchedUpdate(set={"MonthlyCharges": "new.MonthlyCharges"}) \
    .whenNotMatchedInsert(values={"CustomerID": "new.CustomerID", "MonthlyCharges": "new.MonthlyCharges"}) \
    .execute()

print("Delta table updated successfully.")


### Step 7: Conclusion


In [None]:
print("Data storage in Delta Lake completed successfully.")
print("1. Transformed data stored in Delta format.")
print("2. Time travel and versioning enabled for future queries.")
print("3. Delta table ready for further updates and queries.")


### Summary of Key Points
-Delta Lake allows you to store data in a transactional manner and perform time travel.

-Use write.format("delta") to store data in Delta format.
    
-Delta tables support *ACID* transactions, which means you can safely add, delete, and modify data over time.

-You can perform time travel and query historical versions of the data using the versionAsOf option.
    
-The MERGE operation can be used for efficient upserts into Delta tables.