# <span style="color:#1f77b4">**Data Analytics 03 - SQL Analysis**</span>

This notebook loads product data into Delta Lake, performs updates, and registers a managed table.



### <span style="color:#1f77b4">**Loading CSV file into dbfs (Databricks File System)**</span>

Create a DBFS folder and download the product CSV.

- `%sh` runs shell commands in the notebook context.
- `wget` downloads the CSV into DBFS.



In [None]:
# Create a DBFS folder and download sample CSVs
%sh
rm -r /dbfs/delta_lab
mkdir /dbfs/delta_lab
wget -O /dbfs/delta_lab/products.csv https://raw.githubusercontent.com/Ch3rry-Pi3-Azure/DataBricks-Data-Analytics/refs/heads/main/data/products.csv

### <span style="color:#1f77b4">**Loading data into a dataframe**</span>

Read the CSV into a Spark DataFrame with headers.

- `spark.read.load` reads the CSV file.
- `header=True` uses the first row as column names.



In [None]:
# Load all CSVs into a Spark DataFrame
df = spark.read.load('/delta_lab/products.csv', format='csv', header=True)
display(df.limit(10))

### <span style="color:#1f77b4">**Load the data into a delta table**</span>

Write the DataFrame to Delta format on DBFS.

- `write.format("delta")` selects Delta Lake.
- `save` writes the table to the path.



#### <span style="color:#1f77b4">**Storing in DBFS (Databricks File System)**</span>

Define the Delta table storage path in DBFS.

- `delta_table_path` is the location used by Delta.



In [None]:
delta_table_path = "/delta/products-delta" 
df.write.format("delta").save(delta_table_path)

### <span style="color:#1f77b4">**Manipulating the Delta Table by creating a DeltaTable Object**</span>

Create a DeltaTable object and apply an update.

- `DeltaTable.forPath` opens the Delta table.
- `update` modifies records in place.



In [None]:
from delta.tables import *
from pyspark.sql.functions import *

# Create a deltaTable object
deltaTable = DeltaTable.forPath(spark, delta_table_path)
# Update the table (reduce price of product 771 by 10%)
deltaTable.update(
   condition = "ProductID == 771",
   set = { "ListPrice": "ListPrice * 0.9" })
# View the updated data as a dataframe
deltaTable.toDF().show(10)

### <span style="color:#1f77b4">**Creating a dataframe from the delta dataset**</span>

Read the Delta table back into a DataFrame.

- `spark.read.format("delta").load` loads Delta data.



In [None]:
new_df = spark.read.format("delta").load(delta_table_path)
new_df.show(10)

### <span style="color:#1f77b4">**Explore Logging for the delta table**</span>

Inspect Delta transaction history for auditing.

- `history` shows recent operations.



In [None]:
deltaTable.history(10).show(10, False, True)

### <span style="color:#1f77b4">**Creating a Data Catalog Table**</span>

Create a managed table in the metastore.

- `saveAsTable` registers the table in the catalog.



In [None]:
df.write.format("delta").saveAsTable("default.ProductsManaged")


### <span style="color:#1f77b4">**Accessing the Data Catalog Table**</span>

Query the managed table with SQL.

- `%sql` runs SQL in a notebook cell.
- `SELECT` reads from the catalog table.



In [None]:
%sql
USE default;
SELECT * FROM ProductsManaged;