# Guided Project : Data Management with Databricks: Big Data with <img src="https://docs.delta.io/latest/_static/delta-lake-logo.png" width=300/>

<img src="https://upload.wikimedia.org/wikipedia/commons/9/97/Coursera-Logo_600x600.svg" width=50 height=50/>

**Project Scneario**: You are a Data Engineer working for an online clothing brand that sells a wide range of fashion Brands. The company's Supply Chain team has been tasked with building a dashboard to Analyze Orders history.

The supply chain team has been tasked with building a dashboard to **Analyze Orders history**. Your dashboard will be used to inform purchasing behaviour and ensure that the company has enough inventory to meet demand for the upcoming holiday season.

Throughout this real-world business scenario, you will learn how to create and ingest data into a delta table. Then use Databricks notebooks (using Python and SQL) to process/transform the data and produce the Supply chain dashboard. At the end you'll leverage Delta Lake's built-in functionalities such as merge operations and time travel to create a scalable data pipeline.

# TASK 2 - Upload project JSON files to Databricks file system

In [0]:
# First Check that the parameter "DBFS File Browser" is Enable. Navigate to "Settings > Admin > Workspace settings"  to check

### a. Upload ORDERS Json files in Databricks File System

In [0]:
## Load Data Using the UI to this path dbfs:/FileStore/SupplyChain/ORDERS_RAW/

### b. Check loaded files

In [0]:
# Use Databricks Utilities (dbutils). Documentation : https://docs.databricks.com/dev-tools/databricks-utils.html#ls-command-dbutilsfsls 



# TASK 3 - Create Delta Table : ORDERS_RAW

### a. Read multiline json files using spark dataframe:

In [0]:
# Read multiple line json files using spark dataframeAPI


orders_raw_df = 

## Show the datafarme
orders_raw_df.show(n=5, truncate=False) 

## click on orders_raw_df to Check the schema

In [0]:
#Validate loaded files Count Number of Rows in the DataFrame, the total Should be "1510"


Out[3]: 1510

### ![b.](https://pages.databricks.com/rs/094-YMS-629/images/delta-lake-tiny-logo.png) b. Create Delta Table ORDERS_RAW

Delta Lake is 100% compatible with Apache Spark&trade;, which makes it easy to get started with if you already use Spark for your big data workflows.
Delta Lake features APIs for **SQL**, **Python**, and **Scala**, so that you can use it in whatever language you feel most comfortable in.


   <img src="https://databricks.com/wp-content/uploads/2020/12/simplysaydelta.png" width=400/>

In [0]:
# First, Create Database SupplyChainDB if it doesn't exist
db = "SupplyChainDB"

spark.sql(f"CREATE DATABASE IF NOT EXISTS {db}")
spark.sql(f"USE {db}")

Out[4]: DataFrame[]

In [0]:
## Create DelaTable ORDERS_RAW in the metastore using DataFrame's schema and write data to it
## Documentation : https://docs.delta.io/latest/quick-start.html#create-a-table


### c. Show Created Delta Table:

In [0]:
%sql
-- Switch to SQL Cell using %SQL
SHOW tables
 
 -- Alternativerly you can use Python: display(spark.sql(f"SHOW TABLES"))

**d. Validate data loaded successfully to Delta Table ORDERS_RAW**:

In [0]:
%sql



**e. Decsribe Detail of the Delta Table**:

In [0]:
%sql

-- describe DETAIL ORDERS_RAW

-- Returns the basic metadata information of a delta table.

#Practice Activity 1 : Create INVENTORY Delta table

### a. Upload INVENTORY.Json file in DBFS

In [0]:
## Load the file using the UI to this path dbfs:/FileStore/SupplyChain/INVENTORY/

###b. Read the File using spark dataframe

In [0]:
inventory_df = 

## Show the datafarme
inventory_df.show(n=5, truncate=False)

### ![c.](https://pages.databricks.com/rs/094-YMS-629/images/delta-lake-tiny-logo.png) c. Create Delta Table INVENTORY

In [0]:
# First, Create Database SupplyChainDB
db = "SupplyChainDB"
spark.sql(f"USE {db}")

Out[41]: DataFrame[]

In [0]:
## Create INVENTORY Delta Table 
inventory_df. 

### d. Show Created Delta Tables:

In [0]:
%sql
-- Switch to SQL Cell using %sql
SHOW TABLES

# TASK 4 - Transform data in delta table

<a href="https://www.databricks.com/glossary/medallion-architecture" target="_blank">Medallion Architecture</a>   
</br>
<img src="https://databricks.com/wp-content/uploads/2020/09/delta-lake-medallion-model-scaled.jpg" width=900/>

During this Task you will : 
* 1- Read delta Table using Spark Dataframe
* 2- Convert Data Type String --> Date
* 3- Drop Rows with Null Values
* 4- Add a Computed Column "TOTAL_ORDER"
* 5- Create new deltatable Orders_Gold

### a. Read ORDERS_RAW delta table using spark Dataframe

In [0]:
#read Delta Table using spark dataframe

ORDERS_Gold_df=  

ORDERS_Gold_df.show(n=5,truncate=False)
# Click on ORDERS_DF to See the Schema of the Table. 

### b. Update ORDER_DATE Column's Data Type

In [0]:
#Use withColumn method & to_date()
# withColumn Documentation : https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.DataFrame.withColumn.html
# TO_DATE() Documentation : https://docs.databricks.com/sql/language-manual/functions/to_date.html

from pyspark.sql.functions import *


ORDERS_Gold_df =  

### c. Drop Rows with Null Values

In [0]:
# Count Nulls for each column
from pyspark.sql.functions import *

display(ORDERS_Gold_df.select([count(when(col(c).isNull(),c)).alias(c) for c in ORDERS_Gold_df.columns]))

In [0]:
#  Remove Nulls using dropna() method which removes all rows with Null Values 

ORDERS_Gold_df = 

ORDERS_Gold_df.count()

### d. Add new Column TOTAL_ORDER

In [0]:
#Use withColumn function
#Documentation : https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.DataFrame.withColumn.html


ORDERS_Gold_df= 

# Display ORDERS_Gold_df to validate the creation of the New Column TOTAL_ORDER
display(ORDERS_Gold_df)

### e. Create Delta Table ORDERS_GOLD

In [0]:
# Make sure you are using SupplyChainDB
spark.sql(f"USE SupplyChainDB")

## Create DeltaTable Orders_GOLD: 

ORDERS_Gold_df.


## Validate that the table was created successfully
display(spark.sql(f"SHOW TABLES"))

In [0]:
display(spark.sql(f"SHOW TABLES"))

database,tableName,isTemporary
default,orders_gold,False


-- Read more about different write options and parameters here https://docs.delta.io/latest/delta-batch.html#write-to-a-table 

* **Append** to automatically add new data to an existing Delta table, 
* **Overwrite** To automatically replace all the data in a table:

# TASK 5 - Query Orders Delta table using SQL

### Get Familiar with Orders_Gold dataset

In [0]:
%sql
-- Get top 30 rows Get Familiar with the Data


### KPI-1: Quantity Sold by Country

In [0]:
%sql
-- Division = CATEGORY 
-- Dont forget to Filter out Cancelled Orders


### KPI-2: Sales by Division ($)

In [0]:
%sql
-- Dont forget to Filter out Cancelled Orders


### KPI-3: Top-5 Popular Brands

In [0]:
%sql
-- Limit Result to 5 and Order Results and order by Sold Quanity


# TASK 6 - Create Dashboard

In [0]:
# Use Databricks UI
# 1- Turn results of Previous Queries into visualisations
# 2- Create Dashboard and add Visualisations

# Practice Activity 2 : Add Monthly Sales Trend to your Dashboard

### KPI-4: Monthly Sales Trend (In QTY)

** Instructions :**  
  # 1- Query Delta Table: Orders_Gold to extract Monthly Sales (in Quantity, across all brands and all regions) 
  # 2- Turn the result (Table) into a visualisation (line chart) to Show the Trend for the last 18 months.
  # 3- Add your visualization to the Supply Chain Dashboard.

In [0]:
%sql
-- Use DATE_TRUNC()  


# TASK 7 - Update Data in Orders table using Merge

<img src="https://databricks.com/wp-content/uploads/2020/09/delta-lake-medallion-model-scaled.jpg" width=1012/>

### a. Upload Json files into DBFS

Use UI to upload the file "UPDATE_ORDERS_RAW.json" into DBFS, use the same folder dbfs:/FileStore/SupplyChain/ORDERS_RAW/

### b. Read file using Spark dataframe

In [0]:
# Read multiple line json file UPDATE_ORDERS_RAW.json
Update_orders_df = 

## Show the datafarme
display(Update_orders_df)

-->Check the original data **BEFORE MERGE**

In [0]:
%sql 
select ORDER_ID,ORDER_STATUS,Quantity from Supplychaindb.ORDERS_RAW WHERE ORDER_ID in ("ORD-1281","ORD-829","ORD-193","ORD-826","ORD-842")

### c. Update Orders_RAW deltatable using Merge

In [0]:
%sql
DESCRIBE DETAIL supplychaindb.ORDERS_RAW

In [0]:
from delta.tables import *

# programmatically interacting with Delta tables using the class delta.tables.DeltaTable(spark: pyspark.sql.session.SparkSession, jdt: JavaObject)
delta_orders_raw =  

In [0]:
## merge data into delta Table ORDER_RAW
# DOCUMENTATION https://docs.delta.io/latest/delta-update.html#language-python 

delta_orders_raw. 

# must be at least one WHEN clause in a MERGE statement.

--> check the udaptes rows **AFTER MERGE**

In [0]:
%sql 
select ORDER_ID,ORDER_STATUS,Quantity from SUPPLYCHAINDB.ORDERS_RAW WHERE ORDER_ID in ("ORD-1281","ORD-829","ORD-193","ORD-826","ORD-842")

Learn More about Merge Operations check out https://docs.delta.io/latest/delta-update.html#language-python

# TASK 8 - Query previous versions of delta table using **Time Travel**

**This Task shows how to time travel between different versions of a Delta table with Delta Lake. You can time travel by table version or by timestamp. You’ll learn about the benefits of time travel and why it’s an essential feature for production data workloads.**

**Documentation : https://delta.io/blog/2023-02-01-delta-lake-time-travel/** 

<img src="https://delta.io/static/9c42ea9f028932de03257ed75d35a8ba/cf8e5/image1.png" width=1012/>

### a. Describe Detla Table History:

In [0]:
%sql
-- Check Table History 

-- Use the UI to see Delta Table History

### b. Using SQL:

In [0]:
%sql 
 select ORDER_ID,ORDER_STATUS,Quantity from SUPPLYCHAINDB.ORDERS_RAW VERSION AS OF 1 WHERE ORDER_ID in ("ORD-1281","ORD-829","ORD-193","ORD-826","ORD-842")

-- CHange Version Number to See different Versions of the delta table

### c. Using Spark dataframe:

In [0]:
#Time Travel
version_1 = spark.read.format('delta').option('TimeStamp', "2023-05-16").table("SUPPLYCHAINDB.ORDERS_RAW")
display(version_1)

# END OF THE PROJECT

# CUMULATIVE CHALLENGE

**Your Task :</br> Using the “Inventory” data, your task is to enrich the Supply Chain Dashboard with the list of low-stock and out-of-stock Items.** 

Using Databricks notebook you will : </br>
1-Upload INVENTORY.JSON file to DBFS(1) </br>
2-Read the file using spark dataframe (1)</br>
3-Create Delta Table INVENTORY (1)</br>
4-Write an SQL query to cross join ORDERS_GOLD and INVENTORY DeltaTables to find the list of Items Low-in Stock or Out-of Stock</br>
5-Turn the result into a Visualisation (Table type) and Add it to your SupplyChain Dashboard</br>

</br>(1) Skip if you have completed Practice Activity 1

### a. Upload INVENTORY.Json file in DBFS

In [0]:
## Load the file using the UI to this path dbfs:/FileStore/SupplyChain/INVENTORY/

### b. Read the File using spark dataframe

In [0]:
inventory_df =

## Show the datafarme
inventory_df.show(n=5, truncate=False)

### c. Create Delta Table INVENTORY <img src="https://pages.databricks.com/rs/094-YMS-629/images/delta-lake-tiny-logo.png" width=35 height=35/>

In [0]:
# Use SupplyChainDB Database
db = "SupplyChainDB"
spark.sql(f"USE {db}")

Out[19]: DataFrame[]

In [0]:
## Create INVENTORY Delta Table 
inventory_df. 

## Validate that the table was created successfully
display(spark.sql(f"SHOW TABLES"))

### d. Cross Join ORDERS_GOLD and INVENTORY DeltaTables to find the list of Low Stock or Out-of Stock Items

**Your Goal** is to find the list of Low-Stock or Out-of-Stock Items and Add the result to your SupplyChain Dashboard<br />
**Hints:**
* Group all Orders (from ORDERS_GOLD) based-on BRAND, COLOR, PRODUCT_NAME AND SIZE And Add QTY_SOLD (SUM QUANTITY) 
* Cross join the result with INVENTORY using an inner join on BRAND, PRODUCT_NAME, COLOR AND SIZE.
* Add calculated column "QTY_LEFT_STOCK" as (STOCK - QTY_SOLD)
* Filter-out Cancelled ORDERS (ORDER_STATUS)
* Keep only Items with QTY_LEFT_STOCK < 20
* Sort the result by "QTY_LEFT_STOCK" in ascending order

In [0]:
%sql
-- Write Your Query Here : 


### e. Turn the result into a Visualisation (Table) and Add it to SupplyChain Dashboard

In [0]:
# Use Databricks UI to Turn results into a visualisation and then add it to your SupplyChain Dashboard

#  
   <img src="https://www.freeiconspng.com/uploads/congratulations-png-1.png" width=350/>
   .... THIS IS THE END OF THE GUIDED PROJECT