**Intialize the Spark Session**

In [0]:
from pyspark.sql import SparkSession
spark=SparkSession.builder\
      .appName("Inventory-supply")\
      .getOrCreate()
spark

**Inventory Alerting System**

Tasks:
 1. Load the data using PySpark.
 2. Create a new column 
NeedsReorder = StockQty < ReorderLevel .
 3. Create a view of all items that need restocking.
 4. Highlight warehouses with more than 2 such items.

In [0]:
from pyspark.sql.functions import *
#1.
df=spark.read.option("header",True).option("inferSchema",True) \
    .csv("file:/Workspace/Shared/inventory_supply.csv")
df.printSchema()
df.show()
#2.
df=df.withColumn("NeedsReorder", col("StockQty") < col("ReorderLevel"))
#3.
df.createOrReplaceTempView("needs_reorder")
#4.
spark.sql("SELECT * FROM needs_reorder WHERE NeedsReorder = true").show()
spark.sql("""
SELECT Warehouse, COUNT(*) AS ItemsUnderReorder
FROM needs_reorder
WHERE NeedsReorder = true
GROUP BY Warehouse
HAVING COUNT(*) > 2
""").show()

root
 |-- ItemID: string (nullable = true)
 |-- ItemName: string (nullable = true)
 |-- Category: string (nullable = true)
 |-- Warehouse: string (nullable = true)
 |-- StockQty: integer (nullable = true)
 |-- ReorderLevel: integer (nullable = true)
 |-- LastRestocked: date (nullable = true)
 |-- UnitPrice: integer (nullable = true)
 |-- Supplier: string (nullable = true)

+------+------------+-----------+----------+--------+------------+-------------+---------+---------+
|ItemID|    ItemName|   Category| Warehouse|StockQty|ReorderLevel|LastRestocked|UnitPrice| Supplier|
+------+------------+-----------+----------+--------+------------+-------------+---------+---------+
|  I001|      LED TV|Electronics|WarehouseA|      50|          20|   2024-03-15|    30000|   AVTech|
|  I002|      Laptop|Electronics|WarehouseB|      10|          15|   2024-04-01|    70000|TechWorld|
|  I003|Office Chair|  Furniture|WarehouseA|      40|          10|   2024-03-25|     6000|  ChairCo|
|  I004|Refrigerat

**Supplier Price Optimization**

 Tasks:
 1. Group items by Supplier and compute average price.
 2. Find which suppliers offer items below average price in their category.
 3. Tag suppliers with 
Good Deal if >50% of their items are below market average

In [0]:
#1.
supplier_avg = df.groupBy("Supplier").agg(avg("UnitPrice").alias("AvgPriceBySupplier"))
s=df.join(supplier_avg, "Supplier")
market_avg = df.groupBy("Category").agg(avg("UnitPrice").alias("AvgPriceByCategory"))
s= s.join(market_avg, "Category")
print("Avg price by supplier:")
s.show()
#2.
print("Below market average:")
s=s.withColumn("BelowMarket", col("UnitPrice") < col("AvgPriceByCategory"))
s.show()
#3.
score = s.groupBy("Supplier").agg(
    (sum(col("BelowMarket").cast("int")) / count("*")).alias("BelowPct")
)
good_deals = score.withColumn("GoodDeal", col("BelowPct") > 0.5)
print("Good deals:")
good_deals.show()


Avg price by supplier:
+-----------+---------+------+------------+----------+--------+------------+-------------+---------+------------+------------------+------------------+
|   Category| Supplier|ItemID|    ItemName| Warehouse|StockQty|ReorderLevel|LastRestocked|UnitPrice|NeedsReorder|AvgPriceBySupplier|AvgPriceByCategory|
+-----------+---------+------+------------+----------+--------+------------+-------------+---------+------------+------------------+------------------+
|Electronics|   AVTech|  I001|      LED TV|WarehouseA|      50|          20|   2024-03-15|    30000|       false|           30000.0|           36000.0|
|Electronics|TechWorld|  I002|      Laptop|WarehouseB|      10|          15|   2024-04-01|    70000|        true|           70000.0|           36000.0|
|Electronics|PrintFast|  I005|     Printer|WarehouseB|       3|           5|   2024-03-30|     8000|        true|            8000.0|           36000.0|
| Appliances| FreezeIt|  I004|Refrigerator|WarehouseC|       5|  

**Cost Forecasting**

 Tasks:
 1. Calculate 
TotalStockValue = StockQty * UnitPrice .
 2. Identify top 3 highest-value items.
 3. Export the result as a Parquet file partitioned by 
Warehouse.

In [0]:
#1.
print("Total stock value by warehouse:")
df = df.withColumn("TotalStockValue", col("StockQty") * col("UnitPrice"))
df.groupBy("Warehouse").agg(sum("TotalStockValue").alias("TotalStockValue")).show()
#2.
print("Top 3 warehouses by stock value:")
df.orderBy(col("TotalStockValue").desc()).limit(3).show()
#3.
df.write.mode("overwrite").parquet("file:/Workspace/Shared/stock_by_warehouse", partitionBy="Warehouse")

Total stock value by warehouse:
+----------+---------------+
| Warehouse|TotalStockValue|
+----------+---------------+
|WarehouseA|        1740000|
|WarehouseC|         125000|
|WarehouseB|         724000|
+----------+---------------+

Top 3 warehouses by stock value:
+------+------------+-----------+----------+--------+------------+-------------+---------+---------+------------+---------------+
|ItemID|    ItemName|   Category| Warehouse|StockQty|ReorderLevel|LastRestocked|UnitPrice| Supplier|NeedsReorder|TotalStockValue|
+------+------------+-----------+----------+--------+------------+-------------+---------+---------+------------+---------------+
|  I001|      LED TV|Electronics|WarehouseA|      50|          20|   2024-03-15|    30000|   AVTech|       false|        1500000|
|  I002|      Laptop|Electronics|WarehouseB|      10|          15|   2024-04-01|    70000|TechWorld|        true|         700000|
|  I003|Office Chair|  Furniture|WarehouseA|      40|          10|   2024-03-25| 

**Warehouse Utilization**

 Tasks:
 1. Count items stored per warehouse.
 2. Average stock per category in each warehouse.
 3. Determine underutilized warehouses (
 total stock < 100 ).

In [0]:
#1.
stock_counts=df.groupBy("Warehouse").count().withColumnRenamed("count", "ItemCount")
print("Stock counts:")
stock_counts.show()
#3.
avg_stock=df.groupBy("Warehouse","Category").agg(avg("StockQty").alias("AvgStock"))
print("Average stock:")
avg_stock.show()
#3.
underutilized=stock_counts.join(df.groupBy("Warehouse").agg(expr("sum(StockQty)").alias("TotalStock")), "Warehouse") \
    .filter(col("TotalStock") < 100)
print("Underutilized warehouses:")
underutilized.show()

Stock counts:
+----------+---------+
| Warehouse|ItemCount|
+----------+---------+
|WarehouseA|        2|
|WarehouseC|        1|
|WarehouseB|        2|
+----------+---------+

Average stock:
+----------+-----------+--------+
| Warehouse|   Category|AvgStock|
+----------+-----------+--------+
|WarehouseB|Electronics|     6.5|
|WarehouseA|  Furniture|    40.0|
|WarehouseC| Appliances|     5.0|
|WarehouseA|Electronics|    50.0|
+----------+-----------+--------+

Underutilized warehouses:
+----------+---------+----------+
| Warehouse|ItemCount|TotalStock|
+----------+---------+----------+
|WarehouseA|        2|        90|
|WarehouseC|        1|         5|
|WarehouseB|        2|        13|
+----------+---------+----------+



**Delta Audit Trail**

 Tasks:
1. Save as Delta table 
retail_inventory .
 2. Update stock of 'Laptop' to 20.
 3. Delete any item with 
StockQty = 0 .
 4. Run 
DESCRIBE HISTORY and query 
VERSION AS OF previous state.

In [0]:
from delta.tables import DeltaTable
#1.
df.write.format("delta").mode("overwrite").save("file:/Workspace/Shared/retail_inventory")
delta = DeltaTable.forPath(spark, "file:/Workspace/Shared/retail_inventory")
#2.
delta.update(condition="ItemName='Laptop'", set={"StockQty": "20"})
#3.
delta.delete("StockQty = 0")
#4.
spark.sql("DESCRIBE HISTORY delta.`file:/Workspace/Shared/retail_inventory`").show()
spark.read.format("delta").option("versionAsOf", 0).load("file:/Workspace/Shared/retail_inventory").show()

+-------+--------------------+----------------+--------------------+---------+--------------------+----+------------------+--------------------+-----------+-----------------+-------------+--------------------+------------+--------------------+
|version|           timestamp|          userId|            userName|operation| operationParameters| job|          notebook|           clusterId|readVersion|   isolationLevel|isBlindAppend|    operationMetrics|userMetadata|          engineInfo|
+-------+--------------------+----------------+--------------------+---------+--------------------+----+------------------+--------------------+-----------+-----------------+-------------+--------------------+------------+--------------------+
|     19|2025-06-19 06:20:...|4833629471493945|azuser3545_mml.lo...| OPTIMIZE|{predicate -> [],...|NULL|{1093877947262588}|0611-043339-3vb7b9iv|         17|SnapshotIsolation|        false|{numRemovedFiles ...|        NULL|Databricks-Runtim...|
|     18|2025-06-19 06:2

**Alerts from Restock Logs(join tasks)**

 Tasks:
 1. Join with inventory table to update StockQty.
 2. Calculate new stock and flag 
RestockedRecently = true for updated items.
 3. Use 
MERGE INTO to update in Delta.

In [0]:
from delta.tables import DeltaTable
#1.
logs = spark.read.option("header", True).csv("file:/Workspace/Shared/restock_logs.csv") \
    .withColumnRenamed("QuantityAdded ", "QuantityAdded") \
    .withColumn("RestockDate", to_date("RestockDate", "yyyy-MM-dd"))
#Load inventory Delta table
df = spark.read.format("delta").load("file:/Workspace/Shared/retail_inventory")
#add the missing column to Delta
if 'RestockedRecently' not in df.columns:
    df = df.withColumn("RestockedRecently", col("StockQty") * 0 == 1)  
    df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").save("file:/Workspace/Shared/retail_inventory")
#reload of delta data
delta = DeltaTable.forPath(spark, "file:/Workspace/Shared/retail_inventory")
#2
updated = df.alias("i").join(logs.alias("r"), "ItemID", "left") \
    .withColumn("NewStockQty", col("StockQty") + col("QuantityAdded")) \
    .withColumn("RestockedRecently", col("QuantityAdded").isNotNull())
#3
delta.alias("t").merge(
    updated.select("ItemID", "NewStockQty", "RestockedRecently").alias("s"),
    "t.ItemID = s.ItemID"
).whenMatchedUpdate(set={
    "StockQty": "s.NewStockQty",
    "RestockedRecently": "s.RestockedRecently"
}).execute()

**Report Generation with SQL Views**

 Tasks:
 1. Create SQL view 
inventory_summary with:
 ItemName, Category, StockQty, NeedsReorder, TotalStockValue
 2. Create view 
supplier_leaderboard sorted by average price

In [0]:
#1.
spark.sql("""
CREATE OR REPLACE TEMP VIEW inventory_summary AS
SELECT ItemName, Category, StockQty, NeedsReorder, StockQty*UnitPrice AS TotalStockValue
FROM delta.`file:/Workspace/Shared/retail_inventory`
""")
#2.
spark.sql("""
CREATE OR REPLACE TEMP VIEW supplier_leaderboard AS
SELECT Supplier, AVG(UnitPrice) AS AvgPrice
FROM delta.`file:/Workspace/Shared/retail_inventory`
GROUP BY Supplier
ORDER BY AvgPrice
""")


DataFrame[]

**Advanced Filtering**

 Tasks:
 1. Use 
when /
 otherwise to categorize items:
 "Overstocked" (>2x ReorderLevel)
 "LowStock"
 2. Use 
.filter() and 
.where() for the same and compare.

In [0]:
df=df.select(
    "ItemName", "Category", "StockQty", "ReorderLevel", "TotalStockValue","LastRestocked"
).withColumn(
    "NeedsReorder", col("StockQty") < col("ReorderLevel")
)
df.createOrReplaceTempView("inventory_summary")
#1.
df = spark.table("inventory_summary").withColumn("StockStatus",
    when(col("StockQty") > 2 * col("ReorderLevel"), "Overstocked")
    .when(col("StockQty") < col("ReorderLevel"), "LowStock")
    .otherwise("OK")
)
#2.
df.filter(col("StockQty") < col("ReorderLevel")).show()
df.where("StockQty < ReorderLevel").show()

+--------+--------+--------+------------+---------------+-------------+------------+-----------+
|ItemName|Category|StockQty|ReorderLevel|TotalStockValue|LastRestocked|NeedsReorder|StockStatus|
+--------+--------+--------+------------+---------------+-------------+------------+-----------+
+--------+--------+--------+------------+---------------+-------------+------------+-----------+

+--------+--------+--------+------------+---------------+-------------+------------+-----------+
|ItemName|Category|StockQty|ReorderLevel|TotalStockValue|LastRestocked|NeedsReorder|StockStatus|
+--------+--------+--------+------------+---------------+-------------+------------+-----------+
+--------+--------+--------+------------+---------------+-------------+------------+-----------+



**Feature Engineering**

 Tasks:
 1. Extract 
RestockMonth from 
LastRestocked .
 2. Create feature: 
StockAge = CURRENT_DATE - LastRestocked
 3. Bucket StockAge into: New, Moderate, Stale

In [0]:
df = spark.table("inventory_summary")
df = df.withColumn("RestockMonth", month("LastRestocked")) \
       .withColumn("StockAge", datediff(current_date(), col("LastRestocked"))) \
       .withColumn("StockAgeBucket",
           when(col("StockAge") < 30, "New")
           .when(col("StockAge") < 90, "Moderate")
           .otherwise("Stale"))
df.select("ItemName", "RestockMonth", "StockAge", "StockAgeBucket").show()

+------------+------------+--------+--------------+
|    ItemName|RestockMonth|StockAge|StockAgeBucket|
+------------+------------+--------+--------------+
|      Laptop|           4|     444|         Stale|
|      LED TV|           3|     461|         Stale|
|Office Chair|           3|     451|         Stale|
|Refrigerator|           2|     485|         Stale|
|     Printer|           3|     446|         Stale|
+------------+------------+--------+--------------+



**Export Options**

 Tasks:
 1. Write full DataFrame to:
CSV for analysts
 JSON for integration
 Delta for pipelines
 2. Save with meaningful file and partition names like
 /export/inventory/stale_items

In [0]:
df.write.mode("overwrite").option("header",True) \
   .csv("file:/Workspace/Shared/export/inventory/all_items_csv")
df.write.mode("overwrite").json("file:/Workspace/Shared/export/inventory/all_items_json")
df.write.mode("overwrite").format("delta") \
   .save("file:/Workspace/Shared/export/inventory/all_items_delta")