<pre>
Problem Statement

You have a PySpark DataFrame containing daily sales data for multiple products.
Write a PySpark program to find the date with the highest sales for each product.
Sample Input (product_sales)
product 	sale_date 	sales
Laptop 	2025-01-01 	100
Laptop 	2025-01-02 	250
Laptop 	2025-01-03 	200
Phone 	2025-01-01 	300
Phone 	2025-01-02 	150
Phone 	2025-01-03 	400
Expected Output
product 	peak_sale_date 	peak_sales
Laptop 	2025-01-02 	250
Phone 	2025-01-03 	400
<pre>

In [2]:
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql.window import Window

In [5]:
spark = SparkSession.builder.appName('Daily-Day13').getOrCreate()
data = [
("Laptop","2025-01-01",100),
("Laptop","2025-01-02",250),
("Laptop","2025-01-03",200),
("Phone","2025-01-01",300),
("Phone","2025-01-02",150),
("Phone","2025-01-03",400)]
schema = ["product","sale_date","sales"]
df = spark.createDataFrame(data,schema)
df = df.withColumn('sale_date',F.to_date(F.col('sale_date'),"yyyy-MM-dd"))

In [6]:
df.show()

+-------+----------+-----+
|product| sale_date|sales|
+-------+----------+-----+
| Laptop|2025-01-01|  100|
| Laptop|2025-01-02|  250|
| Laptop|2025-01-03|  200|
|  Phone|2025-01-01|  300|
|  Phone|2025-01-02|  150|
|  Phone|2025-01-03|  400|
+-------+----------+-----+



In [10]:
wd = Window.partitionBy('product').orderBy(F.col('sales').desc())
peak_sales_df = df.withColumn('rank',F.dense_rank().over(wd)).filter(F.col('rank')==1)\
                  .select(F.col('product'),F.col('sale_date').alias('peak_sale_date'),F.col('sales').alias('peak_sales'))
peak_sales_df.show()

+-------+--------------+----------+
|product|peak_sale_date|peak_sales|
+-------+--------------+----------+
| Laptop|    2025-01-02|       250|
|  Phone|    2025-01-03|       400|
+-------+--------------+----------+



<pre>
You have a SQL table daily_product_sales(product_id, sale_date, sales) with
daily sales for each product.Write a SQL query to
find products where sales decreased compared to the
previous day.
Sample Input (daily_product_sales)
product_id 	sale_date 	sales
P1 	2025-01-01 	100
P1 	2025-01-02 	120
P1 	2025-01-03 	90
P2 	2025-01-01 	200
P2 	2025-01-02 	180
P2 	2025-01-03 	220
Expected Output
product_id 	sale_date 	sales 	previous_day_sales
P1 	2025-01-03 	90 	120
P2 	2025-01-02 	180 	200
</pre>

In [14]:
data = [
    ("P1", "2025-01-01", 100),
    ("P1", "2025-01-02", 120),
    ("P1", "2025-01-03", 90),
    ("P2", "2025-01-01", 200),
    ("P2", "2025-01-02", 180),
    ("P2", "2025-01-03", 220),
]
columns = ["product_id", "sale_date", "sales"]
df = spark.createDataFrame(data, columns)
df = df.withColumn("sale_date",F.to_date(F.col('sale_date'),"yyyy-mm-DD"))
df.show()

+----------+----------+-----+
|product_id| sale_date|sales|
+----------+----------+-----+
|        P1|2025-01-01|  100|
|        P1|2025-01-02|  120|
|        P1|2025-01-03|   90|
|        P2|2025-01-01|  200|
|        P2|2025-01-02|  180|
|        P2|2025-01-03|  220|
+----------+----------+-----+



In [15]:
df.createOrReplaceTempView('products')

In [21]:
spark.sql('''
WITH previous as(
    SELECT product_id , sale_date,sales,LAG(sales,1) OVER (partition by product_id order by sale_date) as previous_sales
    FROM products)
SELECT  * from previous where previous_sales > sales

''').show()

+----------+----------+-----+--------------+
|product_id| sale_date|sales|previous_sales|
+----------+----------+-----+--------------+
|        P1|2025-01-03|   90|           120|
|        P2|2025-01-02|  180|           200|
+----------+----------+-----+--------------+

