<pre>
Problem Statement
You have a PySpark DataFrame with total sales for different products. Write a PySpark program to calculate the percentage contribution of each product towards the total sales across all products.

Sample Input (product_sales)
product	sales
Laptop	1200
Phone	800
Tablet	500
Desktop	500
Expected Output
product	sales	percentage_contribution
Laptop	1200	40.0
Phone	800	26.7
Tablet	500	16.7
Desktop	500	16.7
</pre>

In [16]:
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql import Window


In [2]:
spark = SparkSession.builder.appName('Daily-Day5').getOrCreate()

In [31]:
data = [
('Laptop',    1200),
('Phone',    800),
('Tablet'  ,  500),
('Desktop',    500)]

In [32]:
schema= ['product','sales']

In [33]:
product_sales = spark.createDataFrame(data=data,schema=schema)

In [38]:
window = Window.partitionBy()

In [39]:
sum = product_sales.withColumn('sumOfSales', F.sum('sales').over(window))
sum.show()

+-------+-----+----------+
|product|sales|sumOfSales|
+-------+-----+----------+
| Laptop| 1200|      3000|
|  Phone|  800|      3000|
| Tablet|  500|      3000|
|Desktop|  500|      3000|
+-------+-----+----------+



In [40]:
result = sum.withColumn('percentage_contribution',(F.col('sales')/F.col('sumOfSales'))*100).select(F.col('product'),F.col('sales'),F.col('percentage_contribution'))
result.show()

+-------+-----+-----------------------+
|product|sales|percentage_contribution|
+-------+-----+-----------------------+
| Laptop| 1200|                   40.0|
|  Phone|  800|     26.666666666666668|
| Tablet|  500|     16.666666666666664|
|Desktop|  500|     16.666666666666664|
+-------+-----+-----------------------+



<pre>Problem Statement
You have a SQL table transactions(txn_id, txn_date, amount). Write a query to find months in 2025 where there were no transactions. Assume txn_date is in YYYY-MM-DD format.

Sample Input (transactions)
txn_id	txn_date	amount
T1	2025-01-10	100
T2	2025-01-15	200
T3	2025-03-05	300
T4	2025-05-20	400
Expected Output
missing_month
2025-02
2025-04
2025-06
2025-07
2025-08
2025-09
2025-10
2025-11
2025-12
</pre>

In [42]:
data = [('T1',	'2025-01-10'	,100),
('T2'	,'2025-01-15'	,200),
('T3',	'2025-03-05',	300),
('T4',	'2025-05-20',	400)]
schema =['txn_id',	'txn_date',	'amount']

transactions = spark.createDataFrame(data=data,schema=schema)
transactions = transactions.withColumn("txn_date", F.to_date("txn_date", "yyyy-MM-dd"))
transactions.createOrReplaceTempView("transactions")

In [43]:
result = spark.sql('''
                    WITH months AS (
  SELECT EXPLODE(
    SEQUENCE(
      TO_DATE('2025-01-01'),
      TO_DATE('2025-12-01'),
      INTERVAL 1 MONTH
    )
  ) AS month_start
)
SELECT
  DATE_FORMAT(month_start, 'yyyy-MM') AS missing_month
FROM months m
LEFT JOIN (
  SELECT DISTINCT DATE_FORMAT(txn_date, 'yyyy-MM') AS txn_month
  FROM transactions
  WHERE txn_date BETWEEN '2025-01-01' AND '2025-12-31'
) t
  ON DATE_FORMAT(month_start, 'yyyy-MM') = t.txn_month
WHERE t.txn_month IS NULL
ORDER BY month_start;
''')
result.show()

+-------------+
|missing_month|
+-------------+
|      2025-02|
|      2025-04|
|      2025-06|
|      2025-07|
|      2025-08|
|      2025-09|
|      2025-10|
|      2025-11|
|      2025-12|
+-------------+

