# Cognizant (Hard Level) - PySpark Interview Question

You have the following dataset containing sales information for different products and regions. Reshape the data using PySpark's pivot() method to calculate the total sales for each product across regions, and then optimize it further by applying specific transformations.

Task 1: Use pivot() to create a table showing the total sales for each product by region.

Task 2: Add a column calculating the percentage contribution of each region to the total sales for that product.

Task 3: Sort the data in descending order by total sales for each product.

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import *

In [0]:
data = [ ("North", "Laptop", 2000, "Q1"), ("South", "Laptop", 3000, "Q1"), ("East", "Laptop", 2500, "Q1"), ("North", "Phone", 1500, "Q1"), 
("South", "Phone", 1000, "Q1"), ("East", "Phone", 2000, "Q1"), 
("North", "Laptop", 3000, "Q2"), ("South", "Laptop", 4000, "Q2"), 
("East", "Laptop", 3500, "Q2"), ("North", "Phone", 2500, "Q2"), 
("South", "Phone", 1500, "Q2"), ("East", "Phone", 3000, "Q2"), ] 

columns = ["Region", "Product", "Sales", "Quarter"]

df = spark.createDataFrame(data, columns)
df.display()

Region,Product,Sales,Quarter
North,Laptop,2000,Q1
South,Laptop,3000,Q1
East,Laptop,2500,Q1
North,Phone,1500,Q1
South,Phone,1000,Q1
East,Phone,2000,Q1
North,Laptop,3000,Q2
South,Laptop,4000,Q2
East,Laptop,3500,Q2
North,Phone,2500,Q2


In [0]:
(
    df.groupBy('Product').pivot('Region').sum('Sales')
      .withColumns({
          'total_sales': col('East') + col('North') + col('South')
          , 'perc_east': round((col('East') / col('total_sales')) * 100, 2)
          , 'perc_north': round((col('North') / col('total_sales')) * 100, 2) 
          , 'perc_south': round((col('South') / col('total_sales')) * 100, 2) 
      })
      .orderBy(col('total_sales').desc())
      .display()

)

Product,East,North,South,total_sales,perc_east,perc_north,perc_south
Laptop,6000,5000,7000,18000,33.33,27.78,38.89
Phone,5000,4000,2500,11500,43.48,34.78,21.74
