**Preparation Instructions**
1. Create a PySpark DataFrame with the following schema:


1. OrderID (int)
2. CustomerName (string)
3. Product (string)
4. Category (string)
5. Quantity (int)
6. UnitPrice (int)
7. OrderDate (string in YYYY-MM-DD format)


2. Sample at least 12 rows across multiple categories:

"Electronics" , "Clothing" , "Furniture" , "Books"

In [77]:
from pyspark.sql import Row
data=[
    Row(OrderID=1,CustomerName='Harish',Product='Laptop',Category='Electronics',Quantity=1,UnitPrice=45000,OrderDate='2023-01-17'),
    Row(OrderID=2,CustomerName='Vishal',Product='T shirt',Category='Clothing',Quantity=2,UnitPrice=400,OrderDate='2023-01-19'),
    Row(OrderID=3,CustomerName='Valarmathi',Product='Mixer',Category='Electronics',Quantity=1,UnitPrice=1999,OrderDate='2023-01-22'),
    Row(OrderID=4,CustomerName='Saravanan',Product='Wooden Table',Category='Furniture',Quantity=1,UnitPrice=15000,OrderDate='2023-01-22'),
    Row(OrderID=5,CustomerName='Vishal',Product='Sports Shoe',Category='Clothing',Quantity=1,UnitPrice=599,OrderDate='2023-01-25'),
    Row(OrderID=6,CustomerName='Harish',Product='C-type Cable',Category='Electronics',Quantity=3,UnitPrice=200,OrderDate='2023-01-29'),
    Row(OrderID=7,CustomerName='Indhu',Product='Power of Mind',Category='Books',Quantity=1,UnitPrice=400,OrderDate='2023-02-01'),
    Row(OrderID=8,CustomerName='Indhu',Product='Real Face',Category='Books',Quantity=2,UnitPrice=350,OrderDate='2023-02-10'),
    Row(OrderID=9,CustomerName='Saravanan',Product='Wooden Chair',Category='Furniture',Quantity=6,UnitPrice=1500,OrderDate='2023-02-11'),
    Row(OrderID=10,CustomerName='Vasanth',Product='T shirt',Category='Clothing',Quantity=2,UnitPrice=699,OrderDate='2023-02-14'),
    Row(OrderID=11,CustomerName='Vishal',Product='Earbuds-samsung',Category='Electronics',Quantity=1,UnitPrice=1499,OrderDate='2023-02-14'),
    Row(OrderID=12,CustomerName='Indhu',Product='The last word',Category='Books',Quantity=1,UnitPrice=300,OrderDate='2023-02-20'),
    Row(OrderID=13,CustomerName='PalniSamy',Product='Cotton Towel',Category='Clothing',Quantity=5,UnitPrice=150,OrderDate='2023-02-25'),
    Row(OrderID=14,CustomerName='Barani',Product='Samasung S20 Ultra',Category='Electronics',Quantity=1,UnitPrice=65000,OrderDate='2023-03-01'),
    Row(OrderID=15,CustomerName='Vishal',Product='Shoe Stand',Category='Furniture',Quantity=1,UnitPrice=699,OrderDate='2023-03-01'),
    Row(OrderID=16,CustomerName='Indhu',Product='HP Printer',Category='Electronics',Quantity=1,UnitPrice=15000,OrderDate='2023-03-07')
]

3. Create:

A local temporary view: "orders_local"

A global temporary view: "orders_global"

In [78]:
from pyspark.sql import SparkSession
spark=SparkSession.builder.appName('E_com_Orders').getOrCreate()
df=spark.createDataFrame(data)
df.createOrReplaceTempView("orders_local")
df.createOrReplaceGlobalTempView("orders_global")
df.show()

+-------+------------+------------------+-----------+--------+---------+----------+
|OrderID|CustomerName|           Product|   Category|Quantity|UnitPrice| OrderDate|
+-------+------------+------------------+-----------+--------+---------+----------+
|      1|      Harish|            Laptop|Electronics|       1|    45000|2023-01-17|
|      2|      Vishal|           T shirt|   Clothing|       2|      400|2023-01-19|
|      3|  Valarmathi|             Mixer|Electronics|       1|     1999|2023-01-22|
|      4|   Saravanan|      Wooden Table|  Furniture|       1|    15000|2023-01-22|
|      5|      Vishal|       Sports Shoe|   Clothing|       1|      599|2023-01-25|
|      6|      Harish|      C-type Cable|Electronics|       3|      200|2023-01-29|
|      7|       Indhu|     Power of Mind|      Books|       1|      400|2023-02-01|
|      8|       Indhu|         Real Face|      Books|       2|      350|2023-02-10|
|      9|   Saravanan|      Wooden Chair|  Furniture|       6|     1500|2023

**Part A: Local View – orders_local**

1. List all orders placed for "Electronics" with a Quantity of 2 or more.

In [79]:
spark.sql("select * from orders_local where Category='Electronics' and Quantity>=2").show()

+-------+------------+------------+-----------+--------+---------+----------+
|OrderID|CustomerName|     Product|   Category|Quantity|UnitPrice| OrderDate|
+-------+------------+------------+-----------+--------+---------+----------+
|      6|      Harish|C-type Cable|Electronics|       3|      200|2023-01-29|
+-------+------------+------------+-----------+--------+---------+----------+



2. Calculate TotalAmount (Quantity × UnitPrice) for each order.

In [80]:
spark.sql("select *,Quantity*unitPrice as Total_Amount from orders_local").show()

+-------+------------+------------------+-----------+--------+---------+----------+------------+
|OrderID|CustomerName|           Product|   Category|Quantity|UnitPrice| OrderDate|Total_Amount|
+-------+------------+------------------+-----------+--------+---------+----------+------------+
|      1|      Harish|            Laptop|Electronics|       1|    45000|2023-01-17|       45000|
|      2|      Vishal|           T shirt|   Clothing|       2|      400|2023-01-19|         800|
|      3|  Valarmathi|             Mixer|Electronics|       1|     1999|2023-01-22|        1999|
|      4|   Saravanan|      Wooden Table|  Furniture|       1|    15000|2023-01-22|       15000|
|      5|      Vishal|       Sports Shoe|   Clothing|       1|      599|2023-01-25|         599|
|      6|      Harish|      C-type Cable|Electronics|       3|      200|2023-01-29|         600|
|      7|       Indhu|     Power of Mind|      Books|       1|      400|2023-02-01|         400|
|      8|       Indhu|        

3. Show the total number of orders per Category .

In [81]:
spark.sql("select Category,count(*) as Total_Orders from orders_local group by Category").show()

+-----------+------------+
|   Category|Total_Orders|
+-----------+------------+
|Electronics|           6|
|   Clothing|           4|
|      Books|           3|
|  Furniture|           3|
+-----------+------------+



4. List orders placed in "January 2023" only.

In [82]:
spark.sql("select * from orders_local where OrderDate like '2023-01%' ").show()

+-------+------------+------------+-----------+--------+---------+----------+
|OrderID|CustomerName|     Product|   Category|Quantity|UnitPrice| OrderDate|
+-------+------------+------------+-----------+--------+---------+----------+
|      1|      Harish|      Laptop|Electronics|       1|    45000|2023-01-17|
|      2|      Vishal|     T shirt|   Clothing|       2|      400|2023-01-19|
|      3|  Valarmathi|       Mixer|Electronics|       1|     1999|2023-01-22|
|      4|   Saravanan|Wooden Table|  Furniture|       1|    15000|2023-01-22|
|      5|      Vishal| Sports Shoe|   Clothing|       1|      599|2023-01-25|
|      6|      Harish|C-type Cable|Electronics|       3|      200|2023-01-29|
+-------+------------+------------+-----------+--------+---------+----------+



5. Show the average UnitPrice per category.

In [83]:
spark.sql("select Category,avg(UnitPrice) as AVG_UnitPrice from orders_local group by Category").show()

+-----------+------------------+
|   Category|     AVG_UnitPrice|
+-----------+------------------+
|Electronics|21449.666666666668|
|   Clothing|             462.0|
|      Books|             350.0|
|  Furniture|            5733.0|
+-----------+------------------+



6. Find the order with the highest total amount.

In [84]:
spark.sql("select *,Quantity*UnitPrice as Total_Price from orders_local order by Total_Price desc limit 1").show()

+-------+------------+------------------+-----------+--------+---------+----------+-----------+
|OrderID|CustomerName|           Product|   Category|Quantity|UnitPrice| OrderDate|Total_Price|
+-------+------------+------------------+-----------+--------+---------+----------+-----------+
|     14|      Barani|Samasung S20 Ultra|Electronics|       1|    65000|2023-03-01|      65000|
+-------+------------+------------------+-----------+--------+---------+----------+-----------+



7. Drop the local view and try querying it again.

In [85]:
spark.sql("Drop view orders_local")
# spark.sql("select * from orders_local")

DataFrame[]

Part B: Global View – orders_global
1. Display all "Furniture" orders with TotalAmount above
10,000.

In [86]:
spark.sql("select * from global_temp.orders_global where Category='Furniture' and (Quantity*unitPrice)>10000").show()

+-------+------------+------------+---------+--------+---------+----------+
|OrderID|CustomerName|     Product| Category|Quantity|UnitPrice| OrderDate|
+-------+------------+------------+---------+--------+---------+----------+
|      4|   Saravanan|Wooden Table|Furniture|       1|    15000|2023-01-22|
+-------+------------+------------+---------+--------+---------+----------+



2. Create a column called DiscountFlag :

Mark "Yes" if Quantity > 3

Otherwise "No"

In [87]:
spark.sql("""
select * ,
case when Quantity>3 then 'Yes'
else 'No'
end as DiscountFlag
from global_temp.orders_global
""").show()

+-------+------------+------------------+-----------+--------+---------+----------+------------+
|OrderID|CustomerName|           Product|   Category|Quantity|UnitPrice| OrderDate|DiscountFlag|
+-------+------------+------------------+-----------+--------+---------+----------+------------+
|      1|      Harish|            Laptop|Electronics|       1|    45000|2023-01-17|          No|
|      2|      Vishal|           T shirt|   Clothing|       2|      400|2023-01-19|          No|
|      3|  Valarmathi|             Mixer|Electronics|       1|     1999|2023-01-22|          No|
|      4|   Saravanan|      Wooden Table|  Furniture|       1|    15000|2023-01-22|          No|
|      5|      Vishal|       Sports Shoe|   Clothing|       1|      599|2023-01-25|          No|
|      6|      Harish|      C-type Cable|Electronics|       3|      200|2023-01-29|          No|
|      7|       Indhu|     Power of Mind|      Books|       1|      400|2023-02-01|          No|
|      8|       Indhu|        

3. List customers who ordered more than 1 product type (Hint: use GROUP BY and
HAVING).

In [88]:
spark.sql("select CustomerName,count(Distinct Product) from global_temp.orders_global group by CustomerName").show()

+------------+-----------------------+
|CustomerName|count(DISTINCT Product)|
+------------+-----------------------+
|      Barani|                      1|
|   PalniSamy|                      1|
|      Harish|                      2|
|   Saravanan|                      2|
|     Vasanth|                      1|
|  Valarmathi|                      1|
|      Vishal|                      4|
|       Indhu|                      4|
+------------+-----------------------+



4. Count number of orders per month across the dataset.

In [89]:
spark.sql("select month(OrderDate) as Month,count(*) from global_temp.orders_global group by Month").show()

+-----+--------+
|Month|count(1)|
+-----+--------+
|    1|       6|
|    2|       7|
|    3|       3|
+-----+--------+



5. Rank all products by total quantity sold across all orders using a window
function.

In [90]:
from pyspark.sql.window import Window
from pyspark.sql.functions import rank
windowSpec=Window.partitionBy().orderBy(df['Quantity'].desc())
df_with_rank=df.withColumn("rank",rank().over(windowSpec))
df_with_rank.show()

+-------+------------+------------------+-----------+--------+---------+----------+----+
|OrderID|CustomerName|           Product|   Category|Quantity|UnitPrice| OrderDate|rank|
+-------+------------+------------------+-----------+--------+---------+----------+----+
|      9|   Saravanan|      Wooden Chair|  Furniture|       6|     1500|2023-02-11|   1|
|     13|   PalniSamy|      Cotton Towel|   Clothing|       5|      150|2023-02-25|   2|
|      6|      Harish|      C-type Cable|Electronics|       3|      200|2023-01-29|   3|
|      2|      Vishal|           T shirt|   Clothing|       2|      400|2023-01-19|   4|
|      8|       Indhu|         Real Face|      Books|       2|      350|2023-02-10|   4|
|     10|     Vasanth|           T shirt|   Clothing|       2|      699|2023-02-14|   4|
|      1|      Harish|            Laptop|Electronics|       1|    45000|2023-01-17|   7|
|      3|  Valarmathi|             Mixer|Electronics|       1|     1999|2023-01-22|   7|
|      4|   Saravanan

6. Run a query using a new SparkSession and the global view.

In [91]:
new_spark=SparkSession.builder.appName('New_E_com_Orders').getOrCreate()
new_spark.sql("select * from global_temp.orders_global").show()

+-------+------------+------------------+-----------+--------+---------+----------+
|OrderID|CustomerName|           Product|   Category|Quantity|UnitPrice| OrderDate|
+-------+------------+------------------+-----------+--------+---------+----------+
|      1|      Harish|            Laptop|Electronics|       1|    45000|2023-01-17|
|      2|      Vishal|           T shirt|   Clothing|       2|      400|2023-01-19|
|      3|  Valarmathi|             Mixer|Electronics|       1|     1999|2023-01-22|
|      4|   Saravanan|      Wooden Table|  Furniture|       1|    15000|2023-01-22|
|      5|      Vishal|       Sports Shoe|   Clothing|       1|      599|2023-01-25|
|      6|      Harish|      C-type Cable|Electronics|       3|      200|2023-01-29|
|      7|       Indhu|     Power of Mind|      Books|       1|      400|2023-02-01|
|      8|       Indhu|         Real Face|      Books|       2|      350|2023-02-10|
|      9|   Saravanan|      Wooden Chair|  Furniture|       6|     1500|2023

**Bonus Challenges**
1. Save a filtered subset (only "Books" category) as a new global temp view.

In [92]:
df_books=df.filter(df['Category']=='Books')
df_books.createOrReplaceGlobalTempView("books_global")
spark.sql("select * from global_temp.books_global").show()

+-------+------------+-------------+--------+--------+---------+----------+
|OrderID|CustomerName|      Product|Category|Quantity|UnitPrice| OrderDate|
+-------+------------+-------------+--------+--------+---------+----------+
|      7|       Indhu|Power of Mind|   Books|       1|      400|2023-02-01|
|      8|       Indhu|    Real Face|   Books|       2|      350|2023-02-10|
|     12|       Indhu|The last word|   Books|       1|      300|2023-02-20|
+-------+------------+-------------+--------+--------+---------+----------+



2. Find the most purchased product per category.

In [93]:
from pyspark.sql.window import Window
from pyspark.sql.functions import rank
windowSpec=Window.partitionBy('Category').orderBy(df['Quantity'].desc())
df_with_rank_cat=df.withColumn("rank",rank().over(windowSpec))
df_with_rank_cat.filter(df_with_rank_cat['rank']==1).show()

+-------+------------+------------+-----------+--------+---------+----------+----+
|OrderID|CustomerName|     Product|   Category|Quantity|UnitPrice| OrderDate|rank|
+-------+------------+------------+-----------+--------+---------+----------+----+
|      8|       Indhu|   Real Face|      Books|       2|      350|2023-02-10|   1|
|     13|   PalniSamy|Cotton Towel|   Clothing|       5|      150|2023-02-25|   1|
|      6|      Harish|C-type Cable|Electronics|       3|      200|2023-01-29|   1|
|      9|   Saravanan|Wooden Chair|  Furniture|       6|     1500|2023-02-11|   1|
+-------+------------+------------+-----------+--------+---------+----------+----+



3. Create a view that excludes all "Clothing" orders and call it
"filtered_orders" .

In [94]:
df_ex_cloth=df.filter(df['Category']!='Clothing')
df_ex_cloth.createOrReplaceTempView("filtered_orders")
spark.sql("select * from filtered_orders").show()

+-------+------------+------------------+-----------+--------+---------+----------+
|OrderID|CustomerName|           Product|   Category|Quantity|UnitPrice| OrderDate|
+-------+------------+------------------+-----------+--------+---------+----------+
|      1|      Harish|            Laptop|Electronics|       1|    45000|2023-01-17|
|      3|  Valarmathi|             Mixer|Electronics|       1|     1999|2023-01-22|
|      4|   Saravanan|      Wooden Table|  Furniture|       1|    15000|2023-01-22|
|      6|      Harish|      C-type Cable|Electronics|       3|      200|2023-01-29|
|      7|       Indhu|     Power of Mind|      Books|       1|      400|2023-02-01|
|      8|       Indhu|         Real Face|      Books|       2|      350|2023-02-10|
|      9|   Saravanan|      Wooden Chair|  Furniture|       6|     1500|2023-02-11|
|     11|      Vishal|   Earbuds-samsung|Electronics|       1|     1499|2023-02-14|
|     12|       Indhu|     The last word|      Books|       1|      300|2023