<a href="https://colab.research.google.com/github/amrit6878/Learning-PySpark/blob/main/Sorting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Sorting & Ordering in PySpark**
sorting can be done using:

	1.	orderBy() → Most common method
	2.	sort() → Alias of orderBy()
	3.	asc() and desc() → For ascending or descending

Syntax :

`df.orderby(col('column_name').asc/desc())`

In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, desc, asc

spark = SparkSession.builder.appName("SortingExample").getOrCreate()

sales_data = [
    (101, "Laptop", 5, 50000, "2025-01-10"),
    (102, "Phone", 10, 20000, "2025-02-15"),
    (103, "Tablet", 3, 30000, "2025-02-20"),
    (104, "Laptop", 7, 52000, "2025-03-01"),
    (105, "Camera", 2, 25000, "2025-03-15"),
    (106, "Phone", 15, 18000, "2025-04-05"),
    (107, "Laptop", 2, 48000, "2025-04-10")
]
columns = ["OrderID", "Product", "Quantity", "Price", "Date"]

df = spark.createDataFrame(sales_data, columns)
df.show()

+-------+-------+--------+-----+----------+
|OrderID|Product|Quantity|Price|      Date|
+-------+-------+--------+-----+----------+
|    101| Laptop|       5|50000|2025-01-10|
|    102|  Phone|      10|20000|2025-02-15|
|    103| Tablet|       3|30000|2025-02-20|
|    104| Laptop|       7|52000|2025-03-01|
|    105| Camera|       2|25000|2025-03-15|
|    106|  Phone|      15|18000|2025-04-05|
|    107| Laptop|       2|48000|2025-04-10|
+-------+-------+--------+-----+----------+



 Product team wants sales orders sorted by quantity.

In [2]:
df.orderBy(col("Quantity").asc()).show()

+-------+-------+--------+-----+----------+
|OrderID|Product|Quantity|Price|      Date|
+-------+-------+--------+-----+----------+
|    105| Camera|       2|25000|2025-03-15|
|    107| Laptop|       2|48000|2025-04-10|
|    103| Tablet|       3|30000|2025-02-20|
|    101| Laptop|       5|50000|2025-01-10|
|    104| Laptop|       7|52000|2025-03-01|
|    102|  Phone|      10|20000|2025-02-15|
|    106|  Phone|      15|18000|2025-04-05|
+-------+-------+--------+-----+----------+



 Management wants to see highest priced orders first.

In [3]:
df.orderBy(col("Price").desc()).show()

+-------+-------+--------+-----+----------+
|OrderID|Product|Quantity|Price|      Date|
+-------+-------+--------+-----+----------+
|    104| Laptop|       7|52000|2025-03-01|
|    101| Laptop|       5|50000|2025-01-10|
|    107| Laptop|       2|48000|2025-04-10|
|    103| Tablet|       3|30000|2025-02-20|
|    105| Camera|       2|25000|2025-03-15|
|    102|  Phone|      10|20000|2025-02-15|
|    106|  Phone|      15|18000|2025-04-05|
+-------+-------+--------+-----+----------+



Scenario:

	•	Sort orders first by Product (alphabetically)
	•	Then by Price (descending) within each product

In [4]:
df.orderBy(col("Product").asc(), col("Price").desc()).show()

+-------+-------+--------+-----+----------+
|OrderID|Product|Quantity|Price|      Date|
+-------+-------+--------+-----+----------+
|    105| Camera|       2|25000|2025-03-15|
|    104| Laptop|       7|52000|2025-03-01|
|    101| Laptop|       5|50000|2025-01-10|
|    107| Laptop|       2|48000|2025-04-10|
|    102|  Phone|      10|20000|2025-02-15|
|    106|  Phone|      15|18000|2025-04-05|
|    103| Tablet|       3|30000|2025-02-20|
+-------+-------+--------+-----+----------+

