Suppose you are a data analyst working for ride-sharing platform Uber. Uber is interested in analyzing the performance of drivers based on their ratings and wants to categorize them into different performance tiers. 

Write an SQL query to categorize drivers equally into three performance tiers (Top, Middle, and Bottom) based on their average ratings. Drivers with the highest average ratings should be placed in the top tier, drivers with ratings below the top tier but above the bottom tier should be placed in the middle tier, and drivers with the lowest average ratings should be placed in the bottom tier. Sort the output in decreasing order of average rating

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, FloatType
from pyspark.sql.window import *
from pyspark.sql.functions import *

# Create a Spark session
spark = SparkSession.builder \
    .appName("Driver Ratings") \
    .getOrCreate()

# Define the schema
schema = StructType([
    StructField("driver_id", IntegerType(), True),
    StructField("avg_rating", FloatType(), True)
])

# Create the data
data = [
    (1, 4.80),
    (2, 4.50),
    (3, 3.90),
    (4, 4.20),
    (5, 4.70),
    (6, 3.60),
    (7, 4.90),
    (8, 3.80),
    (9, 4.40),
    (10, 3.50),
    (11, 4.10),
    (12, 4.60)
]

# Create a DataFrame
df = spark.createDataFrame(data, schema)

# Show the DataFrame
df.show()


+---------+----------+
|driver_id|avg_rating|
+---------+----------+
|        1|       4.8|
|        2|       4.5|
|        3|       3.9|
|        4|       4.2|
|        5|       4.7|
|        6|       3.6|
|        7|       4.9|
|        8|       3.8|
|        9|       4.4|
|       10|       3.5|
|       11|       4.1|
|       12|       4.6|
+---------+----------+



In [0]:
window_spec=Window.partitionBy().orderBy(col("avg_rating"))
df=df.withColumn("Bucketing",ntile(3).over(window_spec)) \
    .withColumn("performance_tier", when(col("Bucketing")==1,"Top") \
        .when(col("Bucketing")==2,"Middle") \
            .when(col("Bucketing")==3,"Bottom") \
                .otherwise("Unknown") \
    
        ).select("driver_id","avg_rating","performance_tier")