## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**2991. Top Three Wineries (Hard)**

**Table: Wineries**

| Column Name | Type     |
|-------------|----------|
| id          | int      |
| country     | varchar  |
| points      | int      |
| winery      | varchar  |

id is column of unique values for this table.
This table contains id, country, points, and winery.

**Write a solution to find the top three wineries in each country based on their total points. If multiple wineries have the same total points, order them by winery name in ascending order. If there's no second winery, output 'No second winery,' and if there's no third winery, output 'No third winery.'**

Return the result table ordered by country in ascending order.

The result format is in the following example.

**Example 1:**

**Input:** 
**Wineries table:**

| id  | country   | points | winery          | 
|-----|-----------|--------|-----------------|
| 103 | Australia | 84     | WhisperingPines | 
| 737 | Australia | 85     | GrapesGalore    |    
| 848 | Australia | 100    | HarmonyHill     | 
| 222 | Hungary   | 60     | MoonlitCellars  | 
| 116 | USA       | 47     | RoyalVines      | 
| 124 | USA       | 45     | Eagle'sNest     | 
| 648 | India     | 69     | SunsetVines     | 
| 894 | USA       | 39     | RoyalVines      |  
| 677 | USA       | 9      | PacificCrest    |  

**Output:**
| country   | top_winery          | second_winery     | third_winery         |
|-----------|---------------------|-------------------|----------------------|
| Australia | HarmonyHill (100)   | GrapesGalore (85) | WhisperingPines (84) |
| Hungary   | MoonlitCellars (60) | No second winery  | No third winery      | 
| India     | SunsetVines (69)    | No second winery  | No third winery      |  
| USA       | RoyalVines (86)     | Eagle'sNest (45)  | PacificCrest (9)     | 

**Explanation**

- For Australia
  - HarmonyHill Winery accumulates the highest score of 100 points in Australia.
  - GrapesGalore Winery has a total of 85 points, securing the second-highest position in Australia.
  - WhisperingPines Winery has a total of 80 points, ranking as the third-highest.
- For Hungary
  - MoonlitCellars is the sole winery, accruing 60 points, automatically making it the highest. There is no second or third winery.
- For India
  - SunsetVines is the sole winery, earning 69 points, making it the top winery. There is no second or third winery.
- For the USA
  - RoyalVines Wines accumulates a total of 47 + 39 = 86 points, claiming the highest position in the USA.
  - Eagle'sNest has a total of 45 points, securing the second-highest position in the USA.
  - PacificCrest accumulates 9 points, ranking as the third-highest winery in the USA

Output table is ordered by country in ascending order.

In [0]:
wineries_data_2991 = [
    (103, "Australia", 84, "WhisperingPines"),
    (737, "Australia", 85, "GrapesGalore"),
    (848, "Australia", 100, "HarmonyHill"),
    (222, "Hungary", 60, "MoonlitCellars"),
    (116, "USA", 47, "RoyalVines"),
    (124, "USA", 45, "Eagle'sNest"),
    (648, "India", 69, "SunsetVines"),
    (894, "USA", 39, "RoyalVines"),
    (677, "USA", 9, "PacificCrest")
]

wineries_columns_2991 = ["id", "country", "points", "winery"]
wineries_df_2991 = spark.createDataFrame(wineries_data_2991, wineries_columns_2991)
wineries_df_2991.show()


+---+---------+------+---------------+
| id|  country|points|         winery|
+---+---------+------+---------------+
|103|Australia|    84|WhisperingPines|
|737|Australia|    85|   GrapesGalore|
|848|Australia|   100|    HarmonyHill|
|222|  Hungary|    60| MoonlitCellars|
|116|      USA|    47|     RoyalVines|
|124|      USA|    45|    Eagle'sNest|
|648|    India|    69|    SunsetVines|
|894|      USA|    39|     RoyalVines|
|677|      USA|     9|   PacificCrest|
+---+---------+------+---------------+



In [0]:
df_total_2991 = wineries_df_2991\
                    .groupBy("country", "winery") \
                        .agg(sum("points").alias("total_points"))

In [0]:
window_spec = Window.partitionBy("country") \
                    .orderBy(desc("total_points"), asc("winery"))

In [0]:
df_ranked_2991 = df_total_2991.withColumn("rank", row_number().over(window_spec))

In [0]:
df_pivot_2991 = df_ranked_2991\
                    .groupBy("country")\
                        .agg(
                            max( when(col("rank") == 1, concat(col("winery"), lit(" ("), col("total_points").cast("string"), lit(")")))).alias("top_winery"),
                            max(
                                when(col("rank") == 2, concat(col("winery"), lit(" ("), col("total_points").cast("string"), lit(")")))).alias("second_winery"),
                            max(
                                when(col("rank") == 3, concat(col("winery"), lit(" ("), col("total_points").cast("string"), lit(")")))).alias("third_winery")
                            )


In [0]:
df_pivot_2991\
    .fillna({"second_winery": "No second winery", "third_winery": "No third winery"}) \
        .orderBy("country").display()

country,top_winery,second_winery,third_winery
Australia,HarmonyHill (100),GrapesGalore (85),WhisperingPines (84)
Hungary,MoonlitCellars (60),No second winery,No third winery
India,SunsetVines (69),No second winery,No third winery
USA,RoyalVines (86),Eagle'sNest (45),PacificCrest (9)
