## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**2324. Product Sales Analysis IV (Medium)**

**Table: Sales**

| Column Name | Type  |
|-------------|-------|
| sale_id     | int   |
| product_id  | int   |
| user_id     | int   |
| quantity    | int   |

sale_id contains unique values.
product_id is a foreign key (reference column) to Product table.
Each row of this table shows the ID of the product and the quantity purchased by a user.
 

**Table: Product**

| Column Name | Type |
|-------------|------|
| product_id  | int  |
| price       | int  |

product_id contains unique values.
Each row of this table indicates the price of each product.
 
**Write a solution that reports for each user the product id on which the user spent the most money. In case the same user spent the most money on two or more products, report all of them.**

Return the resulting table in any order.

The result format is in the following example.

**Example 1:**

**Input:** 

**Sales table:**

| sale_id | product_id | user_id | quantity |
|---------|------------|---------|----------|
| 1       | 1          | 101     | 10       |
| 2       | 3          | 101     | 7        |
| 3       | 1          | 102     | 9        |
| 4       | 2          | 102     | 6        |
| 5       | 3          | 102     | 10       |
| 6       | 1          | 102     | 6        |

**Product table:**
| product_id | price |
|------------|-------|
| 1          | 10    |
| 2          | 25    |
| 3          | 15    |

**Output:** 
| user_id | product_id |
|---------|------------|
| 101     | 3          |
| 102     | 1          |
| 102     | 2          |
| 102     | 3          |

**Explanation:** 
- User 101:
  - Spent 10 * 10 = 100 on product 1.
  - Spent 7 * 15 = 105 on product 3.

User 101 spent the most money on product 3.

- User 102:
  - Spent (9 + 6) * 10 = 150 on product 1.
  - Spent 6 * 25 = 150 on product 2.
  - Spent 10 * 15 = 150 on product 3.

User 102 spent the most money on products 1, 2, and 3.

In [0]:
sales_data_2324 = [
    (1, 1, 101, 10),
    (2, 3, 101, 7),
    (3, 1, 102, 9),
    (4, 2, 102, 6),
    (5, 3, 102, 10),
    (6, 1, 102, 6),
]

sales_columns_2324 = ["sale_id", "product_id", "user_id", "quantity"]
sales_df_2324 = spark.createDataFrame(sales_data_2324, sales_columns_2324)
sales_df_2324.show()

product_data_2324 = [
    (1, 10),
    (2, 25),
    (3, 15),
]

product_columns_2324 = ["product_id", "price"]
product_df_2324 = spark.createDataFrame(product_data_2324, product_columns_2324)
product_df_2324.show()

+-------+----------+-------+--------+
|sale_id|product_id|user_id|quantity|
+-------+----------+-------+--------+
|      1|         1|    101|      10|
|      2|         3|    101|       7|
|      3|         1|    102|       9|
|      4|         2|    102|       6|
|      5|         3|    102|      10|
|      6|         1|    102|       6|
+-------+----------+-------+--------+

+----------+-----+
|product_id|price|
+----------+-----+
|         1|   10|
|         2|   25|
|         3|   15|
+----------+-----+



In [0]:
user_product_spent_2324 = sales_df_2324\
                            .join(product_df_2324, on="product_id", how="inner")\
                                .withColumn("spent", col("quantity") * col("price"))\
                                    .groupBy("user_id", "product_id")\
                                        .agg(sum("spent").alias("total_spent"))

In [0]:
windowSpec = Window.partitionBy("user_id")

In [0]:
user_product_spent_2324\
    .withColumn("max_spent", max("total_spent").over(windowSpec))\
        .filter(col("total_spent") == col("max_spent"))\
            .select("user_id", "product_id")\
                .orderBy("user_id", "product_id").show()

+-------+----------+
|user_id|product_id|
+-------+----------+
|    101|         3|
|    102|         1|
|    102|         2|
|    102|         3|
+-------+----------+

