## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**1549. The Most Recent Orders for Each Product (Medium)**

**Table: Customers**

| Column Name   | Type    |
|---------------|---------|
| customer_id   | int     |
| name          | varchar |

customer_id is the column with unique values for this table.
This table contains information about the customers.
 
**Table: Orders**

| Column Name   | Type    |
|---------------|---------|
| order_id      | int     |
| order_date    | date    |
| customer_id   | int     |
| product_id    | int     |

order_id is the column with unique values for this table.
This table contains information about the orders made by customer_id.
There will be no product ordered by the same user more than once in one day.
 
**Table: Products**

| Column Name   | Type    |
|---------------|---------|
| product_id    | int     |
| product_name  | varchar |
| price         | int     |

product_id is the column with unique values for this table.
This table contains information about the Products.
 
**Write a solution to find the most recent order(s) of each product.**

Return the result table ordered by product_name in ascending order and in case of a tie by the product_id in ascending order. If there still a tie, order them by order_id in ascending order.

The result format is in the following example.

**Example 1:**

**Input:** 

**Customers table:**

| customer_id | name      |
|-------------|-----------|
| 1           | Winston   |
| 2           | Jonathan  |
| 3           | Annabelle |
| 4           | Marwan    |
| 5           | Khaled    |

**Orders table:**
| order_id | order_date | customer_id | product_id |
|----------|------------|-------------|------------|
| 1        | 2020-07-31 | 1           | 1          |
| 2        | 2020-07-30 | 2           | 2          |
| 3        | 2020-08-29 | 3           | 3          |
| 4        | 2020-07-29 | 4           | 1          |
| 5        | 2020-06-10 | 1           | 2          |
| 6        | 2020-08-01 | 2           | 1          |
| 7        | 2020-08-01 | 3           | 1          |
| 8        | 2020-08-03 | 1           | 2          |
| 9        | 2020-08-07 | 2           | 3          |
| 10       | 2020-07-15 | 1           | 2          |

**Products table:**
| product_id | product_name | price |
|------------|--------------|-------|
| 1          | keyboard     | 120   |
| 2          | mouse        | 80    |
| 3          | screen       | 600   |
| 4          | hard disk    | 450   |

**Output:** 
| product_name | product_id | order_id | order_date |
|--------------|------------|----------|------------|
| keyboard     | 1          | 6        | 2020-08-01 |
| keyboard     | 1          | 7        | 2020-08-01 |
| mouse        | 2          | 8        | 2020-08-03 |
| screen       | 3          | 3        | 2020-08-29 |

**Explanation:** 
- keyboard's most recent order is in 2020-08-01, it was ordered two times this day.
- mouse's most recent order is in 2020-08-03, it was ordered only once this day.
- screen's most recent order is in 2020-08-29, it was ordered only once this day.
- The hard disk was never ordered and we do not include it in the result table.

In [0]:
customers_data_1549 = [
    (1, "Winston"),
    (2, "Jonathan"),
    (3, "Annabelle"),
    (4, "Marwan"),
    (5, "Khaled"),
]

customers_columns_1549 = ["customer_id", "name"]
customers_df_1549 = spark.createDataFrame(customers_data_1549, customers_columns_1549)
customers_df_1549.show()

orders_data_1549 = [
    (1, "2020-07-31", 1, 1),
    (2, "2020-07-30", 2, 2),
    (3, "2020-08-29", 3, 3),
    (4, "2020-07-29", 4, 1),
    (5, "2020-06-10", 1, 2),
    (6, "2020-08-01", 2, 1),
    (7, "2020-08-01", 3, 1),
    (8, "2020-08-03", 1, 2),
    (9, "2020-08-07", 2, 3),
    (10, "2020-07-15", 1, 2),
]

orders_columns_1549 = ["order_id", "order_date", "customer_id", "product_id"]
orders_df_1549 = spark.createDataFrame(orders_data_1549, orders_columns_1549)
orders_df_1549.show()

products_data_1549 = [
    (1, "keyboard", 120),
    (2, "mouse", 80),
    (3, "screen", 600),
    (4, "hard disk", 450),
]

products_columns_1549 = ["product_id", "product_name", "price"]
products_df_1549 = spark.createDataFrame(products_data_1549, products_columns_1549)
products_df_1549.show()


+-----------+---------+
|customer_id|     name|
+-----------+---------+
|          1|  Winston|
|          2| Jonathan|
|          3|Annabelle|
|          4|   Marwan|
|          5|   Khaled|
+-----------+---------+

+--------+----------+-----------+----------+
|order_id|order_date|customer_id|product_id|
+--------+----------+-----------+----------+
|       1|2020-07-31|          1|         1|
|       2|2020-07-30|          2|         2|
|       3|2020-08-29|          3|         3|
|       4|2020-07-29|          4|         1|
|       5|2020-06-10|          1|         2|
|       6|2020-08-01|          2|         1|
|       7|2020-08-01|          3|         1|
|       8|2020-08-03|          1|         2|
|       9|2020-08-07|          2|         3|
|      10|2020-07-15|          1|         2|
+--------+----------+-----------+----------+

+----------+------------+-----+
|product_id|product_name|price|
+----------+------------+-----+
|         1|    keyboard|  120|
|         2|       mouse

In [0]:
orders_df_1549 = orders_df_1549\
                    .withColumn("order_date", to_date("order_date"))

In [0]:
df_1549 = orders_df_1549\
                .join(products_df_1549, on="product_id", how="inner")

In [0]:
window_spec = Window.partitionBy("product_id").orderBy(col("order_date").desc())

In [0]:
latest_date_df_1549 = df_1549\
                        .withColumn("max_date", max("order_date").over(Window.partitionBy("product_id"))) \
                            .filter(col("order_date") == col("max_date"))

In [0]:
latest_date_df_1549\
    .select(
        "product_name",
        "product_id",
        "order_id",
        "order_date"
).orderBy(
    col("product_name").asc(),
    col("product_id").asc(),
    col("order_id").asc()
).show()

+------------+----------+--------+----------+
|product_name|product_id|order_id|order_date|
+------------+----------+--------+----------+
|    keyboard|         1|       6|2020-08-01|
|    keyboard|         1|       7|2020-08-01|
|       mouse|         2|       8|2020-08-03|
|      screen|         3|       3|2020-08-29|
+------------+----------+--------+----------+

