## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**1159. Market Analysis II (Hard)**

**Table: Users**

| Column Name    | Type    |
|----------------|---------|
| user_id        | int     |
| join_date      | date    |
| favorite_brand | varchar |

user_id is the primary key (column with unique values) of this table.
This table has the info of the users of an online shopping website where users can sell and buy items.
 
**Table: Orders**

| Column Name   | Type    |
|---------------|---------|
| order_id      | int     |
| order_date    | date    |
| item_id       | int     |
| buyer_id      | int     |
| seller_id     | int     |

order_id is the primary key (column with unique values) of this table.
item_id is a foreign key (reference column) to the Items table.
buyer_id and seller_id are foreign keys to the Users table.
 

**Table: Items**

| Column Name   | Type    |
|---------------|---------|
| item_id       | int     |
| item_brand    | varchar |

item_id is the primary key (column with unique values) of this table.
 
**Write a solution to find for each user whether the brand of the second item (by date) they sold is their favorite brand. If a user sold less than two items, report the answer for that user as no. It is guaranteed that no seller sells more than one item in a day.**

Return the result table in any order.

The result format is in the following example.

**Example 1:**

**Input:** 

**Users table:**

| user_id | join_date  | favorite_brand |
|---------|------------|----------------|
| 1       | 2019-01-01 | Lenovo         |
| 2       | 2019-02-09 | Samsung        |
| 3       | 2019-01-19 | LG             |
| 4       | 2019-05-21 | HP             |

**Orders table:**
| order_id | order_date | item_id | buyer_id | seller_id |
|----------|------------|---------|----------|-----------|
| 1        | 2019-08-01 | 4       | 1        | 2         |
| 2        | 2019-08-02 | 2       | 1        | 3         |
| 3        | 2019-08-03 | 3       | 2        | 3         |
| 4        | 2019-08-04 | 1       | 4        | 2         |
| 5        | 2019-08-04 | 1       | 3        | 4         |
| 6        | 2019-08-05 | 2       | 2        | 4         |

**Items table:**
| item_id | item_brand |
|---------|------------|
| 1       | Samsung    |
| 2       | Lenovo     |
| 3       | LG         |
| 4       | HP         |

**Output:** 
| seller_id | 2nd_item_fav_brand |
|-----------|--------------------|
| 1         | no                 |
| 2         | yes                |
| 3         | yes                |
| 4         | no                 |

**Explanation:** 
- The answer for the user with id 1 is no because they sold nothing.
- The answer for the users with id 2 and 3 is yes because the brands of their second sold items are their favorite brands.
- The answer for the user with id 4 is no because the brand of their second sold item is not their favorite brand.

In [0]:
users_data_1159 = [
    (1, "2019-01-01", "Lenovo"),
    (2, "2019-02-09", "Samsung"),
    (3, "2019-01-19", "LG"),
    (4, "2019-05-21", "HP"),
]

users_columns_1159 = ["user_id", "join_date", "favorite_brand"]
users_df_1159 = spark.createDataFrame(users_data_1159, users_columns_1159)
users_df_1159.show()

orders_data_1159 = [
    (1, "2019-08-01", 4, 1, 2),
    (2, "2019-08-02", 2, 1, 3),
    (3, "2019-08-03", 3, 2, 3),
    (4, "2019-08-04", 1, 4, 2),
    (5, "2019-08-04", 1, 3, 4),
    (6, "2019-08-05", 2, 2, 4),
]

orders_columns_1159 = ["order_id", "order_date", "item_id", "buyer_id", "seller_id"]
orders_df_1159 = spark.createDataFrame(orders_data_1159, orders_columns_1159)
orders_df_1159.show()

items_data_1159 = [
    (1, "Samsung"),
    (2, "Lenovo"),
    (3, "LG"),
    (4, "HP"),
]

items_columns_1159 = ["item_id", "item_brand"]
items_df_1159 = spark.createDataFrame(items_data_1159, items_columns_1159)
items_df_1159.show()

+-------+----------+--------------+
|user_id| join_date|favorite_brand|
+-------+----------+--------------+
|      1|2019-01-01|        Lenovo|
|      2|2019-02-09|       Samsung|
|      3|2019-01-19|            LG|
|      4|2019-05-21|            HP|
+-------+----------+--------------+

+--------+----------+-------+--------+---------+
|order_id|order_date|item_id|buyer_id|seller_id|
+--------+----------+-------+--------+---------+
|       1|2019-08-01|      4|       1|        2|
|       2|2019-08-02|      2|       1|        3|
|       3|2019-08-03|      3|       2|        3|
|       4|2019-08-04|      1|       4|        2|
|       5|2019-08-04|      1|       3|        4|
|       6|2019-08-05|      2|       2|        4|
+--------+----------+-------+--------+---------+

+-------+----------+
|item_id|item_brand|
+-------+----------+
|      1|   Samsung|
|      2|    Lenovo|
|      3|        LG|
|      4|        HP|
+-------+----------+



In [0]:
orders_with_brand_df_1159 = orders_df_1159.join(items_df_1159, on="item_id")

In [0]:
window_spec = Window.partitionBy("seller_id").orderBy("order_date")

In [0]:
ranked_sales_df_1159 = orders_with_brand_df_1159.withColumn("rank", row_number().over(window_spec))

In [0]:
second_sales_df_1159 = ranked_sales_df_1159.filter(col("rank") == 2) \
    .select("seller_id", col("item_brand").alias("second_item_brand"))

In [0]:
users_df_1159\
    .join(second_sales_df_1159, users_df_1159.user_id == second_sales_df_1159.seller_id, how="left") \
    .withColumn("2nd_item_fav_brand", when(col("second_item_brand") == col("favorite_brand"), "yes").otherwise("no")) \
    .select(col("user_id").alias("seller_id"), "2nd_item_fav_brand").show()

+---------+------------------+
|seller_id|2nd_item_fav_brand|
+---------+------------------+
|        1|                no|
|        2|               yes|
|        3|               yes|
|        4|                no|
+---------+------------------+

