## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**1158. Market Analysis I (Medium)**

**Table: Users**

| Column Name    | Type    |
|----------------|---------|
| user_id        | int     |
| join_date      | date    |
| favorite_brand | varchar |

user_id is the primary key (column with unique values) of this table.
This table has the info of the users of an online shopping website where users can sell and buy items.
 
**Table: Orders**

| Column Name   | Type    |
|---------------|---------|
| order_id      | int     |
| order_date    | date    |
| item_id       | int     |
| buyer_id      | int     |
| seller_id     | int     |

order_id is the primary key (column with unique values) of this table.
item_id is a foreign key (reference column) to the Items table.
buyer_id and seller_id are foreign keys to the Users table.
 
**Table: Items**

| Column Name   | Type    |
|---------------|---------|
| item_id       | int     |
| item_brand    | varchar |

item_id is the primary key (column with unique values) of this table.
 
**Write a solution to find for each user, the join date and the number of orders they made as a buyer in 2019.**

Return the result table in any order.

The result format is in the following example.

**Example 1:**

**Input:**
**Users table:**
| user_id | join_date  | favorite_brand |
|---------|------------|----------------|
| 1       | 2018-01-01 | Lenovo         |
| 2       | 2018-02-09 | Samsung        |
| 3       | 2018-01-19 | LG             |
| 4       | 2018-05-21 | HP             |

**Orders table:**
| order_id | order_date | item_id | buyer_id | seller_id |
|----------|------------|---------|----------|-----------|
| 1        | 2019-08-01 | 4       | 1        | 2         |
| 2        | 2018-08-02 | 2       | 1        | 3         |
| 3        | 2019-08-03 | 3       | 2        | 3         |
| 4        | 2018-08-04 | 1       | 4        | 2         |
| 5        | 2018-08-04 | 1       | 3        | 4         |
| 6        | 2019-08-05 | 2       | 2        | 4         |

**Items table:**
| item_id | item_brand |
|---------|------------|
| 1       | Samsung    |
| 2       | Lenovo     |
| 3       | LG         |
| 4       | HP         |

**Output:**
| buyer_id  | join_date  | orders_in_2019 |
|-----------|------------|----------------|
| 1         | 2018-01-01 | 1              |
| 2         | 2018-02-09 | 2              |
| 3         | 2018-01-19 | 0              |
| 4         | 2018-05-21 | 0              |


In [0]:
users_data_1158 = [
    (1, "2018-01-01", "Lenovo"),
    (2, "2018-02-09", "Samsung"),
    (3, "2018-01-19", "LG"),
    (4, "2018-05-21", "HP"),
]
users_columns_1158 = ["user_id", "join_date", "favorite_brand"]
users_df_1158 = spark.createDataFrame(users_data_1158, users_columns_1158)
users_df_1158.show()

orders_data_1158 = [
    (1, "2019-08-01", 4, 1, 2),
    (2, "2018-08-02", 2, 1, 3),
    (3, "2019-08-03", 3, 2, 3),
    (4, "2018-08-04", 1, 4, 2),
    (5, "2018-08-04", 1, 3, 4),
    (6, "2019-08-05", 2, 2, 4),
]
orders_columns_1158 = ["order_id", "order_date", "item_id", "buyer_id", "seller_id"]
orders_df_1158 = spark.createDataFrame(orders_data_1158, orders_columns_1158)
orders_df_1158.show()

In [0]:
orders_count_df_1158 = orders_df_1158\
                    .filter(year("order_date") == 2019)\
                        .groupBy("buyer_id") \
                          .agg(count("*").alias("orders_in_2019"))

users_df_1158.join(orders_count_df_1158, users_df_1158.user_id == orders_count_df_1158.buyer_id, "left") \
                 .select(
                     users_df_1158.user_id.alias("buyer_id"),
                     "join_date",
                     coalesce(col("orders_in_2019"), lit(0)).alias("orders_in_2019")
                 ).show()