## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**1127. User Purchase Platform (Hard)**

**Table: Spending**

| Column Name | Type    |
|-------------|---------|
| user_id     | int     |
| spend_date  | date    |
| platform    | enum    | 
| amount      | int     |

The table logs the history of the spending of users that make purchases from an online shopping website that has a desktop and a mobile application.
(user_id, spend_date, platform) is the primary key (combination of columns with unique values) of this table.
The platform column is an ENUM (category) type of ('desktop', 'mobile').
 
**Write a solution to find the total number of users and the total amount spent using the mobile only, the desktop only, and both mobile and desktop together for each date.**

Return the result table in any order.

The result format is in the following example.

**Example 1:**

**Input:**

**Spending table:**

| user_id | spend_date | platform | amount |
|---------|------------|----------|--------|
| 1       | 2019-07-01 | mobile   | 100    |
| 1       | 2019-07-01 | desktop  | 100    |
| 2       | 2019-07-01 | mobile   | 100    |
| 2       | 2019-07-02 | mobile   | 100    |
| 3       | 2019-07-01 | desktop  | 100    |
| 3       | 2019-07-02 | desktop  | 100    |

**Output:** 
| spend_date | platform | total_amount | total_users |
|------------|----------|--------------|-------------|
| 2019-07-01 | desktop  | 100          | 1           |
| 2019-07-01 | mobile   | 100          | 1           |
| 2019-07-01 | both     | 200          | 1           |
| 2019-07-02 | desktop  | 100          | 1           |
| 2019-07-02 | mobile   | 100          | 1           |
| 2019-07-02 | both     | 0            | 0           |

**Explanation:** 
- On 2019-07-01, user 1 purchased using both desktop and mobile, user 2 purchased using mobile only and user 3 purchased using desktop only.
- On 2019-07-02, user 2 purchased using mobile only, user 3 purchased using desktop only and no one purchased using both platforms.

In [0]:
spending_data_1127 = [
    (1, "2019-07-01", "mobile", 100),
    (1, "2019-07-01", "desktop", 100),
    (2, "2019-07-01", "mobile", 100),
    (2, "2019-07-02", "mobile", 100),
    (3, "2019-07-01", "desktop", 100),
    (3, "2019-07-02", "desktop", 100),
]

spending_columns_1127 = ["user_id", "spend_date", "platform", "amount"]
spending_df_1127 = spark.createDataFrame(spending_data_1127, spending_columns_1127)
spending_df_1127.show()

+-------+----------+--------+------+
|user_id|spend_date|platform|amount|
+-------+----------+--------+------+
|      1|2019-07-01|  mobile|   100|
|      1|2019-07-01| desktop|   100|
|      2|2019-07-01|  mobile|   100|
|      2|2019-07-02|  mobile|   100|
|      3|2019-07-01| desktop|   100|
|      3|2019-07-02| desktop|   100|
+-------+----------+--------+------+



In [0]:
platforms_df_1127 = spending_df_1127\
                .groupBy("user_id", "spend_date") \
                .pivot("platform", ["desktop", "mobile"]) \
                .agg(sum("amount"))

In [0]:
user_types_df_1127 = platforms_df_1127\
                        .withColumn("platform_type", 
                                    when(col("desktop").isNotNull() & col("mobile").isNotNull(), "both")
                                    .when(col("desktop").isNotNull(), "desktop")
                                    .when(col("mobile").isNotNull(), "mobile")
                        )

In [0]:
result_df_1127 = user_types_df_1127\
                    .join(spending_df_1127, on=["user_id", "spend_date"], how="inner")\
                    .groupBy("spend_date", "platform_type") \
                                .agg(
                                    sum("amount").alias("total_amount"),
                                    countDistinct("user_id").alias("total_users")
                                )

In [0]:
platform_list_1127 = ["desktop", "mobile", "both"]
dates_1127 = spending_df_1127.select("spend_date").distinct()

In [0]:
all_platforms_df_1127 = dates_1127.withColumn("platform_type", explode(array([lit(p) for p in platform_list_1127])))

In [0]:
all_platforms_df_1127\
    .join(result_df_1127, on=["spend_date", "platform_type"], how="left") \
                     .na.fill(0)\
                         .select("spend_date", "platform_type", "total_amount", "total_users") \
                        .orderBy("spend_date", "platform_type").show()

+----------+-------------+------------+-----------+
|spend_date|platform_type|total_amount|total_users|
+----------+-------------+------------+-----------+
|2019-07-01|         both|         200|          1|
|2019-07-01|      desktop|         100|          1|
|2019-07-01|       mobile|         100|          1|
|2019-07-02|         both|           0|          0|
|2019-07-02|      desktop|         100|          1|
|2019-07-02|       mobile|         100|          1|
+----------+-------------+------------+-----------+

