## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window
import datetime

**2994. Friday Purchases II (Hard)**

**Table: Purchases**

| Column Name   | Type |
|---------------|------|
| user_id       | int  |
| purchase_date | date |
| amount_spend  | int  |

(user_id, purchase_date, amount_spend) is the primary key (combination of columns with unique values) for this table.
purchase_date will range from November 1, 2023, to November 30, 2023, inclusive of both dates.
Each row contains user id, purchase date, and amount spend.

**Write a solution to calculate the total spending by users on each Friday of every week in November 2023. If there are no purchases on a particular Friday of a week, it will be considered as 0.**

Return the result table ordered by week of month in ascending order.

The result format is in the following example.

**Example 1:**

**Input:**
**Purchases table:**

| user_id | purchase_date | amount_spend |
|---------|---------------|--------------|
| 11      | 2023-11-07    | 1126         |
| 15      | 2023-11-30    | 7473         |
| 17      | 2023-11-14    | 2414         |
| 12      | 2023-11-24    | 9692         |
| 8       | 2023-11-03    | 5117         |
| 1       | 2023-11-16    | 5241         |
| 10      | 2023-11-12    | 8266         |
| 13      | 2023-11-24    | 12000        |

**Output:**

| week_of_month | purchase_date | total_amount |
|---------------|---------------|--------------|
| 1             | 2023-11-03    | 5117         |
| 2             | 2023-11-10    | 0            |
| 3             | 2023-11-17    | 0            |
| 4             | 2023-11-24    | 21692        |

**Explanation:**
- During the first week of November 2023, transactions amounting to $5,117 occurred on Friday, 2023-11-03.
- For the second week of November 2023, there were no transactions on Friday, 2023-11-10, resulting in a value of 0 in the output table for that day.
- Similarly, during the third week of November 2023, there were no transactions on Friday, 2023-11-17, reflected as 0 in the output table for that specific day.
- In the fourth week of November 2023, two transactions took place on Friday, 2023-11-24, amounting to $12,000 and $9,692 respectively, summing up to a total of $21,692.

Output table is ordered by week_of_month in ascending order.

In [0]:
purchases_data_2994 = [
    (11, "2023-11-07", 1126),
    (15, "2023-11-30", 7473),
    (17, "2023-11-14", 2414),
    (12, "2023-11-24", 9692),
    (8, "2023-11-03", 5117),
    (1, "2023-11-16", 5241),
    (10, "2023-11-12", 8266),
    (13, "2023-11-24", 12000)
]

purchases_columns_2994 = ["user_id","purchase_date","amount_spend"]
purchases_df_2994 = spark.createDataFrame(purchases_data_2994, purchases_columns_2994)
purchases_df_2994.show()

+-------+-------------+------------+
|user_id|purchase_date|amount_spend|
+-------+-------------+------------+
|     11|   2023-11-07|        1126|
|     15|   2023-11-30|        7473|
|     17|   2023-11-14|        2414|
|     12|   2023-11-24|        9692|
|      8|   2023-11-03|        5117|
|      1|   2023-11-16|        5241|
|     10|   2023-11-12|        8266|
|     13|   2023-11-24|       12000|
+-------+-------------+------------+



In [0]:
purchases_df_2994 = purchases_df_2994\
                        .withColumn("purchase_date", col("purchase_date").cast(DateType()))

In [0]:
nov_fridays = [datetime.date(2023, 11, d) for d in range(1, 31)
               if datetime.date(2023, 11, d).weekday() == 4]

In [0]:
fridays_df = spark.createDataFrame([(d,) for d in nov_fridays], ["purchase_date"]) \
                  .withColumn("week_of_month", ceil(dayofmonth(col("purchase_date")) / 7))

In [0]:
df_friday_sales_2994 = purchases_df_2994\
                            .withColumn("week_of_month", ceil(dayofmonth(col("purchase_date")) / 7)) \
                                .groupBy("purchase_date", "week_of_month") \
                                    .agg(sum("amount_spend").alias("total_amount"))

In [0]:
fridays_df\
    .join(df_friday_sales_2994, ["purchase_date", "week_of_month"], "left") \
        .fillna(0, subset=["total_amount"]) \
            .orderBy("week_of_month").display()

purchase_date,week_of_month,total_amount
2023-11-03,1,5117
2023-11-10,2,0
2023-11-17,3,0
2023-11-24,4,21692
