## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**1321. Restaurant Growth (Medium)**

**Table: Customer**

| Column Name   | Type    |
|---------------|---------|
| customer_id   | int     |
| name          | varchar |
| visited_on    | date    |
| amount        | int     |

In SQL,(customer_id, visited_on) is the primary key for this table.
This table contains data about customer transactions in a restaurant.
visited_on is the date on which the customer with ID (customer_id) has visited the restaurant.
amount is the total paid by a customer.
 
You are the restaurant owner and you want to analyze a possible expansion (there will be at least one customer every day).

**Compute the moving average of how much the customer paid in a seven days window (i.e., current day + 6 days before). average_amount should be rounded to two decimal places.**

Return the result table ordered by visited_on in ascending order.

The result format is in the following example.

**Example 1:**

**Input:**
**Customer table:**
| customer_id | name         | visited_on   | amount      |
|-------------|--------------|--------------|-------------|
| 1           | Jhon         | 2019-01-01   | 100         |
| 2           | Daniel       | 2019-01-02   | 110         |
| 3           | Jade         | 2019-01-03   | 120         |
| 4           | Khaled       | 2019-01-04   | 130         |
| 5           | Winston      | 2019-01-05   | 110         | 
| 6           | Elvis        | 2019-01-06   | 140         | 
| 7           | Anna         | 2019-01-07   | 150         |
| 8           | Maria        | 2019-01-08   | 80          |
| 9           | Jaze         | 2019-01-09   | 110         | 
| 1           | Jhon         | 2019-01-10   | 130         | 
| 3           | Jade         | 2019-01-10   | 150         | 

**Output:**
| visited_on   | amount       | average_amount |
|--------------|--------------|----------------|
| 2019-01-07   | 860          | 122.86         |
| 2019-01-08   | 840          | 120            |
| 2019-01-09   | 840          | 120            |
| 2019-01-10   | 1000         | 142.86         |

**Explanation:**
- 1st moving average from 2019-01-01 to 2019-01-07 has an average_amount of (100 + 110 + 120 + 130 + 110 + 140 + 150)/7 = 122.86
- 2nd moving average from 2019-01-02 to 2019-01-08 has an average_amount of (110 + 120 + 130 + 110 + 140 + 150 + 80)/7 = 120
- 3rd moving average from 2019-01-03 to 2019-01-09 has an average_amount of (120 + 130 + 110 + 140 + 150 + 80 + 110)/7 = 120
- 4th moving average from 2019-01-04 to 2019-01-10 has an average_amount of (130 + 110 + 140 + 150 + 80 + 110 + 130 + 150)/7 = 142.86

In [0]:
customer_data_1321 = [
    (1, "Jhon", "2019-01-01", 100),
    (2, "Daniel", "2019-01-02", 110),
    (3, "Jade", "2019-01-03", 120),
    (4, "Khaled", "2019-01-04", 130),
    (5, "Winston", "2019-01-05", 110),
    (6, "Elvis", "2019-01-06", 140),
    (7, "Anna", "2019-01-07", 150),
    (8, "Maria", "2019-01-08", 80),
    (9, "Jaze", "2019-01-09", 110),
    (1, "Jhon", "2019-01-10", 130),
    (3, "Jade", "2019-01-10", 150),
]

customer_columns_1321 = ["customer_id", "name", "visited_on", "amount"]
customer_df_1321 = spark.createDataFrame(customer_data_1321, customer_columns_1321)
customer_df_1321.show()

+-----------+-------+----------+------+
|customer_id|   name|visited_on|amount|
+-----------+-------+----------+------+
|          1|   Jhon|2019-01-01|   100|
|          2| Daniel|2019-01-02|   110|
|          3|   Jade|2019-01-03|   120|
|          4| Khaled|2019-01-04|   130|
|          5|Winston|2019-01-05|   110|
|          6|  Elvis|2019-01-06|   140|
|          7|   Anna|2019-01-07|   150|
|          8|  Maria|2019-01-08|    80|
|          9|   Jaze|2019-01-09|   110|
|          1|   Jhon|2019-01-10|   130|
|          3|   Jade|2019-01-10|   150|
+-----------+-------+----------+------+



In [0]:
customer_df_1321 = customer_df_1321.withColumn("visited_on", to_date("visited_on"))

In [0]:
daily_totals_df_1321 = customer_df_1321.\
                            groupBy("visited_on").agg(sum("amount").alias("amount"))

In [0]:
moving_avg_df_1321 = daily_totals_df_1321.alias("a")\
    .join(daily_totals_df_1321.alias("b"),(col("b.visited_on") <= col("a.visited_on")) &(col("b.visited_on") >= expr("date_sub(a.visited_on, 6)")))\
        .groupBy("a.visited_on").agg(sum("b.amount").alias("total_amount"))


In [0]:
min_date = customer_df_1321.selectExpr("MIN(visited_on) as min_date").collect()[0]["min_date"]
min_date_plus_6 = spark.sql(f"SELECT date_add('{min_date}', 6) as cutoff").collect()[0]["cutoff"]


In [0]:
moving_avg_df_1321 \
    .filter(col("visited_on") >= min_date_plus_6) \
    .select(
        col("visited_on"),
        col("total_amount").alias("amount"),
        round((col("total_amount") / 7), 2).alias("average_amount")
    ) \
    .orderBy("visited_on").show()


+----------+------+--------------+
|visited_on|amount|average_amount|
+----------+------+--------------+
|2019-01-07|   860|        122.86|
|2019-01-08|   840|         120.0|
|2019-01-09|   840|         120.0|
|2019-01-10|  1000|        142.86|
+----------+------+--------------+

