## Importing Libraries

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import Window

**1251. Average Selling Price (Easy)**

**Table: Prices**

| Column Name   | Type    |
|---------------|---------|
| product_id    | int     |
| start_date    | date    |
| end_date      | date    |
| price         | int     |

(product_id, start_date, end_date) is the primary key (combination of columns with unique values) for this table.
Each row of this table indicates the price of the product_id in the period from start_date to end_date.
For each product_id there will be no two overlapping periods. That means there will be no two intersecting periods for the same product_id.
 
**Table: UnitsSold**

| Column Name   | Type    |
|---------------|---------|
| product_id    | int     |
| purchase_date | date    |
| units         | int     |

This table may contain duplicate rows.
Each row of this table indicates the date, units, and product_id of each product sold. 
 
**Write a solution to find the average selling price for each product. average_price should be rounded to 2 decimal places. If a product does not have any sold units, its average selling price is assumed to be 0.**

Return the result table in any order.

The result format is in the following example.

**Example 1:**

**Input:**
**Prices table:**
| product_id | start_date | end_date   | price  |
|------------|------------|------------|--------|
| 1          | 2019-02-17 | 2019-02-28 | 5      |
| 1          | 2019-03-01 | 2019-03-22 | 20     |
| 2          | 2019-02-01 | 2019-02-20 | 15     |
| 2          | 2019-02-21 | 2019-03-31 | 30     |

**UnitsSold table:**
| product_id | purchase_date | units |
|------------|---------------|-------|
| 1          | 2019-02-25    | 100   |
| 1          | 2019-03-01    | 15    |
| 2          | 2019-02-10    | 200   |
| 2          | 2019-03-22    | 30    |

**Output:**
| product_id | average_price |
|------------|---------------|
| 1          | 6.96          |
| 2          | 16.96         |

**Explanation:**
- Average selling price = Total Price of Product / Number of products sold.
- Average selling price for product 1 = ((100 * 5) + (15 * 20)) / 115 = 6.96
- Average selling price for product 2 = ((200 * 15) + (30 * 30)) / 230 = 16.96

In [0]:
prices_data_1251 = [
    (1, "2019-02-17", "2019-02-28", 5),
    (1, "2019-03-01", "2019-03-22", 20),
    (2, "2019-02-01", "2019-02-20", 15),
    (2, "2019-02-21", "2019-03-31", 30),
]

prices_columns_1251 = ["product_id", "start_date", "end_date", "price"]
prices_df_1251 = spark.createDataFrame(prices_data_1251, prices_columns_1251)
prices_df_1251.show()

units_sold_data_1251 = [
    (1, "2019-02-25", 100),
    (1, "2019-03-01", 15),
    (2, "2019-02-10", 200),
    (2, "2019-03-22", 30),
]

units_sold_columns_1251 = ["product_id", "purchase_date", "units"]
units_sold_df_1251 = spark.createDataFrame(units_sold_data_1251, units_sold_columns_1251)
units_sold_df_1251.show()

In [0]:
prices_df_1251 = prices_df_1251\
                    .withColumn("start_date", to_date("start_date")) \
                    .withColumn("end_date", to_date("end_date"))

units_sold_df_1251 = units_sold_df_1251\
                    .withColumn("purchase_date", to_date("purchase_date"))

In [0]:
joined_df_1251 = units_sold_df_1251\
                    .join(prices_df_1251,on="product_id")\
                        .where((col("purchase_date") >= col("start_date")) & (col("purchase_date") <= col("end_date")))\
                            .withColumn("revenue", col("price") * col("units"))

In [0]:
agg_df_1251 = joined_df_1251.groupBy("product_id")\
                    .agg(
                        sum("revenue").alias("total_revenue"),
                        sum("units").alias("total_units")
                    )\
                    .withColumn("average_price", round(col("total_revenue") / col("total_units"), 2)) \
                    .select("product_id", "average_price")

In [0]:
all_products_df_1251 = prices_df_1251.select("product_id").distinct()

all_products_df_1251.join(agg_df_1251, on="product_id", how="left").fillna(0).show()