<pre>
Table: Sales

+-------------+-------+
| Column Name | Type  |
+-------------+-------+
| sale_id     | int   |
| product_id  | int   |
| year        | int   |
| quantity    | int   |
| price       | int   |
+-------------+-------+
(sale_id, year) is the primary key (combination of columns with unique values) of this table.
product_id is a foreign key (reference column) to Product table.
Each row of this table shows a sale on the product product_id in a certain year.
Note that the price is per unit.
 

Table: Product

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| product_id   | int     |
| product_name | varchar |
+--------------+---------+
product_id is the primary key (column with unique values) of this table.
Each row of this table indicates the product name of each product.
 

Write a solution to report the product_name, year, and price for each sale_id in the Sales table.

Return the resulting table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Sales table:
+---------+------------+------+----------+-------+
| sale_id | product_id | year | quantity | price |
+---------+------------+------+----------+-------+ 
| 1       | 100        | 2008 | 10       | 5000  |
| 2       | 100        | 2009 | 12       | 5000  |
| 7       | 200        | 2011 | 15       | 9000  |
+---------+------------+------+----------+-------+
Product table:
+------------+--------------+
| product_id | product_name |
+------------+--------------+
| 100        | Nokia        |
| 200        | Apple        |
| 300        | Samsung      |
+------------+--------------+
Output: 
+--------------+-------+-------+
| product_name | year  | price |
+--------------+-------+-------+
| Nokia        | 2008  | 5000  |
| Nokia        | 2009  | 5000  |
| Apple        | 2011  | 9000  |
+--------------+-------+-------+
Explanation: 
From sale_id = 1, we can conclude that Nokia was sold for 5000 in the year 2008.
From sale_id = 2, we can conclude that Nokia was sold for 5000 in the year 2009.
From sale_id = 7, we can conclude that Apple was sold for 9000 in the year 2011.
</pre>

In [0]:
spark

In [0]:
# importing pyspark sql functions
from pyspark.sql.functions import *

# importing sql types from pyspark
from pyspark.sql.types import StructType, StructField, StringType, TimestampType, DoubleType, IntegerType, DateType

# importing SparkSession
from pyspark.sql import SparkSession


In [0]:
# creating spark session and providing app name
spark = SparkSession.builder.appName("leetcode-top-50-sql-solution-with-pyspark").getOrCreate()

In [0]:
# creating Schema
# Define the schema for the Sales table
sales_schema = StructType([
    StructField("sale_id", IntegerType(), nullable=False),
    StructField("product_id", IntegerType(), nullable=False),
    StructField("year", IntegerType(), nullable=False),
    StructField("quantity", IntegerType(), nullable=False),
    StructField("price", IntegerType(), nullable=False)
])

# Define the schema for the Product table
product_schema = StructType([
    StructField("product_id", IntegerType(), nullable=False),
    StructField("product_name", StringType(), nullable=False)
])


In [0]:

sales_df = spark.createDataFrame([
    (1, 100, 2008, 10, 5000),
    (2, 100, 2009, 12, 5000),
    (7, 200, 2011, 15, 9000)
], schema=sales_schema)

product_df = spark.createDataFrame([
    (100, "Nokia"),
    (200, "Apple"),
    (300, "Samsung")
], schema=product_schema)




In [0]:
sales_df.display()

sale_id,product_id,year,quantity,price
1,100,2008,10,5000
2,100,2009,12,5000
7,200,2011,15,9000


In [0]:
product_df.display()

product_id,product_name
100,Nokia
200,Apple
300,Samsung


In [0]:
# Leetcode Solution in Spark SQL
# Creating Temporary view for the product dataframe for sql queries
sales_df.createOrReplaceTempView('Sales')
product_df.createOrReplaceTempView('Product')
sql_result = spark.sql(
    '''
    SELECT
    p.product_name,
    s.year,
    s.price
    FROM Sales as s
    JOIN Product as p
    ON p.product_id = s.product_id;
    '''
)

# Displaying Result
sql_result.display()

product_name,year,price
Nokia,2008,5000
Nokia,2009,5000
Apple,2011,9000


In [0]:
# Leet Code Solution in Data Frame
filter_result = sales_df.join(product_df,sales_df.product_id == product_df.product_id,"left").select('product_name','year','price')
# Displaying the filtered Result
filter_result.display()



product_name,year,price
Nokia,2008,5000
Nokia,2009,5000
Apple,2011,9000
