<pre>
Table: Employee

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| empId       | int     |
| name        | varchar |
| supervisor  | int     |
| salary      | int     |
+-------------+---------+
empId is the column with unique values for this table.
Each row of this table indicates the name and the ID of an employee in addition to their salary and the id of their manager.
 

Table: Bonus

+-------------+------+
| Column Name | Type |
+-------------+------+
| empId       | int  |
| bonus       | int  |
+-------------+------+
empId is the column of unique values for this table.
empId is a foreign key (reference column) to empId from the Employee table.
Each row of this table contains the id of an employee and their respective bonus.
 

Write a solution to report the name and bonus amount of each employee with a bonus less than 1000.

Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Employee table:
+-------+--------+------------+--------+
| empId | name   | supervisor | salary |
+-------+--------+------------+--------+
| 3     | Brad   | null       | 4000   |
| 1     | John   | 3          | 1000   |
| 2     | Dan    | 3          | 2000   |
| 4     | Thomas | 3          | 4000   |
+-------+--------+------------+--------+
Bonus table:
+-------+-------+
| empId | bonus |
+-------+-------+
| 2     | 500   |
| 4     | 2000  |
+-------+-------+
Output: 
+------+-------+
| name | bonus |
+------+-------+
| Brad | null  |
| John | null  |
| Dan  | 500   |
+------+-------+
</pre>

In [0]:
spark

In [0]:
# importing pyspark sql functions
from pyspark.sql.functions import *

# importing sql types from pyspark
from pyspark.sql.types import StructType, StructField, StringType, TimestampType, DoubleType, IntegerType, DateType, FloatType

# importing SparkSession
from pyspark.sql import SparkSession


In [0]:
# creating spark session and providing app name
spark = SparkSession.builder.appName("leetcode-top-50-sql-solution-with-pyspark").getOrCreate()

In [0]:
# creating Schema
# Define the schema for the Employee table
employee_schema = StructType([
    StructField("empId", IntegerType(), False),
    StructField("name", StringType(), True),
    StructField("supervisor", IntegerType(), True),
    StructField("salary", IntegerType(), True)
])


# Define the schema for the Bonus table
bonus_schema = StructType([
    StructField("empId", IntegerType(), False),
    StructField("bonus", IntegerType(), True)
])




In [0]:

employee_df = spark.createDataFrame([
    (3, "Brad", None, 4000),
    (1, "John", 3, 1000),
    (2, "Dan", 3, 2000),
    (4, "Thomas", 3, 4000)
], schema=employee_schema)


bonus_df = spark.createDataFrame([
    (2, 500),
    (4, 2000)
], schema=bonus_schema)








In [0]:
employee_df.display()

empId,name,supervisor,salary
3,Brad,,4000
1,John,3.0,1000
2,Dan,3.0,2000
4,Thomas,3.0,4000


In [0]:
bonus_df.display()

empId,bonus
2,500
4,2000


In [0]:
# Leetcode Solution in Spark SQL
# Creating Temporary view for the product dataframe for sql queries
employee_df.createOrReplaceTempView('employee')
bonus_df.createOrReplaceTempView('bonus')


sql_result = spark.sql(
    '''
    SELECT
    emp.name AS name,
    bo.bonus AS bonus
    FROM employee AS emp
    LEFT JOIN bonus as bo
    ON emp.empId = bo.empId
    WHERE bo.bonus IS NULL OR bo.bonus < 1000;
    
    '''
)

# Displaying Result
sql_result.display()

name,bonus
Brad,
John,
Dan,500.0


In [0]:
# Leetcode Solution in PySpark

#Joining the employee and bonus dataframe
joined_df = employee_df.join(bonus_df, employee_df.empId == bonus_df.empId,"left")

# filtering the data based on bonus less than 1000
joined_df.select('name','bonus').filter((col("bonus").isNull()) | (col("bonus") < 1000)  ).display()

name,bonus
Brad,
John,
Dan,500.0
