Pivot the Occupation column in OCCUPATIONS so that each Name is sorted alphabetically and displayed underneath its corresponding Occupation. The output should consist of four columns (Doctor, Professor, Singer, and Actor) in that specific order, with their respective names listed alphabetically under each column.

Note: Print NULL when there are no more names corresponding to an occupation.

Input Format

![Output](https://s3.amazonaws.com/hr-challenge-images/12889/1443816414-2a465532e7-1.png)


The OCCUPATIONS table is described as follows:



Occupation will only contain one of the following values: Doctor, Professor, Singer or Actor.

Sample Input

![Output](https://s3.amazonaws.com/hr-challenge-images/12890/1443817648-1b2b8add45-2.png)


Sample Output

Jenny    Ashley     Meera  Jane
Samantha Christeen  Priya  Julia
NULL     Ketty      NULL   Maria
Explanation

The first column is an alphabetically ordered list of Doctor names.
The second column is an alphabetically ordered list of Professor names.
The third column is an alphabetically ordered list of Singer names.
The fourth column is an alphabetically ordered list of Actor names.
The empty cell data for columns with less than the maximum number of names per occupation (in this case, the Professor and Actor columns) are filled with NULL values.

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType

# Initialize Spark session
spark = SparkSession.builder.appName("Occupations").getOrCreate()

# Define schema
schema = StructType([
    StructField("Name", StringType(), True),
    StructField("Occupation", StringType(), True)
])

# Sample Data
data = [
    ("Jenny", "Doctor"),
    ("Samantha", "Doctor"),
    ("Ashley", "Professor"),
    ("Christeen", "Professor"),
    ("Ketty", "Professor"),
    ("Meera", "Singer"),
    ("Priya", "Singer"),
    ("Jane", "Actor"),
    ("Julia", "Actor"),
    ("Maria", "Actor")
]

# Create DataFrame
df = spark.createDataFrame(data, schema=schema)

# Save DataFrame as a temporary table
df.createOrReplaceTempView("OCCUPATIONS")


In [0]:
%sql
WITH CTE
AS (
    SELECT *, row_number() over(PARTITION BY Occupation ORDER BY Name ASC) AS R
    FROM OCCUPATIONS)
SELECT Doctor, Professor, Singer, Actor
FROM CTE
PIVOT (
MAX(Name)
    FOR Occupation IN ("Doctor", "Professor","Actor","Singer")
)

Doctor,Professor,Singer,Actor
Jenny,Ashley,Meera,Jane
Samantha,Christeen,Priya,Julia
,Ketty,,Maria


In [0]:
from pyspark.sql import Window
from pyspark.sql.functions import *

windowSpec = Window.partitionBy("Occupation").orderBy("Name")

df_1 =  df.withColumn("row_number", row_number().over(windowSpec))

df_pivot = df_1.groupBy("row_number").pivot("Occupation").agg(first("Name"))

df_pivot.show()


+----------+-----+--------+---------+------+
|row_number|Actor|  Doctor|Professor|Singer|
+----------+-----+--------+---------+------+
|         1| Jane|   Jenny|   Ashley| Meera|
|         2|Julia|Samantha|Christeen| Priya|
|         3|Maria|    null|    Ketty|  null|
+----------+-----+--------+---------+------+

