# HACKER RANK
## PRoblem: Occupations
Class: Medium

Source: https://www.hackerrank.com/challenges/occupations/problem?isFullScreen=true&h_r=next-challenge&h_v=zen

## Description
Pivot the Occupation column in OCCUPATIONS so that each Name is sorted alphabetically and displayed underneath its corresponding Occupation. The output column headers should be Doctor, Professor, Singer, and Actor, respectively.

Note: Print NULL when there are no more names corresponding to an occupation.

Input Format

The OCCUPATIONS table is described as follows:

![image](https://s3.amazonaws.com/hr-challenge-images/12889/1443816414-2a465532e7-1.png)

Occupation will only contain one of the following values: Doctor, Professor, Singer or Actor.

## Example

An OCCUPATIONS table that contains the following records:

![example](https://s3.amazonaws.com/hr-challenge-images/12889/1443816608-0b4d01d157-2.png)

Sample Output
```
Jenny    Ashley     Meera  Jane
Samantha Christeen  Priya  Julia
NULL     Ketty      NULL   Maria
```
Explanation
- The first column is an alphabetically ordered list of Doctor names.
- The second column is an alphabetically ordered list of Professor names.
- The third column is an alphabetically ordered list of Singer names.
- The fourth column is an alphabetically ordered list of Actor names.
- The empty cell data for columns with less than the maximum number of names per occupation (in this case, the Professor and Actor columns) are filled with NULL values.

## Importing

In [1]:
import pandas as pd
from pandasql import sqldf

## Define Schema

In [2]:
# Define the data as a list of lists
data = [
    ('Ashley', 'Professor'),
    ('Samantha', 'Actor'),
    ('Julia', 'Doctor'),
    ('Britney', 'Professor'),
    ('Maria', 'Professor'),
    ('Meera', 'Professor'),
    ('Priya', 'Doctor'),
    ('Priyanka', 'Professor'),
    ('Jennifer', 'Actor'),
    ('Ketty', 'Actor'),
    ('Belvet', 'Professor'),
    ('Naomi', 'Professor'),
    ('Jane', 'Singer'),
    ('Jenny', 'Singer'),
    ('Kristeen', 'Singer'),
    ('Christeen', 'Singer'),
    ('Eve', 'Actor'),
    ('Aamina', 'Doctor')
]

# Create a DataFrame from the list of data
df = pd.DataFrame(data, columns=['Name', 'Occupation'])

df

Unnamed: 0,Name,Occupation
0,Ashley,Professor
1,Samantha,Actor
2,Julia,Doctor
3,Britney,Professor
4,Maria,Professor
5,Meera,Professor
6,Priya,Doctor
7,Priyanka,Professor
8,Jennifer,Actor
9,Ketty,Actor


## Task

In [10]:
# Define the SQL query
query = """
SELECT
    MAX(CASE WHEN occupation = 'Doctor' THEN name ELSE NULL END) AS Doctor,
    MAX(CASE WHEN occupation = 'Professor' THEN name ELSE NULL END) AS Professor,
    MAX(CASE WHEN occupation = 'Singer' THEN name ELSE NULL END) AS Singer,
    MAX(CASE WHEN occupation = 'Actor' THEN name ELSE NULL END) AS Actor
FROM (
    SELECT
        name,
        occupation,
        ROW_NUMBER() OVER(PARTITION BY occupation ORDER BY name) AS rn
    FROM OCCUPATIONS
) AS tbl
GROUP BY rn
ORDER BY rn;
"""

# Excute the query using pandasql
result = sqldf(query, env={'OCCUPATIONS':df})

# Display the result dataframe
display(result)

Unnamed: 0,Doctor,Professor,Singer,Actor
0,Aamina,Ashley,Christeen,Eve
1,Julia,Belvet,Jane,Jennifer
2,Priya,Britney,Jenny,Ketty
3,,Maria,Kristeen,Samantha
4,,Meera,,
5,,Naomi,,
6,,Priyanka,,
