# Pivoting Student Data by Continent Using Pandas

Analyzing student distributions across different continents can provide valuable insights for educational institutions. By organizing student names under their respective continents, schools can better understand demographic trends, allocate resources effectively, and tailor programs to meet diverse needs. In this tutorial, we'll explore how to pivot student data by continent using Python's Pandas library.

## Problem Statement

You are provided with a **Student** table that contains information about students and the continents they originate from. The goal is to pivot the `continent` column so that each continent becomes a separate column header, and the student names are listed underneath their corresponding continent. The student names within each continent should be sorted alphabetically.

### Table Schemas

**Student Table**

| Column Name | Type    |
|-------------|---------|
| name        | str     |
| continent   | str     |

- **Primary Key**: None specified. The table may contain duplicate rows.
- **Description**: Each row represents a student, indicating their name and the continent they belong to.

## Objective
A school has students from Asia, Europe, and America.

Write a solution to pivot the continent column in the Student table so that each name is sorted alphabetically and displayed underneath its corresponding continent. The output headers should be America, Asia, and Europe, respectively.

The test cases are generated so that the student number from America is not less than either Asia or Europe.

The result format is in the following example.

## Example

### Input

**Student Table:**

| name   | continent |
|--------|-----------|
| Jane   | America   |
| Pascal | Europe    |
| Xi     | Asia      |
| Jack   | America   |

### Output

| America | Asia | Europe |
|---------|------|--------|
| Jack    | Xi   | Pascal |
| Jane    | null | null   |

### Explanation

- **America**:
  - Students: `Jack`, `Jane` (sorted alphabetically).
- **Asia**:
  - Student: `Xi`.
- **Europe**:
  - Student: `Pascal`.

Since there are more students in America than in Asia or Europe, the additional cells under Asia and Europe are filled with `null`.


In [3]:
import pandas as pd

data = [['Jane', 'America'], 
        ['Pascal', 'Europe'], 
        ['Xi', 'Asia'], 
        ['Jack', 'America']]
student = pd.DataFrame(data, 
                       columns=['name', 
                                'continent']).astype({'name':'object', 
                                                      'continent':'object'})
display(student)

Unnamed: 0,name,continent
0,Jane,America
1,Pascal,Europe
2,Xi,Asia
3,Jack,America


**Step 1. Filtering and Organizing Students from America**
- Filtering: student['continent'] == 'America': Creates a boolean mask that selects rows where the continent column is 'America'.
- Filtering: student[...][['name']]: Applies the mask to the student DataFrame and selects only the name column.
- Renaming: .rename(columns={'name': 'America'}): Renames the name column to 'America' to reflect the continent.
- Sorting: .sort_values(by='America'): Sorts the DataFrame alphabetically based on the 'America' column.
- Resetting Index: .reset_index(drop=True): Resets the index of the DataFrame to ensure a clean, continuous index starting from 0. The drop=True parameter ensures the old index is discarded.

In [5]:
df_america = student[student['continent']=='America'][['name']]
df_america = df_america.rename(columns={'name': 'America'}).sort_values(by='America').reset_index(drop=True)

display(df_america)

Unnamed: 0,America
0,Jack
1,Jane


**Step 2. Filtering and Organizing Students from Asia**
- Filtering: Selects rows where continent is 'Asia' and extracts the name column.
- Renaming: Renames the name column to 'Asia'.
- Sorting: Sorts the names alphabetically within the 'Asia' column.
- Resetting Index: Resets the index for alignment during concatenation.

In [7]:
df_asia = student[student['continent']=='Asia'][['name']]
df_asia = df_asia.rename(columns={'name': 'Asia'}).sort_values(by='Asia').reset_index(drop=True)

display(df_asia)

Unnamed: 0,Asia
0,Xi


**Step 3. Filtering and Organizing Students from Europe**
- Filtering: Selects rows where continent is 'Europe' and extracts the name column.
- Renaming: Renames the name column to 'Europe'.
- Sorting: Sorts the names alphabetically within the 'Europe' column.
- Resetting Index: Resets the index for alignment during concatenation.

In [9]:
df_europe = student[student['continent']=='Europe'][['name']]
df_europe = df_europe.rename(columns={'name': 'Europe'}).sort_values(by='Europe').reset_index(drop=True)

display(df_europe)

Unnamed: 0,Europe
0,Pascal


**Step 4. Concatenating the DataFrames**
- [df_america, df_asia, df_europe]: The list of DataFrames to concatenate.
- axis=1: Specifies that the concatenation should occur column-wise (i.e., side by side).
- Result: Combines the three DataFrames into a single DataFrame result with columns 'America', 'Asia', and 'Europe'.

In [11]:
result = pd.concat([df_america, df_asia, df_europe], axis=1)
result

Unnamed: 0,America,Asia,Europe
0,Jack,Xi,Pascal
1,Jane,,


References: [1] https://leetcode.com/problems/students-report-by-geography/?lang=pythondata