# Calculating the Median from a Frequency Distribution Using Pandas

In data analysis, the median is a crucial statistical measure that represents the middle value of a dataset, effectively separating the higher half from the lower half. When dealing with frequency distributions, calculating the median requires decompressing the data based on the given frequencies. In this task, we'll determine the median of all numbers in a database by decompressing a frequency table using Python's Pandas library. The final median will be rounded to one decimal point.

## Problem Statement

You are provided with a **Numbers** table that contains numbers and their corresponding frequencies. Your goal is to calculate the median of all numbers after decompressing the table.

### Numbers Table

| Column Name | Type |
|-------------|------|
| num         | int  |
| frequency   | int  |

- **num**: Primary key (unique identifier for each number).
- **frequency**: The number of times `num` appears in the dataset.

Each row in this table indicates how many times a particular number (`num`) occurs in the database.

### Objective

Write a solution to report the **median** of all the numbers in the database after decompressing the Numbers table. The median should be rounded to one decimal point.

### Definition of Median

The median is the value that separates the higher half from the lower half of a data sample. For datasets with an odd number of observations, the median is the middle number. For datasets with an even number of observations, the median is the average of the two middle numbers.

### Output Format

Return the result in a table with a single column named `median`.

| Column Name | Type  |
|-------------|-------|
| median      | float |

## Example

### Input

**Numbers Table:**

| num | frequency |
|-----|-----------|
| 0   | 7         |
| 1   | 1         |
| 2   | 3         |
| 3   | 1         |

### Output

| median |
|--------|
| 0.0    |

### Explanation

If we decompress the Numbers table, we will get [0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3], so the median is (0 + 0) / 2 = 0.


In [3]:
import pandas as pd

data = [[0, 7], 
        [1, 1], 
        [2, 3], 
        [3, 1]]
numbers = pd.DataFrame(data, 
                       columns=['num', 
                                'frequency']).astype({'num':'Int64', 
                                                      'frequency':'Int64'})

display(numbers)

Unnamed: 0,num,frequency
0,0,7
1,1,1
2,2,3
3,3,1


**Step 1. Decompressing the Numbers Table**

- numbers['num']: Selects the num column from the numbers DataFrame.
- .repeat(numbers['frequency']): Repeats each number in the num column according to its corresponding frequency.
- .to_frame(): Converts the resulting Series into a DataFrame.

In [5]:
df = numbers['num'].repeat(numbers['frequency']).to_frame()

display(df)

Unnamed: 0,num
0,0
0,0
0,0
0,0
0,0
0,0
0,0
1,1
2,2
2,2


**Step 2. Calculating the Median**

- .median(): Computes the median of the num column in the DataFrame.
- .to_frame('median'): Converts the resulting median value into a DataFrame with the column name median.
- .round(1): Rounds the median value to one decimal place.

In [7]:
df = df.median().to_frame('median').round(1)

display(df)

Unnamed: 0,median
num,0.0


References:
[1] https://leetcode.com/problems/find-median-given-frequency-of-numbers/