In [1]:
import polars as pl
print(pl.__version__)

0.20.31


### Count Salary Categories

#### Question

DataFrame: Accounts

| Column Name | Type |
|:-----------:|:----:|
| account_id  | int  |
| income      | int  |

account_id is the primary key (column with unique values) for this table.<br>
Each row contains information about the monthly income for one bank account.
 

Write a solution to calculate the number of bank accounts for each salary category. The salary categories are:

"Low Salary": All the salaries strictly less than $20000.<br>

"Average Salary": All the salaries in the inclusive range [$20000, $50000].<br>

"High Salary": All the salaries strictly greater than $50000.<br>

The result table must contain all three categories. If there are no accounts in a category, return 0.

Return the result table in any order.

The result format is in the following example.

Example:

Input:<br>
Accounts dataframe:

| account_id | income |
|:----------:|:------:|
| 3          | 108939 |
| 2          | 12747  |
| 8          | 87709  |
| 6          | 91796  |

Output: 

| category       | accounts_count |
|:--------------:|--------------:|
| Low Salary     | 1              |
| Average Salary | 0              |
| High Salary    | 3              |

Explanation:<br>
Low Salary: Account 2.<br>
Average Salary: No accounts.<br>
High Salary: Accounts 3, 6, and 8.<br>

#### Testcase

In [2]:
# Test data
data = [[3, 108939], [2, 12747], [8, 87709], [6, 91796]]

# Create the DataFrame
accounts = pl.DataFrame(
    data,
    schema=['account_id', 'income']
)

# Display the DataFrame
print(accounts)

shape: (4, 2)
┌────────────┬────────┐
│ account_id ┆ income │
│ ---        ┆ ---    │
│ i64        ┆ i64    │
╞════════════╪════════╡
│ 3          ┆ 108939 │
│ 2          ┆ 12747  │
│ 8          ┆ 87709  │
│ 6          ┆ 91796  │
└────────────┴────────┘


#### Solution

In [5]:
def count_salary_categories(accounts: pl.DataFrame) -> pl.DataFrame:
    
    # Calculate the number of accounts in each salary category
    low = accounts.filter(pl.col('income') < 20000).height
    avg = accounts.filter((pl.col('income') >= 20000) & (pl.col('income') <= 50000)).height
    high = accounts.filter(pl.col('income') > 50000).height
    
    # Create a new DataFrame with the result
    result = pl.DataFrame({
        'category': ['Low Salary', 'Average Salary', 'High Salary'],
        'accounts_count': [low, avg, high],
    })
    
    return result

# Display the result
print(count_salary_categories(accounts=accounts))


shape: (3, 2)
┌────────────────┬────────────────┐
│ category       ┆ accounts_count │
│ ---            ┆ ---            │
│ str            ┆ i64            │
╞════════════════╪════════════════╡
│ Low Salary     ┆ 1              │
│ Average Salary ┆ 0              │
│ High Salary    ┆ 3              │
└────────────────┴────────────────┘


### Immediate Food Delivery I

#### Question

DataFrame: Delivery

| Column Name                 | Type    |
|:---------------------------:|:-------:|
| delivery_id                 | int     |
| customer_id                 | int     |
| order_date                  | date    |
| customer_pref_delivery_date | date    |

delivery_id is the primary key (column with unique values) of this table.<br>
The table holds information about food delivery to customers that make orders at some date and specify a preferred delivery date (on the same order date or after it).

If the customer's preferred delivery date is the same as the order date, then the order is called immediate; otherwise, it is called scheduled.

Write a solution to find the percentage of immediate orders in the table, rounded to 2 decimal places.

The result format is in the following example.

Example:

Input:<br>
Delivery dataframe:

| delivery_id | customer_id | order_date | customer_pref_delivery_date |
|:-----------:|:-----------:|:----------:|---------------------------:|
| 1           | 1           | 2019-08-01 | 2019-08-02                  |
| 2           | 5           | 2019-08-02 | 2019-08-02                  |
| 3           | 1           | 2019-08-11 | 2019-08-11                  |
| 4           | 3           | 2019-08-24 | 2019-08-26                  |
| 5           | 4           | 2019-08-21 | 2019-08-22                  |
| 6           | 2           | 2019-08-11 | 2019-08-13                  |

Output: 

| immediate_percentage |
|:--------------------:|
| 33.33                |

Explanation: The orders with delivery id 2 and 3 are immediate while the others are scheduled.

#### Tescase

In [6]:
# Test data
data = [['2023-01-01', '2023-01-01'], ['2023-01-02', '2023-01-05'], ['2023-01-03', '2023-01-03'], ['2023-01-04', '2023-01-06']]

# Create the DataFrame
delivery = pl.DataFrame(
    data,
    schema=['order_date', 'customer_pref_delivery_date']
)

# Display the DataFrame
print(delivery)

shape: (4, 2)
┌────────────┬─────────────────────────────┐
│ order_date ┆ customer_pref_delivery_date │
│ ---        ┆ ---                         │
│ str        ┆ str                         │
╞════════════╪═════════════════════════════╡
│ 2023-01-01 ┆ 2023-01-01                  │
│ 2023-01-02 ┆ 2023-01-05                  │
│ 2023-01-03 ┆ 2023-01-03                  │
│ 2023-01-04 ┆ 2023-01-06                  │
└────────────┴─────────────────────────────┘


#### Solution

In [8]:
def food_delivery(delivery: pl.DataFrame) -> pl.DataFrame:
    
    # Calculate the number of immediate deliveries
    immediate_count = delivery.filter(pl.col('order_date') == pl.col('customer_pref_delivery_date')).height
    
    # Calculate the total number of rows
    total_rows = delivery.height
    
    # Calculate the percentage of immediate deliveries
    immediate_percentage = round(immediate_count / total_rows * 100, 2)
    
    # Create a new DataFrame with the result
    result = pl.DataFrame({'immediate_percentage': [immediate_percentage]})
    
    return result

# Display the result
print(food_delivery(delivery=delivery))

shape: (1, 1)
┌──────────────────────┐
│ immediate_percentage │
│ ---                  │
│ f64                  │
╞══════════════════════╡
│ 50.0                 │
└──────────────────────┘


### The Number of Rich Customers

#### Question

DataFrame: Store

| Column Name | Type |
|:-----------:|:----:|
| bill_id     | int  |
| customer_id | int  |
| amount      | int  |

bill_id is the primary key (column with unique values) for this table.<br>
Each row contains information about the amount of one bill and the customer associated with it.

Write a solution to report the number of customers who had at least one bill with an amount strictly greater than 500.

The result format is in the following example.

Example:

Input:<br>
Store dataframe:

| bill_id | customer_id | amount |
|:-------:|:-----------:|:------:|
| 6       | 1           | 549    |
| 8       | 1           | 834    |
| 4       | 2           | 394    |
| 11      | 3           | 657    |
| 13      | 3           | 257    |

Output: 

| rich_count |
|:----------:|
| 2          |

Explanation: 
Customer 1 has two bills with amounts strictly greater than 500.<br>
Customer 2 does not have any bills with an amount strictly greater than 500.<br>
Customer 3 has one bill with an amount strictly greater than 500.<br>

#### Testcase

In [9]:
# Test data
data = [[1, 600], [2, 300], [3, 450], [4, 700], [5, 800], [6, 200], [7, 550]]

# Create the DataFrame
store = pl.DataFrame(
    data,
    schema=['customer_id', 'amount']
)

# Display the DataFrame
print(store)

shape: (7, 2)
┌─────────────┬────────┐
│ customer_id ┆ amount │
│ ---         ┆ ---    │
│ i64         ┆ i64    │
╞═════════════╪════════╡
│ 1           ┆ 600    │
│ 2           ┆ 300    │
│ 3           ┆ 450    │
│ 4           ┆ 700    │
│ 5           ┆ 800    │
│ 6           ┆ 200    │
│ 7           ┆ 550    │
└─────────────┴────────┘


#### Solution



In [10]:
def count_rich_customers(store: pl.DataFrame) -> pl.DataFrame:
    
    # Filter the rows where the 'amount' is greater than 500 and get the unique 'customer_id' count
    rich_count = store.filter(pl.col('amount') > 500)['customer_id'].n_unique()
    
    # Create a new DataFrame with the result
    result = pl.DataFrame({'rich_count': [rich_count]})
    
    return result

# Display the result
print(count_rich_customers(store=store))

shape: (1, 1)
┌────────────┐
│ rich_count │
│ ---        │
│ i64        │
╞════════════╡
│ 4          │
└────────────┘
