In [1]:
import polars as pl
print(pl.__version__)

0.20.31


### Invalid Tweets

#### Question

DataFrame: Tweets

| Column Name    | Type    |
|:--------------:|:-------:|
| tweet_id       | int     |
| content        | varchar |

tweet_id is the primary key (column with unique values) for this table.<br>
This table contains all the tweets in a social media app.
 

Write a solution to find the IDs of the invalid tweets. The tweet is invalid if the number of characters used in the content of the tweet is strictly greater than 15.

Return the result table in any order.

The result format is in the following example.


Input:<br>
Tweets dataframe:

| tweet_id | content                          |
|:--------:|:--------------------------------:|
| 1        | Vote for Biden                   |
| 2        | Let us make America great again! |

Output: 

| tweet_id |
|:--------:|
| 2        |

Explanation:<br>
Tweet 1 has length = 14. It is a valid tweet.<br>
Tweet 2 has length = 32. It is an invalid tweet.

#### Testcase

In [2]:
# Test data
data = [[1, 'Vote for Biden'], [2, 'Let us make America great again!']]

# Create the DataFrame
tweets = pl.DataFrame(
    data,
    schema=['tweet_id', 'content']
)

# Display the DataFrame
print(tweets)

shape: (2, 2)
┌──────────┬─────────────────────────────────┐
│ tweet_id ┆ content                         │
│ ---      ┆ ---                             │
│ i64      ┆ str                             │
╞══════════╪═════════════════════════════════╡
│ 1        ┆ Vote for Biden                  │
│ 2        ┆ Let us make America great agai… │
└──────────┴─────────────────────────────────┘


#### Solution

In [3]:
def invalid_tweets(tweets: pl.DataFrame, n_chars: int = 15) -> pl.DataFrame:
    df = tweets.filter(pl.col('content').str.len_chars() > n_chars)
    return df.select('tweet_id')

# Display the result
print(invalid_tweets(tweets=tweets))

shape: (1, 1)
┌──────────┐
│ tweet_id │
│ ---      │
│ i64      │
╞══════════╡
│ 2        │
└──────────┘


### Calculate Special Bonus

#### Question

DataFrame: Employees

| Column Name | Type    |
|:-----------:|:-------:|
| employee_id | int     |
| name        | varchar |
| salary      | int     |

employee_id is the primary key (column with unique values) for this table.<br>
Each row of this table indicates the employee ID, employee name, and salary.
 

Write a solution to calculate the bonus of each employee. The bonus of an employee is 100% of their salary if the ID of the employee is an odd number and the employee's name does not start with the character 'M'. The bonus of an employee is 0 otherwise.

Return the result table ordered by employee_id.

Input:<br>
Employees dataframe:

| employee_id | name    | salary |
|:-----------:|:-------:|:------:|
| 2           | Meir    | 3000   |
| 3           | Michael | 3800   |
| 7           | Addilyn | 7400   |
| 8           | Juan    | 6100   |
| 9           | Kannon  | 7700   |

Output: 

| employee_id | bonus |
|:------------|:-----:|
| 2           | 0     |
| 3           | 0     |
| 7           | 7400  |
| 8           | 0     |
| 9           | 7700  |

Explanation:<br>
The employees with IDs 2 and 8 get 0 bonus because they have an even employee_id.<br>
The employee with ID 3 gets 0 bonus because their name starts with 'M'.<br>
The rest of the employees get a 100% bonus.

#### Testcase

In [4]:
# Test data
data = [[2, 'Meir', 3000], [3, 'Michael', 3800], [7, 'Addilyn', 7400], [8, 'Juan', 6100], [9, 'Kannon', 7700]]

# Create the DataFrame
employees = pl.DataFrame(
    data,
    schema=['employee_id', 'name', 'salary']
)

# Display the DataFrame
print(employees)

shape: (5, 3)
┌─────────────┬─────────┬────────┐
│ employee_id ┆ name    ┆ salary │
│ ---         ┆ ---     ┆ ---    │
│ i64         ┆ str     ┆ i64    │
╞═════════════╪═════════╪════════╡
│ 2           ┆ Meir    ┆ 3000   │
│ 3           ┆ Michael ┆ 3800   │
│ 7           ┆ Addilyn ┆ 7400   │
│ 8           ┆ Juan    ┆ 6100   │
│ 9           ┆ Kannon  ┆ 7700   │
└─────────────┴─────────┴────────┘


#### Solution

In [5]:
def calculate_special_bonus(employees: pl.DataFrame) -> pl.DataFrame:
    df = (
    employees
    .with_columns(
        pl.when(
            (pl.col('employee_id') % 2 != 0) & ~(pl.col('name').str.starts_with('M'))
        )
        .then(pl.col('salary'))
        .otherwise(0)
        .alias('bonus')
    )
    .sort('employee_id')
    )

    return df.select(['employee_id', 'bonus'])

# Display the result
print(calculate_special_bonus(employees=employees))

shape: (5, 2)
┌─────────────┬───────┐
│ employee_id ┆ bonus │
│ ---         ┆ ---   │
│ i64         ┆ i64   │
╞═════════════╪═══════╡
│ 2           ┆ 0     │
│ 3           ┆ 0     │
│ 7           ┆ 7400  │
│ 8           ┆ 0     │
│ 9           ┆ 7700  │
└─────────────┴───────┘


### Fix Names in a Table

#### Question

DataFrame: Users

| Column Name    | Type    |
|:--------------:|:-------:|
| user_id        | int     |
| name           | varchar |

user_id is the primary key (column with unique values) for this table. <br>
This table contains the ID and the name of the user. The name consists of only lowercase and uppercase characters.
 

Write a solution to fix the names so that only the first character is uppercase and the rest are lowercase.

Return the result table ordered by user_id.

Input:<br>
Users dataframe:

| user_id | name  |
|:-------:|:-----:|
| 1       | aLice |
| 2       | bOB   |

Output: 

| user_id | name  |
|:------:|:------:|
| 1       | Alice |
| 2       | Bob   |

#### Testcase

In [6]:
# Test data
data = [[1, 'aLice'], [2, 'bOB']]

# Create the DataFrame
users = pl.DataFrame(
    data,
    schema=['user_id', 'name']
)

# Display the DataFrame
print(users)

shape: (2, 2)
┌─────────┬───────┐
│ user_id ┆ name  │
│ ---     ┆ ---   │
│ i64     ┆ str   │
╞═════════╪═══════╡
│ 1       ┆ aLice │
│ 2       ┆ bOB   │
└─────────┴───────┘


#### Solution

In [7]:
def fix_names(users: pl.DataFrame) -> pl.DataFrame:
    df = users.with_columns(pl.col('name').str.to_titlecase()).sort('user_id')
    return df

# Display the result
print(fix_names(users=users))

shape: (2, 2)
┌─────────┬───────┐
│ user_id ┆ name  │
│ ---     ┆ ---   │
│ i64     ┆ str   │
╞═════════╪═══════╡
│ 1       ┆ Alice │
│ 2       ┆ Bob   │
└─────────┴───────┘


### Find Users With Valid E-Mails

#### Question

DataFrame: Users

| Column Name   | Type    |
|:-------------:|:-------:|
| user_id       | int     |
| name          | varchar |
| mail          | varchar |

user_id is the primary key (column with unique values) for this table.<br>
This table contains information of the users signed up in a website. Some e-mails are invalid.
 

Write a solution to find the users who have valid emails.

A valid e-mail has a prefix name and a domain where:

- The prefix name is a string that may contain letters (upper or lower case), digits, underscore '_', period '.', and/or dash '-'. The prefix name must start with a letter.
- The domain is '@leetcode.com'.

Return the result table in any order.

Input:<br>
Users dataframe:

| user_id | name      | mail                    |
|:-------:|:---------:|:-----------------------:|
| 1       | Winston   | winston@leetcode.com    |
| 2       | Jonathan  | jonathanisgreat         |
| 3       | Annabelle | bella-@leetcode.com     |
| 4       | Sally     | sally.come@leetcode.com |
| 5       | Marwan    | quarz#2020@leetcode.com |
| 6       | David     | david69@gmail.com       |
| 7       | Shapiro   | .shapo@leetcode.com     |

Output: 

| user_id | name      | mail                    |
|:-------:|:---------:|:-----------------------:|
| 1       | Winston   | winston@leetcode.com    |
| 3       | Annabelle | bella-@leetcode.com     |
| 4       | Sally     | sally.come@leetcode.com |

Explanation:<br>
The mail of user 2 does not have a domain.<br>
The mail of user 5 has the # sign which is not allowed.<br>
The mail of user 6 does not have the leetcode domain.<br>
The mail of user 7 starts with a period.

#### Testcase

In [11]:
# Test data
data = [[1, 'Winston', 'winston@leetcode.com'], 
        [2, 'Jonathan', 'jonathanisgreat'], 
        [3, 'Annabelle', 'bella-@leetcode.com'], 
        [4, 'Sally', 'sally.come@leetcode.com'], 
        [5, 'Marwan', 'quarz#2020@leetcode.com'], 
        [6, 'David', 'david69@gmail.com'], 
        [7, 'Shapiro', '.shapo@leetcode.com']]

# Create the DataFrame
users = pl.DataFrame(
    data,
    schema=['user_id', 'name', 'mail']
)

# Display the DataFrame
print(users)

shape: (7, 3)
┌─────────┬───────────┬─────────────────────────┐
│ user_id ┆ name      ┆ mail                    │
│ ---     ┆ ---       ┆ ---                     │
│ i64     ┆ str       ┆ str                     │
╞═════════╪═══════════╪═════════════════════════╡
│ 1       ┆ Winston   ┆ winston@leetcode.com    │
│ 2       ┆ Jonathan  ┆ jonathanisgreat         │
│ 3       ┆ Annabelle ┆ bella-@leetcode.com     │
│ 4       ┆ Sally     ┆ sally.come@leetcode.com │
│ 5       ┆ Marwan    ┆ quarz#2020@leetcode.com │
│ 6       ┆ David     ┆ david69@gmail.com       │
│ 7       ┆ Shapiro   ┆ .shapo@leetcode.com     │
└─────────┴───────────┴─────────────────────────┘


#### Solution

In [12]:
def valid_emails(users: pl.DataFrame) -> pl.DataFrame:
  regex = r"^[a-zA-Z][a-zA-Z0-9_.-]*@leetcode\.com$"
  df = users.filter(pl.col('mail').str.contains(regex))
  return df

# Display the result
print(valid_emails(users=users))

shape: (3, 3)
┌─────────┬───────────┬─────────────────────────┐
│ user_id ┆ name      ┆ mail                    │
│ ---     ┆ ---       ┆ ---                     │
│ i64     ┆ str       ┆ str                     │
╞═════════╪═══════════╪═════════════════════════╡
│ 1       ┆ Winston   ┆ winston@leetcode.com    │
│ 3       ┆ Annabelle ┆ bella-@leetcode.com     │
│ 4       ┆ Sally     ┆ sally.come@leetcode.com │
└─────────┴───────────┴─────────────────────────┘


### Patients With a Condition

#### Question

DataFrame: Patients

| Column Name  | Type    |
|:------------:|:-------:|
| patient_id   | int     |
| patient_name | varchar |
| conditions   | varchar |

patient_id is the primary key (column with unique values) for this table.<br>
'conditions' contains 0 or more code separated by spaces.<br>
This table contains information of the patients in the hospital.
 

Write a solution to find the patient_id, patient_name, and conditions of the patients who have Type I Diabetes. Type I Diabetes always starts with DIAB1 prefix. <br>

Return the result table in any order.

Input: 
Patients dataframe:

| patient_id | patient_name | conditions   |
|:----------:|:------------:|:------------:|
| 1          | Daniel       | YFEV COUGH   |
| 2          | Alice        |              |
| 3          | Bob          | DIAB100 MYOP |
| 4          | George       | ACNE DIAB100 |
| 5          | Alain        | DIAB201      |

Output: 

| patient_id | patient_name | conditions   |
|:----------:|:------------:|:------------:|
| 3          | Bob          | DIAB100 MYOP |
| 4          | George       | ACNE DIAB100 | 

Explanation: Bob and George both have a condition that starts with DIAB1.

#### Testcase

In [13]:
# Test data
data = [[1, 'Daniel', 'YFEV COUGH'], [2, 'Alice', ''], [3, 'Bob', 'DIAB100 MYOP'], [4, 'George', 'ACNE DIAB100'], [5, 'Alain', 'DIAB201']]

# Create the DataFrame
patients = pl.DataFrame(
    data,
    schema=['patient_id', 'patient_name', 'conditions']
)

# Display the DataFrame
print(patients)

shape: (5, 3)
┌────────────┬──────────────┬──────────────┐
│ patient_id ┆ patient_name ┆ conditions   │
│ ---        ┆ ---          ┆ ---          │
│ i64        ┆ str          ┆ str          │
╞════════════╪══════════════╪══════════════╡
│ 1          ┆ Daniel       ┆ YFEV COUGH   │
│ 2          ┆ Alice        ┆              │
│ 3          ┆ Bob          ┆ DIAB100 MYOP │
│ 4          ┆ George       ┆ ACNE DIAB100 │
│ 5          ┆ Alain        ┆ DIAB201      │
└────────────┴──────────────┴──────────────┘


#### Solution

In [14]:
def find_patients(patients: pl.DataFrame) -> pl.DataFrame:
  regex = r"\bDIAB1"
  df = patients.filter(pl.col('conditions').str.contains(regex))
  return df

# Display the result
print(find_patients(patients=patients))

shape: (2, 3)
┌────────────┬──────────────┬──────────────┐
│ patient_id ┆ patient_name ┆ conditions   │
│ ---        ┆ ---          ┆ ---          │
│ i64        ┆ str          ┆ str          │
╞════════════╪══════════════╪══════════════╡
│ 3          ┆ Bob          ┆ DIAB100 MYOP │
│ 4          ┆ George       ┆ ACNE DIAB100 │
└────────────┴──────────────┴──────────────┘
