# Pandas LeetCode Study Plan Solutions

This notebook contains solutions to the Pandas study plan problems from LeetCode.  
It serves as a portfolio of the answers I submitted on LeetCode.
[LeetCode Pandas Study Plan](https://leetcode.com/studyplan/30-days-of-pandas/)

---


# Question 1 : 595. Big Countries
Write a solution to find the name, population, and area of the big countries. Return the result table in any order.

#### Combine conditions
- &   -> AND
- |   -> OR

#### Filter rows
df.loc[condition, columns]

In [None]:
import pandas as pd

def big_countries(world: pd.DataFrame) -> pd.DataFrame:
    return world.loc[(world["area"]>=3000000) | (world["population"]>=25000000),["name","population","area"]]


Input:

| name        | continent | area    | population | gdp          |
| ----------- | --------- | ------- | ---------- | ------------ |
| Afghanistan | Asia      | 652230  | 25500100   | 20343000000  |
| Albania     | Europe    | 28748   | 2831741    | 12960000000  |
| Algeria     | Africa    | 2381741 | 37100000   | 188681000000 |
| Andorra     | Europe    | 468     | 78115      | 3712000000   |
| Angola      | Africa    | 1246700 | 20609294   | 100990000000 |

---
output:

| name        | population | area    |
| ----------- | ---------- | ------- |
| Afghanistan | 25500100   | 652230  |
| Algeria     | 37100000   | 2381741 |

# Question 2. 1757. Recyclable and Low Fat Products

Write a solution to find the ids of products that are both low fat and recyclable. Return the result table in any order.


In [3]:
import pandas as pd

data = {
    "product_id": [101, 102, 103, 104],
    "low_fats": ["Y", "N", "Y", "N"],
    "recyclable": ["N", "Y", "Y", "N"]
}


products = pd.DataFrame(data)


def find_products(products: pd.DataFrame) -> pd.DataFrame:
    return products.loc[(products["low_fats"]=="Y")&(products["recyclable"]=="Y"),["product_id"]]

find_products(products)

Unnamed: 0,product_id
2,103


# Question 3. 183. Customers Who Never Order
Write a solution to find all customers who never order anything. Return the result table in any order.

----
used left joins because it Keeps all rows from the left table (customers), and matches from right table (orders) if exists

In [None]:
import pandas as pd

# Customers table
customers = pd.DataFrame({
    "id": [1, 2, 3, 4],
    "name": ["Joe", "Henry", "Sam", "Max"]
})

# Orders table
orders = pd.DataFrame({
    "id": [1, 2],
    "customerId": [3, 1]
})

def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
     merged_id = pd.merge(customers, orders, left_on="id", right_on="customerId", how="left")
     return merged_id.loc[merged_id["customerId"].isna(),["name"]].rename(columns={"name":"customers"})
    
find_customers(customers, orders)


Unnamed: 0,customers
1,Henry
3,Max


In [None]:
# since leetcode runtime is so high lets find another method as well
def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
    mask=~customers["id"].isin(orders["customerId"])
    return customers.loc[mask,["name"]].rename(columns={"name":"customers"})
find_customers(customers, orders)

# This improved little bit runtime in leetcode  
# ~ : Is this customer’s ID in the orders list?”

Unnamed: 0,customers
1,Henry
3,Max


# Question 4. 1148. Article Views I
Write a solution to find all the authors that viewed at least one of their own articles. Return the result table sorted by id in ascending order.

In [None]:
def article_views(views: pd.DataFrame)->pd.DataFrame:
    viewed_article=views[(views["author_id"]==views["viewer_id"])]
    viewed_article=viewed_article[["author_id"]].drop_duplicates()
    return viewed_article.rename(columns={"author_id":"id"}).sort_values(by="id")

# Question 5. 1683 Invalid Tweets 
Write a solution to find the IDs of the invalid tweets. The tweet is invalid if the number of characters used in the content of the tweet is strictly greater than 15. Return the result table in any order.

In [None]:
import pandas as pd

def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:
    invalid=tweets[tweets["content"].str.len()>15]
    return invalid[["tweet_id"]]

# Question 6. 1873. Calculate Special Bonus

Write a solution to calculate the bonus of each employee. The bonus of an employee is 100% of their salary if the ID of the employee is an odd number and the employee's name does not start with the character 'M'. The bonus of an employee is 0 otherwise. Return the result table ordered by employee_id.

In [None]:
import pandas as pd

def calculate_special_bonus(employees: pd.DataFrame) -> pd.DataFrame:
    employees["bonus"]=employees[(employees["employee_id"]%2==1)&(~employees["name"].str.startswith("M"))]["salary"]
    employees["bonus"]=employees["bonus"].fillna(0)

    return employees[["employee_id","bonus"]].sort_values(by="employee_id")

# Question 7. 1667. Fix Names in a Table
Write a solution to fix the names so that only the first character is uppercase and the rest are lowercase.Return the result table ordered by `user_id.`

In [None]:
import pandas as pd
def fix_names(users: pd.DataFrame) -> pd.DataFrame:
    users.name=users.name.str.capitalize()
    return users.sort_values(by="user_id")
    

In [None]:
# Another solution: Better in performance wise
import pandas as pd

def fix_names(users: pd.DataFrame) -> pd.DataFrame:
    users=users.copy()
    users["name"] = users["name"].str.capitalize()
    users.sort_values("user_id", inplace=True)
    users.reset_index(drop=True, inplace=True)
    return users


# Question 8. 1517. Find Users With Valid E-Mails
Write a solution to find the users who have valid emails.A valid e-mail has a prefix name and a domain where:
The prefix name is a string that may contain letters (upper or lower case), digits, underscore '_', period '.', and/or dash '-'. The prefix name must start with a letter.
The domain is '@leetcode.com'. Return the result table in any order.

In [None]:
import pandas as pd

def valid_emails(users: pd.DataFrame) -> pd.DataFrame:
    df=users[users["mail"].str.match(r'^[A-Za-z][A-Za-z0-9_.-]*@leetcode\.com$')]
    return df


- `str` → Pandas string methods
- `match()` → checks if the string matches the regex from the beginning
- `r''` → raw string for regex
- Regex validates the email format

# Question 9. 1527. Patients With a Condition
Write a solution to find the patient_id, patient_name, and conditions of the patients who have Type I Diabetes. Type I Diabetes always starts with DIAB1 prefix. Return the result table in any order.

In [None]:
import pandas as pd

def find_patients(patients: pd.DataFrame) -> pd.DataFrame:
    df=patients[patients["conditions"].str.contains(r"^DIAB1| DIAB1")]
    return df

r"^(PREFIX)|(SPACE)(PREFIX)"


# Question 10. 177. Nth Highest Salary
Write a solution to find the nth highest distinct salary from the Employee table. If there are less than n distinct salaries, return null.

In [None]:
import pandas as pd

def nth_highest_salary(employee: pd.DataFrame, N: int) -> pd.DataFrame:
    sorted_salaries=employee["salary"].sort_values(ascending=False).drop_duplicates()
    if N>len(sorted_salaries) or N<=0:
        return pd.DataFrame({f"getNthHighestSalary({N})":[None]})
    nth_highest=sorted_salaries.iloc[N-1]
    return pd.DataFrame({f"getNthHighestSalary({N})":[nth_highest]})

# Question 11 . 176. Second Highest Salary
Write a solution to find the second highest distinct salary from the Employee table. If there is no second highest salary, return null (return None in Pandas).

In [None]:
import pandas as pd

def second_highest_salary(employee: pd.DataFrame) -> pd.DataFrame:
    unique_salaries=employee["salary"].drop_duplicates()

    unique_salaries=unique_salaries.sort_values(ascending=False)

    if len(unique_salaries)<2:
        return pd.DataFrame({"SecondHighestSalary":[None]})

    second=unique_salaries.iloc[1]
    return pd.DataFrame({"SecondHighestSalary":[second]})

### Input → Output Flow

- DataFrame → Series → Series → Series → scalar → DataFrame
- employee  salary   unique   sorted    iloc[1]  result

# Question 12. 184. Department Highest Salary
Write a solution to find employees who have the highest salary in each of the departments.Return the result table in any order.

In [None]:
import pandas as pd

def department_highest_salary(employee: pd.DataFrame, department: pd.DataFrame) -> pd.DataFrame:
    merged=pd.merge(employee,department, left_on="departmentId",right_on="id", suffixes=("_emp","_dep"))
    merged=merged.rename(columns={"name_emp":"Employee","name_dep":"Department","salary":"Salary"})
    max_salary=merged.groupby("Department")["Salary"].max().reset_index()

    result=pd.merge(merged,max_salary,on=["Department","Salary"])
    result=result[["Department","Employee","Salary"]]
    return result

# Question 13.  178. Rank Scores
Write a solution to find the rank of the scores. The ranking should be calculated according to the following rules:
- The scores should be ranked from the highest to the lowest.
- If there is a tie between two scores, both should have the same ranking.
- After a tie, the next ranking number should be the next consecutive integer value. In other words, there should be no holes between ranks.
- Return the result table ordered by score in descending order.

In [None]:
import pandas as pd

def order_scores(scores: pd.DataFrame) -> pd.DataFrame:
    scores["rank"]=scores["score"].rank(method="dense",ascending=False)
    return scores.drop("id",axis=1).sort_values(by="score",ascending=False)

# Question 14. 196. Delete Duplicate Emails
Write a solution to delete all duplicate emails, keeping only one unique email with the smallest id. For Pandas users, please note that you are supposed to modify Person in place.After running your script, the answer shown is the Person table. The driver will first compile and run your piece of code and then show the Person table. The final order of the Person table does not matter.

In [None]:
import pandas as pd
def delete_duplicate_emails(person: pd.DataFrame) -> None:
    person.sort_values("id",inplace=True)
    person.drop_duplicates(subset="email",keep="first",inplace=True)

# Question 15. 1795. Rearrange Products Table
Write a solution to rearrange the Products table so that each row has (product_id, store, price). If a product is not available in a store, do not include a row with that product_id and store combination in the result table.Return the result table in any order.

--- 
## Solution
melt is used to turn wide tables into long tables so they are easier to analyze, visualize, or feed into models

`DataFrame.melt(
    id_vars=None,
    value_vars=None,
    var_name=None,
    value_name="value",
    col_level=None,
    ignore_index=True,
)
`

In [None]:
import pandas as pd

def rearrange_products_table(products: pd.DataFrame) -> pd.DataFrame:
    melted=products.melt(id_vars="product_id", value_vars=["store1","store2","store3"],var_name="store",value_name="price")
    melted=melted.dropna(subset=["price"])
    return melted