Link: https://leetcode.com/problems/department-highest-salary/submissions/?envType=study-plan-v2&envId=30-days-of-pandas&lang=pythondata


In [1]:
import pandas as pd

def department_highest_salary(employee: pd.DataFrame, department: pd.DataFrame) -> pd.DataFrame:
    joined_data = pd.merge(employee, department, left_on='departmentId', right_on='id', how='inner')
    
    # Group by department and find employees with the highest salary
    grouped_data = joined_data.groupby('name_y').apply(lambda group: group[group['salary'] == group['salary'].max()])
    
    # Select relevant columns and reset the index
    result_dataframe = grouped_data[['name_y', 'name_x', 'salary']].reset_index(drop=True)
    result_dataframe.columns = ['Department', 'Employee', 'Salary']
    
    return result_dataframe

**Step 1: Joining the Employee and Department tables**
```python
joined_data = pd.merge(employee, department, left_on='departmentId', right_on='id', how='inner')
```
In this step, we are using the `pd.merge()` function to combine the `employee` and `department` DataFrames based on the common column `'departmentId'`. The resulting `joined_data` DataFrame will contain information from both tables merged together.

**Step 2: Group by department and find employees with the highest salary**
```python
grouped_data = joined_data.groupby('name_y').apply(lambda group: group[group['salary'] == group['salary'].max()])
```
In this step, we are using the `groupby()` function to group the `joined_data` DataFrame by the department names `'name_y'`. Then, we use the `apply()` function with a lambda function. The lambda function is applied to each group and filters the group to only include rows where the salary is equal to the maximum salary within that group. This effectively identifies employees with the highest salary in each department.

**Step 3: Select relevant columns and reset the index**
```python
result_dataframe = grouped_data[['name_y', 'name_x', 'salary']].reset_index(drop=True)
result_dataframe.columns = ['Department', 'Employee', 'Salary']
```
In this step, we are selecting the relevant columns `'name_y'`, `'name_x'`, and `'salary'` from the `grouped_data` DataFrame. The `.reset_index(drop=True)` call resets the index of the DataFrame, effectively removing the existing index and replacing it with a default integer index. Finally, we rename the columns to match the expected output using the `.columns` attribute.

So, in summary, the `department_highest_salary` function combines employee and department data, groups the data by department, identifies employees with the highest salary in each department, and then formats the result into a DataFrame with the appropriate columns and structure. The lambda function within the `apply()` step is used to filter the groups to retain only those employees who have the maximum salary within their respective departments.

Link: https://leetcode.com/problems/rank-scores/submissions/?envType=study-plan-v2&envId=30-days-of-pandas&lang=pythondata

In [2]:
import pandas as pd

def order_scores(scores: pd.DataFrame) -> pd.DataFrame:
    scores['rank'] = scores['score'].rank(ascending=False,method='dense')
    scores = scores.sort_values(by='rank')
    return scores[['score','rank']]

The `rank()` function in Pandas provides several methods for assigning ranks to data. You can specify the desired ranking method using the `method` parameter. Here are the available ranking methods:

1. **average (default)**: This method assigns the average rank to tied values. If two values are tied for a particular rank, they will both receive the average of the ranks that they would have received using the 'min' and 'max' methods.

2. **min**: This method assigns the minimum rank to tied values. If two values are tied, they both receive the same rank, which is the smallest rank that would have been assigned using any ranking method.

3. **max**: This method assigns the maximum rank to tied values. If two values are tied, they both receive the same rank, which is the largest rank that would have been assigned using any ranking method.

4. **first**: This method assigns ranks in the order they appear in the data, without adjusting for ties. The first occurrence of a value receives rank 1, the second occurrence receives rank 2, and so on.

5. **dense**: This method assigns ranks similar to the 'min' method, but the ranks are not skipped if there are tied values. For example, if two values are tied for rank 2, the next value will receive rank 3 instead of 4.

Link: https://leetcode.com/problems/rearrange-products-table/submissions/?envType=study-plan-v2&envId=30-days-of-pandas&lang=pythondata

In [3]:
import pandas as pd

def rearrange_products_table(products: pd.DataFrame) -> pd.DataFrame:
    melt_products = pd.melt(products, id_vars=['product_id'], value_vars=['store1','store2','store3'], var_name='store',value_name='price')

    melt_products.dropna(subset=['price'], inplace= True)
    return melt_products

The `pd.melt()` function is used to transform a DataFrame from a wide format to a long format. In a wide format DataFrame, each row represents a single observation or entity, and each column represents a different attribute or feature. In a long format DataFrame, the data is "melted" down so that each row represents a unique combination of observations and attributes.

Let's break down the parameters of the `pd.melt()` function:

```python
melted_df = pd.melt(
    frame,       # The DataFrame to be melted
    id_vars,     # Columns to use as identifiers (they will remain as columns in the melted DataFrame)
    value_vars,  # Columns to be "unpivoted" (they will become the values in the melted DataFrame)
    var_name,    # Name to give to the variable column (default is 'variable')
    value_name   # Name to give to the value column (default is 'value')
)
```

Here's a detailed explanation of each parameter:

- `frame`: This is the DataFrame that you want to reshape.

- `id_vars`: These are the columns that you want to keep as identifiers in the resulting melted DataFrame. They will remain as columns, and each unique combination of identifiers and melted columns will form a row in the melted DataFrame.

- `value_vars`: These are the columns that you want to "unpivot" or melt. The values in these columns will be gathered into a single column in the melted DataFrame.

- `var_name`: This parameter allows you to specify the name of the column that will contain the variable names (i.e., the columns that were "unpivoted"). By default, this column is named `'variable'`.

- `value_name`: This parameter allows you to specify the name of the column that will contain the values that were in the columns you unpivoted. By default, this column is named `'value'`.

In the context of your example, you have a DataFrame with prices of products in different stores (`store1`, `store2`, `store3`). You want to reshape this data so that each row represents a product in a specific store along with its price. The `pd.melt()` function helps you achieve this transformation.

By specifying `id_vars=['product_id']` and `value_vars=['store1', 'store2', 'store3']`, you're indicating that you want to keep the `product_id` as an identifier, and you want to "melt" the columns `store1`, `store2`, and `store3` into a single column called `'store'` and the corresponding prices into a single column called `'price'`.

The result is a long format DataFrame where each row contains the product's `product_id`, the store it's associated with, and the price in that store. This format makes it easier to analyze and work with the data, especially when you want to compare prices across different stores.

Suppose you have the following DataFrame with student scores in different subjects:

```python
import pandas as pd

data = {
    'Student': ['Alice', 'Bob', 'Charlie'],
    'Math': [90, 85, 70],
    'Science': [95, 92, 88],
    'History': [80, 75, 60]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
```

This will output:

```
   Student  Math  Science  History
0    Alice    90       95       80
1      Bob    85       92       75
2  Charlie    70       88       60
```

Now, let's use the `pd.melt()` function to reshape this DataFrame:

```python
melted_df = pd.melt(df, id_vars=['Student'], value_vars=['Math', 'Science', 'History'], 
                    var_name='Subject', value_name='Score')
print("\nMelted DataFrame:")
print(melted_df)
```

The output will be:

```
    Student  Subject  Score
0     Alice     Math     90
1       Bob     Math     85
2   Charlie     Math     70
3     Alice  Science     95
4       Bob  Science     92
5   Charlie  Science     88
6     Alice  History     80
7       Bob  History     75
8   Charlie  History     60
```

In this example, the original DataFrame has students' scores in different subjects. By using the `pd.melt()` function, we've reshaped the data so that each row now represents a student's score in a specific subject. The columns `'Student'`, `'Subject'`, and `'Score'` correspond to the identifier, the variable (unpivoted columns), and the value, respectively.

This transformed long format DataFrame can be useful for various analyses, such as comparing scores across subjects or performing aggregation tasks on specific subjects.