This Python function, `case_when`, is inspired by the `case_when` function in R's `dplyr` package. It allows for the application of multiple conditional statements to a pandas DataFrame, assigning values based on these conditions, with the ability to specify a default value if none of the conditions are met. This is particularly useful for creating new variables or transforming existing ones based on complex logic.

### Function Overview:

- **Parameters:**
  - `dataframe` (pd.DataFrame): The DataFrame to which the conditions will be applied.
  - `conditions` (list of tuples): Each tuple contains a boolean condition and the corresponding value to be assigned if the condition is true.
  - `default` (optional): The value to be assigned if none of the specified conditions are met. If not provided, the default is `None`.

- **Process:**
  1. Initializes a pandas Series (`result`) filled with the default value for each row in the DataFrame.
  2. Iterates through the `conditions` list, applying each condition to the DataFrame. Where a condition is met, the corresponding value is assigned in the `result` Series.
  3. Checks for missing data in the DataFrame. If a row contains any missing values (`np.nan`), the function assigns `np.nan` to that row in the `result` Series, overriding the default or conditionally assigned value.

- **Returns:** A pandas Series with the assigned values based on the specified conditions and the handling of missing data.

### Example Usage Explained:

In the provided example:

- A sample DataFrame `df` is created with two columns (`'A'` and `'B'`) and some missing values.
- Conditions are defined such that:
  - "Condition 1 met" is assigned to rows where column `'A'` is greater than 1 and column `'B'` is 'y'.
  - "Condition 2 met" is assigned to rows where column `'A'` is less than or equal to 1.
- A default condition "Default condition met" is specified for rows that do not meet either of the first two conditions.
- The `case_when` function is applied to the DataFrame, and the results are stored in a new column `'Result'`.
- The presence of missing data in any row results in `np.nan` being assigned to that row in the `'Result'` column, due to the function's design to handle missing values by overriding the assigned value with `np.nan`.

This function is particularly useful for data manipulation and feature engineering tasks, allowing for the concise expression of complex logic that would otherwise require multiple steps using standard pandas operations.

In [1]:
import pandas as pd
import numpy as np

def case_when(dataframe, conditions, default=None):
    """
    Apply multiple conditions to a pandas DataFrame, similar to dplyr's case_when in R, with a default value and handling missing data.

    Args:
    dataframe (pd.DataFrame): The DataFrame on which to apply the conditions.
    conditions (list of tuples): A list of tuples, where each tuple contains a condition and the corresponding value to assign.
    default (optional): The default value to assign when none of the conditions are met.

    Returns:
    pd.Series: A pandas Series with the values assigned based on the conditions.
    """
    # Initialize a series with the default value
    result = pd.Series([default] * len(dataframe), index=dataframe.index)

    # Apply each condition
    for condition, value in conditions:
        result[condition] = value

    # Replace the default value with np.nan where any of the values in the row are missing
    missing_condition = dataframe.isnull().any(axis=1)
    result[missing_condition] = np.nan

    return result

# Example usage

import pandas as pd
import numpy as np

# Example DataFrame with missing data
df = pd.DataFrame({'A': [1, 2, None, 3], 'B': ['x', 'y', None, 'z']})

# Define conditions
conditions = [
    ((df['A'] > 1) & (df['B'] == 'y'), "Condition 1 met"),
    (df['A'] <= 1, "Condition 2 met")
]

# Apply the case_when function with a default value
df['Result'] = case_when(df, conditions, default="Default condition met")

# Display the DataFrame
df

Unnamed: 0,A,B,Result
0,1.0,x,Condition 2 met
1,2.0,y,Condition 1 met
2,,,
3,3.0,z,Default condition met
