### Challenge 1: Handling Missing Values with SimpleImputer

**Topic:** Handling Missing Values (SimpleImputer)  

**Problem Description:**  
You are given a dataset represented as a Pandas DataFrame with missing values (`NaN`). Your task is to write a function that imputes these missing values using different strategies supported by `SimpleImputer` from `scikit-learn`. The function should return the dataset with imputed values.

**Function Signature:**  
```python
def impute_missing_values(data: pd.DataFrame, strategy: str) -> pd.DataFrame:
    """
    Impute missing values in the dataset using the specified strategy.

    Args:
    data (pd.DataFrame): Input dataset with missing values.
    strategy (str): Strategy to impute missing values. Must be one of ['mean', 'median', 'most_frequent'].

    Returns:
    pd.DataFrame: Dataset with missing values imputed.
    """
```

**Constraints:**
1. You can assume the input `data` is always a valid Pandas DataFrame.
2. The `strategy` parameter must be one of the following:
   - `'mean'`: Replace missing values with the mean of the column.
   - `'median'`: Replace missing values with the median of the column.
   - `'most_frequent'`: Replace missing values with the mode (most frequent value) of the column.
3. If `strategy` is not valid, the function should raise a `ValueError`.
4. The input DataFrame may contain both numeric and non-numeric columns.

**Example Input:**  
```python
import pandas as pd
import numpy as np

data = pd.DataFrame({
    'A': [1, 2, np.nan, 4],
    'B': [np.nan, 2, 2, 4],
    'C': ['cat', 'dog', np.nan, 'dog']
})

strategy = 'most_frequent'
```

**Example Output:**  
For the above input, the function should return:  
```python
   A    B    C
0  1.0  2.0  cat
1  2.0  2.0  dog
2  2.0  2.0  dog
3  4.0  4.0  dog
```

**Hints:**  
- Use `SimpleImputer` from `sklearn.impute`.
- Handle both numeric and non-numeric columns appropriately.
- You may find the `ValueError` useful for handling invalid strategies.

# Solution 1:

In [1]:
import pandas as pd
from sklearn.impute import SimpleImputer

def impute_missing_values(data: pd.DataFrame, strategy: str) -> pd.DataFrame:
    """
    Impute missing values in the dataset using the specified strategy.

    Args:
    data (pd.DataFrame): Input dataset with missing values.
    strategy (str): Strategy to impute missing values. Must be one of ['mean', 'median', 'most_frequent'].

    Returns:
    pd.DataFrame: Dataset with missing values imputed.
    """
    if strategy not in ['mean', 'median', 'most_frequent']:
        raise ValueError("Strategy must be one of ['mean', 'median', 'most_frequent']")
    
    # Create a SimpleImputer with the given strategy
    imputer = SimpleImputer(strategy=strategy)
    
    # Apply imputation to the dataset
    imputed_data = imputer.fit_transform(data)
    
    # Return the result as a DataFrame with the same column names
    return pd.DataFrame(imputed_data, columns=data.columns)

In [2]:
import numpy as np

In [3]:
# Example Input
data = pd.DataFrame({
    'A': [1, 2, np.nan, 4],
    'B': [np.nan, 2, 2, 4],
    'C': ['cat', 'dog', np.nan, 'dog']
})

In [4]:
data

Unnamed: 0,A,B,C
0,1.0,,cat
1,2.0,2.0,dog
2,,2.0,
3,4.0,4.0,dog


In [5]:
# Applying the solution
strategy = 'most_frequent'
result = impute_missing_values(data, strategy)
result

Unnamed: 0,A,B,C
0,1.0,2.0,cat
1,2.0,2.0,dog
2,1.0,2.0,dog
3,4.0,4.0,dog


**Note that the function above would crash if the strategy is 'mean' while having non numerical columns in the data frame.**

# Solution 2:

In [6]:
# Since SimpleImputer does not work with non-numeric data for 'mean' or 'median',
# we preprocess it to handle non-numeric data separately.
def impute_with_mixed_data(data: pd.DataFrame, strategy: str) -> pd.DataFrame:
    """
    Handle imputation for mixed numeric and non-numeric datasets.
    """
    numeric_cols = data.select_dtypes(include=['number']).columns
    non_numeric_cols = data.select_dtypes(exclude=['number']).columns

    imputed_data = data.copy()
    
    # Impute numeric columns
    if not numeric_cols.empty:
        imputer = SimpleImputer(strategy=strategy)
        imputed_data[numeric_cols] = imputer.fit_transform(data[numeric_cols])

    # Impute non-numeric columns (only works for most_frequent)
    if not non_numeric_cols.empty:
        imputer = SimpleImputer(strategy='most_frequent')
        imputed_data[non_numeric_cols] = imputer.fit_transform(data[non_numeric_cols])

    return imputed_data

In [7]:
# Applying the solution
strategy = 'most_frequent'
result = impute_with_mixed_data(data, strategy)
result

Unnamed: 0,A,B,C
0,1.0,2.0,cat
1,2.0,2.0,dog
2,1.0,2.0,dog
3,4.0,4.0,dog


In [8]:
# Applying the solution
strategy = 'mean'
result = impute_with_mixed_data(data, strategy)
result

Unnamed: 0,A,B,C
0,1.0,2.666667,cat
1,2.0,2.0,dog
2,2.333333,2.0,dog
3,4.0,4.0,dog
