# ColumnTransformer Example with Custom Dataset
This example demonstrates how to use `ColumnTransformer` to apply different preprocessing steps to different columns of a dataset, specifically for scaling numerical features and encoding categorical features.

In [1]:
import pandas as pd

# Load data
df = pd.DataFrame({
    'Age': [56, 46, 32, 60, 25],
    'Gender': ['Male', 'Male', 'Male', 'Male', 'Male'],
    'EducationLevel': ['High School', 'Bachelors', 'Masters', 'PhD', 'Bachelors'],
    'City': ['Los Angeles', 'Houston', 'New York', 'Los Angeles', 'Chicago'],
    'Income': [102762, 100020, 77310, 38405, 58522],
    'HighIncome': [1, 1, 0, 0, 0]
})

# Display the data
df.head()

   Age  Gender EducationLevel           City  Income  HighIncome
0   56   Male     High School    Los Angeles  102762          1
1   46   Male     Bachelors      Houston     100020          1
2   32   Male     Masters        New York     77310          0
3   60   Male         PhD     Los Angeles   38405          0
4   25   Male     Bachelors      Chicago     58522          0


### Apply ColumnTransformer
Now let's apply the `ColumnTransformer` to scale numerical features and one-hot encode categorical features.

In [2]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder

# Define the ColumnTransformer
preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), ['Age', 'Income']),  # Scale 'Age' and 'Income'
        ('cat', OneHotEncoder(), ['Gender', 'EducationLevel', 'City'])  # One-hot encode categorical features
    ])

# Apply the transformations
transformed_data = preprocessor.fit_transform(df)

# Convert the transformed data back to a DataFrame
transformed_df = pd.DataFrame(transformed_data,
                              columns=['Age', 'Income', 'Gender_Male', 'Gender_Female',
                                       'EducationLevel_Bachelors', 'EducationLevel_High School',
                                       'EducationLevel_Masters', 'EducationLevel_PhD',
                                       'City_Chicago', 'City_Houston', 'City_Los Angeles', 'City_New York'])

# Display the transformed data
transformed_df.head()

   Age    Income  Gender_Male  Gender_Female  EducationLevel_Bachelors  EducationLevel_High School  EducationLevel_Masters  EducationLevel_PhD  City_Chicago  City_Houston  City_Los Angeles  City_New York
0  -0.467619   1.334114     1.0            0.0                    0.0                        1.0                    0.0                    0.0          0.0           0.0                1.0            0.0
1  -0.950418   1.290784     1.0            0.0                    1.0                        0.0                    0.0                    0.0          0.0           1.0                0.0            0.0
2  -1.549228  -0.095445     1.0            0.0                    0.0                        0.0                    1.0                    0.0          0.0           0.0                0.0            1.0
3   0.598790  -1.482929     1.0            0.0                    0.0                        0.0                    0.0                    1.0          0.0           0.0               

### Explanation of Code
1. **Data Loading**: We load the dataset into a pandas DataFrame.
2. **ColumnTransformer**: 
   - The `ColumnTransformer` applies the following transformations:
     - **Scaling**: The `Age` and `Income` columns are scaled using **StandardScaler**.
     - **Encoding**: The categorical columns `Gender`, `EducationLevel`, and `City` are one-hot encoded using **OneHotEncoder**.
3. **Transformed Output**: The transformed data is displayed as a DataFrame with the scaled and one-hot encoded columns.

### Benefits of Using ColumnTransformer
1. **Efficient Preprocessing**: Automatically applies different preprocessing steps to the respective columns without needing manual intervention for each.
2. **Code Readability**: Cleaner code as you don't need to apply transformations one by one for each feature.
3. **Consistency**: Ensures the same transformation is applied each time the model is trained or tested, making it easier to reproduce results.