## Step 1: Solve the Problem Without ColumnTransformer
### 1.1 Import Necessary Libraries 📚

In [1]:
import pandas as pd  # 🐼 Importing pandas for data manipulation
from sklearn.preprocessing import OneHotEncoder, StandardScaler  # 🔄 For encoding and scaling
from sklearn.compose import ColumnTransformer  # 🧩 For applying column transformations

### 1.2 Create a Toy Dataset 🎲

In [2]:
# 🛠️ Creating a simple toy dataset with categorical and numerical columns
data = {
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Temperature': [85, 90, 78, 92, 105],
    'Humidity': [70, 65, 80, 60, 50]
}
df = pd.DataFrame(data)  # 📋 Converting the dictionary into a DataFrame for easier manipulation
df

Unnamed: 0,City,Temperature,Humidity
0,New York,85,70
1,Los Angeles,90,65
2,Chicago,78,80
3,Houston,92,60
4,Phoenix,105,50


### 1.3 Separate Numerical and Categorical Columns 🔍

In [3]:
# ✂️ Splitting the data into numerical and categorical subsets
numerical_features = df[['Temperature', 'Humidity']]  # 🌡️ Focusing on numerical columns (Temperature and Humidity)
categorical_features = df[['City']]  # 🏙️ Focusing on the categorical column (City)

### 1.4 Apply Transformations Individually ⚙️

In [5]:
# 🌈 One-hot encoding the categorical column (City) to convert it into numerical format
encoder = OneHotEncoder(sparse_output=False)
encoded_cities = encoder.fit_transform(categorical_features)

# 📏 Scaling the numerical features for normalization
scaler = StandardScaler()
scaled_numerical = scaler.fit_transform(numerical_features)

### 1.5 Combine Transformed Features 🔗

In [6]:
# 🔗 Combining the transformed categorical and numerical features into one array
import numpy as np

transformed_data = np.hstack([encoded_cities, scaled_numerical])
transformed_data

array([[ 0.        ,  0.        ,  0.        ,  1.        ,  0.        ,
        -0.5604198 ,  0.5       ],
       [ 0.        ,  0.        ,  1.        ,  0.        ,  0.        ,
         0.        ,  0.        ],
       [ 1.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        -1.34500752,  1.5       ],
       [ 0.        ,  1.        ,  0.        ,  0.        ,  0.        ,
         0.22416792, -0.5       ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  1.        ,
         1.6812594 , -1.5       ]])

## Step 2: Solve the Problem with ColumnTransformer
### 2.1 Apply ColumnTransformer 🧩

In [7]:
from sklearn.compose import ColumnTransformer  # 🧩 For applying multiple transformations in a streamlined way

# 🏗️ Constructing the ColumnTransformer to handle both categorical and numerical columns in one go
column_transformer = ColumnTransformer(
    transformers=[
        ('encoder', OneHotEncoder(), ['City']),  # 🌈 Encoding the 'City' column
        ('scaler', StandardScaler(), ['Temperature', 'Humidity'])  # 📏 Scaling the numerical columns
    ]
)

### 2.2 Fit and Transform the Dataset Using ColumnTransformer 🚀

In [8]:
# 🚀 Applying the transformations to the entire DataFrame at once
transformed_data_with_ct = column_transformer.fit_transform(df)
transformed_data_with_ct

array([[ 0.        ,  0.        ,  0.        ,  1.        ,  0.        ,
        -0.5604198 ,  0.5       ],
       [ 0.        ,  0.        ,  1.        ,  0.        ,  0.        ,
         0.        ,  0.        ],
       [ 1.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        -1.34500752,  1.5       ],
       [ 0.        ,  1.        ,  0.        ,  0.        ,  0.        ,
         0.22416792, -0.5       ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  1.        ,
         1.6812594 , -1.5       ]])

## Final Remarks 🌟
- **Without ColumnTransformer:** 
    - We manually encoded the categorical data and scaled the numerical data. Afterward, we combined them into a single dataset. This approach works, but it requires more steps and manual handling.
- **With ColumnTransformer:** 
    - The ColumnTransformer allows you to streamline this process, handling both categorical and numerical data in one step. It's more efficient and less error-prone, especially with larger datasets and more complex transformations.