In [1]:
# Feature Scaling & Encoding

# Objective: Learn to scale numerical features and encode categorical features for better model performance.
# Instructions:
# For each example, perform the following steps:
#     1. Load the Dataset: Load the dataset into your environment.
#     2. Feature Scaling: Apply scaling methods (StandardScaler or MinMaxScaler) to specified numerical columns.
#     3. Feature Encoding: Apply encoding methods (One-Hot Encoding or Label Encoding) to specified categorical columns.
#     4. Verify Changes: Check the data to ensure proper scaling and encoding. 


# Task:
#     Dataset: customer_data.csv (get it by your own it includes the columns of Age , Annual_Income)
#     Columns to scale: Age , Annual_Income
#     Column to encode: Region
#     Steps:
#         1. Load customer_data.csv .
#         2. Use MinMaxScaler on Age and Annual_Income .
#         3. Perform One-Hot Encoding on Region .
#         4. Verify by assessing the transformed dataset.



    
    
    

In [2]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Simulate loading customer_data.csv
data = pd.DataFrame({
    'Age': [25, 45, 35, 50, 23, 40, 60],
    'Annual_Income': [50000, 64000, 58000, 72000, 48000, 62000, 80000],
    'Region': ['North', 'South', 'East', 'West', 'East', 'North', 'South']
})

print("Original Data:")
print(data)

# Step 2: Feature Scaling using MinMaxScaler on Age and Annual_Income
scaler = MinMaxScaler()
data[['Age', 'Annual_Income']] = scaler.fit_transform(data[['Age', 'Annual_Income']])

print("\nData after MinMax Scaling:")
print(data)

# Step 3: One-Hot Encoding on Region column
data_encoded = pd.get_dummies(data, columns=['Region'])

print("\nData after One-Hot Encoding on 'Region':")
print(data_encoded)

# Step 4: Verify the transformed dataset
print("\nSummary statistics of scaled columns:")
print(data_encoded[['Age', 'Annual_Income']].describe())

print("\nColumns after encoding:")
print(data_encoded.columns.tolist())


Original Data:
   Age  Annual_Income Region
0   25          50000  North
1   45          64000  South
2   35          58000   East
3   50          72000   West
4   23          48000   East
5   40          62000  North
6   60          80000  South

Data after MinMax Scaling:
        Age  Annual_Income Region
0  0.054054         0.0625  North
1  0.594595         0.5000  South
2  0.324324         0.3125   East
3  0.729730         0.7500   West
4  0.000000         0.0000   East
5  0.459459         0.4375  North
6  1.000000         1.0000  South

Data after One-Hot Encoding on 'Region':
        Age  Annual_Income  Region_East  Region_North  Region_South  \
0  0.054054         0.0625        False          True         False   
1  0.594595         0.5000        False         False          True   
2  0.324324         0.3125         True         False         False   
3  0.729730         0.7500        False         False         False   
4  0.000000         0.0000         True         False   