# Feature Engieering Notebook

__Our cleaned dataset is now ready for the tranformation process. We will carry out transformations on our column values to prepare our data for the model we will be building__
__We will performing operations on our data:__

+ Encoding Variables: We are going to be encoding categorical columns(Label encoding, One-hot encoding) from texts to numbers for the model to be able to understand.
+ Feature Creation: After going through our columns, we may have to create new features out of our existing ones.
+ Scaling/Normalization: for our numerical columns, we wiil perform transformations to ensure they are on a similar scale for our model.

__Even complex models will perform poorly when given bad data to work with, so this notebook is a major point in being able to properly predict our customers' behaviour.__

In [79]:
import pandas as pd
cleaned_churn_df = pd.read_excel("../data/cleaned/E Commerce Dataset (cleaned).xlsx")

In [80]:
df = cleaned_churn_df.copy()
df.head()

Unnamed: 0,CustomerID,Churn,Tenure(months),PreferredLoginDevice,CityTier,WarehouseToHome,PreferredPaymentMode,Gender,HourSpendOnApp,NumberOfDeviceRegistered,PreferredOrderCat,SatisfactionScore,MaritalStatus,NumberOfAddress,Complain,OrderAmountHikeFromlastYear(%),CouponUsed,OrderCount,DaySinceLastOrder,CashbackAmount
0,50001,1,4,Phone,3,6,Debit Card,Female,3,3,Laptop & Accessory,2,Single,9,1,11,1,1,5,159.93
1,50002,1,9,Phone,1,8,Unified Payments Interface,Male,3,4,Mobile Phone,3,Single,7,1,15,0,1,0,120.9
2,50003,1,9,Phone,1,30,Debit Card,Male,2,4,Mobile Phone,3,Single,6,1,14,0,1,3,120.28
3,50004,1,0,Phone,3,15,Debit Card,Male,2,4,Laptop & Accessory,5,Single,8,0,23,0,1,3,134.07
4,50005,1,0,Phone,1,12,Credit Card,Male,0,3,Mobile Phone,5,Single,3,0,11,1,1,3,129.6


In [81]:
# We don't need the customer ID in training our model
df.drop(columns=["CustomerID"], inplace=True)

## Encoding Categorical Variables

In [82]:
from sklearn.preprocessing import OneHotEncoder, LabelEncoder

<p>We will use different encoding types depending on the nature of our features</p>

Label Encoding for Binary value features <br/>
One-Hot encoding for when we have > 2 categories with no order or hierarchy <br/>
Ordinal encoding when we have more than 2 ordinal categories <br/>

In [83]:
binary_columns = ["PreferredLoginDevice", "Gender"]
nominal_cols = ["PreferredPaymentMode", "PreferredOrderCat", "MaritalStatus"]

### <u>Label Encoding</u>


In [84]:
label_encoder = LabelEncoder()

for col in binary_columns:
    df[col] = label_encoder.fit_transform(df[col])


### <u>Ordinal Encoding</u>
- For variables with actual hierarchy/ranking
- Columns that apply here are the `Complain` and `SarisfactionScore` columns
- They are already numeric so we will leave then as they are

In [85]:
df["CityTier"] = df["CityTier"].astype(int)
df["SatisfactionScore"] = df["SatisfactionScore"].astype(int)


### <u>Ordinal Encoding</u>
- For column values that have no meaningful order

In [86]:
one_hot_encoder = OneHotEncoder(drop="first", sparse_output=False)
# drop set to first to avoid making the model learn the from the very correlated column value (multicollinearity) 
one_hot_encoder

In [87]:
nominal_cols = ["PreferredPaymentMode", "PreferredOrderCat", "MaritalStatus"]
one_hot_encoded_df = pd.DataFrame(
    data=one_hot_encoder.fit_transform(df[nominal_cols]),
    columns=one_hot_encoder.get_feature_names_out(nominal_cols),
    index=df.index
)

df = pd.concat([df.drop(columns=nominal_cols), one_hot_encoded_df], axis=1)

In [88]:
df.head()

Unnamed: 0,Churn,Tenure(months),PreferredLoginDevice,CityTier,WarehouseToHome,Gender,HourSpendOnApp,NumberOfDeviceRegistered,SatisfactionScore,NumberOfAddress,...,PreferredPaymentMode_Credit Card,PreferredPaymentMode_Debit Card,PreferredPaymentMode_E wallet,PreferredPaymentMode_Unified Payments Interface,PreferredOrderCat_Grocery,PreferredOrderCat_Laptop & Accessory,PreferredOrderCat_Mobile Phone,PreferredOrderCat_Others,MaritalStatus_Married,MaritalStatus_Single
0,1,4,1,3,6,0,3,3,2,9,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
1,1,9,1,1,8,1,3,4,3,7,...,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0
2,1,9,1,1,30,1,2,4,3,6,...,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
3,1,0,1,3,15,1,2,4,5,8,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
4,1,0,1,1,12,1,0,3,5,3,...,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
