# Feature Engineering

**Objective:**  
In this notebook, we apply feature engineering techniques to enhance the dataset and prepare it for modeling. The goal is to extract, transform, and encode features in a way that improves the model’s ability to predict `SalePrice`.

---

## Key Steps:

1. **Handle missing values**
   - Domain-specific imputation strategies
   - Use of indicators for missingness (if meaningful)

2. **Transform variables**
   - Log transformation for skewed features
   - Binning or discretization of continuous variables
   - Date or time-based feature extraction

3. **Encode categorical features**
   - Ordinal encoding for ranked categories
   - One-hot encoding for nominal variables
   - Group rare categories

4. **Create new features**
   - Total square footage (`TotalSF = TotalBsmtSF + 1stFlrSF + 2ndFlrSF`)
   - Age-related features (`Age = YrSold - YearBuilt`)
   - Quality-related scores (`OverallQual * OverallCond`)

5. **Feature scaling (if required)**
   - Standardization or normalization of numerical values for certain models

---

**Outcome:**  
We produce a cleaned and transformed dataset with meaningful features that are ready to be fed into machine learning models for accurate price prediction.


In [1]:
# Import Libraries
import polars as pl
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import numpy as np
from bokeh.models import NumeralTickFormatter
import holoviews as hv
from sklearn.model_selection import train_test_split
from numpy import log
hv.extension('bokeh')

In [2]:
# Load the dataset, handle missing values and get basic information
data = pl.read_csv('data/train.csv', null_values="NA")
print(f'The dataset contains {data.shape[0]} rows and {data.shape[1]} columns')
data.head()

The dataset contains 1460 rows and 81 columns


Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,OverallQual,OverallCond,YearBuilt,YearRemodAdd,RoofStyle,RoofMatl,Exterior1st,Exterior2nd,MasVnrType,MasVnrArea,ExterQual,ExterCond,Foundation,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinSF1,BsmtFinType2,BsmtFinSF2,…,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual,TotRmsAbvGrd,Functional,Fireplaces,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond,PavedDrive,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
i64,i64,str,i64,i64,str,str,str,str,str,str,str,str,str,str,str,str,i64,i64,i64,i64,str,str,str,str,str,i64,str,str,str,str,str,str,str,i64,str,i64,…,i64,i64,i64,i64,i64,i64,i64,i64,i64,str,i64,str,i64,str,str,i64,str,i64,i64,str,str,str,i64,i64,i64,i64,i64,i64,str,str,str,i64,i64,i64,str,str,i64
1,60,"""RL""",65,8450,"""Pave""",,"""Reg""","""Lvl""","""AllPub""","""Inside""","""Gtl""","""CollgCr""","""Norm""","""Norm""","""1Fam""","""2Story""",7,5,2003,2003,"""Gable""","""CompShg""","""VinylSd""","""VinylSd""","""BrkFace""",196,"""Gd""","""TA""","""PConc""","""Gd""","""TA""","""No""","""GLQ""",706,"""Unf""",0,…,854,0,1710,1,0,2,1,3,1,"""Gd""",8,"""Typ""",0,,"""Attchd""",2003,"""RFn""",2,548,"""TA""","""TA""","""Y""",0,61,0,0,0,0,,,,0,2,2008,"""WD""","""Normal""",208500
2,20,"""RL""",80,9600,"""Pave""",,"""Reg""","""Lvl""","""AllPub""","""FR2""","""Gtl""","""Veenker""","""Feedr""","""Norm""","""1Fam""","""1Story""",6,8,1976,1976,"""Gable""","""CompShg""","""MetalSd""","""MetalSd""","""None""",0,"""TA""","""TA""","""CBlock""","""Gd""","""TA""","""Gd""","""ALQ""",978,"""Unf""",0,…,0,0,1262,0,1,2,0,3,1,"""TA""",6,"""Typ""",1,"""TA""","""Attchd""",1976,"""RFn""",2,460,"""TA""","""TA""","""Y""",298,0,0,0,0,0,,,,0,5,2007,"""WD""","""Normal""",181500
3,60,"""RL""",68,11250,"""Pave""",,"""IR1""","""Lvl""","""AllPub""","""Inside""","""Gtl""","""CollgCr""","""Norm""","""Norm""","""1Fam""","""2Story""",7,5,2001,2002,"""Gable""","""CompShg""","""VinylSd""","""VinylSd""","""BrkFace""",162,"""Gd""","""TA""","""PConc""","""Gd""","""TA""","""Mn""","""GLQ""",486,"""Unf""",0,…,866,0,1786,1,0,2,1,3,1,"""Gd""",6,"""Typ""",1,"""TA""","""Attchd""",2001,"""RFn""",2,608,"""TA""","""TA""","""Y""",0,42,0,0,0,0,,,,0,9,2008,"""WD""","""Normal""",223500
4,70,"""RL""",60,9550,"""Pave""",,"""IR1""","""Lvl""","""AllPub""","""Corner""","""Gtl""","""Crawfor""","""Norm""","""Norm""","""1Fam""","""2Story""",7,5,1915,1970,"""Gable""","""CompShg""","""Wd Sdng""","""Wd Shng""","""None""",0,"""TA""","""TA""","""BrkTil""","""TA""","""Gd""","""No""","""ALQ""",216,"""Unf""",0,…,756,0,1717,1,0,1,0,3,1,"""Gd""",7,"""Typ""",1,"""Gd""","""Detchd""",1998,"""Unf""",3,642,"""TA""","""TA""","""Y""",0,35,272,0,0,0,,,,0,2,2006,"""WD""","""Abnorml""",140000
5,60,"""RL""",84,14260,"""Pave""",,"""IR1""","""Lvl""","""AllPub""","""FR2""","""Gtl""","""NoRidge""","""Norm""","""Norm""","""1Fam""","""2Story""",8,5,2000,2000,"""Gable""","""CompShg""","""VinylSd""","""VinylSd""","""BrkFace""",350,"""Gd""","""TA""","""PConc""","""Gd""","""TA""","""Av""","""GLQ""",655,"""Unf""",0,…,1053,0,2198,1,0,2,1,4,1,"""Gd""",9,"""Typ""",1,"""TA""","""Attchd""",2000,"""RFn""",3,836,"""TA""","""TA""","""Y""",192,84,0,0,0,0,,,,0,12,2008,"""WD""","""Normal""",250000


###  Missing Values

In [3]:
# We start the missing value process by the categorical data
print('Missing Data - Categorical Columns')
categorical_features = data.select(pl.col(pl.String)).columns

# First we find which columns have more or less than 10% missing data. 
lot_missing_data = [column for column in categorical_features if data.select(pl.col(column).is_null().sum()).item() / data.height > 0.1]
few_missing_data = [column for column in categorical_features if data.select(pl.col(column).is_null().sum()).item() / data.height <= 0.1]
print(f'We have {len(lot_missing_data)} of columns with more than 10% data and {len(few_missing_data)} of columns with less than 10%')

# Ensure that all the categorical features are regarded as strings
data = data.with_columns([pl.col(col).cast(pl.Utf8).alias(col) for col in categorical_features])

Missing Data - Categorical Columns
We have 5 of columns with more than 10% data and 38 of columns with less than 10%


In [4]:
# For the columns that have >10% missing values we replace the NA or null with 'Missing'
data = data.with_columns([
    pl.col(column).cast(pl.Utf8).fill_null('Missing').alias(column)
    for column in lot_missing_data
])

# For the columns that have <=10% missing values we are going to replace them with the most common value
data = data.with_columns([
    pl.col(column).cast(pl.Utf8).fill_null(data.select(pl.col(column).drop_nulls().mode()).item()).alias(column)
    for column in few_missing_data
])


# Ensure that we have no null values
for column in categorical_features:
  print(f'For the column {column} we have {data.select(pl.col(column).is_null().sum()).item()}')

For the column MSZoning we have 0
For the column Street we have 0
For the column Alley we have 0
For the column LotShape we have 0
For the column LandContour we have 0
For the column Utilities we have 0
For the column LotConfig we have 0
For the column LandSlope we have 0
For the column Neighborhood we have 0
For the column Condition1 we have 0
For the column Condition2 we have 0
For the column BldgType we have 0
For the column HouseStyle we have 0
For the column RoofStyle we have 0
For the column RoofMatl we have 0
For the column Exterior1st we have 0
For the column Exterior2nd we have 0
For the column MasVnrType we have 0
For the column ExterQual we have 0
For the column ExterCond we have 0
For the column Foundation we have 0
For the column BsmtQual we have 0
For the column BsmtCond we have 0
For the column BsmtExposure we have 0
For the column BsmtFinType1 we have 0
For the column BsmtFinType2 we have 0
For the column Heating we have 0
For the column HeatingQC we have 0
For the colu

In [5]:
# We follow the missing value process by the numerical data
print('Missing Data - Numerical Columns')
numerical_features = data.select(pl.col(pl.Int64)).columns
numerical_features.remove('SalePrice')

# First we find which columns have more or less than 10% missing data. 
lot_missing_data = [column for column in numerical_features if data.select(pl.col(column).is_null().sum()).item() / data.height > 0.1]
few_missing_data = [column for column in numerical_features if data.select(pl.col(column).is_null().sum()).item() / data.height <= 0.1]
print(f'We have {len(lot_missing_data)} of columns with more than 10% data and {len(few_missing_data)} of columns with less than 10%')

# Ensure that all the categorical features are regarded as strings
data = data.with_columns([pl.col(column).cast(pl.Utf8).alias(column) for column in numerical_features])

Missing Data - Numerical Columns
We have 1 of columns with more than 10% data and 36 of columns with less than 10%


### Numerical variable transformation


In [6]:
# We know that the columns LotFrontage, 1stFlrSF, GrLivArea have a right skewed distribution
log_columns = ['LotFrontage', '1stFlrSF', 'GrLivArea']
data = data.with_columns([pl.col(column).cast(pl.Float64).fill_null(1.0).log().alias(column) for column in log_columns])

### Binarize skewed variables


In [7]:
# We know that the columns 'BsmtFinSF2', 'LowQualFinSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'MiscVal' have a skewed distribution
binarised_columns = ['BsmtFinSF2', 'LowQualFinSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'MiscVal']
data = data.with_columns([(pl.col(column).cast(pl.Float64).fill_null(0) > 0).cast(pl.Int8).alias(column) for column in binarised_columns])

### Categorical Variables Mapping

In [8]:
# Map categorical values to numerical values for the quality related columns
quality_mappings = {'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5, 'Missing': 0, 'NA': 0}
quality_columns  = ['ExterQual', 'ExterCond', 'BsmtQual', 'BsmtCond', 'HeatingQC', 'KitchenQual', 'FireplaceQu', 'GarageQual', 'GarageCond']
data = data.with_columns([pl.col(col).fill_null('Missing').replace(quality_mappings).cast(pl.Int8).alias(col) for col in quality_columns])

In [9]:
# Map categorical values to numerical values for the exposure column
exposure_mappings = {'No': 1, 'Mn': 2, 'Av': 3, 'Gd': 4}
exposure_column   = 'BsmtExposure'
data = data.with_columns([pl.col(exposure_column).fill_null('Missing').replace(exposure_mappings).cast(pl.Int8).alias(exposure_column)])

In [10]:
# Map categorical values to numerical values for the finishing columns
finish_mappings = {'Missing': 0, 'NA': 0, 'Unf': 1, 'LwQ': 2, 'Rec': 3, 'BLQ': 4, 'ALQ': 5, 'GLQ': 6}
finish_columns = ['BsmtFinType1', 'BsmtFinType2']
data = data.with_columns(pl.col(column).fill_null('Missing').replace(finish_mappings).cast(pl.Int8).alias(column) for column in finish_columns)

In [11]:
# Map categorical values to numerical values for the garage column
garage_mappings = {'Missing': 0, 'NA': 0, 'Unf': 1, 'RFn': 2, 'Fin': 3}
garage_columns  = 'GarageFinish'
data = data.with_columns(pl.col(garage_columns).fill_null('Missing').replace(garage_mappings).cast(pl.Int8).alias(garage_columns))

In [12]:
# Map categorical values to numerical values for the garage type column
garage_type_mappings = {'2Types': 1, 'Detchd': 2, 'Attchd': 3, 'CarPort': 2, 'BuiltIn': 4, 'Basment': 5}
garage_type_columns  = 'GarageType'
data = data.with_columns(pl.col(garage_type_columns).fill_null('Missing').replace(garage_type_mappings).cast(pl.Int8).alias(garage_type_columns))

In [13]:
# Map categorical values to numerical values for the fence column
fence_mappings = {'Missing': 0, 'NA': 0, 'MnWw': 1, 'GdWo': 2, 'MnPrv': 3, 'GdPrv': 4}
fence_column  = 'Fence'
data = data.with_columns(pl.col(fence_column).fill_null('Missing').replace(fence_mappings).cast(pl.Int8).alias(fence_column))

In [14]:
# Map categorical values to numerical values for the zoning column
zoning_mappings = {'Missing': 0, 'NA': 0, 'RM': 1, 'C (all)': 2, 'RL': 3, 'FV': 4, 'RH': 5}
zoning_column  = 'MSZoning'
data = data.with_columns(pl.col(zoning_column).fill_null('Missing').replace(zoning_mappings).cast(pl.Int8).alias(zoning_column))

In [15]:
# Map categorical values to numerical values for the street, alley column
street_mappings = {'Missing': 0, 'Grvl': 1, 'Pave': 2}
street_columns  = ['Street', 'Alley']
data = data.with_columns(pl.col(column).fill_null('Missing').replace(street_mappings).cast(pl.Int8).alias(column) for column in street_columns)

In [16]:
# Map categorical values to numerical values for the lot shape column
lot_shape_mappings = {'IR1': 1, 'IR2': 2, 'IR3': 3, 'Reg': 4}
lot_shape_column  = 'LotShape'
data = data.with_columns(pl.col(lot_shape_column).fill_null('Missing').replace(lot_shape_mappings).cast(pl.Int8).alias(lot_shape_column))

In [17]:
# Map categorical values to numerical values for the land contour column
lot_contour_mappings = {'Low': 1, 'Bnk': 2, 'Lvl': 3, 'HLS': 4}
lot_contour_column  = 'LandContour'
data = data.with_columns(pl.col(lot_contour_column).fill_null('Missing').replace(lot_contour_mappings).cast(pl.Int8).alias(lot_contour_column))

In [18]:
# Map categorical values to numerical values for the Utilities column
utilities_mappings = {'AllPub': 1, 'NoSeWa': 2}
utilities_column  = 'Utilities'
data = data.with_columns(pl.col(utilities_column).fill_null('Missing').replace(utilities_mappings).cast(pl.Int8).alias(utilities_column))

In [19]:
# Map categorical values to numerical values for the LotConfig column
lot_config_mappings = {'FR2': 1, 'FR3': 2, 'CulDSac': 3 , 'Inside': 4, 'Corner': 5}
lot_config_column  = 'LotConfig'
data = data.with_columns(pl.col(lot_config_column).fill_null('Missing').replace(lot_config_mappings).cast(pl.Int8).alias(lot_config_column))

In [20]:
# Map categorical values to numerical values for the LandSlope column
land_slope_mappings = {'Gtl': 1, 'Mod': 2, 'Sev': 3 }
land_slope_column  = 'LandSlope'
data = data.with_columns(pl.col(land_slope_column).fill_null('Missing').replace(land_slope_mappings).cast(pl.Int8).alias(land_slope_column))

In [21]:
# Map categorical values to numerical values for the condition columns
condition_mappings = {'Missing': 0, 'Artery': 1, 'RRNn': 2, 'RRNe': 3, 'RRAn': 4, 'RRAe':5, 'PosA':6, 'PosN':7, 'Feedr': 8, 'Norm':9}
conditions_columns  = ['Condition1', 'Condition2']
data = data.with_columns(pl.col(column).fill_null('Missing').replace(condition_mappings).cast(pl.Int8).alias(column) for column in conditions_columns)

In [22]:
# Map categorical values to numerical values for the BldgType column
bldg_type_mappings = {'Twnhs': 1, 'TwnhsE': 2, 'Duplex': 3, '2fmCon':4, '1Fam':5}
bldg_type_column  = 'BldgType'
data = data.with_columns(pl.col(bldg_type_column).fill_null('Missing').replace(bldg_type_mappings).cast(pl.Int8).alias(bldg_type_column))

In [23]:
# Map categorical values to numerical values for the HouseStyle column
house_style_mappings = {'1.5Unf': 1, '2Story': 2, '1.5Fin': 3, '2.5Fin': 4, 'SLvl':5, '2.5Unf': 6, 'SFoyer': 7, '1Story': 8}
house_style_column  = 'HouseStyle'
data = data.with_columns(pl.col(house_style_column).fill_null('Missing').replace(house_style_mappings).cast(pl.Int8).alias(house_style_column))

In [24]:
# Map categorical values to numerical values for the RoofStyle column
roof_style_mappings = {'Gambrel': 1, 'Gable': 2, 'Mansard': 3, 'Flat': 4, 'Shed':5, 'Hip': 6}
roof_style_column  = 'RoofStyle'
data = data.with_columns(pl.col(roof_style_column).fill_null('Missing').replace(roof_style_mappings).cast(pl.Int8).alias(roof_style_column))

In [25]:
# Map categorical values to numerical values for the RoofMatl column
roof_material_mappings = {'Tar&Grv': 1, 'ClyTile': 2, 'WdShngl': 3, 'Roll': 4, 'Membran': 5, 'WdShake': 6, 'Metal': 7, 'CompShg': 8}
roof_material_column  = 'RoofMatl'
data = data.with_columns(pl.col(roof_material_column).fill_null('Missing').replace(roof_material_mappings).cast(pl.Int8).alias(roof_material_column))

In [26]:
# Map categorical values to numerical values for the MasVnrType column
mas_vnr_type_mappings = {'Stone': 1, 'BrkCmn': 2, 'None': 3, 'BrkFace': 4}
mas_vnr_type_column  = 'MasVnrType'
data = data.with_columns(pl.col(mas_vnr_type_column).fill_null('Missing').replace(mas_vnr_type_mappings).cast(pl.Int8).alias(mas_vnr_type_column))

In [27]:
# Map categorical values to numerical values for the Foundation column
foundation_mappings = {'BrkTil': 1, 'CBlock': 2, 'PConc': 3, 'Wood': 4, 'Stone': 5, 'Slab':6}
foundation_column  = 'Foundation'
data = data.with_columns(pl.col(foundation_column).fill_null('Missing').replace(foundation_mappings).cast(pl.Int8).alias(foundation_column))

In [28]:
# Map categorical values to numerical values for the Functional column
functional_mappings = {'Min1': 1, 'Min2': 2, 'Maj1': 3, 'Maj2': 4, 'Mod': 5, 'Typ': 6, 'Sev': 7}
functional_column  = 'Functional'
data = data.with_columns(pl.col(functional_column).fill_null('Missing').replace(functional_mappings).cast(pl.Int8).alias(functional_column))

In [29]:
# Map categorical values to numerical values for the SaleCondition column
sale_condition_mappings = {'Normal': 1, 'Abnorml': 2, 'Alloca': 3, 'Family': 4, 'Partial': 5, 'AdjLand': 6}
sale_condition_column  = 'SaleCondition'
data = data.with_columns(pl.col(sale_condition_column).fill_null('Missing').replace(sale_condition_mappings).cast(pl.Int8).alias(sale_condition_column))

In [30]:
# Map categorical values to numerical values for the SaleType column
sale_type_mappings = {'Oth': 1, 'ConLw': 2, 'CWD': 3, 'WD': 4, 'ConLD': 5, 'New': 6, 'COD': 7, 'Con': 8, 'ConLI': 9}
sale_type_column  = 'SaleType'
data = data.with_columns(pl.col(sale_type_column).fill_null('Missing').replace(sale_type_mappings).cast(pl.Int8).alias(sale_type_column))

In [31]:
# Map categorical values to numerical values for the Heating column
heating_mappings = {'GasA': 1, 'Wall': 2, 'OthW': 3, 'Floor': 4, 'GasW': 5, 'Grav': 6}
heating_column  = 'Heating'
data = data.with_columns(pl.col(heating_column).fill_null('Missing').replace(heating_mappings).cast(pl.Int8).alias(heating_column))

In [32]:
# Map categorical values to numerical values for the CentralAir column
central_air_mappings = {'Y': 1, 'N': 2}
central_air_column  = 'CentralAir'
data = data.with_columns(pl.col(central_air_column).fill_null('Missing').replace(central_air_mappings).cast(pl.Int8).alias(central_air_column))

In [33]:
# Map categorical values to numerical values for the Electrical column
electrical_mappings = {'FuseP': 1, 'Mix': 2, 'FuseA': 3, 'SBrkr': 4, 'FuseF': 5}
electrical_column  = 'Electrical'
data = data.with_columns(pl.col(electrical_column).fill_null('Missing').replace(electrical_mappings).cast(pl.Int8).alias(electrical_column))

In [34]:
# Map categorical values to numerical values for the PavedDrive column
paved_drive_mappings = {'P': 1, 'Y': 2, 'N': 3}
paved_drive_column  = 'PavedDrive'
data = data.with_columns(pl.col(paved_drive_column).fill_null('Missing').replace(paved_drive_mappings).cast(pl.Int8).alias(paved_drive_column))

In [35]:
# Drop columns with rare values
data = data.drop(['Id', 'Neighborhood', 'Exterior1st', 'Exterior2nd', 'PoolQC', 'MiscFeature'])

# Convert stringified numbers (like "2003") to actual Int32
string_columns = [col for col in data.columns if data[col].dtype == pl.Utf8]

# Attempt to convert each string column to numeric if all non-null values look like digits
convertible_cols = []
for col in string_columns:
    try:
        # Check: all non-null values are digits
        if data.select(pl.col(col).drop_nulls().str.contains(r'^\d+$').all()).item():
            convertible_cols.append(col)
    except:
        pass

# Perform cast
data = data.with_columns([pl.col(col).cast(pl.Int32).alias(col) for col in convertible_cols])

In [36]:
# Categorical and Numerical Features
categorical_features = data.select(pl.col(pl.String))
numerical_features   = data.select(pl.col(pl.Int8))
print(f'The dataset contains {categorical_features.shape[1]} categorical features and {numerical_features.shape[1]} numerical features')

The dataset contains 0 categorical features and 44 numerical features


### Feature Scaling

In [37]:
# Convert polars DataFrame to pandas DataFrame for scaling
data_pd = data.to_pandas()

# Initialise scaler and fit data
scaler = MinMaxScaler()
scaler.fit(data_pd)

# Create the final dataset that is going to be using in the training
final_data = pd.DataFrame(scaler.transform(data_pd), columns=data.columns)
print('The final dataset is:\n')
final_data.head()

The final dataset is:



Unnamed: 0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,...,3SsnPorch,ScreenPorch,PoolArea,Fence,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,0.235294,0.5,0.72646,0.03342,1.0,0.0,1.0,0.666667,0.0,0.75,...,0.0,0.0,0.0,0.0,0.0,0.090909,0.5,0.375,0.0,0.241078
1,0.0,0.5,0.762595,0.038795,1.0,0.0,1.0,0.666667,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.363636,0.25,0.375,0.0,0.203583
2,0.235294,0.5,0.734312,0.046507,1.0,0.0,0.0,0.666667,0.0,0.75,...,0.0,0.0,0.0,0.0,0.0,0.727273,0.5,0.375,0.0,0.261908
3,0.294118,0.5,0.71253,0.038561,1.0,0.0,0.0,0.666667,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.375,0.2,0.145952
4,0.235294,0.5,0.771086,0.060576,1.0,0.0,0.0,0.666667,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.5,0.375,0.0,0.298709


In [38]:
# Save the dataset to the data folder
final_data.to_csv('Data/final_data.csv', index=False)