After referencing data_description, the encoding plan is as follows:

## Ordinal
Fence: Fence quality (GdPrv	Good Privacy MnPrv	Minimum Privacy GdWo	Good Wood MnWw	Minimum Wood/Wire NA	No Fence)
PavedDrive: Paved driveway (Y	Paved P	Partial Pavement N	Dirt/Gravel)
Functional: Home functionality (Assume typical unless deductions are warranted) (Typ	Typical Functionality Min1	Minor Deductions 1 Min2	Minor Deductions 2 Mod	Moderate Deduction Maj1	Major Deductions 1 Maj2	Major Deductions 2 Sev	Severely Damaged Sal	Salvage only)
Electrical: Electrical system (2 SBrkr	Standard Circuit Breakers & Romex 1 FuseA	Fuse Box over 60 AMP and all Romex wiring (Average) 0 Mix	Mixed -1 FuseF	60 AMP Fuse Box and mostly Romex wiring (Fair)  -2 FuseP	60 AMP Fuse Box and mostly knob & tube wiring (poor))
BsmtFinType1: Rating of basement finished area (GLQ	Good Living Quarters ALQ	Average Living Quarters BLQ	Below Average Living Quarters	 Rec	Average Rec Room LwQ	Low Quality Unf	Unfinshed NA	No Basement)
BsmtFinType2: Rating of basement finished area (if multiple types) (GLQ	Good Living Quarters ALQ	Average Living Quarters BLQ	Below Average Living Quarters	 Rec	Average Rec Room LwQ	Low Quality Unf	Unfinshed NA	No Basement)
BsmtExposure: Refers to walkout or garden level walls (Gd	Good Exposure Av	Average Exposure (split levels or foyers typically score average or above)	 Mn	Mimimum Exposure No	No Exposure NA	No Basement)

__{"Ex": 5, "Gd": 4, "TA": 3, "Fa": 2, "Po": 1, NaN: 0}__

HeatingQC: Heating quality and condition 
KitchenQual: Kitchen quality 
GarageCond: Garage condition 
GarageQual: Garage quality 
FireplaceQu: Fireplace quality 
BsmtQual: Evaluates the height of the basement
BsmtCond: Evaluates the general condition of the basement
ExterCond: Evaluates the present condition of the material on the exterior
ExterQual: Evaluates the quality of the material on the exterior 
PoolQC: Pool quality 

## Nominal
### OHE (option to do drop if binary)
Home Ownership (trinary)
Purpose
MoSold: Month Sold (MM)
GarageFinish: Interior finish of the garage
CentralAir: Central air conditioning binary
Street: Type of road access to property
Alley: Type of alley access to property
LotShape: General shape of property
LandContour: Flatness of the property
Utilities: Type of utilities available
LotConfig: Lot configuration
LandSlope: Slope of property
Alley: Type of alley access to property

### Hash
SaleCondition: Condition of sale
SaleType: Type of sale
GarageType: Garage location
Heating: Type of heating
MSSubClass: Identifies the type of dwelling involved in the sale.
MSZoning: Identifies the general zoning classification of the sale.
Neighborhood: Physical locations within Ames city limits			
Condition1: Proximity to various conditions
Condition2: Proximity to various conditions (if more than one is present)
BldgType: Type of dwelling
HouseStyle: Style of dwelling
RoofStyle: Type of roof	
RoofMatl: Roof material
Exterior1st: Exterior covering on house	
Exterior2nd: Exterior covering on house (if more than one material)	
MasVnrType: Masonry veneer type
Foundation: Type of foundation

## Numeric Columns
MiscVal: $Value of miscellaneous feature
YrSold: Year Sold (YYYY)
PoolArea: Pool area in square feet
GarageCars: Size of garage in car capacity
GarageArea: Size of garage in square feet
GarageYrBlt: Year garage was built
Fireplaces: Number of fireplaces
TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)
1stFlrSF: First Floor square feet
2ndFlrSF: Second floor square feet
LowQualFinSF: Low quality finished square feet (all floors)
GrLivArea: Above grade (ground) living area square feet
BsmtFullBath: Basement full bathrooms
BsmtHalfBath: Basement half bathrooms
FullBath: Full bathrooms above grade
HalfBath: Half baths above grade
Bedroom: Bedrooms above grade (does NOT include basement bedrooms)
BsmtFinSF1: Type 1 finished square feet
BsmtFinSF2: Type 2 finished square feet
BsmtUnfSF: Unfinished square feet of basement area
TotalBsmtSF: Total square feet of basement area
MasVnrArea: Masonry veneer area in square feet
LotFrontage: Linear feet of street connected to property
LotArea: Lot size in square feet
YearBuilt: Original construction date
YearRemodAdd: Remodel date (same as construction date if no remodeling or additions)
OverallQual: Rates the overall material and finish of the house
OverallCond: Rates the overall condition of the house

__sum all porches into one Porch column__

ScreenPorch: Screen porch area in square feet
3SsnPorch: Three season porch area in square feet
WoodDeckSF: Wood deck area in square feet
OpenPorchSF: Open porch area in square feet
EnclosedPorch: Enclosed porch area in square feet

# Target
SalePrice

Steps:
1. Split data into train and test

2. Drops from EDA
- drop `Id`
- drop `MiscFeature`: Miscellaneous feature not covered in other categories
- drop duplicates

3. Imputation

4. Preprocessing
- Build Pipelines
- Select columns
- Create tuples
- Column Transformer
- Encoding

5. Model!

## Library imports

In [48]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import math
import scipy

from sklearn import set_config
set_config(transform_output='pandas')

from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestRegressor
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LinearRegression, LogisticRegression, Ridge, Lasso, ElasticNet
from sklearn.metrics import (mean_absolute_error, mean_squared_error, r2_score, 
                             mean_absolute_percentage_error, root_mean_squared_error)
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder, RobustScaler, PolynomialFeatures
from sklearn.tree import DecisionTreeRegressor, plot_tree


import category_encoders as ce
from category_encoders.hashing import HashingEncoder
from category_encoders.ordinal import OrdinalEncoder

from xgboost import XGBRegressor

## Load dataset

In [49]:
file_path = "../../house-prices-advanced-regression-techniques/input/train.csv"
houses = pd.read_csv(file_path)

# Train/Test Split

In [50]:
X = houses.drop(columns = "SalePrice")
y = houses["SalePrice"].copy()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Drop indicated columns

In [51]:
for _ in [X_train, X_test]:
    _.drop(columns=["Id", "MiscFeature"], inplace=True)

# Impute missing vals

In [52]:
imputer = SimpleImputer(strategy="constant", fill_value=0)
imputer.fit(X_train)

for _ in [X_train, X_test]:
    imputer.transform(_)

# Build Pipelines

In [None]:
# numerical pipeline
num_pipe = Pipeline([('Median Imputer', median_imputer),
                     ('Standard Scaler', scaler)])

# categone hot encoder pipeline
ohe_pipe = Pipeline([('Most Frequent Imputer', freq_imputer),
                     ('OHE', ohe)])

# Hashing Pipeline
hashing_pipe = Pipeline([
    ('imputer', freq_imputer),
     ('hash', hash)
])

# column selection
ohe_cols = X_train.select_dtypes(include='object').columns.drop(hashing_cols)
num_cols = X_train.select_dtypes(include='number').columns.to_list()

# Column Transformer Tuples
num_tuple = ("Numeric", num_pipe, num_cols)
ohe_tuple = ("OHE", ohe_pipe, ohe_cols)
hashing_tuple = ('Hashing Encoder', hashing_pipe, hashing_cols)
cat_tuple = ('Categorical Transformers', ohe_pipe, ohe_cols)
num_tuple = ('Numeric Transformers', num_pipe, num_cols)

In [None]:
# Column Selector
preprocessor = ColumnTransformer([num_tuple, ohe_tuple], remainder='drop',
                                 verbose_feature_names_out=False)

# Transform Data
X_train_proc = preprocessor.fit(X_train)
X_train_proc = preprocessor.transform(X_train)

X_train_proc.info()