# Kaggle Submission


### Contents:
- [Setup & Train/Test/Split](#Setup-&-Train/Test/Split)
- [Base Model](#Base-Model)
- [Data Transformation](#Data-Transformation)
- [Model Fitting!](#Model-Fitting!)
- [Create Kaggle Submission File](#Create-Kaggle-Submission-File)

### Setup & Train/Test/Split
---

In [1]:
#Library Imports
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder, StandardScaler, PolynomialFeatures
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

In [2]:
#Read in relevant csvs
train_clean = pd.read_csv('../datasets/train_clean.csv')
validate_clean = pd.read_csv('../datasets/validate_clean.csv')

In [3]:
#Interaction terms code

train_clean['kitchen_qual * overall_qual * exter_qual'] = train_clean['kitchen_qual'] * train_clean['overall_qual'] * train_clean['exter_qual']
validate_clean['kitchen_qual * overall_qual * exter_qual'] = validate_clean['kitchen_qual'] * validate_clean['overall_qual'] * validate_clean['exter_qual']

In [4]:
#Features in use
features = ['neighborhood',
            'overall_cond',
            'bldg_type',
            'kitchen_qual',
            'central_air',
            'gr_liv_area',
            'garage_area',
            'total_bsmt_sf',
            '1st_flr_sf',
            'kitchen_qual * overall_qual * exter_qual',
            'bedroom_abvgr',
            'overall_qual',
            'exter_qual',
            'year_built']

In [5]:
#Test/Train Data
X = train_clean[features]
y = train_clean['saleprice']

#Validate Data
val = validate_clean[features]

#Train/Test/Split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 24)

In [6]:
y.shape, X.shape

((2051,), (2051, 14))

### Base Model
---

In [7]:
null_values = [y_train.mean()] * len(y_test)

In [8]:
r2_score(y_test,null_values)

-0.0003395859087844677

### Data Transformation
---

In [9]:
# Simple Imputing
si = SimpleImputer(strategy = 'most_frequent').set_output(transform = 'pandas')
imputefeatures = ['bedroom_abvgr']

X_train[imputefeatures] = si.fit_transform(X_train[imputefeatures])
X_test[imputefeatures] = si.transform(X_test[imputefeatures])
val[imputefeatures] = si.transform(val[imputefeatures])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  val[imputefeatures] = si.transform(val[imputefeatures])


In [10]:
#Transform the data with ColumnTransformer
ohe = OneHotEncoder(drop = 'first',
                    handle_unknown = 'ignore',
                    sparse_output = False)

ctx = ColumnTransformer(
    transformers =[
        ('one_hot', ohe, ['neighborhood', 'bldg_type']),
        ('ss', StandardScaler(), ['bedroom_abvgr', '1st_flr_sf', 'garage_area', 'total_bsmt_sf'])
    ], remainder = 'passthrough',
    verbose_feature_names_out = False
)

In [11]:
#Fit and transform the training set
X_train_ctx = pd.DataFrame(ctx.fit_transform(X_train),
                           columns = ctx.get_feature_names_out())

X_test_ctx = pd.DataFrame(ctx.transform(X_test),
                           columns = ctx.get_feature_names_out())

#Transform the  validation data
val_enc = pd.DataFrame(ctx.transform(val),
                           columns = ctx.get_feature_names_out())

### Model Fitting!
---

In [12]:
#Instantiate Linear Regression Model
lr = LinearRegression()

In [13]:
# Fit the Model
lr.fit(X_train_ctx, y_train)

### Create Kaggle Submission File
---


In [14]:
y_preds = lr.predict(val_enc)

In [15]:
#Attaching the y_preds series to the validate_clean data to submit to Kaggle.
validate_clean['saleprice']= y_preds
validate_clean.shape

(878, 82)

In [16]:
# alidate_clean[['id', 'saleprice']].to_csv('../datasets/khalbig_kaggle_submission.csv', index= False)

In [17]:
validate_clean.head()

Unnamed: 0,id,pid,ms_subclass,ms_zoning,lot_frontage,lot_area,street,alley,lot_shape,land_contour,...,pool_area,pool_qc,fence,misc_feature,misc_val,mo_sold,yr_sold,sale_type,kitchen_qual * overall_qual * exter_qual,saleprice
0,2658,902301120,190,RM,69.0,9142,Pave,Grvl,Reg,Lvl,...,0,,,,0,4,2006,WD,0,147159.989769
1,2718,905108090,90,RL,,9662,Pave,,IR1,Lvl,...,0,,,,0,8,2006,WD,5,168980.288993
2,2414,528218130,60,RL,58.0,17104,Pave,,IR1,Lvl,...,0,,,,0,9,2006,New,28,196011.260799
3,1989,902207150,30,RM,60.0,8520,Pave,,Reg,Lvl,...,0,,,,0,7,2007,WD,10,98874.471907
4,625,535105100,20,RL,,9500,Pave,,IR1,Lvl,...,0,,,,0,7,2009,WD,6,160746.68687
