<a href="https://colab.research.google.com/github/Soichiro-Gardinner/Neural_Networks/blob/main/Neural_NetwoerkKaggle.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Hausing Dataset**
- **By:** Oscar Castanaza

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv("/content/train.csv")
df.head(11)

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000
5,6,50,RL,85.0,14115,Pave,,IR1,Lvl,AllPub,...,0,,MnPrv,Shed,700,10,2009,WD,Normal,143000
6,7,20,RL,75.0,10084,Pave,,Reg,Lvl,AllPub,...,0,,,,0,8,2007,WD,Normal,307000
7,8,60,RL,,10382,Pave,,IR1,Lvl,AllPub,...,0,,,Shed,350,11,2009,WD,Normal,200000
8,9,50,RM,51.0,6120,Pave,,Reg,Lvl,AllPub,...,0,,,,0,4,2008,WD,Abnorml,129900
9,10,190,RL,50.0,7420,Pave,,Reg,Lvl,AllPub,...,0,,,,0,1,2008,WD,Normal,118000


In [None]:
df.isna().sum()

Id                 0
MSSubClass         0
MSZoning           0
LotFrontage      259
LotArea            0
                ... 
MoSold             0
YrSold             0
SaleType           0
SaleCondition      0
SalePrice          0
Length: 81, dtype: int64

In [None]:
df.shape

(1460, 81)

# **Split and Impute**

- Impute

In [None]:
from sklearn.impute import SimpleImputer

# Separate the features (X) and target variable (y)
X = df.drop('SalePrice', axis=1)
y = df['SalePrice']

# Identify the numeric and categorical columns
numeric_cols = X.select_dtypes(include='number').columns
categorical_cols = X.select_dtypes(include='object').columns

# Impute missing values in numeric columns with mean imputation
numeric_imputer = SimpleImputer(strategy='mean')
X[numeric_cols] = numeric_imputer.fit_transform(X[numeric_cols])

# Impute missing values in categorical columns with mode imputation
categorical_imputer = SimpleImputer(strategy='most_frequent')
X[categorical_cols] = categorical_imputer.fit_transform(X[categorical_cols])

# Split the imputed data into train and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## **Scale, OH encoder, ColumnTransformer and Pipeline.**

In [None]:
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

# Preprocessing pipeline for numeric features
numeric_features = ['LotFrontage', 'LotArea', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',
                    'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF',
                    '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
                    'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'GarageYrBlt',
                    'GarageCars', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch',
                    'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold']

numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])

# Preprocessing pipeline for categorical features
categorical_features = ['MSSubClass', 'MSZoning', 'Street', 'Alley', 'LotShape', 'LandContour', 'Utilities',
                        'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType',
                        'HouseStyle', 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',
                        'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure',
                        'BsmtFinType1', 'BsmtFinType2', 'Heating', 'HeatingQC', 'CentralAir', 'Electrical',
                        'KitchenQual', 'Functional', 'FireplaceQu', 'GarageType', 'GarageFinish', 'GarageQual',
                        'GarageCond', 'PavedDrive', 'PoolQC', 'Fence', 'MiscFeature', 'SaleType',
                        'SaleCondition']

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('encoder', OneHotEncoder(handle_unknown='ignore'))
])

# Create a preprocessor to apply the transformations
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)
    ])

# Apply the preprocessing to the training and test data
X_train_preprocessed = preprocessor.fit_transform(X_train)
X_test_preprocessed = preprocessor.transform(X_test)

# **Models**

- Model 1

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras import regularizers

# Define the architecture of the deep learning model
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(X_train_preprocessed.shape[1],)))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
history = model.fit(X_train_preprocessed.toarray(), y_train, epochs=100, batch_size=32, validation_data=(X_test_preprocessed.toarray(), y_test))


# Evaluate the model
loss = model.evaluate(X_test_preprocessed.toarray(), y_test.values)
print('\nMean Squared Error:', loss)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

- Model 2

In [None]:
# Define the architecture of the deep learning model
model2 = Sequential()
model2.add(Dense(64, activation='relu', input_shape=(X_train_preprocessed.shape[1],), kernel_regularizer=regularizers.l2(0.001)))
model2.add(Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001)))
model2.add(Dense(1))

# Compile the model
model2.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
history2 = model2.fit(X_train_preprocessed.toarray(), y_train.values, epochs=100, batch_size=32, validation_data=(X_test_preprocessed.toarray(), y_test.values))

# Evaluate the model
loss2 = model2.evaluate(X_test_preprocessed.toarray(), y_test.values)
print('Model 2 - Mean Squared Error:', loss2)


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

- **Model 3:** Adding L2 Regularization (Ridge Regression)

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import regularizers


# Define the architecture of the deep learning model
model3 = Sequential()
model3.add(Dense(64, activation='relu', input_shape=(X_train_preprocessed.shape[1],), kernel_regularizer=regularizers.l2(0.001)))
model3.add(Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001)))
model3.add(Dense(1))

# Compile the model
model3.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
history3 = model3.fit(X_train_preprocessed.toarray(), y_train.values, epochs=100, batch_size=32, validation_data=(X_test_preprocessed.toarray(), y_test.values))

# Evaluate the model
loss3 = model3.evaluate(X_test_preprocessed.toarray(), y_test.values)
print('Model 3 - Mean Squared Error:', loss3)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

# **Select the Best Model:**

In [None]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

print('Model 1 - Mean Squared Error:', loss)
print('Model 2 - Mean Squared Error:', loss2)
print('Model 3 - Mean Squared Error:', loss3)

Model 1 - Mean Squared Error: 1201398272.0
Model 2 - Mean Squared Error: 1170308992.0
Model 3 - Mean Squared Error: 1162292864.0


- Model 1:

In [None]:
# Predict on the test set
y_pred_1 = model.predict(X_test_preprocessed)

# Calculate metrics
mse_1 = mean_squared_error(y_test, y_pred_1)
mae_1 = mean_absolute_error(y_test, y_pred_1)
r2_1 = r2_score(y_test, y_pred_1)

# Print metrics
print("Model 1 Metrics:")
print("Mean Squared Error:", mse_1)
print("Mean Absolute Error:", mae_1)
print("R-squared Score:", r2_1)

Model 1 Metrics:
Mean Squared Error: 1201398183.798757
Mean Absolute Error: 19682.55070098459
R-squared Score: 0.8433706531707814


- Model 2:

In [None]:
# Predict on the test set
y_pred_2 = model2.predict(X_test_preprocessed)

# Calculate metrics
mse_2 = mean_squared_error(y_test, y_pred_2)
mae_2 = mean_absolute_error(y_test, y_pred_2)
r2_2 = r2_score(y_test, y_pred_2)

# Print metrics
print("Model 2 Metrics:")
print("Mean Squared Error:", mse_2)
print("Mean Absolute Error:", mae_2)
print("R-squared Score:", r2_2)


Model 2 Metrics:
Mean Squared Error: 1170309099.3228042
Mean Absolute Error: 19644.836191673803
R-squared Score: 0.8474238164439186


- Model 3:

In [None]:
# Predict on the test set
y_pred_3 = model3.predict(X_test_preprocessed)

# Calculate metrics
mse_3 = mean_squared_error(y_test, y_pred_3)
mae_3 = mean_absolute_error(y_test, y_pred_3)
r2_3 = r2_score(y_test, y_pred_3)

# Print metrics
print("Model 3 Metrics:")
print("Mean Squared Error:", mse_3)
print("Mean Absolute Error:", mae_3)
print("R-squared Score:", r2_3)

Model 3 Metrics:
Mean Squared Error: 1162292830.3642085
Mean Absolute Error: 19532.43358037243
R-squared Score: 0.8484689178831618


# **Selected**

Based on the metrics, Model 3 has the lowest Mean Squared Error (MSE) among the three models:

- Model 1: MSE = 1,201,398,183.80
- Model 2: MSE = 1,170,309,099.32
- Model 3: MSE = 1,162,292,830.36

Lower MSE values indicate better model performance because it means the model's predictions are closer to the actual target values. In this case, Model 3 has the lowest MSE, suggesting that it has better predictive accuracy compared to the other two models.



#### **Additionally**, the R-squared score is a measure of how well the model fits the data. Model 3 has the highest R-squared score of 0.848, which indicates that approximately 84.8% of the variance in the target variable can be explained by the model's predictions. Higher R-squared scores generally indicate a better fit to the data.

## **Therefore, based on the lower MSE and higher R-squared score, Model 3 appears to be the best model among the three.**