# Real Estate Price Prediction using Neural Networks

In this notebook, we build a machine learning model using a neural network to predict real estate prices. The dataset includes various features such as living area, province, salary, and average price per square meter. The goal is to predict property prices based on historical data.


## 1. Import Libraries
This section contains all necessary imports to keep the code organized.

In [56]:
# Data manipulation and processing
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler

# Machine learning and deep learning libraries
import keras_tuner as kt
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam

# Custom functions
from functions import adjust_row, remove_outliers, accuracy, smape

## 2. Data Loading and Preprocessing
This section loads the dataset, removes outliers, and applies necessary transformations.

In [57]:
original_data_path = (
    r"./Data/data_clean.csv"
)
df = pd.read_csv(original_data_path)
df = pd.read_csv("Data").drop(columns=["Unnamed: 0"])

# Remove outliers
df = remove_outliers(df)

# Create synthetic data
df_synth = df.copy()
df_synth[["Living_Area", "Price"]] = df_synth.apply(adjust_row, axis=1)

## 3. Data Splitting
The dataset is divided into training (80%) and test (20%) sets.


In [58]:
train = np.array(df)[: int(0.8 * df.shape[0])]
test = np.array(df.copy())[int(0.8 * df.shape[0]) :]

synthetic_data = np.array(df_synth)[: int(0.8 * df.shape[0])]
train = np.concatenate((train, synthetic_data), axis=0)
np.random.shuffle(train)

## 4. Data Normalization
The data is scaled using MinMaxScaler.

The data is scaled using the mean and standard deviation of the training set

In [59]:
scaler = MinMaxScaler()
train_scaled = scaler.fit_transform(train)
test_scaled = scaler.transform(test)

X_train, y_train = train_scaled[:, 1:], train_scaled[:, 0]
X_test, y_test = test_scaled[:, 1:], test_scaled[:, 0]

## 5. Building the Neural Network Model
A sequential model is built with hyperparameter tuning using Keras Tuner.


In [60]:
def build_model(hp):
    model = Sequential()
    model.add(Dense(hp.Int("units_layer_1", 800, 1000, 50), activation=hp.Choice("activation_layer_1", ["relu", "tanh", "sigmoid"]), input_dim=X_train.shape[1]))
    
    if hp.Boolean("add_layer_2"):
        model.add(Dense(hp.Int("units_layer_2", 600, 800, 50), activation=hp.Choice("activation_layer_2", ["relu", "tanh", "sigmoid"])))
    
    model.add(Dense(hp.Int("units_layer_3", 400, 600, 50), activation=hp.Choice("activation_layer_3", ["relu", "tanh", "sigmoid"])))
    model.add(Dense(hp.Int("units_layer_4", 200, 400, 50), activation=hp.Choice("activation_layer_4", ["relu", "tanh", "sigmoid"])))
    model.add(Dense(hp.Int("units_layer_5", 50, 100, 50), activation=hp.Choice("activation_layer_5", ["relu", "softmax"])))
    
    if hp.Boolean("add_layer_6"):
        model.add(Dense(hp.Int("units_layer_6", 20, 50, 10), activation="softmax"))
        model.add(Dropout(hp.Float("dropout_2", 0.0, 0.5, 0.1)))
    
    model.add(Dense(1))
    model.compile(optimizer=Adam(learning_rate=hp.Float("learning_rate", 1e-4, 1e-2, sampling="log")), loss="mean_squared_error", metrics=["mean_squared_error"])
    return model

## 6. Hyperparameter Tuning
The Bayesian optimization method is used for hyperparameter search.

In [61]:


tuner = kt.BayesianOptimization(build_model, objective="val_mean_squared_error", max_trials=30, directory="keras_tuner_dir", project_name="bayesian_tuning_example")
tuner.search(X_train, y_train, epochs=20, validation_data=(X_test, y_test), batch_size=16)

best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
print(f"Best hyperparameters: {best_hps.values}")


Reloading Tuner from keras_tuner_dir\bayesian_tuning_example\tuner0.json
Best hyperparameters: {'units_layer_1': 850, 'activation_layer_1': 'relu', 'add_layer_2': True, 'units_layer_3': 450, 'activation_layer_3': 'tanh', 'units_layer_4': 400, 'activation_layer_4': 'tanh', 'units_layer_5': 50, 'activation_layer_5': 'relu', 'add_layer_6': False, 'learning_rate': 0.008117079378823321, 'units_layer_2': 600, 'activation_layer_2': 'relu'}


## 7. Model Training
The best model is trained with early stopping.

In [62]:
best_model = tuner.get_best_models(num_models=1)[0]
early_stopping = tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=5, mode="min", restore_best_weights=True)
best_model.fit(X_train, y_train, epochs=200, validation_data=(X_test, y_test), batch_size=16, callbacks=[early_stopping])


Epoch 1/2


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  saveable.load_own_variables(weights_store.get(inner_path))


[1m1580/1580[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 14ms/step - loss: 0.0427 - mean_squared_error: 0.0427 - val_loss: 0.0142 - val_mean_squared_error: 0.0142
Epoch 2/2
[1m1580/1580[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 14ms/step - loss: 0.0130 - mean_squared_error: 0.0130 - val_loss: 0.0157 - val_mean_squared_error: 0.0157


<keras.src.callbacks.history.History at 0x2212d599930>

## 8. Save the model

In [63]:
best_model.save("name.h5")



## 8. Model Evaluation
Predictions are made, and performance is evaluated.

In [64]:

y_pred_train = best_model.predict(X_train)
y_pred = best_model.predict(X_test)

[1m790/790[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step
[1m99/99[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


## 9. Inverse Scaling and Performance Evaluation

In [65]:
def my_inverse_scaler(y: np.ndarray, X: np.ndarray) -> np.ndarray:
    tmp = np.concatenate((y.reshape((-1, 1)), X), axis=1)
    return scaler.inverse_transform(tmp)[:, 0]

# Inverse scaling
y_pred_train = my_inverse_scaler(y_pred_train, X_train)
y_train = my_inverse_scaler(y_train, X_train)
y_test = my_inverse_scaler(y_test, X_test)
y_pred = my_inverse_scaler(y_pred, X_test)

# Accuracy evaluation
print("Train:")
accuracy(y_train, y_pred_train)
print("Test:")
accuracy(y_test, y_pred)

Train:
RMSE: 98190.90279984119
MAE:  73841.54079612285
MAPE: 26.846540227542015 %
SMAPE: 22.693285625104515 %
R2: 0.5034783329649752
Test:
RMSE: 108994.42932119273
MAE:  81782.51711428494
MAPE: 30.153101689544354 %
SMAPE: 25.151863202018692 %
R2: 0.45777846464393546


### Conclusion

In this notebook, we built a neural network model to predict real estate prices based on historical data. After preprocessing the data, scaling it, and tuning the model's settings, we were able to make accurate predictions using all the available features. Interestingly, using only a few features that were somewhat related to the target (as identified by SHAP) did not improve the results. In fact, using all the features gave better performance. The model showed strong generalization on both the training and test sets, and early stopping helped prevent overfitting. Overall, the model highlights the potential of neural networks for predicting real estate prices and can be improved with more advanced features or optimization techniques.