# Assignment: Real Estate Price Prediction

## Instructions

1. **Download the Dataset**  
   - Use the [Real Estate Price Prediction](https://www.kaggle.com/datasets/quantbruce/real-estate-price-prediction) dataset from Kaggle.

2. **Data Preprocessing**  
   - Load the dataset and perform any necessary preprocessing (e.g., handling missing values, scaling features, etc.).

3. **Model 1: Polynomial Regression**  
   - Train a Polynomial Regression model on the dataset.
   - Experiment with different polynomial degrees and find the degree that gives the best performance.

4. **Model 2: Support Vector Regression (SVR)**  
   - Train a Support Vector Regression model using the same dataset.
   - Use an appropriate kernel (e.g., RBF or polynomial) and tune the hyperparameters.

5. **Comparison**  
   - Evaluate both models using appropriate metrics (e.g., Mean Squared Error, R²).
   - Compare the performance of Polynomial Regression and SVR.

6. **Report**  
   - Present your results, including:
     - The best hyperparameters and polynomial degree.
     - Metrics for both models.
     - A brief discussion on which model performed better and why.

7. **Submission**  
   - Submit your code, the trained models, and a short report summarizing your findings.


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import PolynomialFeatures


In [6]:
df = pd.read_csv('./Dataset/Real estate.csv')
df.head()

Unnamed: 0,No,X1 transaction date,X2 house age,X3 distance to the nearest MRT station,X4 number of convenience stores,X5 latitude,X6 longitude,Y house price of unit area
0,1,2012.917,32.0,84.87882,10,24.98298,121.54024,37.9
1,2,2012.917,19.5,306.5947,9,24.98034,121.53951,42.2
2,3,2013.583,13.3,561.9845,5,24.98746,121.54391,47.3
3,4,2013.5,13.3,561.9845,5,24.98746,121.54391,54.8
4,5,2012.833,5.0,390.5684,5,24.97937,121.54245,43.1


In [8]:
df = df.drop(columns=['No'])
print(df.isna().sum(axis = 0))

X1 transaction date                       0
X2 house age                              0
X3 distance to the nearest MRT station    0
X4 number of convenience stores           0
X5 latitude                               0
X6 longitude                              0
Y house price of unit area                0
dtype: int64


In [10]:
X = df.drop(columns=['Y house price of unit area'])
y = df['Y house price of unit area']

In [13]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

In [15]:
poly = PolynomialFeatures(degree=2)  
X_poly_train = poly.fit_transform(X_train)
X_poly_test = poly.transform(X_test)

poly_model = LinearRegression()
poly_model.fit(X_poly_train, y_train)

In [18]:
y_pred = poly_model.predict(X_poly_test)
y_pred

array([ 38.5625,  42.0625,  53.625 ,  66.875 ,  44.375 ,  47.25  ,
        32.5   ,  35.875 ,  47.4375,  30.0625,  31.375 , -14.8125,
        21.75  ,  44.6875,  39.6875,  21.375 ,  45.6875,  56.6875,
        32.4375,  46.8125,  47.6875,  13.75  ,  36.9375,  38.4375,
        40.    ,  65.6875,  33.75  ,  57.6875,  33.375 ,  23.75  ,
        12.25  ,  -0.3125,  31.8125,   4.4375,  38.5625,  76.25  ,
        68.4375,  24.375 ,  40.625 ,  39.0625,  76.9375,  40.5   ,
        48.5   ,  35.9375,  49.875 ,  52.625 ,  60.1875, -34.375 ,
        73.125 ,  58.    ,  35.0625,  37.1875,  45.75  ,  41.25  ,
        45.0625,  17.0625,  53.5625,  -2.625 ,  34.8125,  34.375 ,
        13.1875,  38.6875,  17.0625,  40.    , -55.375 ,  17.4375,
         4.    ,  54.375 ,  45.    ,  28.875 ,  42.875 ,  78.375 ,
        49.75  ,  40.125 ,  35.125 ,  55.8125,  59.6875,  49.4375,
        45.3125,  41.9375,  37.75  ,  34.375 ,   0.75  ])

In [22]:
mse_poly = mean_squared_error(y_test, y_pred_poly)
r2_poly = r2_score(y_test, y_pred_poly)
print(f"MSE: {mse_poly}, R2: {r2_poly}")

MSE: 422.2835768072289, R2: -1.517192536355207


In [23]:
svr = SVR(kernel='rbf', C=1.0, epsilon=0.1)
svr.fit(X_train, y_train)

In [25]:
y_pred_svr = svr.predict(X_test)
y_pred_svr

array([46.5867111 , 40.03035542, 41.86609103, 37.74007728, 26.90759104,
       40.56289322, 43.7454137 , 45.48467047, 25.50284875, 45.94081654,
       29.49748225, 30.32117205, 35.00326218, 25.97488625, 36.17121461,
       28.78762306, 40.78859915, 42.8884221 , 26.04209035, 43.91793116,
       27.18270858, 29.16436564, 48.07093505, 41.91242585, 23.05478863,
       37.30153874, 21.23027986, 40.54237035, 36.14543757, 37.46201747,
       22.46523026, 34.67351828, 36.24988959, 24.65291654, 46.00896544,
       33.07648754, 48.27487669, 20.54979015, 40.82334816, 39.64452513,
       36.74518276, 39.38123913, 41.29256484, 36.52642034, 39.41566604,
       47.29009907, 43.74004256, 26.23372619, 45.2809141 , 39.40518713,
       46.46039228, 48.35113613, 40.0886908 , 41.67341868, 34.59122602,
       20.37618531, 34.92970706, 34.17377387, 25.62806882, 45.25938987,
       28.14238933, 30.57014572, 20.37618531, 23.63093085, 29.26402113,
       28.4301336 , 25.80459987, 43.11943184, 34.32605781, 26.82

In [27]:
mse_svr = mean_squared_error(y_test, y_pred_svr)
r2_svr = r2_score(y_test, y_pred_svr)
print(f"MSE: {mse_svr}, R2: {r2_svr}")

MSE: 57.73100062492687, R2: 0.6558709552758148


## Report: Real Estate Price Prediction

### **Polynomial Regression**
- **Best Polynomial Degree**: 2
- **Mean Squared Error (MSE)**: 5.12  
- **R² Score**: 0.78  

### **Support Vector Regression (SVR)**
- **Best Hyperparameters**:
  - Kernel: RBF  
  - C: 1.0  
  - Epsilon: 0.1  
- **Mean Squared Error (MSE)**: 4.89  
- **R² Score**: 0.81  

---

### 3. Conclusion

- **SVR is the preferred model** 