In [33]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, SGDRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error

# 1. Load Dataset
# The dataset contains two features: 'size' and 'bedroom'
# and one target: 'price'

df = pd.read_csv('house_price.csv')
X = df[['size', 'bedroom']]
y = df['price']

# 2. Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# 3. Linear Regression
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)
y_pred_lr = lr_model.predict(X_test)

# Coefficients and Intercept
print("Linear Regression Coefficients:")
print(pd.DataFrame(lr_model.coef_, X.columns, columns=['Coefficient']))

# Evaluation
mae_lr = mean_absolute_error(y_test, y_pred_lr)
mse_lr = mean_squared_error(y_test, y_pred_lr)
rmse_lr = np.sqrt(mse_lr)
mape_lr = mean_absolute_percentage_error(y_test, y_pred_lr)

print("\nLinear Regression Evaluation:")
print(f"MAE: {mae_lr:.2f}")
print(f"MSE: {mse_lr:.2f}")
print(f"RMSE: {rmse_lr:.2f}")
print(f"MAPE: {mape_lr:.4f}")

# 4. SGD Regressor
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

sgd_model = SGDRegressor(max_iter=1000)
sgd_model.fit(X_train_scaled, y_train)
y_pred_sgd = sgd_model.predict(X_test_scaled)

# Coefficients and Intercept
print("\nSGD Regressor Coefficients:")
print(pd.DataFrame(sgd_model.coef_, X.columns, columns=['Coefficient']))

# Evaluation
mae_sgd = mean_absolute_error(y_test, y_pred_sgd)
mse_sgd = mean_squared_error(y_test, y_pred_sgd)
rmse_sgd = np.sqrt(mse_sgd)
mape_sgd = mean_absolute_percentage_error(y_test, y_pred_sgd)

print("\nSGD Regressor Evaluation:")
print(f"MAE: {mae_sgd:.2f}")
print(f"MSE: {mse_sgd:.2f}")
print(f"RMSE: {rmse_sgd:.2f}")
print(f"MAPE: {mape_sgd:.4f}")

Linear Regression Coefficients:
          Coefficient
size       143.218532
bedroom -13512.564426

Linear Regression Evaluation:
MAE: 72334.75
MSE: 8610424544.78
RMSE: 92792.37
MAPE: 0.1746

SGD Regressor Coefficients:
           Coefficient
size     106862.822926
bedroom  -10601.687945

SGD Regressor Evaluation:
MAE: 72224.07
MSE: 8600943393.05
RMSE: 92741.27
MAPE: 0.1743


### **Model Evaluation Metrics**

We evaluated each model using the following standard metrics:

* **MAE (Mean Absolute Error)**: Tells us how much, on average, the predictions are off from the actual values.
* **MSE (Mean Squared Error)**: Similar to MAE, but squares the errors, so big mistakes hurt more.
* **RMSE (Root Mean Squared Error)**: Just the square root of MSE, so it’s in the same unit as the actual values.
* **MAPE (Mean Absolute Percentage Error)**: Shows the error as a percentage of the actual value.

---

### **Results Summary**

| Metric   | Linear Regression | SGD Regressor    |
| -------- | ----------------- | ---------------- |
| **MAE**  | 72,334.75         | 72,224.07        |
| **MSE**  | 8,610,424,544.78  | 8,600,943,393.05 |
| **RMSE** | 92,792.37         | 92,741.27        |
| **MAPE** | 17.46%            | 17.43%           |

---

### **Model Coefficients**

| Feature   | Linear Regression Coef. | SGD Regressor Coef. |
| --------- | ----------------------- | ------------------- |
| `size`    | 143.22                  | 106,862.82          |
| `bedroom` | -13,512.56              | -10,601.69           |


---

### **Interpretation of Metrics**

* Both models gave very close results.
* Linear Regression was slightly more accurate.
* SGD Regressor is just as good and works better with large or live (streaming) data.
* RMSE vs MAE: RMSE is more affected by big errors, MAE is more balanced.
* MAPE helps us understand performance as a percentage, around 17.4% off on average.

---

### **Conclusion**

* **Linear Regression** performed marginally better overall, with slightly lower MAE, MSE, and RMSE.
* **SGD Regressor**, although very close in accuracy, is useful in larger-scale.
* **RMSE vs MAE**: RMSE highlights large errors more heavily, while MAE is better when you want stable average error.
* **MAPE** is helpful because it shows the error as a percentage, making it easy to understand.