<a href="https://colab.research.google.com/github/cheng1610/houseprice-pradict/blob/main/chicago-houseprice-pradict.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 芝加哥房價預測
訓練資料來源: https://www.kaggle.com/datasets/tawfikelmetwally/chicago-house-price/data
- Price(房價): price of house
- Bedroom(臥室數): number of bedrooms
- Room(房間數量): number of rooms
- Space(房子大小): size of house (in square - feet)
- Lot(地段寬度): width of a lot
- Tax(年度稅額): amount of annual tax
- Bathroom(浴室數量): number of bathrooms
- Garage(車庫數量): number of garage
- Condition(房子評價): condition of house (1 if good , 0 otherwise)

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np

data = pd.read_csv("realest.csv")

print(data.head())
print(data.isnull().sum())

   Price  Bedroom   Space  Room   Lot     Tax  Bathroom  Garage  Condition
0   53.0      2.0   967.0   5.0  39.0   652.0       1.5     0.0        0.0
1   55.0      2.0   815.0   5.0  33.0  1000.0       1.0     2.0        1.0
2   56.0      3.0   900.0   5.0  35.0   897.0       1.5     1.0        0.0
3   58.0      3.0  1007.0   6.0  24.0   964.0       1.5     2.0        0.0
4   64.0      3.0  1100.0   7.0  50.0  1099.0       1.5     1.5        0.0
Price         1
Bedroom       1
Space        11
Room          1
Lot          11
Tax          10
Bathroom      1
Garage        1
Condition     1
dtype: int64


In [3]:
data = data.dropna()

print(data.isnull().sum())
print(data.head())


Price        0
Bedroom      0
Space        0
Room         0
Lot          0
Tax          0
Bathroom     0
Garage       0
Condition    0
dtype: int64
   Price  Bedroom   Space  Room   Lot     Tax  Bathroom  Garage  Condition
0   53.0      2.0   967.0   5.0  39.0   652.0       1.5     0.0        0.0
1   55.0      2.0   815.0   5.0  33.0  1000.0       1.0     2.0        1.0
2   56.0      3.0   900.0   5.0  35.0   897.0       1.5     1.0        0.0
3   58.0      3.0  1007.0   6.0  24.0   964.0       1.5     2.0        0.0
4   64.0      3.0  1100.0   7.0  50.0  1099.0       1.5     1.5        0.0


In [4]:
x = data[['Bedroom', 'Space', 'Room', 'Lot', 'Tax', 'Bathroom', 'Garage']]
y = data['Price']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

print(f"訓練集大小: {x_train.shape}")
print(f"測試集大小: {x_test.shape}")

訓練集大小: (102, 7)
測試集大小: (26, 7)


In [5]:
features = ['Bedroom', 'Space', 'Room', 'Lot', 'Tax', 'Bathroom', 'Garage']

model = LinearRegression()

for feature in features:
    isNaN = data[f'{feature}'].isnull().sum()
    # print(isNaN)
    assert not isNaN, f"{feature}欄位有空值(NaN)"

model.fit(x_train, y_train)

for feature, coef in zip(features, model.coef_):
    print(f"{feature}係數: {coef:.1f}")

print(f"b截距: {model.intercept_:.1f}")

Bedroom係數: -2.6
Space係數: 0.0
Room係數: 0.6
Lot係數: 0.3
Tax係數: 0.0
Bathroom係數: 6.8
Garage係數: 5.4
b截距: 22.9


In [6]:
y_pred = model.predict(x_test)

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"MSE(均方誤差): {mse:.2f}")
print(f"RMSE(均方根誤差): {mse**0.5:.2f}")
print(f"R-squared(決定係數): {r2:.3f}")

MSE(均方誤差): 58.51
RMSE(均方根誤差): 7.65
R-squared(決定係數): 0.695


In [7]:
df = pd.DataFrame(
    {'實際房價': y_test, '預測房價': y_pred}
)
df['預測房價'] = df['預測房價'].map(lambda x: f"{x:.1f}")

print(df)

     實際房價  預測房價
63   45.0  40.2
45   35.0  45.4
23   46.0  56.3
36   88.0  82.8
113  81.0  75.6
64   47.0  41.7
81   55.0  60.1
119  63.0  47.0
94   62.0  55.6
30   62.0  66.7
108  63.0  67.2
31   46.0  57.0
74   43.0  43.6
4    64.0  66.3
110  49.0  55.5
133  55.0  60.0
41   61.0  47.6
93   59.0  45.7
153  43.0  55.2
98   36.0  42.2
22   46.0  45.7
12   47.0  43.5
140  88.0  85.5
13   49.0  48.5
120  65.0  57.8
51   66.0  60.4
