**Context**

The market historical data set of real estate valuation are collected from Sindian Dist., New Taipei City, Taiwan. The real estate valuation is a regression problem.

**Data info**

The inputs are as follows:

house_age = the house age (unit: year)

dist_to_nearest_MRT = the distance to the nearest MRT station (unit: meter)

no_stores = the number of convenience stores in the living circle on foot (integer)

latitude = the geographic coordinate, latitude. (unit: degree)

longitude = the geographic coordinate, longitude. (unit: degree)

The output is as follows:

house_price = house price of unit area (10000 New Taiwan Dollar/Ping, where Ping is a local unit, 1 Ping = 3.3 meter squared)

**Exercise**:  

Look at the relationship between house price per area and the number of nearby convenience stores using statsmodels (OLS)

a. Find the intercept coefficient of the model. What does this mean?

b. Find the n_stores coefficient of the model. What does this mean?

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("taiwan_real_estate - taiwan_real_estate.csv")

In [3]:
df

Unnamed: 0,No,house_age,dist_to_nearest_MRT,no_stores,latitude,longitude,house_price
0,1,32.0,84.87882,10,24.98298,121.54024,37.9
1,2,19.5,306.59470,9,24.98034,121.53951,42.2
2,3,13.3,561.98450,5,24.98746,121.54391,47.3
3,4,13.3,561.98450,5,24.98746,121.54391,54.8
4,5,5.0,390.56840,5,24.97937,121.54245,43.1
...,...,...,...,...,...,...,...
409,410,13.7,4082.01500,0,24.94155,121.50381,15.4
410,411,5.6,90.45606,9,24.97433,121.54310,50.0
411,412,18.8,390.96960,7,24.97923,121.53986,40.6
412,413,8.1,104.81010,5,24.96674,121.54067,52.5


In [4]:
df.isnull().sum()

No                     0
house_age              0
dist_to_nearest_MRT    0
no_stores              0
latitude               0
longitude              0
house_price            0
dtype: int64

In [17]:
import statsmodels.api as sm

In [24]:
X = sm.add_constant(df["no_stores"])
y = df["house_price"]

In [25]:
results = sm.OLS(y, X).fit()

In [26]:
results.summary()

0,1,2,3
Dep. Variable:,house_price,R-squared:,0.326
Model:,OLS,Adj. R-squared:,0.324
Method:,Least Squares,F-statistic:,199.3
Date:,"Wed, 24 Apr 2024",Prob (F-statistic):,3.4100000000000004e-37
Time:,15:06:42,Log-Likelihood:,-1586.0
No. Observations:,414,AIC:,3176.0
Df Residuals:,412,BIC:,3184.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,27.1811,0.942,28.857,0.000,25.330,29.033
no_stores,2.6377,0.187,14.118,0.000,2.270,3.005

0,1,2,3
Omnibus:,171.927,Durbin-Watson:,1.993
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1417.242
Skew:,1.553,Prob(JB):,1.78e-308
Kurtosis:,11.516,Cond. No.,8.87


In [31]:
intercept = results.params.iloc[0]
intercept

27.18110478147243

In [32]:
n_stores_coefficient = results.params.iloc[1]
n_stores_coefficient

2.637653463404371

In [None]:
# Conclusion

# intercept = 27.18110478147243 means: The base value of house price
# when all independent variables is set to zero.

# n_stores_coefficient = 2.637653463404371 means: Keep other independent variables remain,
# each unit change in no_stores will result in 2.637653463404371 unit change in house price.