In [1]:
import pandas as pd
import numpy as np
from scipy import stats

df = pd.read_csv('Real estate valuation data set.csv')

df.columns = ['No', 'Date', 'Age', 'DistMRT', 'Stores', 'Lat', 'Lon', 'Price']

This investigates whether having a "High" number of convenience stores (7-10) results in significantly higher prices compared to a "Medium" number of stores (3-6).

In [2]:
group_med = df[(df['Stores'] >= 3) & (df['Stores'] <= 6)]['Price']
group_high = df[(df['Stores'] >= 7) & (df['Stores'] <= 10)]['Price']

t, p = stats.ttest_ind(group_high, group_med, equal_var=False)

print(t)
print(p)

4.967395256723803
1.329755471827907e-06



The resulting $p$-value ($1.33 \times 10^{-6}$) is significantly less than the $\alpha$ level of $0.05$, leading to the rejection of the null hypothesis. A statistically significant price difference exists between the "High Convenience" (7-10 stores) group and the "Medium Convenience" (3-6 stores) group.
<br>
<br>
____

This compares the prices of "Old but Convenient" houses (Age > 20 years, DistMRT < 500m) against "New but Inconvenient" houses (Age < 10 years, DistMRT > 1500m). This insight reveals which factor dominates the valuation: location or building age.

In [3]:
old_near = df[(df['Age'] > 20) & (df['DistMRT'] < 500)]['Price']
new_far = df[(df['Age'] < 10) & (df['DistMRT'] > 1500)]['Price']

t, p = stats.ttest_ind(old_near, new_far, equal_var=False)

print(old_near.mean())
print(new_far.mean())
print(t)
print(p)

42.345454545454544
28.31904761904762
9.402888277034172
2.6257515494450012e-14


The extremely low $p$-value ($\approx 2.63 \times 10^{-14}$) is far below $0.05$, confirming a highly significant difference in price between the two groups. 
<br>
<br>
____


This analyzes whether there is a significant price drop-off within the "walking distance" zone itself. We compare "Immediate Proximity" (< 200m) vs. "Short Walk" (200m - 500m). This tests if the premium decays rapidly even within the first few hundred meters.

In [4]:
zone_immediate = df[df['DistMRT'] <= 200]['Price']
zone_short = df[(df['DistMRT'] > 200) & (df['DistMRT'] <= 500)]['Price']

t, p = stats.ttest_ind(zone_immediate, zone_short, equal_var=False)

print(t)
print(p)

2.4314071759323945
0.01602553990303957


The $p$-value ($\approx 0.016$) is less than $0.05$, leading to the rejection of the null hypothesis. This indicates a statistically significant price difference exists between properties in "Immediate Proximity" (< 200m) and those in the "Short Walk" (200m - 500m) zone.