购房决策
- 房价筛选：总价、所属区域、房型
- 横向对比：AHP层次分析（综合评分模型）

AHP层次分析
- 构建层次结构：明确决策目标、拆分决策因素、可选方案
- 构造决策因素判断矩阵：根据相对重要性1~9，对各因素打分
- 构造方案判断矩阵：各个方案，在不同决策因素上的表现
- 计算决策因素权重：得到每个方案的总分

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set()
plt.rcParams['font.sans-serif'] = ['Arial Unicode MS']

In [2]:
data = pd.read_csv('Shanghai_cleaned.csv')
data.head()

Unnamed: 0,链家编号,总价,均价,房型,面积,小区名称,所在区域,纬度,经度
0,107104000000.0,360.0,39259,3室1厅,91.7,上浦小区,浦东 川沙 外环外,31.197312,121.695243
1,107104000000.0,725.0,31859,4室2厅,227.57,绿地东岸涟城,浦东 临港新城 外环外,30.889207,121.923456
2,107104000000.0,155.0,15611,2室2厅,99.29,绿地金卫新家园（东区）,金山 金山 外环外,30.750241,121.3203
3,107104000000.0,206.0,25333,2室2厅,81.32,丹桂佳苑,浦东 惠南 外环外,31.032386,121.768496
4,107104000000.0,310.0,40302,2室1厅,76.92,虹浦新城南区,闵行 闵浦 外环外,31.051948,121.542567


In [3]:
# 2.5.1 构建层次结构

In [4]:
# 可选方案
data['地段'] = data['所在区域'].str.split('\xa0',expand=True)[0]
choices = data[(data['总价']<350)&(data['房型']=='2室1厅')
               &(data['地段'].isin(['静安','徐汇','长宁','黄浦']))]

In [5]:
choices

Unnamed: 0,链家编号,总价,均价,房型,面积,小区名称,所在区域,纬度,经度,地段
735,107104000000.0,333.0,71644,2室1厅,46.48,仙乐小区,长宁 仙霞 内环至中环,31.211407,121.400562,长宁
864,107104000000.0,275.0,56330,2室1厅,48.82,驰骋新苑,静安 彭浦,31.325544,121.46902,静安
1310,107104000000.0,325.0,64395,2室1厅,50.47,馨虹小区,长宁 北新泾 中环至外环,31.212895,121.369958,长宁
1405,107104000000.0,327.0,60951,2室1厅,53.65,保德路425弄,静安 彭浦 中环至外环,31.321523,121.465496,静安
1409,107104000000.0,327.0,56929,2室1厅,57.44,阳曲路650弄,静安 彭浦 中环至外环,31.322481,121.466688,静安
1624,107104000000.0,297.0,54099,2室1厅,54.9,保德公寓,静安 彭浦 中环至外环,31.322477,121.44999,静安
2248,107104000000.0,333.0,64837,2室1厅,51.36,长桥二村,徐汇 长桥 中环至外环,31.142221,121.447552,徐汇
2597,107104000000.0,327.0,59358,2室1厅,55.09,保德公寓,静安 彭浦 中环至外环,31.322477,121.44999,静安
2602,107104000000.0,316.0,55468,2室1厅,56.97,共康四村,静安 彭浦 中环至外环,31.324544,121.441493,静安
2615,107104000000.0,327.0,59358,2室1厅,55.09,保德公寓,静安 彭浦 中环至外环,31.322477,121.44999,静安


In [6]:
choices.shape

(14, 10)

In [7]:
# 决策因素
cols = ['链家编号','总价','均价','面积']
choices[cols]

Unnamed: 0,链家编号,总价,均价,面积
735,107104000000.0,333.0,71644,46.48
864,107104000000.0,275.0,56330,48.82
1310,107104000000.0,325.0,64395,50.47
1405,107104000000.0,327.0,60951,53.65
1409,107104000000.0,327.0,56929,57.44
1624,107104000000.0,297.0,54099,54.9
2248,107104000000.0,333.0,64837,51.36
2597,107104000000.0,327.0,59358,55.09
2602,107104000000.0,316.0,55468,56.97
2615,107104000000.0,327.0,59358,55.09


In [8]:
# 2.5.2 决策因素权重：打分分值1~9
weights = pd.DataFrame(columns=cols[1:],index=cols[1:])
weights

Unnamed: 0,总价,均价,面积
总价,,,
均价,,,
面积,,,


In [9]:
for i in cols[1:]:
    weights.loc[i] = i

In [10]:
weights

Unnamed: 0,总价,均价,面积
总价,总价,总价,总价
均价,均价,均价,均价
面积,面积,面积,面积


In [11]:
weights = pd.DataFrame([[1,1/4,1/6],[4,1,1/5],[6,5,1]],
                       columns=cols[1:],index=cols[1:])

In [12]:
weights = weights.astype('float').round(2)

In [13]:
weights

Unnamed: 0,总价,均价,面积
总价,1.0,0.25,0.17
均价,4.0,1.0,0.2
面积,6.0,5.0,1.0


In [14]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit_transform(weights)

array([[0.        , 0.        , 0.        ],
       [0.6       , 0.15789474, 0.03614458],
       [1.        , 1.        , 1.        ]])

In [15]:
# 计算特征因素的权重系数
weights = scaler.fit_transform(weights)
weights = weights.mean(axis=1)
weights

array([0.        , 0.26467977, 1.        ])

In [16]:
# 2.5.3 可选方案：数据表现

In [17]:
choices[cols[1:]]

Unnamed: 0,总价,均价,面积
735,333.0,71644,46.48
864,275.0,56330,48.82
1310,325.0,64395,50.47
1405,327.0,60951,53.65
1409,327.0,56929,57.44
1624,297.0,54099,54.9
2248,333.0,64837,51.36
2597,327.0,59358,55.09
2602,316.0,55468,56.97
2615,327.0,59358,55.09


In [18]:
features = scaler.fit_transform(choices[cols[1:]])
features    # array([1.        , 0.10230395, 0.04597701])

array([[0.86567164, 1.        , 0.        ],
       [0.        , 0.12715873, 0.1825273 ],
       [0.74626866, 0.58683386, 0.31123245],
       [0.7761194 , 0.39053861, 0.55928237],
       [0.7761194 , 0.16129952, 0.8549142 ],
       [0.32835821, 0.        , 0.65678627],
       [0.86567164, 0.61202622, 0.38065523],
       [0.7761194 , 0.29974352, 0.67160686],
       [0.6119403 , 0.07802793, 0.81825273],
       [0.7761194 , 0.29974352, 0.67160686],
       [0.82089552, 0.49307495, 0.47659906],
       [0.44776119, 0.12750071, 0.5975039 ],
       [1.        , 0.20370476, 1.        ],
       [0.89552239, 0.24998575, 0.8424337 ]])

In [19]:
# 2.5.4 可选方案：最终得分

In [20]:
scores = (features * weights).sum(axis=1)
scores

array([0.26467977, 0.21618365, 0.4665555 , 0.66265004, 0.89760692,
       0.65678627, 0.54264619, 0.75094291, 0.83890514, 0.75094291,
       0.60710603, 0.63125076, 1.05391653, 0.90859987])

In [21]:
scores.max()

1.0539165291605448

In [23]:
scores.tolist()

[0.2646797717184528,
 0.2161836459597481,
 0.4665555002876096,
 0.6626500427576525,
 0.8976069155170348,
 0.6567862714508581,
 0.5426461859532914,
 0.7509429098412479,
 0.8389051443287866,
 0.7509429098412479,
 0.6071060292025899,
 0.6312507596221896,
 1.0539165291605448,
 0.9085998688359043]

In [25]:
choices.shape

(14, 11)

In [30]:
choices.loc[:,'score'] = scores.tolist()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


In [28]:
choices.sort_values('score')

Unnamed: 0,链家编号,总价,均价,房型,面积,小区名称,所在区域,纬度,经度,地段,score
864,107104000000.0,275.0,56330,2室1厅,48.82,驰骋新苑,静安 彭浦,31.325544,121.46902,静安,0.216184
735,107104000000.0,333.0,71644,2室1厅,46.48,仙乐小区,长宁 仙霞 内环至中环,31.211407,121.400562,长宁,0.26468
1310,107104000000.0,325.0,64395,2室1厅,50.47,馨虹小区,长宁 北新泾 中环至外环,31.212895,121.369958,长宁,0.466556
2248,107104000000.0,333.0,64837,2室1厅,51.36,长桥二村,徐汇 长桥 中环至外环,31.142221,121.447552,徐汇,0.542646
2697,107104000000.0,330.0,62750,2室1厅,52.59,长桥四村,徐汇 长桥 中环至外环,31.141225,121.444537,徐汇,0.607106
2724,107104000000.0,305.0,56336,2室1厅,54.14,虹桥机场新村,长宁 西郊 外环外,31.191168,121.361397,长宁,0.631251
1624,107104000000.0,297.0,54099,2室1厅,54.9,保德公寓,静安 彭浦 中环至外环,31.322477,121.44999,静安,0.656786
1405,107104000000.0,327.0,60951,2室1厅,53.65,保德路425弄,静安 彭浦 中环至外环,31.321523,121.465496,静安,0.66265
2597,107104000000.0,327.0,59358,2室1厅,55.09,保德公寓,静安 彭浦 中环至外环,31.322477,121.44999,静安,0.750943
2615,107104000000.0,327.0,59358,2室1厅,55.09,保德公寓,静安 彭浦 中环至外环,31.322477,121.44999,静安,0.750943
