# 使用 Query 得出结论

在下面的notebook 中，你将使用 Pandas 的 query 函数调查有关此数据的两个问题。以下是回答每个问题的提示：
- 问题 1：酒精含量越高的葡萄酒获得的评级更高吗？

要回答这个问题，请使用 query 创建两组葡萄酒样本:

    低酒精（酒精含量低于中值的样本）
    高酒精（酒精含量高于或等于中值的样本）

然后，找到每组的平均质量评级。
- 问题 2：更甜的葡萄酒（残糖更多）获得的评级更高吗？

同样，使用中值按残糖将样本分为两组，并找出每组的平均质量评级

# Drawing Conclusions Using Query

In [1]:
# Load 'winequality_edited.csv,' a file you created in a previous section 
import pandas as pd

df = pd.read_csv('winequality_edited.csv')
df.head()

Unnamed: 0,alcohol,chlorides,citric_acid,color,density,fixed_acidity,free_sulfur_dioxide,pH,quality,residual_sugar,sulphates,total_sulfur-dioxide,total_sulfur_dioxide,volatile_acidity,acidity_levels
0,9.4,0.076,0.0,red,0.9978,7.4,11.0,3.51,5,1.9,0.56,34.0,,0.7,low
1,9.8,0.098,0.0,red,0.9968,7.8,25.0,3.2,5,2.6,0.68,67.0,,0.88,mod_high
2,9.8,0.092,0.04,red,0.997,7.8,15.0,3.26,5,2.3,0.65,54.0,,0.76,medium
3,9.8,0.075,0.56,red,0.998,11.2,17.0,3.16,6,1.9,0.58,60.0,,0.28,mod_high
4,9.4,0.076,0.0,red,0.9978,7.4,11.0,3.51,5,1.9,0.56,34.0,,0.7,low


### Do wines with higher alcoholic content receive better ratings?

In [2]:
# get the median amount of alcohol content
df.alcohol.median()

10.3

In [3]:
# select samples with alcohol content less than the median
low_alcohol = df[df.alcohol < 10.3]

# select samples with alcohol content greater than or equal to the median
high_alcohol = df[df.alcohol >= 10.3]

# ensure these queries included each sample exactly once
num_samples = df.shape[0]
num_samples == low_alcohol['quality'].count() + high_alcohol['quality'].count() # should be True

True

In [4]:
# get mean quality rating for the low alcohol and high alcohol groups
low_alcohol.quality.mean(), high_alcohol.quality.mean()

(5.475920679886686, 6.1460843373493974)

### Do sweeter wines receive better ratings?

In [5]:
# get the median amount of residual sugar
df.residual_sugar.median()

3.0

In [6]:
# select samples with residual sugar less than the median
low_sugar = df[df.residual_sugar < 3]

# select samples with residual sugar greater than or equal to the median
high_sugar = df[df.residual_sugar >= 3]

# ensure these queries included each sample exactly once
num_samples == low_sugar['quality'].count() + high_sugar['quality'].count() # should be True

True

In [7]:
# get mean quality rating for the low sugar and high sugar groups
low_sugar.quality.mean(), high_sugar.quality.mean()

(5.8088007437248219, 5.8278287461773699)