# Drawing Conclusions Using Query

Q1: Do wines with higher alcoholic content receive better ratings?
To answer this question, use query to create two groups of wine samples:

    1. Low alcohol (samples with an alcohol content less than the median)
    2. High alcohol (alcohol content greater than or equal to the median)
Then, find the mean quality rating of each group.

Q2: Do sweeter wines (more residual sugar) receive better ratings?
Similarly, use the median to split the samples into two groups by residual sugar and find the mean quality rating of each group.

In [1]:
import pandas as pd

# Load 'winequality_edited.csv,' a file you previously created

df = pd.read_csv('winequality_edited.csv')
df.head()

Unnamed: 0,fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,free_sulfur_dioxide,total_sulfur_dioxide,density,pH,sulphates,alcohol,quality,color,acidity_levels
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6,white,25%
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6,white,75%
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6,white,75%
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6,white,50%
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6,white,50%


### Do wines with higher alcoholic content receive better ratings?

In [2]:
# get the median amount of alcohol content
med_alcohol = df['alcohol'].median()
med_alcohol

10.300000000000001

In [3]:
# select samples with alcohol content less than the median
low_alcohol = df.query('alcohol < {}'.format(med_alcohol))

# select samples with alcohol content greater than or equal to the median
high_alcohol = df.query('alcohol >= {}'.format(med_alcohol))

# ensure these queries included each sample exactly once
num_samples = df.shape[0]
# should be True
num_samples == low_alcohol['quality'].count() + high_alcohol['quality'].count()

True

In [4]:
# get mean quality rating for the low alcohol and high alcohol groups
print('low:',low_alcohol['quality'].mean())
print('high:',high_alcohol['quality'].mean())

low: 5.47592067989
high: 6.14608433735


### Do sweeter wines receive better ratings?

In [5]:
# get the median amount of residual sugar
med_residual_sugar = df['residual_sugar'].median()

In [6]:
# select samples with residual sugar less than the median
low_sugar = df.query('residual_sugar < {}'.format(med_residual_sugar))

# select samples with residual sugar greater than or equal to the median
high_sugar = df.query('residual_sugar >= {}'.format(med_residual_sugar))

# ensure these queries included each sample exactly once
num_samples == low_sugar['quality'].count() + high_sugar['quality'].count() # should be True

True

In [7]:
# get mean quality rating for the low sugar and high sugar groups
print('low:',low_sugar['quality'].mean())
print('high:',high_sugar['quality'].mean())

low: 5.80880074372
high: 5.82782874618
