# Drawing Conclusions Using Query
In the notebook below, you're going to investigate two questions about this data using pandas' query function. Here are tips for answering each question:

* ##  Q1: Do wines with higher alcoholic content receive better ratings?
To answer this question, use query to create two groups of wine samples:

Low alcohol (samples with an alcohol content less than the median)
High alcohol (samples with an alcohol content greater than or equal to the median)
Then, find the mean quality rating of each group.

* ## Q2: Do sweeter wines (more residual sugar) receive better ratings?
Similarly, use the median to split the samples into two groups by residual sugar and find the mean quality rating of each group.

# Drawing Conclusions Using Query

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Load 'winequality_edited.csv,' a file you previously created
# in this workspace and worked with in the concepts
# "Appending Data(cont.)" and "Exploring with Visuals"

df = pd.read_csv("winequality_edited.csv")
df

Unnamed: 0,fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,free_sulfur_dioxide,total_sulfur_dioxide,density,pH,sulphates,alcohol,quality,color,total_sulfur_dioxide.1,acidity_levels
0,7.4,0.70,0.00,1.9,0.076,11.0,34.0,0.99780,3.51,0.56,9.4,5,red,,75
1,7.8,0.88,0.00,2.6,0.098,25.0,67.0,0.99680,3.20,0.68,9.8,5,red,,25%
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.99700,3.26,0.65,9.8,5,red,,50%
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.99800,3.16,0.58,9.8,6,red,,25%
4,7.4,0.70,0.00,1.9,0.076,11.0,34.0,0.99780,3.51,0.56,9.4,5,red,,75
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6492,6.2,0.21,0.29,1.6,0.039,24.0,,0.99114,3.27,0.50,11.2,6,white,92.0,50%
6493,6.6,0.32,0.36,8.0,0.047,57.0,,0.99490,3.15,0.46,9.6,5,white,168.0,25%
6494,6.5,0.24,0.19,1.2,0.041,30.0,,0.99254,2.99,0.46,9.4,6,white,111.0,min
6495,5.5,0.29,0.30,1.1,0.022,20.0,,0.98869,3.34,0.38,12.8,7,white,110.0,75


### Do wines with higher alcoholic content receive better ratings?

In [5]:
# get the median amount of alcohol content
df.median()

fixed_acidity               7.00000
volatile_acidity            0.29000
citric_acid                 0.31000
residual_sugar              3.00000
chlorides                   0.04700
free_sulfur_dioxide        29.00000
total_sulfur_dioxide       38.00000
density                     0.99489
pH                          3.21000
sulphates                   0.51000
alcohol                    10.30000
quality                     6.00000
total_sulfur_dioxide.1    134.00000
dtype: float64

In [15]:
# select samples with alcohol content less than the median
low_alcohol =df.query('alcohol < 10.3')

# select samples with alcohol content greater than or equal to the median
high_alcohol =df.query('alcohol >= 10.3')

# ensure these queries included each sample exactly once
num_samples = df.shape[0]
num_samples == low_alcohol['quality'].count() + high_alcohol['quality'].count() # should be True

True

In [16]:
# get mean quality rating for the low alcohol and high alcohol groups
low_alcohol.mean()


fixed_acidity               7.299496
volatile_acidity            0.355777
citric_acid                 0.315244
residual_sugar              6.955461
chlorides                   0.064714
free_sulfur_dioxide        33.355052
total_sulfur_dioxide       53.912114
density                     0.996455
pH                          3.201350
sulphates                   0.530850
alcohol                     9.485463
quality                     5.475921
total_sulfur_dioxide.1    157.078801
dtype: float64

In [17]:
high_alcohol.mean()

fixed_acidity               7.134744
volatile_acidity            0.324248
citric_acid                 0.321877
residual_sugar              3.996145
chlorides                   0.047727
free_sulfur_dioxide        27.817470
total_sulfur_dioxide       38.187583
density                     0.993014
pH                          3.234913
sulphates                   0.531669
alcohol                    11.454793
quality                     6.146084
total_sulfur_dioxide.1    121.307647
dtype: float64

In [20]:
low_alcohol.quality.mean() , high_alcohol.quality.mean()

(5.475920679886686, 6.146084337349397)

### Do sweeter wines receive better ratings?

In [21]:
# get the median amount of residual sugar
df.residual_sugar.median()

3.0

In [24]:
# select samples with residual sugar less than the median
low_sugar =df.query('residual_sugar < 3.0')

# select samples with residual sugar greater than or equal to the median
high_sugar = df.query('residual_sugar >= 3.0')

# ensure these queries included each sample exactly once
num_samples == low_sugar['quality'].count() + high_sugar['quality'].count() # should be True

True

In [25]:
# get mean quality rating for the low sugar and high sugar groups
low_sugar.quality.mean() , high_sugar.quality.mean()

(5.808800743724822, 5.82782874617737)