# **Wine Dataset Analysis**

## **1. Importing Libraries and Reading the CSV Files**

In [1]:
# Import necessary libraries
import pandas as pd

# Read the red and white wine datasets
red_wine = pd.read_csv("winequality-red.csv", sep=';')
white_wine = pd.read_csv("winequality-white.csv", sep=';')

# Display the first few rows to confirm
print("Red Wine Dataset:")
display(red_wine.head())
print("White Wine Dataset:")
display(white_wine.head())

Red Wine Dataset:


Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


White Wine Dataset:


Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6


## **2. Average Ratings Comparison between Red and White Wines**

In [2]:
# Calculate average ratings for red and white wines
red_avg_quality = red_wine['quality'].mean()
white_avg_quality = white_wine['quality'].mean()

# Display the results
print(f"Average Quality of Red Wines: {red_avg_quality}")
print(f"Average Quality of White Wines: {white_avg_quality}")

Average Quality of Red Wines: 5.6360225140712945
Average Quality of White Wines: 5.87790935075541


**Observation:** 
- The average rating for red wines is 5.64.
- The average rating for white wines is 5.88.
- This indicates that, on average, white wines are perceived to be of higher quality than red wines in the dataset. The difference in average quality scores suggests that the characteristics or attributes of white wines may contribute to a more favorable evaluation compared to those of red wines.

## **3. Mean and Standard Deviation for Each Feature**

In [3]:
# Calculate mean and standard deviation for each feature
red_stats = red_wine.describe().T[['mean', 'std']]
white_stats = white_wine.describe().T[['mean', 'std']]

# Display the statistics
print("Red Wine Statistics (Mean & Std):")
display(red_stats)

print("White Wine Statistics (Mean & Std):")
display(white_stats)

Red Wine Statistics (Mean & Std):


Unnamed: 0,mean,std
fixed acidity,8.319637,1.741096
volatile acidity,0.527821,0.17906
citric acid,0.270976,0.194801
residual sugar,2.538806,1.409928
chlorides,0.087467,0.047065
free sulfur dioxide,15.874922,10.460157
total sulfur dioxide,46.467792,32.895324
density,0.996747,0.001887
pH,3.311113,0.154386
sulphates,0.658149,0.169507


White Wine Statistics (Mean & Std):


Unnamed: 0,mean,std
fixed acidity,6.854788,0.843868
volatile acidity,0.278241,0.100795
citric acid,0.334192,0.12102
residual sugar,6.391415,5.072058
chlorides,0.045772,0.021848
free sulfur dioxide,35.308085,17.007137
total sulfur dioxide,138.360657,42.498065
density,0.994027,0.002991
pH,3.188267,0.151001
sulphates,0.489847,0.114126


**Observation:** 
- Acidity: Red wine has higher fixed acidity (mean = 8.32) and volatile acidity (mean = 0.53), indicating a stronger acidic profile compared to white wine.
- Residual Sugar: White wine contains significantly more residual sugar (mean = 6.39), resulting in a generally sweeter taste than red wine.
- Sulfur Dioxide: White wine has higher free (mean = 35.31) and total sulfur dioxide (mean = 138.36) levels, suggesting different preservation methods between the two wine types.
- Density: The density of red wine (mean = 0.997) is higher than that of white wine (mean = 0.994), reflecting greater solids content in red wine.
- Alcohol Content: White wine has a slightly higher alcohol content (mean = 10.51) compared to red wine (mean = 10.42).
- Quality Ratings: White wine also receives higher average quality ratings (mean = 5.88) compared to red wine (mean = 5.64).

## **4. Correlation Analysis and Features with Highest Correlation with Quality**

In [4]:
# Calculate correlations
red_corr = red_wine.corr()['quality'].sort_values(ascending=False)
white_corr = white_wine.corr()['quality'].sort_values(ascending=False)

# Display the top correlated features
print("Top Correlated Features with Quality (Red Wine):")
print(red_corr)

print("\nTop Correlated Features with Quality (White Wine):")
print(white_corr)

Top Correlated Features with Quality (Red Wine):
quality                 1.000000
alcohol                 0.476166
sulphates               0.251397
citric acid             0.226373
fixed acidity           0.124052
residual sugar          0.013732
free sulfur dioxide    -0.050656
pH                     -0.057731
chlorides              -0.128907
density                -0.174919
total sulfur dioxide   -0.185100
volatile acidity       -0.390558
Name: quality, dtype: float64

Top Correlated Features with Quality (White Wine):
quality                 1.000000
alcohol                 0.435575
pH                      0.099427
sulphates               0.053678
free sulfur dioxide     0.008158
citric acid            -0.009209
residual sugar         -0.097577
fixed acidity          -0.113663
total sulfur dioxide   -0.174737
volatile acidity       -0.194723
chlorides              -0.209934
density                -0.307123
Name: quality, dtype: float64


**Observation:**
- Red Wine:
- 1. Alcohol Content: Strongest positive correlation with quality at 0.476, suggesting higher alcohol levels improve quality.
- 2. Sulphates and Citric Acid: Moderate positive correlations (0.251 and 0.226, respectively) indicate these compounds may enhance quality.
- 3. Volatile Acidity: Notable negative correlation of -0.390, implying higher volatile acidity reduces quality.
- White Wine:  
- 1. Alcohol Content: Strong positive correlation at 0.436, reinforcing its importance for quality.
- 2. Weak Correlations: Other features show weak correlations, with pH at 0.099 being the highest.
- 3. Density and Chlorides: Both have negative correlations (-0.307 and -0.209), suggesting they detract from quality.

## **5. Count of High-Quality Wines (Quality > 7)**

In [5]:
# Count high-quality wines
high_quality_red = red_wine[red_wine['quality'] > 7].shape[0]
high_quality_white = white_wine[white_wine['quality'] > 7].shape[0]

print(f"Number of High-Quality Red Wines: {high_quality_red}")
print(f"Number of High-Quality White Wines: {high_quality_white}")

Number of High-Quality Red Wines: 18
Number of High-Quality White Wines: 180


**Observation:** 
- There are significantly more high-quality white wines (180) compared to high-quality red wines (18), indicating a strong prevalence of high-quality white wines in the dataset. This disparity suggests that white wines may be more likely to achieve higher quality ratings than red wines in this particular sample.

## **6. Percentage of High-Quality Wines**

In [6]:
# Calculate percentages of high-quality wines
total_red_wines = red_wine.shape[0]
total_white_wines = white_wine.shape[0]

high_quality_red_pct = (high_quality_red / total_red_wines) * 100
high_quality_white_pct = (high_quality_white / total_white_wines) * 100

print(f"High-Quality Red Wine Percentage: {high_quality_red_pct:.2f}%")
print(f"High-Quality White Wine Percentage: {high_quality_white_pct:.2f}%")

High-Quality Red Wine Percentage: 1.13%
High-Quality White Wine Percentage: 3.67%


**Observation:** 
- The data shows that high-quality red wine accounts for 1.13%, while high-quality white wine is at 3.67%. This indicates that high-quality white wine is produced at a significantly higher rate than red wine, suggesting better conditions or practices for producing high-quality white wine.

## **7. Comparative Feature Analysis of High-Quality vs Low-Quality Wines**

In [7]:
# Separate high and low-quality wines
high_quality_red = red_wine[red_wine['quality'] > 7]
low_quality_red = red_wine[red_wine['quality'] < 7]

high_quality_white = white_wine[white_wine['quality'] > 7]
low_quality_white = white_wine[white_wine['quality'] < 7]

# Calculate averages of features for comparison
red_feature_comparison = high_quality_red.mean() - low_quality_red.mean()
white_feature_comparison = high_quality_white.mean() - low_quality_white.mean()

print("Feature Comparison (Red Wines):")
print(red_feature_comparison)

print("\nFeature Comparison (White Wines):")
print(white_feature_comparison)

Feature Comparison (Red Wines):
fixed acidity            0.329836
volatile acidity        -0.123689
citric acid              0.136704
residual sugar           0.065658
chlorides               -0.020836
free sulfur dioxide     -2.894436
total sulfur dioxide   -14.841373
density                 -0.001647
pH                      -0.047394
sulphates                0.123024
alcohol                  1.843407
quality                  2.591172
dtype: float64

Feature Comparison (White Wines):
fixed acidity           -0.212261
volatile acidity        -0.003829
citric acid             -0.008272
residual sugar          -1.075145
chlorides               -0.009864
free sulfur dioxide      1.110451
total sulfur dioxide   -16.099600
density                 -0.002259
pH                       0.040320
sulphates               -0.001337
alcohol                  1.385896
quality                  2.507976
dtype: float64


**Observation:**  

__Red Wines__
- 1. Alcohol: The highest positive correlation (1.84) with quality indicates that higher alcohol content tends to improve quality.
- 2. Quality: Strongly influenced by various features (2.59), suggesting a complex relationship with other factors.
- 3. Fixed Acidity: Moderate positive correlation (0.33) implies it may enhance quality.
- 4. Volatile Acidity: Weak negative correlation (-0.12) suggests higher levels reduce quality.
- 5. Sulfur Dioxide: Very strong negative correlations (-2.89 for free and -14.84 for total) indicate these negatively impact quality.

__White Wines__
- 1. Quality: Also positively correlated (2.51), showing similar trends with red wines.
- 2. Free Sulfur Dioxide: Positive correlation (1.11) indicates it may enhance quality, unlike in red wines.
- 3. Alcohol: Positive correlation (1.39) with quality, but weaker than in red wines.
- 4. Residual Sugar: Strong negative correlation (-1.08) suggests higher levels lower quality.

__Summary__
- Both wine types show that alcohol positively impacts quality. However, free sulfur dioxide and residual sugar behave differently between red and white wines, highlighting the need for individual analysis of each type.

## **8. Features with the Highest Percentage Change**

In [8]:
# Calculate percentage change between high and low-quality wines
red_pct_change = (red_feature_comparison / low_quality_red.mean()) * 100
white_pct_change = (white_feature_comparison / low_quality_white.mean()) * 100

# Identify features with the highest percentage change
top_red_change = red_pct_change.sort_values(ascending=False)
top_white_change = white_pct_change.sort_values(ascending=False)

print("Top Percentage Change Features (Red Wines):")
print(top_red_change)

print("\nTop Percentage Change Features (White Wines):")
print(top_white_change)

Top Percentage Change Features (Red Wines):
citric acid             53.734621
quality                 47.906355
sulphates               19.080735
alcohol                 17.982642
fixed acidity            4.004404
residual sugar           2.613635
density                 -0.165213
pH                      -1.429857
free sulfur dioxide    -17.897589
volatile acidity       -22.611339
chlorides              -23.337962
total sulfur dioxide   -30.736506
dtype: float64

Top Percentage Change Features (White Wines):
quality                 45.435974
alcohol                 13.500893
free sulfur dioxide      3.126505
pH                       1.267583
density                 -0.227169
sulphates               -0.274532
volatile acidity        -1.358933
citric acid             -2.458574
fixed acidity           -3.080442
total sulfur dioxide   -11.339110
residual sugar         -16.038614
chlorides              -20.602883
dtype: float64


**Observation:**  

__Red Wines__
- 1. Quality as a Significant Factor: The feature "quality" shows a substantial increase of 47.91%, indicating that improvements in wine quality are highly correlated with the overall performance of red wines. This suggests that quality may have a strong impact on consumer perception and market value.
- 2. Citric Acid Leading the Charge: With a percentage change of 53.73%, "citric acid" has the highest positive change among all features, indicating its potential importance in enhancing the flavor profile of red wines. This suggests that higher citric acid levels could lead to a more desirable taste.
- 3. Sulphates and Alcohol Contributions: "Sulphates" and "alcohol" also show significant positive changes (19.08% and 17.98%, respectively), highlighting their roles in enhancing the sensory characteristics of red wines.
- 4. Negative Trends in Certain Features: Features such as "total sulfur dioxide" and "chlorides" show notable declines of -30.74% and -23.34%, respectively. This may indicate a trend towards lower levels of these compounds, potentially reflecting consumer preferences for cleaner, less manipulated wines.

__White Wines__
- 1. Quality Again Stands Out: Similar to red wines, "quality" is the top feature for white wines, with a percentage change of 45.44%. This reinforces the idea that quality is a critical determinant in the wine industry, affecting both production and consumer choice.
- 2. Alcohol's Influence: "Alcohol" has a positive change of 13.50%, which may suggest that moderate alcohol levels contribute positively to the overall quality and enjoyment of white wines.
- 3. Free Sulfur Dioxide and pH Stability: While "free sulfur dioxide" shows a slight increase of 3.13%, the overall changes for features like "density" and "pH" are minimal, suggesting that these characteristics remain relatively stable in the production of white wines.
- 4. Negative Changes in Volatile Acidity and Chlorides: Similar to red wines, features such as "volatile acidity" and "chlorides" have significant negative changes (-1.36% and -20.60%, respectively). This could indicate a move toward wines with less acidity and salinity, aligning with current consumer trends for fresher and less harsh flavors.

__Summary__
- Overall, the analysis highlights that quality is paramount in both red and white wines, with citric acid being particularly important for red wines. The decrease in certain negative attributes suggests an industry shift towards producing wines that meet evolving consumer preferences for quality and taste