## Red and White Wine Classification ##
https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009 <br>
https://www.kaggle.com/piyushagni5/white-wine-quality?select=winequality.names

Parameters:
 - Fixed Acidity. Most acids involved with wine or fixed or nonvolatile (do not evaporate readily).
 - Volatile Acidity. The amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste.
 - Citric Acid. Found in small quantities, citric acid can add 'freshness' and flavor to wines.
 - Residual Sugar. The amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet.
 - Chlorides. The amount of salt in the wine.
 - Free Sulfur Dioxide. The free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine.
 - Total Sulfur Dioxide. Amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine.
 - Density. The density of water is close to that of water depending on the percent alcohol and sugar content.
 - pH. Describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale.
 - Sulphates. A wine additive which can contribute to sulfur dioxide gas (S02) levels, which acts as an antimicrobial and antioxidant.
 - Alcohol. The percent alcohol content of the wine.
 - Quality. Output variable (based on sensory data, score between 0 and 10).

Importing 'pandas', 'pandas_profiling' and 'numpy' to process the dataset.

In [1]:
import pandas as pd
import pandas_profiling
import numpy as np

Importing 'matplotlib', 'scipy.stats' and 'seaborn' to visualize the data.

In [2]:
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import norm

In [3]:
red_wine_data = pd.read_csv("winequality-red.csv")
white_wine_data = pd.read_csv("winequality-white.csv")

### Correlation ###
Let us find the overall correlation between the different parameters in order to get a better idea about the relationship of similar parameters (such as 'citric acid' and 'fixed acidity') and to confirm our suspicions.

In [4]:
def CorrelationTable(data, title):
    # Compute the correlation matrix:
    pandas_correlation = data.corr()

    # Generating a mask for the upper triangle for a cleaner table:
    mask = np.triu(np.ones_like(pandas_correlation, dtype=bool))

    # Set up the matplotlib figure:
    f, ax = plt.subplots(figsize=(11, 9))

    # Generate a custom diverging colormap:
    cmap = sns.diverging_palette(20, 230, as_cmap=True)

    # Draw the heatmap with the mask and correct aspect ratio:
    sns.heatmap(pandas_correlation, mask=mask, cmap=cmap, vmax=.7, center=0, square=True, linewidths=.5, cbar_kws={"shrink": .5}, annot=True)

    # Title:
    plt.title(title)

    plt.show()

In [5]:
CorrelationTable(white_wine_data, 'White Wine Parameters Correletion')

  plt.show()


In [6]:
CorrelationTable(red_wine_data, 'Red Wine Parameters Correletion')

  plt.show()


Lets use the Pandas Profile Report for a fast analyses:

In [8]:
red_profile = red_wine_data.profile_report(title="Red Wine Report")
red_profile.to_file(output_file="Red Wine Report.html")

Summarize dataset: 100%|██████████| 170/170 [00:15<00:00, 10.83it/s, Completed]                                        
Generate report structure: 100%|██████████| 1/1 [00:03<00:00,  3.02s/it]
Render HTML: 100%|██████████| 1/1 [00:02<00:00,  2.80s/it]
Export report to file: 100%|██████████| 1/1 [00:00<00:00, 45.45it/s]


In [9]:
white_profile = white_wine_data.profile_report(title="White Wine Report")
white_profile.to_file(output_file="White Wine Report.html")

Summarize dataset: 100%|██████████| 170/170 [00:12<00:00, 13.82it/s, Completed]                                        
Generate report structure: 100%|██████████| 1/1 [00:02<00:00,  2.90s/it]
Render HTML: 100%|██████████| 1/1 [00:02<00:00,  2.43s/it]
Export report to file: 100%|██████████| 1/1 [00:00<00:00, 47.61it/s]


### Lets Merge The Two ###

TODO: Use "data = data.drop(data.index[range(5)])" to make white whine's sample size similar to teh red's. (randomize it first)

In [7]:
red_wine_data['type'] = 'red'
white_wine_data['type'] = 'white'

wine_data = pd.concat([red_wine_data, white_wine_data], ignore_index=True).sample(frac=1)
wine_data.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,type
4092,7.6,0.31,0.23,12.7,0.054,20.0,139.0,0.99836,3.16,0.5,9.7,4,white
3503,7.9,0.26,0.33,10.3,0.039,73.0,212.0,0.9969,2.93,0.49,9.5,6,white
4695,6.8,0.13,0.39,1.4,0.034,19.0,102.0,0.99121,3.23,0.6,11.3,7,white
5504,6.5,0.33,0.3,3.8,0.036,34.0,88.0,0.99028,3.25,0.63,12.5,7,white
3339,6.6,0.37,0.47,6.5,0.061,23.0,150.0,0.9954,3.14,0.45,9.6,6,white
