# Analyzing Wine Quality Scores

### By: Diane Witt, Jeni Lamoureux, and Allison Palka

![logo.png](attachment:logo.png)

# Our Dataset

This is our Analysis and Exploration of Vinho Verde wine qualities. 

"Vinho Verde" is Portuguese for "green wine" and refers to a region in the lush, green, rolling hills of Northern Portugal. This region starts just below the Portuguese-Spanish border, and extends all the way to the Atlantic Ocean. 

Our analysis comprised ot testing multiple Machine Learning classifier and regression models using both the red and white wine quality datasets to determine the most accurate model for predicting the quality of Vinho Verde wines for use in a consumer buying guide. 

Relevant Information:

The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine.
For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009].Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables
are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).


We found this dataset to be extremely organized and clean--which allowed us to easily analyze data and explore Machine learning preprocessing.

The following data and exploration is specific to the White wine dataset.

### Citations

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
Modeling wine preferences by data mining from physicochemical properties.
In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016
[Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf
[bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib

https://data.world/food/wine-quality/workspace/project-summary?agentid=food&datasetid=wine-quality

https://waterhouse.ucdavis.edu/whats-in-wine

https://vinepair.com/wine-blog/7-things-you-need-to-know-about-vinho-verde/

## Attribute Info

**Input variables (based on physicochemical tests):**

1. fixed acidity
2. volatile acidity
3. citric acid
4. residual sugar
5. chlorides
6. free sulfur dioxide
7. total sulfur dioxide
8. density
9. pH
10. sulphates
11. alcohol

**Output variable (based on sensory data):**

12. quality (score between 0 and 10)

## Attribute Description

Variable | Description
------------ | -------------
Fixed Acidity | Fundamental property of wine, limiting sourness
Volatile Acidity | Used as indicator of wine spoilage
Citric Acid | Primary acid in fruit, used to add acidity to wine
Residual Sugar | Leftover natural sugars from fermentation,sweetness
Chlorides | Amount of salt in wine
Free Sulfur Dioxide | Preservation ability; prevents microbial growth
Total Sulfur Dioxide| Amount of free/bound forms of SO2, undetectable
Density | Wine density close to that of water, dry(less) and sweet(more)
pH | Describes how acidic a wine is on pH scale of 0(high)-14(low)
Sulphates | Wine additive that can contribute to SO2 levels
Alcohol | Percent of alcohol content 
quality | Content in the second column

# Exploratory Data Analysis and Preprocessing #

### Loading the Data ###

![LR1.png](attachment:LR1.png)

![LR2.png](attachment:LR2.png)

![LR3.png](attachment:LR3.png)

Look at the coorelation matrix below, we saw some general correlations of the following features:
    - Alcohol vs. Density
    - Fixed Acidity vs. Density
    - Residual Sugar vs. Total Sulfur Dioxide
    - Residual Sugar vs. Density
    - Residual Sugar vs. Alcohol
    - Chlorides vs. Density
    - Chlorides vs. Sulphates
    - Quality vs. Alchohol

![Pairplot.png](attachment:Pairplot.png)

![Volatile%20Acidity.png](attachment:Volatile%20Acidity.png)

![Citric%20Acid.png](attachment:Citric%20Acid.png)

![LR4.png](attachment:LR4.png)

![LR5.png](attachment:LR5.png)

# Attribute Observations #

# Logistic Regression #

![LR6.png](attachment:LR6.png)

![LR7.png](attachment:LR7.png)

![LR8.png](attachment:LR8.png)

![LR9.png](attachment:LR9.png)

# Descision Tree Models # 

![DC1.png](attachment:DC1.png)

![DC2.png](attachment:DC2.png)

![DC3.png](attachment:DC3.png)

![DC4.png](attachment:DC4.png)

# Random Forest #

![RF1.png](attachment:RF1.png)

![RF2.png](attachment:RF2.png)

![RF3.png](attachment:RF3.png)

![RF4.png](attachment:RF4.png)

![RF5.png](attachment:RF5.png)

![RF1.png](attachment:RF1.png)

![RF2.png](attachment:RF2.png)

![RF3.png](attachment:RF3.png)

![RF4.png](attachment:RF4.png)

# KNN #

![KNN1.png](attachment:KNN1.png)

![KNN2.png](attachment:KNN2.png)

![KNN3.png](attachment:KNN3.png)

![KNN4.png](attachment:KNN4.png)

![KNN5.png](attachment:KNN5.png)

![KNN6.png](attachment:KNN6.png)

![KNN7.png](attachment:KNN7.png)

![KNN1.png](attachment:KNN1.png)

![KNN2.png](attachment:KNN2.png)

![KNN3.png](attachment:KNN3.png)

![KNN4.png](attachment:KNN4.png)

# SVM #

![SVM1.png](attachment:SVM1.png)

![SVM2.png](attachment:SVM2.png)

![SVM3.png](attachment:SVM3.png)

![SVM4.png](attachment:SVM4.png)

![SVM5.png](attachment:SVM5.png)

![SVM6.png](attachment:SVM6.png)

![SVM7.png](attachment:SVM7.png)

# Gussian Naive Bayes #

![GNB1.png](attachment:GNB1.png)

![GNB2.png](attachment:GNB2.png)

![GNB3.png](attachment:GNB3.png)

![GNB4.png](attachment:GNB4.png)

# Neural Networks #

![NN1.png](attachment:NN1.png)

![NN2.png](attachment:NN2.png)

![NN3.png](attachment:NN3.png)

![NN4.png](attachment:NN4.png)

# Multi-Layer Perceptron (Scaler) #

![MLP1.png](attachment:MLP1.png)

![MLP2.png](attachment:MLP2.png)

![MLP3.png](attachment:MLP3.png)

![MLP4.png](attachment:MLP4.png)

![MLP5.png](attachment:MLP5.png)

![MLP6.png](attachment:MLP6.png)

# Conclusion #