<a href="https://colab.research.google.com/github/cagBRT/Machine-Learning/blob/master/WineQuality.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
# Clone the entire repo.
!git clone -l -s https://github.com/cagBRT/Machine-Learning.git cloned-repo
%cd cloned-repo
!ls

# **Can the quality of a wine be predicted from its measureable characteristics?**

**Fixed acidity**: acids are major wine properties and contribute greatly to the wine’s taste. Usually, the total acidity is divided into two groups: the volatile acids and the nonvolatile or fixed acids. Among the fixed acids that you can find in wines are the following: tartaric, malic, citric, and succinic. This variable is expressed in g(tartaricacid)/dm3 in the data sets.<br>
**Volatile acidity**: the volatile acidity is basically the process of wine turning into vinegar. In the U.S, the legal limits of Volatile Acidity are 1.2 g/L for red table wine and 1.1 g/L for white table wine. In these data sets, the volatile acidity is expressed in g(aceticacid)/dm3.<br>
**Citric acid** is one of the fixed acids that you’ll find in wines. It’s expressed in g/dm3 in the two data sets.<br>
**Residual sugar **typically refers to the sugar remaining after fermentation stops, or is stopped. It’s expressed in g/dm3 in the red and white data.<br>
**Chlorides** can be a significant contributor to saltiness in wine. Here, you’ll see that it’s expressed in g(sodiumchloride)/dm3.
**Free sulfur dioxide**: the part of the sulfur dioxide that is added to a wine and that is lost into it is said to be bound, while the active part is said to be free. The winemaker will always try to get the highest proportion of free sulfur to bind. This variable is expressed in mg/dm3 in the data.<br>
**Total sulfur dioxide** is the sum of the bound and the free sulfur dioxide (SO2). Here, it’s expressed in mg/dm3. There are legal limits for sulfur levels in wines: in the EU, red wines can only have 160mg/L, while white and rose wines can have about 210mg/L. Sweet wines are allowed to have 400mg/L. For the US, the legal limits are set at 350mg/L, and for Australia, this is 250mg/L.<br>
**Density** is generally used as a measure of the conversion of sugar to alcohol. Here, it’s expressed in g/cm3.<br>
**pH** or the potential of hydrogen is a numeric scale to specify the acidity or basicity the wine. As you might know, solutions with a pH less than 7 are acidic, while solutions with a pH greater than 7 are basic. With a pH of 7, pure water is neutral. Most wines have a pH between 2.9 and 3.9 and are therefore acidic.<br>
**Sulfate**s are to wine as gluten is to food. You might already know sulfites from the headaches that they can cause. They are a regular part of the winemaking around the world and are considered necessary. In this case, they are expressed in g(potassiumsulphate)/dm3.<br>
**Alcohol**: wine is an alcoholic beverage, and as you know, the percentage of alcohol can vary from wine to wine. It shouldn’t be surprised that this variable is included in the data sets, where it’s expressed in % vol.<br>
**Quality**: wine experts graded the wine quality between 0 (very bad) and 10 (very excellent). The eventual number is the median of at least three evaluations made by those same wine experts.<br>

In [0]:
from __future__ import absolute_import, division, print_function, unicode_literals

# Install TensorFlow
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass

import tensorflow as tf
import pathlib

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np

from keras.utils import to_categorical
from tensorflow import keras
from tensorflow.keras import layers

print(tf.__version__)

In [0]:
# Read in white wine data 
#white = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv", sep=';')
white = pd.read_csv("winequality-white.csv", sep=';')

# Read in red wine data 
red = pd.read_csv("winequality-red.csv", sep=';')

In [0]:
# Print info on white wine
white.tail()

In [0]:
red.tail()

In [0]:
#combine reds and whites into one dataset
# Add `type` column to `red` with value 1
red['type'] = 1

# Add `type` column to `white` with value 0
white['type'] = 0

# Append `white` to `red`
wines = red.append(white, ignore_index=True)

In [0]:
wines

In [0]:
wines['quality'].value_counts()

In [0]:
wines.isna().sum()

In [0]:
wines_train

In [0]:
corr = wines.corr()
sns.heatmap(corr, 
            xticklabels=corr.columns.values,
            yticklabels=corr.columns.values)
plt.show()

In [0]:
wines_train = wines.sample(frac=0.8,random_state=0)
wines_test = wines.drop(wines_train.index)
print("done")

In [0]:
train_labels = wines_train.pop('quality')
test_labels = wines_test.pop('quality')

In [0]:
wines_train = wines_train.transpose()
wines_train

In [0]:
#The digits are encoded into 10 categories
train_labels = to_categorical(train_labels, 10)
test_labels = to_categorical(test_labels, 10)
print("done")

In [0]:
train_stats = wines_train.describe()
train_stats

In [0]:
def norm(x):
  return (x - train_stats['mean']) / train_stats['std']

normed_train_data = norm(wines_train)
normed_test_data = norm(wines_test)
print("done")

In [0]:
inputs = len(wines_train.keys())
print("number of inputs to the model = " + str(inputs))

def build_model():
  model = keras.Sequential([
    layers.Dense(12, activation=tf.nn.sigmoid,input_shape=([len(wines_train.keys())]),),
    layers.Dense(12, activation=tf.nn.sigmoid),
    layers.Dense(12, activation=tf.nn.sigmoid),
    #layers.Dense(10)
    layers.Dense(10, activation='softmax')
  ])

  optimizer = tf.keras.optimizers.RMSprop(0.001)

  model.compile(
    loss='categorical_crossentropy', 
    optimizer='adam', 
    metrics=['accuracy']
)
  #model.compile(loss='mean_squared_error',
  #              optimizer=optimizer,
  #              metrics=['mean_absolute_error', 'mean_squared_error'])
  return model
  print("done")

In [0]:
model = build_model()
print("done")

In [0]:
EPOCHS = 20
history = model.fit(
  normed_train_data, train_labels,
  epochs=EPOCHS, validation_split = 0.2, verbose=0)

In [0]:
hist = pd.DataFrame(history.history)
hist['epochs'] = history.epoch
hist.tail()

In [0]:
import matplotlib.pyplot as plt
axes = plt.gca()
axes.set_ylim([0,1])
plt.plot(hist['accuracy'], label='training accuracy')
plt.plot(hist['val_accuracy'], label='testing accuracy')
plt.title('Accuracy')
plt.xlabel('epochs')
plt.ylabel('accuracy')
plt.legend()

In [0]:
axes = plt.gca()
axes.set_ylim([0.5,1.5])
plt.plot(hist['loss'], label='training loss')
plt.plot(hist['val_loss'], label='testing loss')
plt.title('Loss')
plt.xlabel('epochs')
plt.ylabel('loss')
plt.legend()

In [0]:
predict_case = pd.read_csv("prediction_cases.csv", sep=';')
typew = [0,0,1,1] 
predict_case['type'] = typew

In [0]:
prediction_labels = predict_case.pop('quality')

In [0]:
normed_prediction_cases = norm(predict_case)
normed_prediction_cases 

In [0]:
#Choose number 0,1,2,3
number = 3
test1 = normed_prediction_cases.loc[[number]]
test1.transpose()

In [0]:
prediction = model.predict(test1)

In [0]:
print(prediction)

In [0]:
print("prediction is ", prediction.argmax())
print("actual value is ", prediction_labels.iloc[number] )

In [0]:
loss, mae = model.evaluate(normed_test_data, test_labels, verbose=0)
print("Testing set Mean Abs Error: {:5.2f}".format(mae))