# Improving the quality of white wine with neural networks
**Team**: Aaron Szabo

**Course**: CMSC 389A Practical Deep Learning  

**Description**: This program trains a model to predict the quality of white wine based on 11 charateristics. Then using the model it predicts what changes should be made to a given wine to maximally improve its quality. 

**Imports**  
This project requires keras, numpy, pandas, and sklearn.

In [1]:
import keras
import numpy as np
import pandas as pd

from keras.layers import Dense
from keras.models import Sequential
from sklearn.model_selection import train_test_split

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


## Load Data
Data is loaded from the white wine quality csv titled "winequality-white.csv" from _[UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/wine+quality)_<br>
Courtesy of:<br>
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.<br>
Modeling wine preferences by data mining from physicochemical properties.<br>
In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

In [2]:
file_name_white = 'winequality-white.csv'
columns = [
    'Fixed acidity',
    'Volatile acidity',
    'Citric acid',
    'Residual sugar',
    'Chlorides',
    'Free sulfur dioxide',
    'Total sulfur dioxide',
    'Density',
    'pH',
    'Sulphates',
    'Alcohol',
    'Quality'
]

df_white = pd.read_csv(file_name_white, names=columns, delimiter=';', header=0)
df_white[columns[:-1]] = df_white[columns[:-1]].astype(float)
df_white[columns[-1]] = df_white[columns[-1]].astype(float)

Normalizing 

In [3]:
# Normalize all the values between -0.5 and 0.5
for feature in df_white.columns[:-1]:
    max_value = df_white[feature].max()
    min_value = df_white[feature].min()
    mean_value = df_white[feature].mean()
    df_white[feature] = (df_white[feature] - mean_value) / (max_value - min_value)

Lets now examine the data after normalizing it.

In [4]:
df_white.head()

Unnamed: 0,Fixed acidity,Volatile acidity,Citric acid,Residual sugar,Chlorides,Free sulfur dioxide,Total sulfur dioxide,Density,pH,Sulphates,Alcohol,Quality
0,0.013963,-0.00808,0.015547,0.219457,-0.002292,0.03377,0.073409,0.134425,-0.171151,-0.046334,-0.276495,6.0
1,-0.053345,0.021332,0.003499,-0.073488,0.009578,-0.074244,-0.014758,-0.000528,0.101576,0.000178,-0.163591,6.0
2,0.119732,0.001724,0.039644,0.0078,0.012545,-0.018495,-0.095964,0.020679,0.065212,-0.057961,-0.066817,6.0
3,0.033193,-0.047295,-0.008549,0.03234,0.036284,0.040738,0.110532,0.030319,0.001576,-0.104473,-0.099075,6.0
4,0.033193,-0.047295,-0.008549,0.03234,0.036284,0.040738,0.110532,0.030319,0.001576,-0.104473,-0.099075,6.0


## Load the examples into a data variable
This is done for ease of use later in the program

In [5]:
class Example:
    def __init__(self, features, label):
        self.features = features
        self.label = label

In [6]:
data_white = []
for row in df_white.itertuples():
    label = row[-1]
    label = (label-3)/6
    features = row[1:-1]
    example = Example(features, label)
    data_white.append(example)

In [7]:
#data = shuffle(data, random_state=kSEED)

X_W = [example.features for example in data_white]
y_W = [example.label for example in data_white]

## Train-test Split

The data is split 80-20 train-test. 

In [8]:
X_train_W, X_test_W, y_train_W, y_test_W = train_test_split(X_W,y_W,test_size=0.2, random_state=10)

X_train_W = np.array(X_train_W)
y_train_W = np.array(y_train_W)
X_test_W = np.array(X_test_W)
y_test_W = np.array(y_test_W)

print('Total White Examples: {:}\nTrain Examples: {:}\nTest Examples: {:4d}'.format(len(data_white), len(X_train_W), len(X_test_W)))

Total White Examples: 4898
Train Examples: 3918
Test Examples:  980


## Implementing the Model



In [9]:
model = Sequential()
model.add(Dense(11,input_shape=(11,),activation='relu'))
model.add(Dense(20,activation='relu'))
model.add(Dense(30,activation='relu'))
model.add(Dense(40,activation='relu'))
model.add(Dense(30,activation='relu'))
model.add(Dense(20,activation='relu'))
model.add(Dense(11,activation='sigmoid'))
model.add(Dense(1,activation='sigmoid'))

### Compiling the Model

In [10]:
model.compile(loss='mean_squared_error', optimizer='adam')

### Model Summary

The model consists of 8 layers with sizes 1 to 40, see the below graph for the sizes of each layer. The first 6 layers use <code>relu</code> activation and the last two use <code>sigmoid</code> activation. This model was decided upon after extensive testing of various architectures. 

In [23]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 11)                132       
_________________________________________________________________
dense_2 (Dense)              (None, 20)                240       
_________________________________________________________________
dense_3 (Dense)              (None, 30)                630       
_________________________________________________________________
dense_4 (Dense)              (None, 40)                1240      
_________________________________________________________________
dense_5 (Dense)              (None, 30)                1230      
_________________________________________________________________
dense_6 (Dense)              (None, 20)                620       
_________________________________________________________________
dense_7 (Dense)              (None, 11)                231       
__________

### Training the Model

The model is trained on approximately all the data

In [24]:
model.fit(X_train_W, y_train_W, epochs=500, batch_size=10)

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500
Epoch 54/500
Epoch 55/500
Epoch 56/500
Epoch 57/500
Epoch 58/500
Epoch 59/500
Epoch 60/500
Epoch 61/500
Epoch 62/500
Epoch 63/500
Epoch 64/500
Epoch 65/500
Epoch 66/500
Epoch 67/500
Epoch 68/500
Epoch 69/500
Epoch 70/500
Epoch 71/500
Epoch 72/500
Epoch 73/500
Epoch 74/500
Epoch 75/500
Epoch 76/500
Epoch 77/500
Epoch 78

Epoch 97/500
Epoch 98/500
Epoch 99/500
Epoch 100/500
Epoch 101/500
Epoch 102/500
Epoch 103/500
Epoch 104/500
Epoch 105/500
Epoch 106/500
Epoch 107/500
Epoch 108/500
Epoch 109/500
Epoch 110/500
Epoch 111/500
Epoch 112/500
Epoch 113/500
Epoch 114/500
Epoch 115/500
Epoch 116/500
Epoch 117/500
Epoch 118/500
Epoch 119/500
Epoch 120/500
Epoch 121/500
Epoch 122/500
Epoch 123/500
Epoch 124/500
Epoch 125/500
Epoch 126/500
Epoch 127/500
Epoch 128/500
Epoch 129/500
Epoch 130/500
Epoch 131/500
Epoch 132/500
Epoch 133/500
Epoch 134/500
Epoch 135/500
Epoch 136/500
Epoch 137/500
Epoch 138/500
Epoch 139/500
Epoch 140/500
Epoch 141/500
Epoch 142/500
Epoch 143/500
Epoch 144/500
Epoch 145/500
Epoch 146/500
Epoch 147/500
Epoch 148/500
Epoch 149/500
Epoch 150/500
Epoch 151/500
Epoch 152/500
Epoch 153/500
Epoch 154/500
Epoch 155/500
Epoch 156/500
Epoch 157/500
Epoch 158/500
Epoch 159/500
Epoch 160/500
Epoch 161/500
Epoch 162/500
Epoch 163/500
Epoch 164/500
Epoch 165/500
Epoch 166/500
Epoch 167/500
Epoch 168

Epoch 190/500
Epoch 191/500
Epoch 192/500
Epoch 193/500
Epoch 194/500
Epoch 195/500
Epoch 196/500
Epoch 197/500
Epoch 198/500
Epoch 199/500
Epoch 200/500
Epoch 201/500
Epoch 202/500
Epoch 203/500
Epoch 204/500
Epoch 205/500
Epoch 206/500
Epoch 207/500
Epoch 208/500
Epoch 209/500
Epoch 210/500
Epoch 211/500
Epoch 212/500
Epoch 213/500
Epoch 214/500
Epoch 215/500
Epoch 216/500
Epoch 217/500
Epoch 218/500
Epoch 219/500
Epoch 220/500
Epoch 221/500
Epoch 222/500
Epoch 223/500
Epoch 224/500
Epoch 225/500
Epoch 226/500
Epoch 227/500
Epoch 228/500
Epoch 229/500
Epoch 230/500
Epoch 231/500
Epoch 232/500
Epoch 233/500
Epoch 234/500
Epoch 235/500
Epoch 236/500
Epoch 237/500
Epoch 238/500
Epoch 239/500
Epoch 240/500
Epoch 241/500
Epoch 242/500
Epoch 243/500
Epoch 244/500
Epoch 245/500
Epoch 246/500
Epoch 247/500
Epoch 248/500
Epoch 249/500
Epoch 250/500
Epoch 251/500
Epoch 252/500
Epoch 253/500
Epoch 254/500
Epoch 255/500
Epoch 256/500
Epoch 257/500
Epoch 258/500
Epoch 259/500
Epoch 260/500
Epoch 

Epoch 284/500
Epoch 285/500
Epoch 286/500
Epoch 287/500
Epoch 288/500
Epoch 289/500
Epoch 290/500
Epoch 291/500
Epoch 292/500
Epoch 293/500
Epoch 294/500
Epoch 295/500
Epoch 296/500
Epoch 297/500
Epoch 298/500
Epoch 299/500
Epoch 300/500
Epoch 301/500
Epoch 302/500
Epoch 303/500
Epoch 304/500
Epoch 305/500
Epoch 306/500
Epoch 307/500
Epoch 308/500
Epoch 309/500
Epoch 310/500
Epoch 311/500
Epoch 312/500
Epoch 313/500
Epoch 314/500
Epoch 315/500
Epoch 316/500
Epoch 317/500
Epoch 318/500
Epoch 319/500
Epoch 320/500
Epoch 321/500
Epoch 322/500
Epoch 323/500
Epoch 324/500
Epoch 325/500
Epoch 326/500
Epoch 327/500
Epoch 328/500
Epoch 329/500
Epoch 330/500
Epoch 331/500
Epoch 332/500
Epoch 333/500
Epoch 334/500
Epoch 335/500
Epoch 336/500
Epoch 337/500
Epoch 338/500
Epoch 339/500
Epoch 340/500
Epoch 341/500
Epoch 342/500
Epoch 343/500
Epoch 344/500
Epoch 345/500
Epoch 346/500
Epoch 347/500
Epoch 348/500
Epoch 349/500
Epoch 350/500
Epoch 351/500
Epoch 352/500
Epoch 353/500
Epoch 354/500
Epoch 

Epoch 378/500
Epoch 379/500
Epoch 380/500
Epoch 381/500
Epoch 382/500
Epoch 383/500
Epoch 384/500
Epoch 385/500
Epoch 386/500
Epoch 387/500
Epoch 388/500
Epoch 389/500
Epoch 390/500
Epoch 391/500
Epoch 392/500
Epoch 393/500
Epoch 394/500
Epoch 395/500
Epoch 396/500
Epoch 397/500
Epoch 398/500
Epoch 399/500
Epoch 400/500
Epoch 401/500
Epoch 402/500
Epoch 403/500
Epoch 404/500
Epoch 405/500
Epoch 406/500
Epoch 407/500
Epoch 408/500
Epoch 409/500
Epoch 410/500
Epoch 411/500
Epoch 412/500
Epoch 413/500
Epoch 414/500
Epoch 415/500
Epoch 416/500
Epoch 417/500
Epoch 418/500
Epoch 419/500
Epoch 420/500
Epoch 421/500
Epoch 422/500
Epoch 423/500
Epoch 424/500
Epoch 425/500
Epoch 426/500
Epoch 427/500
Epoch 428/500
Epoch 429/500
Epoch 430/500
Epoch 431/500
Epoch 432/500
Epoch 433/500
Epoch 434/500
Epoch 435/500
Epoch 436/500
Epoch 437/500
Epoch 438/500
Epoch 439/500
Epoch 440/500
Epoch 441/500
Epoch 442/500
Epoch 443/500
Epoch 444/500
Epoch 445/500
Epoch 446/500
Epoch 447/500
Epoch 448/500
Epoch 

Epoch 472/500
Epoch 473/500
Epoch 474/500
Epoch 475/500
Epoch 476/500
Epoch 477/500
Epoch 478/500
Epoch 479/500
Epoch 480/500
Epoch 481/500
Epoch 482/500
Epoch 483/500
Epoch 484/500
Epoch 485/500
Epoch 486/500
Epoch 487/500
Epoch 488/500
Epoch 489/500
Epoch 490/500
Epoch 491/500
Epoch 492/500
Epoch 493/500
Epoch 494/500
Epoch 495/500
Epoch 496/500
Epoch 497/500
Epoch 498/500
Epoch 499/500
Epoch 500/500


<keras.callbacks.History at 0x27b39bce748>

In [25]:
print("Done")

Done


### Evaluating the Model

<code>rng</code> sets the range around the correct value which will be considered as correct. Specifically if $| pred - label | < 1 /rng$ then that data point will be considered correct.

In [14]:
def accu_check(model, test_data, test_pred, bound):
    y_pred = model.predict(test_data)

    total = 0
    right = 0.0
    for (pred,test) in zip(y_pred,test_pred):
        total += 1
        if pred >= test - bound/2 and pred <= test + bound/2:
            right += 1
    accuracy = right/total
    print("Total elements checked: {}".format(total))
    print("Correct predictions made: {}".format(int(right)))
    print("Accuracy: {:.4}".format(accuracy))

In [22]:
rng = 6
accu_check(model, X_test_W, y_test_W, 1.0/rng)

Total elements checked: 980
Correct predictions made: 600
Accuracy: 0.6122


Typically the model has an accuracy of $>0.7$ with <code>rng = 4</code> and $>0.6$ when <code>rng = 6</code>

## Application

The program takes the test case given below and finds the optimal changes to maximally improve the quality of the selected wine.

In [17]:
print("Enter a value between 1 and {:4d}".format(len(X_train_W)))
index = input(":")
try:
    index = int(index)
    if index < 1 or index > len(X_train_W):
        print("Error in input")
    else:
        index -= 1
except ValueError:
    print("Error in input")

Enter a value between 1 and 3918
:6
6


In [18]:
#Don't run this cell if the above cell prints an error
vary_range = 0.2
def adjust(val, delta):
    if delta == 0:
        val = val - vary_range
        if val < 0: val = 0
    elif delta == 2:
        val = val + vary_range
        if val > 0: val = 1
    return val

In [19]:
vary_range = 0.2
opt = np.zeros((3,3,3,3,3,3,3,3,3,3,3), np.float32)
start = X_test_W[index]
for a in range(3):
    for b in range(3):
        for c in range(3):
            for d in range(3):
                for e in range(3):
                    for f in range(3):
                        for g in range(3):
                            for h in range(3):
                                for i in range(3):
                                    for j in range(3):
                                        for k in range(3):
                                            temp = np.array([np.zeros((11,), np.float32)])
                                            temp[0] = np.array([adjust(start[0], a),adjust(start[1], b),adjust(start[2], c),adjust(start[3], d),adjust(start[4], e),adjust(start[5], f),adjust(start[6], g),adjust(start[7], h),adjust(start[8], i),adjust(start[9], j),adjust(start[10], k)])
                                            opt[a,b,c,d,e,f,g,h,i,j,k] = model.predict(temp)
print("Done")

Done


In [20]:
def translate(val):
    if val == 0:
        return "Decrease"
    elif val == 1:
        return "Don't change"
    else:
        return "Increase"

In [21]:
ind = np.unravel_index(np.argmax(opt, axis=None), opt.shape)
print("{} fixed acidity\n{} volatile acidity\n{} citric acid\n{} residual sugar\n{} chlorides\n{} free sulfur dioxide\n{} total sulfur dioxide\n{} density\n{} pH\n{} sulphates\n{} alcohol".format(translate(ind[0]),translate(ind[1]),translate(ind[2]),translate(ind[3]),translate(ind[4]),translate(ind[5]),translate(ind[6]),translate(ind[7]),translate(ind[8]),translate(ind[9]),translate(ind[10])))

Increase fixed acidity
Decrease volatile acidity
Increase citric acid
Increase residual sugar
Decrease chlorides
Increase free sulfur dioxide
Decrease total sulfur dioxide
Decrease density
Decrease pH
Increase sulphates
Increase alcohol
