Since the fact that we're trying to predict the quality score of wine samples as an output for this problem, I've decided upon following it with regression models for this problem. I'll be relying on Linear Regression model for this case.

In [172]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

Since within Tensorflow's official dataset web page, white wine both is set as a default dataset and also has a larger sets of examples. I've personaly decided upon using white wine dataset to both train and test my model. I've called upon data once again to make sure my code has imported the dataset correctly

In [173]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv"
data = pd.read_csv(url, sep=';')
data

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.00100,3.00,0.45,8.8,6
1,6.3,0.30,0.34,1.6,0.049,14.0,132.0,0.99400,3.30,0.49,9.5,6
2,8.1,0.28,0.40,6.9,0.050,30.0,97.0,0.99510,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.99560,3.19,0.40,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.99560,3.19,0.40,9.9,6
...,...,...,...,...,...,...,...,...,...,...,...,...
4893,6.2,0.21,0.29,1.6,0.039,24.0,92.0,0.99114,3.27,0.50,11.2,6
4894,6.6,0.32,0.36,8.0,0.047,57.0,168.0,0.99490,3.15,0.46,9.6,5
4895,6.5,0.24,0.19,1.2,0.041,30.0,111.0,0.99254,2.99,0.46,9.4,6
4896,5.5,0.29,0.30,1.1,0.022,20.0,110.0,0.98869,3.34,0.38,12.8,7


Now, in order to divide the variables from the outputs, I've created both X(variables) and Y(outputs) arrays. I wasn't able to use data[all the names of the features as string], so I created a new array called feature_columns, which seemed to help.

In [174]:
feature_columns = ["fixed acidity", "volatile acidity", "citric acid",
                   "residual sugar", "chlorides", "free sulfur dioxide",
                   "total sulfur dioxide", "density", "pH", "sulphates",
                   "alcohol"]
x = np.array(data[feature_columns]).reshape(-1, 11)
y = np.array(data["quality"]).reshape(-1, 1)

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=22)
print("Arrays X and Y are divided into both test and training datasets")
print("X Test Dataset Array Size", x_test.shape)
print("X Train Dataset Array Size ",x_train.shape)
print("Y Test Dataset Array Size", y_test.shape)
print("Y Test Dataset Array Size", y_train.shape)

Arrays X and Y are divided into both test and training datasets
X Test Dataset Array Size (980, 11)
X Train Dataset Array Size  (3918, 11)
Y Test Dataset Array Size (980, 1)
Y Test Dataset Array Size (3918, 1)


In [176]:
model = LinearRegression()
history = model.fit(x_train, y_train)
y_pred = model.predict(x_test)

print("Intercept (w0):", model.intercept_)
print("Slope:", model.coef_[0])

Intercept (w0): [209.0554147]
Slope: [ 1.05148846e-01 -1.85241735e+00 -9.09280528e-03  1.01438976e-01
  5.67789801e-01  3.30686264e-03  1.80310965e-05 -2.09787683e+02
  8.45971006e-01  7.11808243e-01  1.24444841e-01]


I've decided upon investigating the first 8 arrays in order to make it more readable. If needed, the for loop can be changed into for i in range(y_pred.shape[0]): anytime.

In [177]:
for i in range(8):
  print(i+1,":")
  for k in feature_columns:
    print("|", k, ":", x_test[i,feature_columns.index(k)])
  print("Predicted Quality:", y_pred[i])
  print("Ground Truth:", y_test[i])

1 :
| fixed acidity : 6.1
| volatile acidity : 0.38
| citric acid : 0.2
| residual sugar : 6.6
| chlorides : 0.033
| free sulfur dioxide : 25.0
| total sulfur dioxide : 137.0
| density : 0.9938
| pH : 3.3
| sulphates : 0.69
| alcohol : 10.4
Predicted Quality: [5.85454021]
Ground Truth: [6]
2 :
| fixed acidity : 7.3
| volatile acidity : 0.33
| citric acid : 0.4
| residual sugar : 6.85
| chlorides : 0.038
| free sulfur dioxide : 32.0
| total sulfur dioxide : 138.0
| density : 0.992
| pH : 3.03
| sulphates : 0.3
| alcohol : 11.9
Predicted Quality: [6.1811536]
Ground Truth: [7]
3 :
| fixed acidity : 7.1
| volatile acidity : 0.24
| citric acid : 0.41
| residual sugar : 17.8
| chlorides : 0.046
| free sulfur dioxide : 39.0
| total sulfur dioxide : 145.0
| density : 0.9998
| pH : 3.32
| sulphates : 0.39
| alcohol : 8.7
Predicted Quality: [5.74015073]
Ground Truth: [5]
4 :
| fixed acidity : 6.4
| volatile acidity : 0.42
| citric acid : 0.46
| residual sugar : 8.4
| chlorides : 0.05
| free sulf