# Overview

The task at hand is classification of wine quality

We will use 3 different approaches:

- A standard neural network (feed fordward nn)
- A bayesian neural network that will take into account epistemic (model) uncertainty on the predicted labels
- A probabilistic neural network that will take into account both aleatoric (data) and epistemic (model) uncertainty on the predicted labels

## Workflow

1. [Data Inspection](#inspection) 
    - Loading
    - Inspection
    - Preprocessing
2. [Modeling](#model-definition)
    - Standard Neural Network
    - Bayesian Neural Network
    - Probabilistic Neural Network
3. [Prediction](#prediction)

In [5]:
# Software install (as required)
!pip install -r ../requirements.txt

Collecting tensorflow_probability (from -r ../requirements.txt (line 3))
  Downloading tensorflow_probability-0.20.1-py2.py3-none-any.whl (6.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m49.6 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
Collecting cloudpickle>=1.3 (from tensorflow_probability->-r ../requirements.txt (line 3))
  Downloading cloudpickle-2.2.1-py3-none-any.whl (25 kB)
Installing collected packages: cloudpickle, tensorflow_probability
Successfully installed cloudpickle-2.2.1 tensorflow_probability-0.20.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m


In [23]:
import os
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow_datasets as tfds
import tensorflow_probability as tfp
import matplotlib.pyplot as plt
import seaborn as sns

## Data Inspection <a name="inspection"></a>

In [70]:
# Data Loading : load the wine dataset
# load train, test & validation splits into 70%, 30% respectively
(ds_train, ds_test), ds_info   = tfds.load(
    "wine_quality", 
    split=["train[:70%]","train[70%:]"],
    as_supervised=True,
    with_info=True)

In [71]:
ds_info.features

FeaturesDict({
    'features': FeaturesDict({
        'alcohol': float32,
        'chlorides': float32,
        'citric acid': float32,
        'density': float32,
        'fixed acidity': float32,
        'free sulfur dioxide': float32,
        'pH': float32,
        'residual sugar': float32,
        'sulphates': float64,
        'total sulfur dioxide': float32,
        'volatile acidity': float32,
    }),
    'quality': int32,
})

In [72]:
# Basic Info
feature_names=list(ds_info.features['features'].keys())
print("Total examples: %d" %(len(ds_train)+len(ds_test)))
print("Train set size: %d" %len(ds_train)) 
print("Test set size : %d" %len(ds_test))   
print("Feature names : %s" %feature_names)
print("")

Total examples: 4898
Train set size: 3429
Test set size : 1469
Feature names : ['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density', 'pH', 'sulphates', 'alcohol']



In [73]:
# show a few examples from the train dataset
tfds.as_dataframe(ds_train.take(10), ds_info)

2023-05-23 22:17:48.638538: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_1' with dtype string and shape [1]
	 [[{{node Placeholder/_1}}]]
2023-05-23 22:17:48.639151: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype string and shape [1]
	 [[{{node Placeholder/_2}}]]
2023-05-23 22:17:48.711533: W tensorflow/core/kernels/data/cache_dataset_ops.cc:856] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline 

Unnamed: 0,features/alcohol,features/chlorides,features/citric acid,features/density,features/fixed acidity,features/free sulfur dioxide,features/pH,features/residual sugar,features/sulphates,features/total sulfur dioxide,features/volatile acidity,quality
0,9.0,0.0540000014007091,0.3400000035762787,1.0008000135421753,7.599999904632568,44.0,3.220000028610229,18.350000381469727,0.550000011920929,197.0,0.3199999928474426,5
1,12.199999809265137,0.0630000010132789,0.4900000095367431,0.991100013256073,6.300000190734863,35.0,3.380000114440918,1.2000000476837158,0.4199999868869781,92.0,0.270000010728836,6
2,11.199999809265137,0.0289999991655349,0.1099999994039535,0.9907600283622742,5.300000190734863,6.0,3.509999990463257,1.100000023841858,0.4799999892711639,51.0,0.4300000071525574,4
3,9.0,0.1099999994039535,0.270000010728836,0.996720016002655,6.599999904632568,20.0,3.0799999237060547,10.699999809265137,0.4099999964237213,103.0,0.4099999964237213,6
4,12.0,0.0350000001490116,0.3000000119209289,0.9901599884033204,5.900000095367432,57.0,3.0899999141693115,3.799999952316284,0.3400000035762787,135.0,0.3400000035762787,6
5,10.300000190734863,0.0549999997019767,0.3899999856948852,0.9965199828147888,7.0,42.0,3.369999885559082,7.5,0.5400000214576721,218.0,0.3100000023841858,5
6,10.699999809265137,0.0540000014007091,0.3499999940395355,0.9917799830436708,7.300000190734863,31.0,3.180000066757202,1.600000023841858,0.4699999988079071,148.0,0.2800000011920929,5
7,10.399999618530272,0.0529999993741512,0.3100000023841858,0.9958699941635132,7.099999904632568,32.0,3.309999942779541,7.400000095367432,0.5899999737739563,211.0,0.2000000029802322,6
8,8.600000381469727,0.0410000011324882,0.6200000047683716,0.9976000189781188,7.199999809265137,70.0,3.0799999237060547,10.800000190734863,0.4900000095367431,189.0,0.4000000059604645,4
9,11.899999618530272,0.0340000018477439,0.3600000143051147,0.9908499717712402,7.300000190734863,30.0,3.25,2.0999999046325684,0.4000000059604645,177.0,0.25,8


In [74]:
# Class balance check : is the dataset imbalanced?
#fig, ax = plt.subplots(1, 1, figsize=(10,6))
#labels, counts = np.unique(np.fromiter(ds_train.map(lambda x, y: y), np.int32), 
#                       return_counts=True)
#ax.set_xlabel('Counts')
#ax.set_title("Counts by type of terrain");
#sns.barplot(x=counts, y=[class_names[l] for l in labels], label="Total")
#ax.grid(True,ls='--')
#sns.despine(left=True, bottom=True)

In [75]:
def prepare_for_training(ds, cache=True, batch_size=1, shuffle_buffer_size=1000):
  ds = ds.map(lambda x, y: (x, tf.cast(y, tf.float32)))
  ds = ds.prefetch(buffer_size=4898)
  ds = ds.cache()
  # shuffle the dataset
  ds = ds.shuffle(buffer_size=shuffle_buffer_size)
  # split to batches
  ds = ds.batch(batch_size)
  # `prefetch` lets the dataset fetch batches in the background while the model is training.
  return ds

In [76]:
batch_size = 1
# preprocess training & validation sets
ds_train = prepare_for_training(ds_train, batch_size=batch_size,shuffle_buffer_size=len(ds_train))

In [77]:
# Function to create model inputs
def create_model_inputs():
    inputs = {}
    for name in feature_names:
        inputs[name] = layers.Input(
            name=name, shape=(1,), dtype=tf.float32
        )
    return inputs

# Create Standard Neural Network
def base_neural_network(hidden_units=None):
    inputs = create_model_inputs()
    input_values = [value for _, value in sorted(inputs.items())]
    features = keras.layers.concatenate(input_values)
    features = layers.BatchNormalization()(features)

    # Create hidden layers with deterministic weights using the Dense layer.
    for units in hidden_units:
        features = layers.Dense(units, activation="sigmoid")(features)
    # The output is deterministic: a single point estimate.
    outputs = layers.Dense(units=1)(features)

    model = keras.Model(inputs=inputs, outputs=outputs)
    return model


# Function to train and evaluate a model (experiment run)
def run_experiment(model, loss, train_dataset, test_dataset, num_epochs, learning_rate):

    model.compile(
        optimizer=keras.optimizers.RMSprop(learning_rate=learning_rate),
        loss=loss,
        metrics=[keras.metrics.RootMeanSquaredError()],
    )

    print("Model training started ...")
    model.fit(
        train_dataset, 
        epochs=num_epochs, 
        validation_data=test_dataset)
    
    print("Model training finished.")
    _, rmse = model.evaluate(train_dataset, verbose=0)
    print(f"Train RMSE: {round(rmse, 3)}")

    print("Evaluating model performance...")
    _, rmse = model.evaluate(test_dataset, verbose=0)
    print(f"Test RMSE: {round(rmse, 3)}")


In [78]:
arch_type = 'nn'
model_name = "wine_quality_classification_"+arch_type
model_path = os.path.join("../models", model_name + ".h5")
if not os.path.exists("../models"):
    os.makedirs(model_path)

### Neural Network Training <a name="model training"></a>

In [79]:
hidden_units = [8, 8]
learning_rate = 0.001
num_epochs = 100
nn_model = base_neural_network(hidden_units=hidden_units)
run_experiment(
    model=nn_model, 
    loss=keras.losses.MeanSquaredError(), 
    train_dataset=ds_train, 
    test_dataset=ds_test,
    num_epochs=num_epochs,
    learning_rate=learning_rate)


Start training the model...
Epoch 1/100


2023-05-23 22:17:55.625866: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype string and shape [1]
	 [[{{node Placeholder/_2}}]]
2023-05-23 22:17:55.626625: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype string and shape [1]
	 [[{{node Placeholder/_2}}]]




2023-05-23 22:18:01.958063: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_4' with dtype int64 and shape [1]
	 [[{{node Placeholder/_4}}]]
2023-05-23 22:18:01.958642: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_2' with dtype string and shape [1]
	 [[{{node Placeholder/_2}}]]


ValueError: in user code:

    File "/home/codespace/.local/lib/python3.10/site-packages/keras/engine/training.py", line 1852, in test_function  *
        return step_function(self, iterator)
    File "/home/codespace/.local/lib/python3.10/site-packages/keras/engine/training.py", line 1836, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/codespace/.local/lib/python3.10/site-packages/keras/engine/training.py", line 1824, in run_step  **
        outputs = model.test_step(data)
    File "/home/codespace/.local/lib/python3.10/site-packages/keras/engine/training.py", line 1788, in test_step
        y_pred = self(x, training=False)
    File "/home/codespace/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/home/codespace/.local/lib/python3.10/site-packages/keras/backend.py", line 3581, in concatenate
        return tf.concat([to_dense(x) for x in tensors], axis)

    ValueError: Exception encountered when calling layer 'concatenate_7' (type Concatenate).
    
    Can't concatenate scalars (use tf.stack instead) for '{{node model_7/concatenate_7/concat}} = ConcatV2[N=11, T=DT_FLOAT, Tidx=DT_INT32](IteratorGetNext, IteratorGetNext:1, IteratorGetNext:2, IteratorGetNext:3, IteratorGetNext:4, IteratorGetNext:5, IteratorGetNext:6, IteratorGetNext:7, model_7/Cast, IteratorGetNext:9, IteratorGetNext:10, model_7/concatenate_7/concat/axis)' with input shapes: [], [], [], [], [], [], [], [], [], [], [], [].
    
    Call arguments received by layer 'concatenate_7' (type Concatenate):
      • inputs=['tf.Tensor(shape=(), dtype=float32)', 'tf.Tensor(shape=(), dtype=float32)', 'tf.Tensor(shape=(), dtype=float32)', 'tf.Tensor(shape=(), dtype=float32)', 'tf.Tensor(shape=(), dtype=float32)', 'tf.Tensor(shape=(), dtype=float32)', 'tf.Tensor(shape=(), dtype=float32)', 'tf.Tensor(shape=(), dtype=float32)', 'tf.Tensor(shape=(), dtype=float32)', 'tf.Tensor(shape=(), dtype=float32)', 'tf.Tensor(shape=(), dtype=float32)']
