# CONDOR ordinal classification/regression in Tensorflow Keras 

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/GarrettJenkinson/condor_tensorflow/blob/main/docs/CONDOR_TensorFlow_demo.ipynb)


This notebook uses MNIST hand-written digits and Amazon reviews as examples of ordinal classification, using the condor_tensorflow package for Tensorflow Keras.


**Acknowledgments**: This notebook is based in part on PyTorch source code written by Sebastian Rashka [in this notebook](https://github.com/Raschka-research-group/coral-cnn/blob/master/coral-implementation-recipe.ipynb) and the CORAL ordinal notebook written by [Chris Kennedy and Stephen Matthews](https://github.com/ck37/coral-ordinal).

## Installation for Google Colab

With pip you can either install the latest source code from GitHub or the stable version of the module on pypi.org

In [None]:
#upgrade sklearn...only needed for advanced ordinalEncoder behaviours
if 'google.colab' in str(get_ipython()):
    !pip install scikit-learn==0.24.2


In [None]:
if 'google.colab' in str(get_ipython()):
    GITHUB_AUTH = "GarrettJenkinson:<APIaccessTOKEN>"
    !git clone https://$GITHUB_AUTH@github.com/GarrettJenkinson/condor_tensorflow.git

In [None]:
# Install source package from GitHub
if 'google.colab' in str(get_ipython()):
    !pip install --force-reinstall --no-deps --use-feature=in-tree-build condor_tensorflow/

## Import statements

In [26]:
import numpy as np
import sklearn
from sklearn import model_selection
from sklearn.model_selection import train_test_split
import pandas as pd
from scipy import special
import tensorflow_hub as hub
import os
import json
import gzip
from urllib.request import urlopen

import tensorflow as tf
print("Tensorflow version", tf.__version__)

import condor_tensorflow as condor
print("CORAL Ordinal version:", condor.__version__)

Tensorflow version 2.6.0
CORAL Ordinal version: 0.1.0-dev


## MNIST toy example

MNIST is a database of handwritten digits extracted from handwriting sample forms and widely utilized in image classification tasks.

The originally intended use of the dataset is categorical prediction (recognition of digits), without any ordinal component.  However, since the data are numerical, one could imagine a scenario where ordinal proximity of incorrect predictions to the correct prediction might be beneficial e.g. handwritten map coordinates.  Hence we utilize the MNIST dataset and enforce ordinal predictions to demonstrate the improved performance of CONDOR on the ordinal problem, while acknowledging that MNIST is usually more suited to categorical prediction.

We begin by setting some core variables required for model building.

In [None]:
##########################
### SETTINGS
##########################

# Hyperparameters
random_seed = 1 # Not yet used
learning_rate = 0.05
batch_size = 128
num_epochs = 2

# Architecture
NUM_CLASSES = 10

Next we load the MNIST data and create training, test and validation datasets in a suitable format.  Finally we check the shapes of the data structures containing our MNIST data.

In [None]:
# Fetch and format the mnist data
(mnist_images, mnist_labels), (mnist_images_test, mnist_labels_test) = tf.keras.datasets.mnist.load_data()

# Split off a validation dataset for early stopping
mnist_images, mnist_images_val, mnist_labels, mnist_labels_val = \
  model_selection.train_test_split(mnist_images, mnist_labels, test_size = 5000, random_state = 1)

print("Shape of training images:", mnist_images.shape)
print("Shape of training labels:", mnist_labels.shape)

print("Shape of test images:", mnist_images_test.shape)
print("Shape of test labels:", mnist_labels_test.shape)

print("Shape of validation images:", mnist_images_val.shape)
print("Shape of validation labels:", mnist_labels_val.shape)

# Also rescales to 0-1 range.
dataset = tf.data.Dataset.from_tensor_slices(
  (tf.cast(mnist_images[..., tf.newaxis] / 255, tf.float32),
   tf.cast(mnist_labels, tf.int64)))
dataset = dataset.shuffle(1000).batch(batch_size)

test_dataset = tf.data.Dataset.from_tensor_slices(
  (tf.cast(mnist_images_test[..., tf.newaxis] / 255, tf.float32),
   tf.cast(mnist_labels_test, tf.int64)))
#test_dataset = test_dataset.shuffle(1000).batch(batch_size)
# Here we do not shuffle the test dataset.
test_dataset = test_dataset.batch(batch_size)

val_dataset = tf.data.Dataset.from_tensor_slices(
  (tf.cast(mnist_images_val[..., tf.newaxis] / 255, tf.float32),
   tf.cast(mnist_labels_val, tf.int64)))
val_dataset = val_dataset.shuffle(1000).batch(batch_size)

### Simple MLP model



Now we create a simple multi-layer perceptron model and apply the ordinal output layer required by CONDOR (i.e. a dense layer with 1 unit less than the number of output classes).  Note while we use the example of an MLP model, any categorical neural network architecture could be used.  The version below uses the Sequential API to create the model.

In [None]:
def create_model(num_classes):
  model = tf.keras.Sequential()
  model.add(tf.keras.layers.Flatten(input_shape = (28, 28, )))
  model.add(tf.keras.layers.Dense(128, activation = "relu"))
  model.add(tf.keras.layers.Dropout(0.2))
  model.add(tf.keras.layers.Dense(32, activation = "relu"))
  model.add(tf.keras.layers.Dropout(0.1))
  # No activation function specified so this will output cumulative logits.
  model.add(tf.keras.layers.Dense(num_classes-1))
  return model

model = create_model(NUM_CLASSES)

# Note that the model generates 1 fewer outputs than the number of classes. 
model.summary()

Alternatively we could build the model using the Functional API as demonstrated below.

In [None]:
# Or a functional API version
def create_model2(num_classes):
  inputs = tf.keras.Input(shape = (28, 28, ))

  x = tf.keras.layers.Flatten()(inputs)
  x = tf.keras.layers.Dense(128, activation = "relu")(x)
  x = tf.keras.layers.Dropout(0.2)(x)
  x = tf.keras.layers.Dense(32, activation = "relu")(x)
  x = tf.keras.layers.Dropout(0.1)(x)
  # No activation function specified so this will output cumulative logits.
  outputs = tf.keras.layers.Dense(num_classes-1)(x)

  model = tf.keras.Model(inputs = inputs, outputs = outputs)

  return model

model = create_model2(NUM_CLASSES)

# Note that the model generates 1 fewer outputs than the number of classes. 
model.summary()

We compile the model using CONDOR's SparseCondorOrdinalCrossEntropy as the loss function.  This is the key component of the CONDOR method, which enables ordinal prediction with rank consistency.  The other metrics provided by CONDOR enable assessment of CONDOR's performance on the ordinal prediction problem.

In [None]:
model.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = learning_rate),
              loss = condor.SparseCondorOrdinalCrossEntropy(),
              metrics = [condor.SparseOrdinalEarthMoversDistance(),
                         condor.SparseOrdinalMeanAbsoluteError()])

Now we train the model.

In [None]:
%%time

# This takes about 5 minutes on CPU, 2.5 minutes on GPU.
history = model.fit(dataset, epochs = 5, validation_data = val_dataset,
                    callbacks = [tf.keras.callbacks.EarlyStopping(patience = 3, restore_best_weights = True)])

### Test set evaluation
Now we can evaluate performance on the MNIST test dataset we created previously.

In [None]:
# Evaluate on test dataset.
model.evaluate(test_dataset)

### Cumulative logits to probabilities

Note that the output layer natively outputs cumulative logit values.  These can be  subsequently converted to probability estimates for each ordinal label utilizing the condor.ordinal_softmax() function.

In [None]:
print("Predict on test dataset")

# Note that these are ordinal (cumulative) logits, not probabilities or regular logits.
ordinal_logits = model.predict(test_dataset)

# Convert from logits to label probabilities. This is initially a tensorflow tensor.
tensor_probs = condor.ordinal_softmax(ordinal_logits)

# Convert the tensor into a pandas dataframe.
probs_df = pd.DataFrame(tensor_probs.numpy())

probs_df.head(10)

Now we can confirm that our probabilities sum to 1 as expected:

In [None]:
# Check that probabilities all sum to 1 - looks good!
probs_df.sum(axis = 1)

At this point you have successfully generated CONDOR's predictions.  Depending on your use case, these may be sufficient for your purposes and if so you can stop here.  However, in the following sections we explore techniques for producing labels from the predicted probabilities.  These techniques will be required if your application requires a single class prediction.

### Label prediction

Using the probabilities generated, we can produce point estimates of the labels for the MNIST images.  There are many valid techniques to produce point estimates from the probabilities.  Here we demonstrate two common techniques of calculating predicted labels.

First we can simply select the label with the highest probability (i.e. we use the mode):

In [None]:
# Probs to labels
labels = probs_df.idxmax(axis = 1)
labels.values

We can now use these labels to calculate the accuracy of our predictions:

In [None]:
np.mean(labels == mnist_labels_test)

In [None]:
# Compare to logit-based cumulative probs
cum_probs = pd.DataFrame(ordinal_logits).apply(special.expit).cumprod(axis=1)
cum_probs.head(10)

Secondly we utilize the method of label prediction given by Equation 1 of the CONDOR paper (i.e. we use the median):

In [None]:
labels2 = cum_probs.apply(lambda x: x > 0.5).sum(axis = 1)
labels2.head()

Next we calculate the accuracy of the labels using our test data:

In [None]:
np.mean(labels2 == mnist_labels_test)

Often the two methods of label prediction agree, but not always:

In [None]:
np.mean(labels == labels2)

In [None]:
print("Mean absolute label error version 1:", np.mean(np.abs(labels - mnist_labels_test)))
print("Mean absolute label error version 2:", np.mean(np.abs(labels2 - mnist_labels_test)))

At this point you have fully implemented the CONDOR ordinal workflow, generated predicted probabilities and utilized two methods to produce point estimates of the labels. 

### Importance weights customization

A quick example to show how the importance weights can be customized. 

In [None]:
model = create_model(num_classes = NUM_CLASSES)
model.summary()

# We have num_classes - 1 outputs (cumulative logits), so there are 9 elements
# in the importance vector to customize.
importance_weights = [1., 1., 0.5, 0.5, 0.5, 1., 1., 0.1, 0.1]
loss_fn = condor.SparseCondorOrdinalCrossEntropy(importance_weights = importance_weights)

model.compile(tf.keras.optimizers.Adam(learning_rate = learning_rate), loss = loss_fn)

In [None]:
%%time

history = model.fit(dataset, epochs = num_epochs)

## Amazon reviews and 5-star ratings

Now we consider a wholly different problem - text-based Amazon product reviews with corresponding star ratings (via https://nijianmo.github.io/amazon/index.html#subsets).

As well as introducing another dataset to which CONDOR can be successfully applied, this part of the tutorial will expand on some relevant topics that were not considered in the MNIST example.

We start by downloading the necessary data:


In [19]:
!curl -o Prime_Pantry_5.json.gz http://deepyeti.ucsd.edu/jianmo/amazon/categoryFilesSmall/Prime_Pantry_5.json.gz 

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
 17 11.2M   17 2021k    0     0  2021k      0  0:00:05 --:--:--  0:00:05 2051k
 54 11.2M   54 6280k    0     0  6280k      0  0:00:01  0:00:01 --:--:-- 3163k
 93 11.2M   93 10.5M    0     0  5403k      0  0:00:02  0:00:02 --:--:-- 3620k
100 11.2M  100 11.2M    0     0  3853k      0  0:00:03  0:00:03 --:--:-- 3699k


Next we read the data from the downloaded file into a Pandas data frame and do some basic cleanup and preprocessing, extracting only the data that we need:

In [20]:
data = []
with gzip.open('Prime_Pantry_5.json.gz') as f:
    for l in f:
        data.append(json.loads(l.strip()))

df = pd.DataFrame.from_dict(data)
df = df[['overall', 'reviewText']]

# There is a large amount of duplicate text in here, possibly due to paid/fraudulent reviews.
df.drop_duplicates("reviewText", inplace = True)

# Some of the text is blank, which causes an obscure error about floating point conversion.
df.dropna(inplace = True)

print(len(df))
print(df.head())

outcome_col = "overall"
text_col = "reviewText"

# We subtract the minimum value from the outcomes so that they start at 0.
df[outcome_col] = df[outcome_col].values - df[outcome_col].min()

print("\n", df.overall.value_counts())

# TODO: define automatically based on the number of unique values in the outcome variable.
num_classes = 5

99025
   overall                                         reviewText
0      4.0  I purchased this Saran premium plastic wrap af...
1      5.0  I am an avid cook and baker.  Saran Premium Pl...
2      5.0  Good wrap, keeping it in the fridge makes it e...
3      4.0  I prefer Saran wrap over other brands. It does...
4      5.0                                             Thanks

 4.0    69812
3.0    15294
2.0     7664
1.0     3342
0.0     2913
Name: overall, dtype: int64


You can see (above) we have a data frame with product star ratings and corresponding text reviews.  You can also see the counts (number of entries) corresponding to each category of star rating.

Now lets split the data into training and test sets:

In [21]:
# Train/Test split
text_train, text_test, labels_train, labels_test = \
  train_test_split(df[text_col].values, df[outcome_col].values, test_size = 10000, random_state = 1)

print("Training text shape:", text_train.shape)
print("Training labels shape:", labels_train.shape)
print("Testing text shape:", text_test.shape)
print("Testing labels shape:", labels_test.shape)

Training text shape: (89025,)
Training labels shape: (89025,)
Testing text shape: (10000,)
Testing labels shape: (10000,)


### Universal Sentence Encoder model (CONDOR applied with minimal code changes)

The Universal Sentence Encoder encodes text into high dimensional vectors that can be used for text classification, semantic similarity, clustering and other natural language tasks.

The model is trained and optimized for greater-than-word length text, such as sentences, phrases or short paragraphs. It is trained on a variety of data sources and a variety of tasks with the aim of dynamically accommodating a wide variety of natural language understanding tasks. The input is variable length English text and the output is a 512 dimensional vector. 

The following code shows how CONDOR can be applied to the Amazon review data for ordinal prediction, utilizing the existing Universal Sentence Encoder model, with minimal code changes.  CONDOR is designed to be easily added to existing models.

In [22]:
%%time
# This takes 20 - 30 seconds.

# Clear our GPU memory to stay efficient.
tf.keras.backend.clear_session()

input_text = tf.keras.layers.Input(shape = [], dtype = tf.string, name = 'input_text')

model_url = hub.load("https://tfhub.dev/google/universal-sentence-encoder-large/5")

base_model = hub.KerasLayer(model_url, input_shape = [],
                            dtype = tf.string,
                            trainable = False)
                            
embedded = base_model(input_text)

x = tf.keras.layers.Dense(64, activation = 'relu')(embedded)
x = tf.keras.layers.Dropout(0.1)(x)
output =tf.keras.layers.Dense(num_classes-1)(x) 

model = tf.keras.Model(inputs = input_text, outputs = output)

model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_text (InputLayer)      [(None,)]                 0         
_________________________________________________________________
keras_layer (KerasLayer)     (None, 512)               147354880 
_________________________________________________________________
dense (Dense)                (None, 64)                32832     
_________________________________________________________________
dropout (Dropout)            (None, 64)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 4)                 260       
Total params: 147,387,972
Trainable params: 33,092
Non-trainable params: 147,354,880
_________________________________________________________________
Wall time: 2min 37s


As shown above, we can load the model from its URL and make some minimal edits, including the addition of our CONDOR-required output layer.

Next we compile the model, usind CONDOR's SparseCondorOrdinalCrossEntropy and the generate the same metrics as used previously with the MNIST data:

In [23]:
model.compile(loss = condor.SparseCondorOrdinalCrossEntropy(),
              metrics = [condor.SparseOrdinalEarthMoversDistance(),
                         condor.SparseOrdinalMeanAbsoluteError()],
              optimizer = tf.keras.optimizers.Adam(learning_rate = 0.001))

Now we can encode a test string and take a look at the first ten dimensions:

In [24]:
base_model(np.array(["test_string"])).numpy()[0, :10]

array([-0.01035028,  0.00395393, -0.04288317,  0.00483279, -0.07732295,
       -0.0669976 ,  0.01624427, -0.01737383, -0.00085805,  0.01084491],
      dtype=float32)

Now we will train the model using the test reviews as our training data and the corresponding star reviews as our labels.  As with the MNIST data example, CONDOR will perform rank-consistent ordinal prediction.  The advantage of ordinal prediction in a scenario such as predicting star reviews from text is clear - misclassifications close to the star value of the actual review will be preferable to misclassifications far from the true value.

Note that the following code may take some time to run (up to several hours), depending on the specifics of your system:

In [25]:
%%time

history = model.fit(x = text_train,
                    y = labels_train,
                    epochs = 5,
                    batch_size = 32, 
                    validation_split = 0.2,
                    callbacks = [tf.keras.callbacks.EarlyStopping(patience = 2,
                                                                  min_delta = 0.001,
                                                                  restore_best_weights = True)])

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Wall time: 4h 29min 25s


#### Evaluate the model

Now we can evaluate model performance.  For comparison, CORAL achieves loss 0.7962, MAE 0.3195.

In [28]:
model.evaluate(text_test, labels_test) 



[0.7599717378616333, 0.4597877860069275, 0.311599999666214]

Next we can generate predictions on our test data using the model.  As previously with the MNIST data, the native outputs from the output later are cumulative logits which we convert to probabilities for each class/label using the condor.ordinal_softmax() method:

In [30]:
preds = model.predict(text_test)
print(preds)

probs = pd.DataFrame(condor.ordinal_softmax(preds).numpy())

[[10.18386     9.835827    7.025723    2.8384259 ]
 [ 2.8308606   2.6194139   1.7318249   1.0468074 ]
 [ 6.201608    5.4099293   4.4318047   2.5570471 ]
 ...
 [ 7.370868    4.77282     1.4445612   0.25541803]
 [ 4.92669     6.689136    5.6169343   3.5220451 ]
 [ 9.117594    8.443037    5.4765677   2.1511822 ]]


Lets have a look at some predicted probabilities versus the known labels:

In [31]:
print(probs.head(10))
print(labels_test[:10])

          0         1         2         3         4
0  0.000038  0.000054  0.000888  0.055229  0.943792
1  0.055679  0.064119  0.132342  0.194323  0.553537
2  0.002022  0.004443  0.011677  0.070649  0.911209
3  0.001537  0.008904  0.080160  0.261910  0.647488
4  0.000558  0.002269  0.032800  0.228711  0.735662
5  0.000031  0.000101  0.007847  0.120239  0.871782
6  0.000476  0.000425  0.004169  0.085749  0.909182
7  0.189523  0.441137  0.319926  0.040930  0.008484
8  0.006887  0.006142  0.011822  0.111691  0.863459
9  0.000146  0.000335  0.016527  0.208991  0.774001
[4. 1. 4. 2. 4. 4. 4. 1. 4. 4.]


#### Evaluate accuracy

Lets evaluate the accuracy and mean absolute error of the model.  First we'll generate predictions using the label with highest probability (i.e. we use the mode, like we did with the MNIST data):


In [32]:
labels_v1 = probs.idxmax(axis = 1)
print("Accuracy of label version 1:", np.mean(labels_v1 == labels_test))

Accuracy of label version 1: 0.7601


And as with the MNIST data we will again generate predictions using the method given by Equation 1 in the CONDOR paper (i.e. we used the median):

In [35]:
cum_probs = pd.DataFrame(preds).apply(special.expit).cumprod(axis=1)
labels_v2 = cum_probs.apply(lambda x: x > 0.5).sum(axis = 1)
print("Accuracy of label version 2:", np.mean(labels_v2 == labels_test))

Accuracy of label version 2: 0.7547


#### Evaluate mean absolute label error

This is effectively an ordinal version of 1 - accuracy.

In [36]:
# These do not correspond with what we get from the model evaluation. Something must be off in one of these.
print("Mean absolute label error version 1:", np.mean(np.abs(labels_v1 - labels_test)))
print("Mean absolute label error version 2:", np.mean(np.abs(labels_v2 - labels_test)))

print("Root mean squared label error version 1:", np.sqrt(np.mean(np.square(labels_v1 - labels_test))))
print("Root mean squared label error version 2:", np.sqrt(np.mean(np.square(labels_v2 - labels_test))))

Mean absolute label error version 1: 0.3283
Mean absolute label error version 2: 0.3116
Root mean squared label error version 1: 0.7588807547961669
Root mean squared label error version 2: 0.6951258878793107


In [37]:
# Review how absolute error is calculated for ordinal labels:
pd.DataFrame({"true": labels_test, "pred_v2": labels_v1, "abs": labels_v2 - labels_test}).head()

Unnamed: 0,true,pred_v2,abs
0,4.0,4,0.0
1,1.0,4,3.0
2,4.0,4,0.0
3,2.0,4,2.0
4,4.0,4,0.0


### Universal Sentence Encoder model (using pre-encoded labels for faster processing)

The "Sparse" versions of the CONDOR API are convenient and implementing them requires minimal changes to existing code. However there is a performance overhead compared to if we pre-encode the labels using CONDORs ordinal encoder method. This is because the sparse API is esssentially encoding on-the-fly inside the training loop rather than doing up-front.

Furthermore, as we will see later, the labels do not always come encoded as 0,1,...,K-1. In these cases, using the CondorOrdinalEncoder will help transform labels into ordinal-ready values.

In the code that follows we will implement up-front ordinal encoding of the labels using CONDOR's built-in functionality:

In [None]:
%%time
enc = condor.CondorOrdinalEncoder(nclasses=num_classes)
enc_labs_train = enc.fit_transform(labels_train)
enc_labs_test = enc.transform(labels_test)

Now we can compile the model.  Note that since we have pre-encoded the labels, we no longer use the 'Sparse' loss functions and metrics.  Rather we use corresponding versions that are designed for use with encoded labels:

In [None]:
model.compile(loss = condor.CondorOrdinalCrossEntropy(),
              metrics = [condor.OrdinalEarthMoversDistance(),
                         condor.OrdinalMeanAbsoluteError()],
              optimizer = tf.keras.optimizers.Adam(learning_rate = 0.001))

Next we will train the model.  Note that we pass it the encoded labels this time around:

In [None]:
%%time
history = model.fit(x = text_train,
                    y = enc_labs_train,
                    epochs = 5,
                    batch_size = 32, 
                    validation_split = 0.2,
                    callbacks = [tf.keras.callbacks.EarlyStopping(patience = 2,
                                                                  min_delta = 0.001,
                                                                  restore_best_weights = True)])

And finally evaluate:

In [None]:
model.evaluate(text_test, enc_labs_test) 

Pre-encoding requires a little extra code, but it runs quickly and so the savings later will often be worth it.  Now you can caclulate accuracies etc like we did previously.

You have now successfully implemented CONDOR on the Amazon review data using the Universal Sentence Encoder model.  Congratulations! You could stop here, or alterantively keep reading to learn more about the capabilities of CONDOR's ordinal encoder.

## More examples of label encoding capabilities

Here we further demonstrate some features of the ordinal encoder.

First we pass a numpy array of classes to the ordinal encoder.  The encoder automatically determines how many classes there are and then orders them in the default sklearn OrdinalEncoder fashion (alphabetically in this case):

In [None]:
labels = np.array(['a','b','c','d','e'])
enc_labs = condor.CondorOrdinalEncoder().fit_transform(labels)
print(enc_labs)

This time we do the same, but using a basic list of labels in place of the numpy array from the previous example:

In [None]:
labels = ['a','b','c','d','e']
enc_labs = condor.CondorOrdinalEncoder().fit_transform(labels)

print(enc_labs)

In this case we wish to specify that the order should be different from alphabetical. We do so by explicitly passing the category labels to the ordinal encoder, in order.  Note this would also allow "missing" categories to be included in proper order.

In [None]:
labels = ['low','med','high']
enc = condor.CondorOrdinalEncoder(categories=[['low', 'med', 'high']])
enc_labs = enc.fit_transform(labels)

print(enc_labs)

This handful of examples demonstrates all of the key behavior of the CONDOR ordinal encoder.  These and the MNIST and Amazon examples  above should provide you with all you need to get started implementing CONDOR in your models!

### Good luck!