# Oldies but Goldies Project: Decision Trees and Neural Networks

<p style="text-align: center;">Leander Girrbach <br> <a href="mailto:girrbach@cl.uni-heidelberg.de">girrbach@cl.uni-heidelberg.de</a></p>

**Important note**: This project consists of 2 parts:
 1. Differentiable decision trees
 2. Distilling neural networks into (conventional) decision trees

Both parts explore different ideas for combining neural networks (mostly MLPs) and decision trees (or random forests).

## Part 2: Distilling Neural Networks into Decision Trees

### Idea

When training a neural network on a classification task, we receive an uninterpretable classifier. A subfield of deep learning is distillation, which seeks to make a smaller neural network behave like a larger neural network. One of the main reasons for distillation is to reduce the computational power needed to solve a certain problem.

The same idea can be used to a learn a completely different type of classifier to imitate the calculations of the neural network. To this end, we view the neural network as a multivariate function mapping input vectors to probability distributions over the labels $\mathbb{R}^{d_\text{in}} \rightarrow  \mathbb{R}^{\#\text{labels}}$.

We can use a trained teacher model and some data to train another student classifier to behave like the teacher neural network. We can either require the student classifier to only output the same labels as the teacher neural network or also require the student classifier to output the same probability distribution over labels as the teacher neural network.

### Method

Given a dataset of paired inputs and labels $(\mathcal{X}_{\text{train}}, \mathcal{Y}_{\text{train}})$, I train a neural networks on $\mathcal{X}_{\text{train}}$ to predict the corresponding labels. This model serves as the teacher. After the training process, I use another set of inputs $\mathcal{X}_{\text{distill}}$ to calculate the label distribution induced by the teacher model. By taking the $\arg\max$, I can get the predicted labels.

For $\mathcal{X}_{\text{distill}}$, I evaluate $2$ options:

 * The train dataset $\mathcal{X}_{\text{train}}$
 * A larger dataset containing documents from the same domain

Using this information, I train Decision Tree classifiers / Random Forest classifiers to predict the same labels on $\mathcal{X}_{\text{distill}}$ as the teachter model. Furthermore, I train Decision Tree regressors / Random Forest regressors to predict the same probability distributions on $\mathcal{X}_{\text{distill}}$ as the teacher model.

### Data

For $\mathcal{X}_{\text{train}}$, I reuse the 20 newsgroups dataset as in Part 1. For $\mathcal{X}_{\text{distill}}$, I add documents from the AG NEWS-dataset as provided by `torchtext`. Preprocessing is the same as in Part 1, namely lowercasing, tokenising, lemmatising, and filtering stopwords.

### Models

I evaluate the distilling neural networks on $2$ types of neural networks (trained on the same data):

 1. A MLP feedforward neural network. Here, I represent documents by SVD truncated tf-idf weighted bag-of-words features.
 2. A Bidirectional LSTM classifier. Here, tokens are represented by pretrained word2vec embeddings (provided by `gensim`)

The MLP has $2$ hidden layers with $128$ units each. The LSTM has also has $2$ layers with $128$ units each (both directions). Both models are trained using the Adam optimiser with default parameters (as specified by `sklearn`/`keras`) by minimising the cross-entropy of the predicted label probabilities and the real one-hot-encoded labels. Batch size is $32$ in both cases. The LSTM is trained for $20$ epochs.

### Evaluation

I report the following metrics:

 * Test set accuracy (on the 20 newsgroups test set)
 * Reference accuracy (on the 20 newsgroups test set): Here, the predictions of the teacher model are treated as true labels
 * Train set accuracy (on the 20 newsgroups train set)
 * R2 coefficient of determination between probabilities predicted by student and teacher models
 * KL-Divergence between probabilities predicted by student and teacher models. For decision trees, this makes only sense for the regressors, because classification trees return only one label (one-hot distribution)
 
With these metrics, both a good impression of the overall performance of the models (wrt. the data) and the approximation performance (wrt. approximating the teacher model) can be evaluated. 

In [1]:
import spacy
import warnings
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
import gensim.downloader as gensim

from tensorflow import keras
from tensorflow.keras import layers

from keras.utils import np_utils
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer

from tqdm.notebook import tqdm
from tqdm.notebook import trange
from torchtext.datasets import AG_NEWS

from scipy.stats import entropy
from sklearn.metrics import r2_score
from sklearn.metrics import accuracy_score
from sklearn.decomposition import TruncatedSVD
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import fetch_20newsgroups
from sklearn.tree import DecisionTreeClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import TfidfVectorizer

Using TensorFlow backend.


In [2]:
# Seeds for determinism
np.random.seed(123)
tf.random.set_seed(123)

# Who cares for warnings?
warnings.filterwarnings('ignore')

In [3]:
# We use the 20 newsgroups dataset as our main dataset
train_data = fetch_20newsgroups(subset='train')
test_data = fetch_20newsgroups(subset='test')
X_train_raw, y_train = train_data.data, train_data.target
X_test_raw, y_test = test_data.data, test_data.target

# Then we use the AG News dataset for synthetic data
# Here, we only need the documents, not the labels
agnews_train, agnews_test = AG_NEWS()
_, agnews_X_train = zip(*agnews_train)
_, agnews_X_test = zip(*agnews_test)

# For synthetic data, we use the train portion of the 20 newsgroups dataset
# and the AG News data
X_synthetic_raw = np.concatenate([X_train_raw, agnews_X_train, agnews_X_test])
# For synthetic data, we do not have labels yet. First, we need to train a classifier, which we will do later
# ---

In [4]:
spacy_preprocessor = spacy.load("en_core_web_sm", disable=["parser", "ner"])

def preprocess(documents):
    processed_documents = []
    documents = tqdm(documents)
    documents.set_description("Processing documents")
    for document in documents:
        document = document.lower()
        document = spacy_preprocessor(document)
        document = [token.lemma_ for token in document if not token.is_stop]
        processed_documents.append(" ".join(document))
    return processed_documents

X_train_text = preprocess(X_train_raw)
X_test_text = preprocess(X_test_raw)
X_synthetic_text = preprocess(X_synthetic_raw)

  0%|          | 0/11314 [00:00<?, ?it/s]

  0%|          | 0/7532 [00:00<?, ?it/s]

  0%|          | 0/138914 [00:00<?, ?it/s]

In [5]:
print("Encoding labels by integers")
label_encoder = LabelEncoder()
label_encoder.fit(y_train)
y_train = label_encoder.transform(y_train)
y_test = label_encoder.transform(y_test)

print("Constructing tf-idf weighted document-term matrix")
vectoriser = TfidfVectorizer(tokenizer=str.split, min_df=10)
vectoriser.fit(X_train_text)
X_train_vector = vectoriser.transform(X_train_text)
X_test_vector = vectoriser.transform(X_test_text)
X_synthetic_vector = vectoriser.transform(X_synthetic_text)

print("Performing Matrix factorisation using SVD")
svd = TruncatedSVD(n_components=512)
svd.fit(X_train_vector)
X_train_vector = svd.transform(X_train_vector)
X_test_vector = svd.transform(X_test_vector)
X_synthetic_vector = svd.transform(X_synthetic_vector)

print("Making sequence data")
# From https://stackabuse.com/python-for-nlp-multi-label-text-classification-with-keras/
tokenizer = Tokenizer(num_words=20000)
tokenizer.fit_on_texts(X_train_text)

X_train_sequence = tokenizer.texts_to_sequences(X_train_text)
X_test_sequence = tokenizer.texts_to_sequences(X_test_text)
X_synthetic_sequence = tokenizer.texts_to_sequences(X_synthetic_text)

vocab_size = len(tokenizer.word_index) + 1
maxlen = 400

X_train_sequence = pad_sequences(X_train_sequence, padding='post', maxlen=maxlen)
X_test_sequence = pad_sequences(X_test_sequence, padding='post', maxlen=maxlen)
X_synthetic_sequence = pad_sequences(X_synthetic_sequence, padding='post', maxlen=maxlen)
y_train_sequence = np_utils.to_categorical(y_train)
y_test_sequence = np_utils.to_categorical(y_test)

Encoding labels by integers
Constructing tf-idf weighted document-term matrix
Performing Matrix factorisation using SVD
Making sequence data


In [6]:
print("Building the LSTM model")
print("Building the embedding matrix")
embeddings = gensim.load('word2vec-google-news-300')
embedding_matrix = np.zeros((vocab_size, 300))
for word, index in tokenizer.word_index.items():
    try:
        embedding_vector = embeddings[word]
        embedding_matrix[index] = embedding_vector
    except KeyError:
        continue

# From https://keras.io/examples/nlp/bidirectional_lstm_imdb/
print("Building Keras model")
# Input for variable-length sequences of integers
inputs = keras.Input(shape=(None,), dtype="int32")
# Embed each integer in a 50-dimensional vector using pretrained embeddings
x = layers.Embedding(vocab_size, 300, weights=[embedding_matrix], trainable=False)(inputs)
# Add 2 bidirectional LSTMs
x = layers.Bidirectional(layers.LSTM(128, return_sequences=True))(x)
x = layers.Bidirectional(layers.LSTM(128))(x)
# Add a classifier
outputs = layers.Dense(20, activation="softmax")(x)
reference_lstm = keras.Model(inputs, outputs)
reference_lstm.summary()

reference_lstm.compile("adam", "categorical_crossentropy", metrics=["accuracy"])

Building the LSTM model
Building the embedding matrix
Building Keras model
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, None)]            0         
_________________________________________________________________
embedding (Embedding)        (None, None, 300)         35796900  
_________________________________________________________________
bidirectional (Bidirectional (None, None, 256)         439296    
_________________________________________________________________
bidirectional_1 (Bidirection (None, 256)               394240    
_________________________________________________________________
dense (Dense)                (None, 20)                5140      
Total params: 36,635,576
Trainable params: 838,676
Non-trainable params: 35,796,900
_________________________________________________________________


In [7]:
# Train Reference MLP: 2-layer MLP on train portion
reference_mlp = MLPClassifier((128, 128,), batch_size=32)
reference_mlp.fit(X_train_vector, y_train)

# Train Reference Decision Tree
reference_decision_tree = DecisionTreeClassifier()
reference_decision_tree.fit(X_train_vector, y_train)

# Train reference Random Forest
reference_random_forest = RandomForestClassifier(n_estimators=512, n_jobs=20)
reference_random_forest.fit(X_train_vector, y_train)

# Train reference BiLSTM model
reference_lstm.fit(X_train_sequence, y_train_sequence, batch_size=32, epochs=20)
print("Done.")

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Done.


In [11]:
# Create synthetic labels from MLP classifier
# First, get labels for the whole synthetic data
y_synthetic_labels = reference_mlp.predict(X_synthetic_vector)
y_synthetic_probabilities = reference_mlp.predict_proba(X_synthetic_vector)
# Also get probabilities for only the 20 newsgroups train set
y_train_probabilities = reference_mlp.predict_proba(X_train_vector)

In [12]:
# Now we distill the MLP into different Decision Tree/Random Forest models
# using the predictions
synthetic_decision_tree_classifier = DecisionTreeClassifier()
synthetic_decision_tree_regressor = DecisionTreeRegressor()
synthetic_random_forest_classifier = RandomForestClassifier(n_estimators=512, n_jobs=-1)
synthetic_random_forest_regressor = RandomForestRegressor(n_estimators=512, n_jobs=-1)

# Since the MLP can fit the train data almost perfectly, we don't need to train
# a decision tree on the predicted train labels
train_decision_tree_regressor = DecisionTreeRegressor()
train_random_forest_regressor = RandomForestRegressor(n_estimators=512, n_jobs=-1)

print("Training synthetic tree classifier")
synthetic_decision_tree_classifier.fit(X_synthetic_vector, y_synthetic_labels)
print("Training synthetic tree regressor")
synthetic_decision_tree_regressor.fit(X_synthetic_vector, y_synthetic_probabilities)
print("Training synthetic forest classifier")
synthetic_random_forest_classifier.fit(X_synthetic_vector, y_synthetic_labels)
print("Training synthetic forest regressor")
synthetic_random_forest_regressor.fit(X_synthetic_vector, y_synthetic_probabilities)
print("Training train tree regressor")
train_decision_tree_regressor.fit(X_train_vector, y_train_probabilities)
print("Training train forest regressor")
train_random_forest_regressor.fit(X_train_vector, y_train_probabilities)
print("Done.")

Training synthetic tree classifier
Training synthetic tree regressor
Training synthetic forest classifier
Training synthetic forest regressor
Training train tree regressor
Training train forest regressor
Done.


In [18]:
y_ref = reference_mlp.predict(X_test_vector)
y_ref_probabilities = reference_mlp.predict_proba(X_test_vector)

def get_metrics(y_pred_test, y_pred_train, y_pred_test_probabilities):
    return [
        accuracy_score(y_test, y_pred_test),
        accuracy_score(y_ref, y_pred_test),
        accuracy_score(y_train, y_pred_train),
        r2_score(y_ref_probabilities, y_pred_test_probabilities),
        np.mean(entropy(y_ref_probabilities, qk=y_pred_test_probabilities, axis=1))
    ]

def evaluate_classifier(classifier):
    y_pred_test = classifier.predict(X_test_vector)
    y_pred_train = classifier.predict(X_train_vector)
    y_pred_test_probabilities = classifier.predict_proba(X_test_vector)
    
    return get_metrics(y_pred_test, y_pred_train, y_pred_test_probabilities)

    
def evaluate_regressor(regressor):
    y_pred_test_probabilities = regressor.predict(X_test_vector)
    y_pred_test = np.argmax(y_pred_test_probabilities, axis=1)
    y_pred_train = np.argmax(regressor.predict(X_train_vector), axis=1)
    
    return get_metrics(y_pred_test, y_pred_train, y_pred_test_probabilities)

def evaluate_lstm(lstm):
    y_pred_test_probabilities = lstm.predict(X_test_sequence)
    y_pred_test = np.argmax(y_pred_test_probabilities, axis=1)
    y_pred_train = np.argmax(lstm.predict(X_train_sequence), axis=1)
    
    return get_metrics(y_pred_test, y_pred_train, y_pred_test_probabilities)

    
results = [
    ["Reference MLP"] + evaluate_classifier(reference_mlp),
    ["Reference LSTM"] + evaluate_lstm(reference_lstm),
    ["Reference Decision Tree"] + evaluate_classifier(reference_decision_tree),
    ["Reference Random Forest"] + evaluate_classifier(reference_random_forest),
    ["(Train only) Distilled Decision Tree Regressor"] + evaluate_regressor(train_decision_tree_regressor),
    ["(Train only) Distilled Random Forest Regressor"] + evaluate_regressor(train_random_forest_regressor),
    ["(Synthetic) Distilled Decision Tree Classifier"] + evaluate_classifier(synthetic_decision_tree_classifier),
    ["(Synthetic) Distilled Random Forest Classifier"] + evaluate_classifier(synthetic_random_forest_classifier),
    ["(Synthetic) Distilled Decision Tree Regressor"] + evaluate_regressor(synthetic_decision_tree_regressor),
    ["(Synthetic) Distilled Random Forest Regressor"] + evaluate_regressor(synthetic_random_forest_regressor),
]

In [19]:
headers = ['Model', "Test Accuracy", "Reference Accuracy", "Train Accuracy",
           "R2", "Test KL-Divergence"]
result_dataframe_mlp = pd.DataFrame(results, columns=headers)

In [21]:
# Create synthetic labels from LSTM classifier
# First, get labels for the whole synthetic data
y_synthetic_probabilities = reference_lstm.predict(X_synthetic_sequence)
y_synthetic_labels = y_synthetic_probabilities.argmax(axis=1)
# Also get probabilities for only the 20 newsgroups train set
y_train_probabilities = reference_lstm.predict(X_train_sequence)

In [22]:
# Now we distill the LSTM into different Decision Tree/Random Forest models
# using the predictions
synthetic_decision_tree_classifier = DecisionTreeClassifier()
synthetic_decision_tree_regressor = DecisionTreeRegressor()
synthetic_random_forest_classifier = RandomForestClassifier(n_estimators=512, n_jobs=-1)
synthetic_random_forest_regressor = RandomForestRegressor(n_estimators=512, n_jobs=-1)

# Since the MLP can fit the train data almost perfectly, we don't need to train
# a decision tree on the predicted train labels
train_decision_tree_regressor = DecisionTreeRegressor()
train_random_forest_regressor = RandomForestRegressor(n_estimators=512, n_jobs=-1)

print("Training synthetic tree classifier")
synthetic_decision_tree_classifier.fit(X_synthetic_vector, y_synthetic_labels)
print("Training synthetic tree regressor")
synthetic_decision_tree_regressor.fit(X_synthetic_vector, y_synthetic_probabilities)
print("Training synthetic forest classifier")
synthetic_random_forest_classifier.fit(X_synthetic_vector, y_synthetic_labels)
print("Training synthetic forest regressor")
synthetic_random_forest_regressor.fit(X_synthetic_vector, y_synthetic_probabilities)
print("Training train tree regressor")
train_decision_tree_regressor.fit(X_train_vector, y_train_probabilities)
print("Training train forest regressor")
train_random_forest_regressor.fit(X_train_vector, y_train_probabilities)
print("Done.")

Training synthetic tree classifier
Training synthetic tree regressor
Training synthetic forest classifier
Training synthetic forest regressor
Training train tree regressor
Training train forest regressor
Done.


In [27]:
y_ref_probabilities = reference_lstm.predict(X_test_sequence)
y_ref = y_ref_probabilities.argmax(axis=1)

def get_metrics(y_pred_test, y_pred_train, y_pred_test_probabilities):
    return [
        accuracy_score(y_test, y_pred_test),
        accuracy_score(y_ref, y_pred_test),
        accuracy_score(y_train, y_pred_train),
        r2_score(y_ref_probabilities, y_pred_test_probabilities),
        np.mean(entropy(y_ref_probabilities, qk=y_pred_test_probabilities, axis=1))
    ]

def evaluate_classifier(classifier):
    y_pred_test = classifier.predict(X_test_vector)
    y_pred_train = classifier.predict(X_train_vector)
    y_pred_test_probabilities = classifier.predict_proba(X_test_vector)
    
    return get_metrics(y_pred_test, y_pred_train, y_pred_test_probabilities)
    
def evaluate_regressor(regressor):
    y_pred_test_probabilities = regressor.predict(X_test_vector)
    y_pred_test = np.argmax(y_pred_test_probabilities, axis=1)
    y_pred_train = np.argmax(regressor.predict(X_train_vector), axis=1)
    
    return get_metrics(y_pred_test, y_pred_train, y_pred_test_probabilities)

def evaluate_lstm(lstm):
    y_pred_test_probabilities = lstm.predict(X_test_sequence)
    y_pred_test = np.argmax(y_pred_test_probabilities, axis=1)
    y_pred_train = np.argmax(lstm.predict(X_train_sequence), axis=1)
    
    return get_metrics(y_pred_test, y_pred_train, y_pred_test_probabilities)

results = [
    ["Reference LSTM"] + evaluate_lstm(reference_lstm),
    ["Reference MLP"] + evaluate_classifier(reference_mlp),
    ["Reference Decision Tree"] + evaluate_classifier(reference_decision_tree),
    ["Reference Random Forest"] + evaluate_classifier(reference_random_forest),
    ["(Train only) Distilled Decision Tree Regressor"] + evaluate_regressor(train_decision_tree_regressor),
    ["(Train only) Distilled Random Forest Regressor"] + evaluate_regressor(train_random_forest_regressor),
    ["(Synthetic) Distilled Decision Tree Classifier"] + evaluate_classifier(synthetic_decision_tree_classifier),
    ["(Synthetic) Distilled Random Forest Classifier"] + evaluate_classifier(synthetic_random_forest_classifier),
    ["(Synthetic) Distilled Decision Tree Regressor"] + evaluate_regressor(synthetic_decision_tree_regressor),
    ["(Synthetic) Distilled Random Forest Regressor"] + evaluate_regressor(synthetic_random_forest_regressor),
]

In [28]:
headers = ['Model', "Test Accuracy", "Reference Accuracy", "Train Accuracy",
           "R2", "Test KL-Divergence"]
result_dataframe_lstm = pd.DataFrame(results, columns=headers)

In [31]:
result_dataframe_mlp

Unnamed: 0,Model,Test Accuracy,Reference Accuracy,Train Accuracy,Absolute Error,Test KL-Divergence
0,Reference MLP,0.752257,1.0,0.997348,1.0,0.0
1,Reference LSTM,0.773898,0.70778,0.987538,0.500799,1.661332
2,Reference Decision Tree,0.423526,0.436139,0.999912,-0.227965,inf
3,Reference Random Forest,0.714153,0.725438,0.999912,0.379902,inf
4,(Train only) Distilled Decision Tree Regressor,0.430563,0.439989,0.997348,-0.198031,15.219741
5,(Train only) Distilled Random Forest Regressor,0.658391,0.667552,0.996995,0.487675,1.223216
6,(Synthetic) Distilled Decision Tree Classifier,0.343601,0.350903,0.997348,-0.417319,inf
7,(Synthetic) Distilled Random Forest Classifier,0.619756,0.63821,0.997348,0.274613,inf
8,(Synthetic) Distilled Decision Tree Regressor,0.346256,0.356745,0.997348,-0.273317,11.981914
9,(Synthetic) Distilled Random Forest Regressor,0.579395,0.593733,0.99726,0.385302,1.498645


In [30]:
result_dataframe_lstm

Unnamed: 0,Model,Test Accuracy,Reference Accuracy,Train Accuracy,R2,Test KL-Divergence
0,Reference LSTM,0.773898,1.0,0.987538,1.0,0.0
1,Reference MLP,0.752257,0.70778,0.997348,0.459579,3.902797
2,Reference Decision Tree,0.423526,0.41171,0.999912,-0.333433,inf
3,Reference Random Forest,0.714153,0.672464,0.999912,0.365827,inf
4,(Train only) Distilled Decision Tree Regressor,0.43043,0.411976,0.987538,-0.27011,6.045733
5,(Train only) Distilled Random Forest Regressor,0.661445,0.629846,0.98577,0.456036,1.243802
6,(Synthetic) Distilled Decision Tree Classifier,0.326341,0.320499,0.987538,-0.53735,inf
7,(Synthetic) Distilled Random Forest Classifier,0.546734,0.537175,0.987538,0.240298,inf
8,(Synthetic) Distilled Decision Tree Regressor,0.347716,0.341211,0.987538,-0.252018,4.957595
9,(Synthetic) Distilled Random Forest Regressor,0.559612,0.541689,0.986389,0.357237,1.498531


### Analysis

This experiment, too, shows that Random Forests are superior to single Decision Trees, which is not surprising. Regressors are also better than classifiers in this case.

Comparing the different metrics proves that accuracy is not a good suitable metric for measuring how similar the calculations are, because the non-distilled ("reference") models achieve strong accuracy results, while distilled models are visibly better when comparing the coefficient of determination (R2) and KL-Divergence.

This comparison also shows that at least Regression Forests are to some extend able to simulate the calculations of neural networks. This becomes especially clear from looking at the KL-Divergence. However, this ability remains rather limited, which is visible from the overall performance and the exact scores.

Two surprising findings are that the additional data doesn't increase or even decreases the performance of the distilled models, and that results for distilling the LSTM are very similar to results for distilling the MLP. Possible consequences are either that decision trees/random forests do not benefit very much from additional data, or that in this case, there is a domain mismatch between $\mathcal{X}_{\text{test}}$ and $\mathcal{X}_{\text{distill}}$. Another possible consequence is that the performance of distilled trees may be rather independent of the teacher model complexity.

Summing up, these experiments have shown that distilling neural networks into decision trees/random forests yield only very limited success. The drop in performance is huge, and training an independent classifier on the original data always yields better results. Also bear in mind that trees cannot really process sequence data, which makes them an unpractical tool in NLP in general.