# Seeing Images Through the Eyes of Decision Trees*
> Copyright Antonio Piemontese 2025
![](images_by_CART.png)

In this article, you’ll learn to:

- Turn unstructured, raw image data into structured, informative features.
- Train a decision tree classifier for image classification based on extracted image features.
- Apply the above concepts to the CIFAR-10 dataset for image classification.
Introduction

It’s no secret that decision tree-based models excel in a wide range of classification and regression tasks, often based on structured, tabular data. However, when used in combination with the right tools, decision trees can also be a powerful predictive tool for unstructured data such as text or images, and even for time series data. 

This article demonstrates how decision trees can make sense of image data that has been converted into structured, meaningful features. More specifically, we will show how to turn raw, pixel-level image data into higher-level features that describe image properties like color histograms and edge counts. We’ll then leverage this information to perform predictive tasks, like classification, by training decision trees — all with the aid of Python’s scikit-learn library.

Think about it: it’ll be like making a decision tree’s behavior more like to how our human eyes work.

**The CIFAR-10 dataset** we will use for the tutorial is a collection of low-resolution, 32×32 pixel color images, with each pixel being described by three RGB values that define its color.

![](CIFAR_10_dataset.png)

This dataset is available [here](https://www.cs.toronto.edu/~kriz/cifar.html).

Although other commonly used models for image classification, like neural networks, can process images as grids of pixels, decision trees are designed to work with structured data; hence, our primary goal is to convert our raw image data into this structured format.

We start by loading the dataset, freely available in the TensorFlow library:

In [4]:
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
import numpy as np
import matplotlib.pyplot as plt
 
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
y_train = y_train.flatten()
y_test = y_test.flatten()
 
class_names = ['airplane','automobile','bird','cat','deer',
               'dog','frog','horse','ship','truck']
 
print("Training set:", X_train.shape, y_train.shape)
print("Test set:", X_test.shape, y_test.shape)
 
# Optional: show a few samples (see article image above)
fig, axes = plt.subplots(1, 5, figsize=(10, 3))
for i, ax in enumerate(axes):
    ax.imshow(X_train[i])
    ax.set_title(class_names[y_train[i]])
    ax.axis('off')
plt.show()

ModuleNotFoundError: No module named 'tensorflow'

Notice that the loaded dataset is already partitioned into training and test sets, and the output labels (10 different classes) are also separated from the input image data. We just need to allocate these elements correctly using Python tuples, as shown above. For clarity, we also store the class names in a Python list.

Next, we define the core function in our code. This function, called extract_features(), takes an image as input and extracts the desired image features. In our example, we will extract features associated with two main image properties: color histograms for each of the three RGB channels (red, green, and blue), and a measure of edge strength.

In [None]:
from skimage.color import rgb2gray
from skimage.filters import sobel
 
def extract_features(images, bins_per_channel=8):
    features = []
    for img in images:
        # Color histogram for each of the 3 RGB channels
        hist_features = []
        for c in range(3):
            hist, _ = np.histogram(img[:,:,c], bins=bins_per_channel, range=(0, 255))
            hist_features.extend(hist)
        
        # Edge detection on grayscale image
        gray_img = rgb2gray(img)
        edges = sobel(gray_img)
        edge_strength = np.sum(edges > 0.1)
        
        # Merging features
        features.append(hist_features + [edge_strength])
    
    return np.array(features, dtype=np.float32)

The number of bins for each computed color histogram is set to 8, so that the density of information describing the image color properties remains at a reasonable level. For edge detection, we use two functions from skimage: rgb2gray and sobel, which together help detect edges on grayscale versions of our original image.

Both subsets of features are put together, and the process repeats for every image in the dataset.

We now call the function twice: once for the training set, and once for the test set. 

In [None]:
X_train_feats = extract_features(X_train)
X_test_feats = extract_features(X_test)
 
print("Feature vector size:", X_train_feats.shape[1])

The resulting number of features containing information about RGB channel histograms and detected edges amounts to 25.

That was the hard part! Now we are largely ready to train a decision tree-based classifier that takes extracted features instead of raw image data as inputs. If you are already familiar with training scikit-learn models, the whole process is self-explanatory: we just need to make sure we pass the extracted features, rather than the raw images, as the training and evaluation inputs.

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, accuracy_score

dt_model = DecisionTreeClassifier(random_state=42, max_depth=20)
dt_model.fit(X_train_feats, y_train)

y_pred_dt = dt_model.predict(X_test_feats)

print("MODEL 1. Decision Tree (Color histograms + Edge count):")
print("Accuracy:", accuracy_score(y_test, y_pred_dt))
print(classification_report(y_test, y_pred_dt, target_names=class_names))

Unfortunately, the decision tree performs rather poorly on the extracted image features. And guess what: this is entirely normal and expected.

Reducing a 32×32 color image to just 25 explanatory features is an over-simplification that misses fine-grained cues and deeper details in the image that help discriminate, for instance, a bird from an airplane, or a dog from a cat. Keep in mind that image subsets belonging to the same class (e.g. ‘plane’) also have great intra-class variations in properties like color distribution. But the important take-home message here is to learn the how-to and limitations of image feature extraction for decision tree classifiers; achieving high accuracy is not our main goal in this tutorial!

Nonetheless, would things be any better if we trained a more advanced tree-based model, like a random forest classifier? Let’s find out:

In [None]:
from sklearn.ensemble import RandomForestClassifier

rf_model = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)
rf_model.fit(X_train_feats, y_train)

y_pred_rf = rf_model.predict(X_test_feats)

print("MODEL 2. Random Forest (Color histograms + Edge count)")
print("Accuracy:", accuracy_score(y_test, y_pred_rf))
print(classification_report(y_test, y_pred_rf, target_names=class_names))

Slight improvement here, but still far from perfect. Eager for some homework? Try applying what we learned in this article to an even simpler dataset, like MNIST or fashion MNIST, and see how it performs. It only got a pass mark for classifying airplanes, still failing for the other nine classes!