# Multi-label Image Classification using Custom CNN (TensorFlow/Keras)

* Custom CNN using a small subset of real data
* Simulates **multi-labels** by combining original labels with a synthetic third label
* Dataset used: `Cats vs Dogs (Keras)`
* Activation: `Sigmoid` for multi-label
* Loss: `Binary Crossentropy`
* Output: `3-label prediction` → \[dog, cat, indoor]

In [1]:
!pip install tensorflow keras --quiet

### Import Required Libraries

We start by importing TensorFlow, NumPy, and supporting libraries for data preprocessing, model building, and evaluation.

In [2]:
import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report, multilabel_confusion_matrix
import os
import zipfile
import random

In [3]:
image_size = (128, 128)
batch_size = 32

### Download and Load Small Real Dataset

We use the **`cats_and_dogs_filtered`** dataset provided by TensorFlow.

* It contains \~1000 training images: ideal for lightweight experimentation
* We load the dataset using `image_dataset_from_directory`

In [4]:
url = "https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip"
path_to_zip = tf.keras.utils.get_file("cats_and_dogs_filtered.zip", origin=url, extract=True)

Downloading data from https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip
[1m68606236/68606236[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [9]:
import shutil

shutil.move(path_to_zip, "/content")

'/content/cats_and_dogs_filtered_extracted'

In [10]:
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    "/content/cats_and_dogs_filtered_extracted/cats_and_dogs_filtered/train",
    image_size=image_size,
    batch_size=batch_size,
    shuffle=True
)

Found 2000 files belonging to 2 classes.


In [29]:
for image_batch, label_batch in train_ds.take(1):
    img = image_batch
    label = label_batch

print(f"Shape of image batch: {img.shape}")
print(f"Shape of label batch: {label.shape}")

print(f"Label batch: {label}")

Shape of image batch: (32, 128, 128, 3)
Shape of label batch: (32,)
Label batch: [0 0 1 1 1 1 0 1 1 1 1 1 0 0 1 0 0 1 0 1 1 1 0 0 0 1 0 0 0 1 1 1]


### Simulate Multi-label Targets

Since the dataset is **not multi-label by default**, we simulate it:

* Use `dog` and `cat` as the first two labels
* Add a third synthetic label: `"indoor"`

  * Randomly assigned (0 or 1) per image
  * Simulates a real-world scenario (e.g., dog AND indoor)

This results in labels like:

```python
[1, 0, 1] → dog, not cat, indoor  
[0, 1, 0] → not dog, cat, not indoor
```


In [11]:
label_map = {'dogs': [1, 0, 0], 'cats': [0, 1, 0]}

def add_random_label(base_label):
    new_label = base_label.copy()
    # Simulate an environmental tag: 50% chance to assign 'indoor' class
    new_label[2] = random.choice([0, 1])
    return new_label

### Preprocess Image Data

* Normalize pixel values by dividing by 255
* Prepare `X` and `y` arrays
* Use `train_test_split` from scikit-learn for train-test division

In [12]:
X = []
y = []

for batch in train_ds.take(10):  # Take ~10 batches = 320 images max
    images, labels = batch
    for i in range(len(images)):
        img = images[i].numpy()
        label_index = int(labels[i].numpy())
        label_name = 'dogs' if label_index == 1 else 'cats'
        multilabel = add_random_label(label_map[label_name])
        X.append(img / 255.0)
        y.append(multilabel)

In [31]:
y[0]

array([1., 0., 1.], dtype=float32)

In [13]:
X = np.array(X)
y = np.array(y).astype('float32')

In [14]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Define the Custom CNN Model

Build a **simple CNN** architecture:

* Two convolutional + pooling layers
* Dense layer + output layer with **sigmoid** activation
* Output shape: `(batch_size, 3)` → for 3 labels
* Sigmoid ensures each output is treated independently

In [15]:
def create_model(input_shape=(128, 128, 3), num_classes=3):
    model = models.Sequential([
        layers.Conv2D(32, (3,3), activation='relu', input_shape=input_shape),
        layers.MaxPooling2D(2,2),
        layers.Conv2D(64, (3,3), activation='relu'),
        layers.MaxPooling2D(2,2),
        layers.Flatten(),
        layers.Dense(128, activation='relu'),
        layers.Dense(num_classes, activation='sigmoid')  # Multi-label → sigmoid
    ])
    return model

model = create_model()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


### Compile the Model

* **Loss**: `binary_crossentropy`
* **Optimizer**: `adam`
* **Metric**: accuracy (can be misleading for multi-label but still indicative)

In [16]:
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

### Train the Model

* Train for a small number of epochs (e.g., 5)
* Use a validation split (e.g., 10%)
* Keep the batch size moderate (e.g., 32)

In [17]:
history = model.fit(X_train, y_train, epochs=5, batch_size=32, validation_split=0.1)

Epoch 1/5
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 498ms/step - accuracy: 0.3325 - loss: 1.1451 - val_accuracy: 0.4231 - val_loss: 0.7313
Epoch 2/5
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 27ms/step - accuracy: 0.5010 - loss: 0.6924 - val_accuracy: 0.0000e+00 - val_loss: 0.7037
Epoch 3/5
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step - accuracy: 0.3253 - loss: 0.6704 - val_accuracy: 0.5385 - val_loss: 0.6814
Epoch 4/5
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step - accuracy: 0.4798 - loss: 0.6311 - val_accuracy: 0.4615 - val_loss: 0.6652
Epoch 5/5
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step - accuracy: 0.3765 - loss: 0.5700 - val_accuracy: 0.6154 - val_loss: 0.7087


### Evaluate the Model

Evaluate performance on test data:

* Use `model.evaluate()` for quick metrics
* Predict probabilities using `model.predict()`
* Threshold predictions using `y_pred > 0.5` to convert to binary labels

In [18]:
model.evaluate(X_test, y_test)

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 24ms/step - accuracy: 0.7083 - loss: 0.6517 


[0.6634343862533569, 0.65625]

In [19]:
y_pred = model.predict(X_test)
y_pred_binary = (y_pred > 0.5).astype(int)

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step 


In [22]:
y_pred[0:3], y_pred_binary[0:3]

(array([[0.35356063, 0.62620795, 0.27926955],
        [0.5137683 , 0.39016345, 0.20910904],
        [0.41531923, 0.55359495, 0.42816448]], dtype=float32),
 array([[0, 1, 0],
        [1, 0, 0],
        [0, 1, 0]]))

### Analyze Results

Use:

* `classification_report()` to print precision, recall, and F1-score
* `multilabel_confusion_matrix()` to inspect per-label confusion

In [23]:
print(classification_report(y_test, y_pred_binary, target_names=['dog', 'cat', 'indoor']))

              precision    recall  f1-score   support

         dog       0.69      0.55      0.61        33
         cat       0.69      0.65      0.67        31
      indoor       0.33      0.04      0.08        23

   micro avg       0.67      0.45      0.54        87
   macro avg       0.57      0.41      0.45        87
weighted avg       0.60      0.45      0.49        87
 samples avg       0.60      0.49      0.53        87



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [24]:
conf_matrices = multilabel_confusion_matrix(y_test, y_pred_binary)
for i, cm in enumerate(conf_matrices):
    print(f"\nConfusion matrix for class {i}:\n{cm}")


Confusion matrix for class 0:
[[23  8]
 [15 18]]

Confusion matrix for class 1:
[[24  9]
 [11 20]]

Confusion matrix for class 2:
[[39  2]
 [22  1]]


### If You Have GPU + Storage

If you have a **powerful GPU and large storage**, you can use the **Open Images Dataset**:

* It's a large-scale dataset for multi-label classification
* Available via \[TensorFlow Datasets (TFDS)] or direct download
* Comes with complex, noisy, real-world multi-label annotations
* You'll need preprocessing scripts and label parsing (TFRecord parsing or Pandas from CSVs)
