# Image Classification
The purpose of this notebook is to demonstrate the use of a pre-trained model to build a custom image classifier. It will also show-case the use of data augmentation.

In [45]:
import os
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator

First, let us define the path. We will be using the following folder hierarchy.

```
data
├── train
│   ├── indoor
│   └── outdoor
└── test
    ├── indoor
    └── outdoor
```

I do not have a specific validation folder. Instead, I am using 20% of training images as the validation set.

In [46]:
# -------------------------
# 1. Define Paths
# -------------------------
base_dir = 'data'  # This folder should contain subfolders 'indoor/' and 'outdoor/'
train_dir = os.path.join(base_dir, 'train')
#val_dir   = os.path.join(base_dir, 'val')
test_dir  = os.path.join(base_dir, 'test')

I am using `ImageDataGenerator` function of `tensorflow` to create variations from the original image. Image will be rotated, stretched, etc. randomly. Note that no actual image is created or stored. This operation happens in-memory while the training process begins.

In [47]:
# If you only have a single dataset folder (with subfolders for each class),
# you can split them manually or use ImageDataGenerator's split parameter.

# -------------------------
# 2. Data Augmentation
# -------------------------
train_datagen = ImageDataGenerator(
    rescale=1.0/255,
    rotation_range=20,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True,
    vertical_flip=False,
    validation_split=0.2  # if using a single folder, 20% for validation    
)

val_datagen = ImageDataGenerator(
    rescale=1.0/255,
    validation_split=0.2    
)

test_datagen = ImageDataGenerator(
    rescale=1.0/255    
)

In [48]:
# -------------------------
# 3. Data Generators
# -------------------------
batch_size = 8
img_size = (224, 224)  # typical for many models like ResNet

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=img_size,
    batch_size=batch_size,
    class_mode='binary',
    subset='training',  # set to 'training'
    seed=42
)

val_generator = val_datagen.flow_from_directory(
    train_dir,
    target_size=img_size,
    batch_size=batch_size,
    class_mode='binary',
    subset='validation',  # set to 'validation'
    seed=42
)

test_generator = test_datagen.flow_from_directory(
    test_dir,
    target_size=img_size,
    batch_size=1,
    class_mode='binary',    
    seed=42
    
)

Found 65 images belonging to 2 classes.
Found 16 images belonging to 2 classes.
Found 19 images belonging to 2 classes.


In this extremely tiny training set, I have:
* Training set = 65 images (around 32 images for each label)
* Validation set = 16 images (8 images for each label)
* Testing set = 19 images (9 images for each label)
This is definitely sub-optimal. However, I want to use this to test my hypothesis. So this is good enough as a first iteration. 

## Pre-Trained Model
For this use-case, I am using the pre-trained model [MobileNetV2](https://huggingface.co/docs/transformers/model_doc/mobilenet_v2). In my use case, I have just 50 images each for the 2 labels that I will be training. Due to the small data set, a smaller model like MobileNet would be better suited.

In [49]:
# -------------------------
# 4. Load a Pretrained Model
# -------------------------
# We'll use a pretrained MobileNetV2 for speed. You could use ResNet50, VGG16, etc.

base_model = tf.keras.applications.MobileNetV2(
    input_shape=img_size + (3,),
    include_top=False,  # exclude final fully-connected layer
    weights='imagenet'
)

In [50]:
# Freeze the base model
base_model.trainable = False


In [51]:
# -------------------------
# 5. Add Custom Layers on Top
# -------------------------
model = models.Sequential([
    base_model,
    layers.GlobalAveragePooling2D(),  # Pools across entire feature map
    layers.Dropout(0.2),             # A bit of dropout for regularization
    layers.Dense(1, activation='sigmoid')  # Binary classification
])

In [52]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss='binary_crossentropy',
    metrics=['accuracy'],    
)

I added early stopping when I noticed that the model was overfiting after the 7th iteration. However, the early stopping params are not good. The training proceeded as usual. 

If you notice below, the training should have stopped at iteration 18. Since I have the patience as 3, it hit the 20 epochs limit and training stopped. So I am missing the most optimal point. Maybe I should bump up # of epochs and force the model to stop at iteration # 18.

TODO: Fix the early stopping criteria or increase # of epochs.

In [53]:
# add an early stopping parameter
from tensorflow.keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(
    monitor='val_loss',         # metric to monitor (could also be 'val_accuracy')
    patience=3,                 # how many epochs to wait before stopping
    restore_best_weights=True   # restore the best model weights at the end
)

In [54]:
# -------------------------
# 6. Train the Model
# -------------------------
epochs = 20  # Increase if you have more data or can handle more training
history = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=epochs,
    callbacks=[early_stopping]
)

Epoch 1/20


  self._warn_if_super_not_called()


[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 110ms/step - accuracy: 0.4869 - loss: 0.7842 - val_accuracy: 0.5625 - val_loss: 0.6586
Epoch 2/20
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 100ms/step - accuracy: 0.6289 - loss: 0.6191 - val_accuracy: 0.6875 - val_loss: 0.5835
Epoch 3/20
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 53ms/step - accuracy: 0.6966 - loss: 0.5420 - val_accuracy: 0.9375 - val_loss: 0.5361
Epoch 4/20
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 81ms/step - accuracy: 0.8393 - loss: 0.4380 - val_accuracy: 0.8125 - val_loss: 0.5036
Epoch 5/20
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 109ms/step - accuracy: 0.8597 - loss: 0.3835 - val_accuracy: 0.9375 - val_loss: 0.4611
Epoch 6/20
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 91ms/step - accuracy: 0.9261 - loss: 0.2980 - val_accuracy: 0.8750 - val_loss: 0.4485
Epoch 7/20
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[

In [55]:
# -------------------------
# 7. Fine-Tuning (Optional)
# -------------------------
# Unfreeze part (or all) of the base model’s layers to fine-tune.

# Let's say we unfreeze the last few layers of MobileNetV2:
unfreeze_at = 100  # layer index to start unfreezing from
for layer in base_model.layers[unfreeze_at:]:
    layer.trainable = True

model.compile(
    optimizer=tf.keras.optimizers.Adam(1e-4),  # lower learning rate
    loss='binary_crossentropy',
    metrics=['accuracy']
)

fine_tune_epochs = 5
history_fine = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=fine_tune_epochs
)

Epoch 1/5
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 155ms/step - accuracy: 0.6907 - loss: 0.5052 - val_accuracy: 0.9375 - val_loss: 0.2072
Epoch 2/5
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 131ms/step - accuracy: 0.9553 - loss: 0.2132 - val_accuracy: 0.9375 - val_loss: 0.1573
Epoch 3/5
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 158ms/step - accuracy: 0.8170 - loss: 0.2373 - val_accuracy: 1.0000 - val_loss: 0.1229
Epoch 4/5
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 118ms/step - accuracy: 1.0000 - loss: 0.1455 - val_accuracy: 0.9375 - val_loss: 0.1119
Epoch 5/5
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 90ms/step - accuracy: 0.9877 - loss: 0.0665 - val_accuracy: 1.0000 - val_loss: 0.0895


In [56]:
test_loss, test_acc = model.evaluate(test_generator)
print(f"Test accuracy: {test_acc:.2f}")

[1m19/19[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 37ms/step - accuracy: 0.8971 - loss: 0.1664
Test accuracy: 0.84


This model shows 84% accuracy for the test data set. It is not the greatest but, it is a decent start. So let's save and use it for inference. 

My main objective - Inference time. If you notice, it is just 37ms. If I had tried to use a visual question answer (VQA) model like LlaVA, this query would have taken a few seconds. Instead, the custom model is orders of magnitude faster. So my hypothesis holds value. It makes sense to invest in this setup when needed.

In [60]:
# saving the model
model.save("indoor_outdoor_classifier_savedmodel.keras")

## Inference Pipeline

The code below is to build a workflow to run a prediction. This is one-off. For a better inference pipeline, please look at `app.py`. In that file, I am using `streamlit` app to make interactive predictions.

In [61]:
# Pipeline to predict
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing import image

IMG_SIZE = (224, 224)  # same as in training

def predict_single_image(model, img_path, class_names=('indoor', 'outdoor')):
    """
    Loads an image, preprocesses it, and returns the predicted class.
    """
    # 1. Load the image from disk
    img = image.load_img(img_path, target_size=IMG_SIZE)

    # 2. Convert to array & scale
    img_array = image.img_to_array(img)
    img_array = img_array / 255.0  # because we used rescale=1/255 in training
    img_array = np.expand_dims(img_array, axis=0)  # model expects batch dimension

    # 3. Make prediction
    pred = model.predict(img_array)[0][0]
    
    # 4. Interpret the prediction
    #   - If using a single sigmoid output: 
    #       - p < 0.5 => "indoor", p >= 0.5 => "outdoor"
    #   - Adjust logic if you used a different output layer or threshold
    if pred < 0.5:
        return class_names[0]  # indoor
    else:
        return class_names[1]  # outdoor


In [62]:
## Loading the model and predicting a single image
from tensorflow.keras.models import load_model
loaded_model = load_model("indoor_outdoor_classifier_savedmodel.keras")

# Predict a single image
img_path = "data/test/pool.jpg"
prediction = predict_single_image(loaded_model, img_path)
print("Prediction:", prediction)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 300ms/step
Prediction: indoor
