# Geospatial Image Classification with Deep Learning  
## End‑to‑End Satellite Image Analysis using CNNs and Vision Transformers

## Table of Contents

1. [The Dataset](#The-Dataset)

2. [Importing The Required Libraries and Data](#Importing-The-Required-Libraries-and-Data)
    - [TensorFlow Environment Settings](#TensorFlow-Environment-Settings)
    - [The Required Libraries](#The-Required-Libraries)
    - [Set Random Seed for Reproducibility](#Set-Random-Seed-for-Reproducibility)
    - [Check GPU Availability](#Check-GPU-Availability)
    - [Define Data Folder Path](#Define-Data-Folder-Path)

3. [Model Hyperparameters](#Model-Hyperparameters)

4. [Create Image Data Generator for Data Augmentation](#Create-Image-Data-Generator-for-Data-Augmentation)

5. [Create Training and Validation Generators](#Create-Training-and-Validation-Generators)

## The Dataset

## Importing The Required Libraries and Data

### TensorFlow Environment Settings

> Environment Variables:

- `TF_ENABLE_ONEDNN_OPTS` \
Controls Intel oneDNN CPU optimizations in TensorFlow.
  - `1` → enable optimized CPU kernels (default, faster)
  - `0` → disable them (useful for reproducibility or avoiding numerical differences)

- `TF_CPP_MIN_LOG_LEVEL` \
Controls how much TensorFlow logs to the console.
  - `0` → show all logs  
  - `1` → hide INFO  
  - `2` → hide INFO + WARNING  
  - `3` → show only errors  


Environment variables must be set before TensorFlow loads, otherwise they have no effect. This ensures TensorFlow reads those settings during initialization.

In [7]:
os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0'
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

### The Required Libraries

In [8]:
import warnings
warnings.filterwarnings('ignore')

import os
import sys
import time
import shutil
import random
import numpy as np
from tqdm import tqdm
import matplotlib.pyplot as plt

In [9]:
import tensorflow as tf
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import (
    Conv2D, MaxPooling2D, Dense, Flatten, Dropout,
    BatchNormalization, GlobalAveragePooling2D
)
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.initializers import HeUniform
from tensorflow.keras.callbacks import ModelCheckpoint

from sklearn.metrics import accuracy_score

### Set Random Seed for Reproducibility  

In [10]:
SEED = 62
random.seed(SEED) 
np.random.seed(SEED) 
tf.random.set_seed(SEED)

### Check GPU Availability

In [11]:
device = "gpu" if tf.config.list_physical_devices('GPU') else "cpu"
print("Device available for training:", device)

Device available for training: cpu


### Define Data Folder Path

In [12]:
data_path = os.path.join(".", "data") 
print("Data folder path:", data_path)

Data folder path: .\data


## Model Hyperparameters

In [13]:
# Model hyperparameters
img_w, img_h = 64, 64
n_channels = 3

batch_size = 128
lr = 0.001          # Learning rate
n_epochs = 3        # Adjust as needed

steps_per_epoch = None
validation_steps = None

model_name = "tf_model"

## Configure Image Data Generator for Data Augmentation

In [14]:
datagen = ImageDataGenerator(
    # Convert pixel values from the range [0,255] to [0,1]
    rescale= 1./255,
    # Randomly rotate images by up to ±25 degrees 
    rotation_range= 25, 
    # Randomly shift the image horizontally by up to 15% of the width
    width_shift_range= 0.15, 
    height_shift_range= 0.15, 
    # Apply a shearing transformation, like slanting the image
    shear_range= 0.2,
    # Randomly zoom in or out by up to 10%
    zoom_range= 0.1, 
    # Randomly flip images left–right
    horizontal_flip= True, 
    # Determine how to fill in new pixels created by rotations, shifts, or zooms
    # "nearest" copies the value of the nearest pixel
    fill_mode= "nearest",
    validation_split= 0.2,
)

## Create Training and Validation Generators

In [None]:
train_generator = datagen.flow_from_directory(
    data_path,
    target_size= (img_w, img_h),
    batch_size= batch_size,
    class_mode= "binary",
    subset= "training",
    shuffle= True,
)

val_generator = datagen.flow_from_directory(
    data_path,
    target_size= (img_w, img_h),
    batch_size= batch_size,
    class_mode= "binary",
    subset= "validation",
    shuffle= False,
)


Found 4800 images belonging to 2 classes.
Found 1200 images belonging to 2 classes.


> Why ``shuffle=True`` for training but ``shuffle=False`` for validation

Shuffling is enabled for the training generator because the model should see 
the data in a different order in each epoch to improve generalization and to reduce overfitting, while the validation generator keeps ``shuffle=False`` so evaluation remains stable and deterministic. Even with a fixed seed, shuffling the validation set is still discouraged because the goal of validation is stable, repeatable evaluation. A seed only guarantees that the shuffle order is the same each run, but the order would still change every epoch.

For ``flow_from_directory``, the default value of ``shuffle`` is ``True``.