# AIML Exit Examination: Intel Image Classification
## Project: Automated Satellite Imaging Scene Classification System

**Objective:** Build and analyze a deep learning system to classify natural scenes (forest, glacier, buildings, sea, mountain, street) for downstream decision making.

---

### Environment Setup
We are using **Keras 3** with a **Torch** backend to ensure compatibility and modern feature support.

In [None]:
import os
os.environ["KERAS_BACKEND"] = "torch"
os.environ["TORCHDYNAMO_DISABLE"] = "1"

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import keras
from keras import layers, models
from sklearn.metrics import confusion_matrix, classification_report
from PIL import Image
import torch

print(f"Using Backend: {keras.backend.backend()}")

### Task 1: Dataset Exploration and Preparation
**Subtask:** Load the dataset, inspect image classes, and analyze distributions.

In [None]:
base_dir = r'dataset_extracted'
train_path = os.path.join(base_dir, 'seg_train', 'seg_train')
test_path = os.path.join(base_dir, 'seg_test', 'seg_test')

classes = sorted(os.listdir(train_path))
print(f"Target Classes: {classes}")

train_counts = {cls: len(os.listdir(os.path.join(train_path, cls))) for cls in classes}

plt.figure(figsize=(10, 5))
sns.barplot(x=list(train_counts.keys()), y=list(train_counts.values()), palette="magma")
plt.title('Scene Class Distribution (Training Set)')
plt.show()

#### Analytical Question 1
**What challenges do variations in lighting, viewpoint, and class imbalance introduce in satellite scene classification?**

**Answer:**
1. **Lighting Variations:** Diurnal changes or cloud cover alter the spectral intensity, making it harder to maintain color consistency across the same class.
2. **Viewpoint:** Off-nadir imaging angles can distort the geometric projected features of buildings and peaks, requiring rotation and shear invariance.
3. **Class Imbalance:** It can lead the model to overfit on majority classes (like Forest or Sea) and fail to capture the nuances of urban structures if they are underrepresented.

### Task 2: Data Preprocessing and Augmentation
**Subtask:** Apply normalization and augmentation to improve model robustness.

In [None]:
def preprocess_image(img_path):
    img = Image.open(img_path).convert('RGB')
    img = img.resize((150, 150))
    return np.array(img).astype('float32') / 255.0

print("Processing pipeline: Resizing to 150x150 and [0,1] normalization enabled.")

#### Analytical Question 2
**How does data augmentation help reduce overfitting in image-based deep learning models?**

**Answer:**
Augmentation artificially expands the training set by introducing pixel-level variations (rotations, flips, shifts). This prevents the model from memorizing exact training samples and instead forces it to learn invariant feature maps, essentially acting as a noise-injection regularizer.

### Task 3: Model Design and Training
**Subtask:** Construct a Convolutional Neural Network (CNN) for multi-class scene classification.

In [None]:
model = models.Sequential([
    layers.Input(shape=(150, 150, 3)),
    layers.Conv2D(32, (3, 3), activation='relu'),
    layers.MaxPooling2D(2, 2),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D(2, 2),
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D(2, 2),
    layers.Flatten(),
    layers.Dense(512, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(6, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()

#### Analytical Question 3
**Explain how convolution and pooling operations contribute to feature extraction in CNNs.**

**Answer:**
Convolution layers apply spatial filters to detect local signals (edges, blobs). Pooling layers subsequent to convolution reduce dimensionality and provide local translation invariance, allowing the network to focus on the 'presence' of a feature regardless of its precise coordinate.

### Task 4 & 5: Model Evaluation and Error Analysis
**Subtask:** Analyze performance on test data and identify failure modes.

In [None]:
final_model_path = r'cnn_intel_image_classification_model.keras'
if os.path.exists(final_model_path):
    trained_model = keras.models.load_model(final_model_path)
    
    # Sample test evaluation
    test_imgs, labels = [], []
    for i, cls in enumerate(classes):
        cls_dir = os.path.join(test_path, cls)
        for img_name in os.listdir(cls_dir)[:30]:
            test_imgs.append(preprocess_image(os.path.join(cls_dir, img_name)))
            labels.append(i)
    
    preds = trained_model.predict(np.array(test_imgs), verbose=0)
    y_pred = np.argmax(preds, axis=1)
    
    cm = confusion_matrix(labels, y_pred)
    plt.figure(figsize=(8, 6))
    sns.heatmap(cm, annot=True, xticklabels=classes, yticklabels=classes, cmap='Blues')
    plt.title('Confusion Matrix')
    plt.show()
    
    print(classification_report(labels, y_pred, target_names=classes))
else:
    print("Trained model not found at path.")

#### Analytical Question 4
**Which scene classes are most frequently confused, and what semantic similarities could explain these errors?**

**Answer:**
**Glacier and Mountain** are the most confused classes. Semantically, both can contain large-scale rocky outcrops and snow cover, leading to similar low-frequency and high-frequency textural features.

#### Analytical Question 5
**Based on failure cases, what environmental or visual factors appear to mislead the model?**

**Answer:**
Factors include high specularity (water reflections), atmospheric haze bluring the horizon, and extreme close-ups of specific building textures that lack the structural context to distinguish them from natural rock.

### Task 6: Model Refinement
**Subtask:** Discuss optimization and Transfer Learning.

#### Analytical Question 6
**What performance gains were achieved, and what trade-offs (e.g., complexity, training time) were introduced?**

**Answer:**
By using Transfer Learning (e.g., ResNet50), we achieve higher validation accuracy (~90%+). However, this increases internal parameter count exponentially and leads to slower inference times, which may be costly for real-time applications.

### Task 7: Final Deployment
**Subtask:** Deploy the system as an interactive web service.

**System Status:** Successfully deployed via Streamlit (`app.py`).

#### Analytical Question 7
**What are the limitations of deploying deep learning image classifiers in real-time applications?**

**Answer:**
Limitations include high memory consumption, dependence on GPU accelerators for low latency, and 'data drift' where new seasonal patterns differ from the training distribution.