# Deep Learning Landmark Classification Project Report

## Overview

The "deep-learning-landmark-classification" project aims to classify famous landmarks using deep learning models. It employs both custom Convolutional Neural Networks (CNNs) and transfer learning techniques. This report details the workings of the models, compares different architectures, and presents the results. The model consists of 2 branches; the main (international) model is about the classification of 50 worldwide landmarks while the local model’s dataset consists of 20 Greek landmarks. The final goal here was to utilize the developed models for deployment in a real-world environment hence the “spot the spot” application.

## Data Preparation

### Data Collection

The international model consists of 50 landmarks:

- Haleakala National Park
- Mount Rainier National Park
- Ljubljana Castle
- Dead Sea
- Wroclaws Dwarves
- London Olympic Stadium
- Niagara Falls
- Stonehenge
- Grand Canyon
- Golden Gate Bridge
- Edinburgh Castle
- Mount Rushmore National Memorial
- Kantanagar Temple
- Yellowstone National Park
- Terminal Tower
- Central Park
- Eiffel Tower
- Changdeokgung
- Delicate Arch
- Vienna City Hall
- Matterhorn
- Taj Mahal
- Moscow Raceway
- Externsteine
- Soreq Cave
- Banff National Park
- Pont du Gard
- Seattle Japanese Garden
- Sydney Harbour Bridge
- Petronas Towers
- Brooklyn Bridge
- Washington Monument
- Hanging Temple
- Sydney Opera House
- Great Barrier Reef
- Monumento a la Revolucion
- Badlands National Park
- Atomium
- Forth Bridge
- Gateway of India
- Stockholm City Hall
- Machu Picchu
- Death Valley National Park
- Gullfoss Falls
- Trevi Fountain
- Temple of Heaven
- Great Wall of China
- Prague Astronomical Clock
- Whitby Abbey
- Temple of Olympian Zeus

All the images that were used for the international model were sourced from the Google Landmarks Dataset v2 ([Google Landmarks Dataset](https://github.com/cvdfoundation/google-landmark)).

The local model dataset consists of images of 20 famous Greek landmarks:

- Arch of Hadrian (Athens)
- Bridge of Arta
- Erechtheum
- Fetiye Mosque
- Lion Gate (Mycenae)
- Meteora
- Palace of the Grand Master of the Knights of Rhodes
- Panathenaic Stadium
- Parthenon
- Portara
- Sanctuary of Asclepius
- Stoa of Attalus
- Temple of Apollo in Delphi
- Temple of Hephaestus in Athens
- Temple of Poseidon Cape Sounion
- Temple of Zeus
- Theater of Epidaurus
- Theatre of Herodes Atticus
- Tower of the Winds
- White Tower (Thessaloniki)

All the images that were used for the local model were sourced from Wikimedia Commons, an online repository of free-use images, sound, and other media files. The photographs were downloaded using the scrapper defined in `wikimedia_scrapper.ipyb`.

### Preprocessing

- **Resizing:** All images are resized to a uniform size.
- **Normalization:** Pixel values are normalized.
- **Augmentation:** Techniques like rotation, flipping, and zooming are applied to increase dataset diversity.

## Custom CNN Architecture

The custom CNN architecture is defined in `LandMarkModel.py` and includes:

- **Layers:**
  - Multiple convolutional layers with ReLU activations and batch normalization.
  - Pooling layers to reduce spatial dimensions.
  - Fully connected (dense) layers for classification.
  - Dropout layers for regularization to prevent overfitting.
- **Implementation:** The `LandmarkCnnModel` class defines the model architecture.

## Transfer Learning Architecture

The transfer learning approach uses pre-trained models as a starting point, fine-tuning them for the specific task of landmark classification. The models used are:

- **ResNet18, ResNet50, ResNet152:** These models are defined in `TransferModel.py` using the PyTorch `torchvision.models` library.
- **Fine-Tuning:** The initial layers leverage pre-trained weights from the ImageNet dataset. The final layer that is added is trained on the landmark datasets specifically for 50 classes in the international model and 20 classes for the local model.

## Training and Optimization

- **Training Script:** `Training.py`
- **Training Loop:** Includes functions for training the model for one epoch and for the entire training process.
- **Plotting:** Utilizes `livelossplot` for real-time loss plotting.
- **Optimization Script:** `Optimization.py`
- **Hyperparameter Tuning:** Functions for optimizing learning rate, batch size, and other hyperparameters.
- **Data Handling:** `Data.py`
  - Data Loaders: Functions to create training, validation, and test data loaders with specified batch sizes and validation splits.

## Model Export and Prediction

- **Model Export:** `ModelExporter.py`
  - Export Function: Function to save the trained model for inference.
- **Prediction Script:** `PredictorWrapper.py`
  - Predictor Class: A class that wraps the model and handles preprocessing and prediction.
  - Confusion Matrix: A function to plot the confusion matrix to evaluate model performance.

## Results Comparison

The results are compared based on accuracy, training time, and computational efficiency.

| Model        | Accuracy@Local | Accuracy@International |
|--------------|----------------|------------------------|
| Custom CNN   | 81%            | 57%                    |
| ResNet18     | 87%            | 76%                    |
| ResNet50     | 90%            | 80%                    |
| ResNet152    | 89%            | 80%                    |

## Detailed Model Analysis

**Custom CNNs:**
- **Pros:** Flexibility in designing the architecture, control over every layer.
- **Cons:** Requires more training time and computational resources, lower accuracy.

**Transfer Learning:**
- **Pros:** Leverages powerful features from pre-trained models, faster training, higher accuracy.
- **Cons:** Limited flexibility in architecture design, dependency on pre-trained weights.

## Conclusion

All transfer learning models significantly outperformed our custom CNNs in terms of accuracy and training efficiency. Among the transfer learning models, ResNet50 demonstrated the greatest performance, but without significant difference between the other two. This highlights the effectiveness of using pre-trained models for complex image classification tasks.