# Fine-Grained Image Classification

## Definition on fine-grained image classification
The fine-grained image classification task aims to classify images at a sub-category level, in comparison to general image classification task. For example, general image classification aims to identify birds and dogs while fine-grained image classification aims to distinguish different species of birds. This task is a challenging task in computer vision as the visual difference between classes are more difficult to distinguish.

### Part 1: Dataset Download

- **Task**: Download a fine-grained image dataset for classification task.
- **Requirements**:
  - Download fine-grained image dataset of FGVC-Aircraft, an aircraft classification dataset. The download webpage is "https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/".
  - Implement code to read image from the downloaded dataset.
  - Visualize sample images from training set and testing set, respectively.
  - Score breakdown:
    - 2 points: The code can read images from FGVC-Aircraft dataset.
    - 2 points: Visualize 10 images from training set and 5 images from testing set.
    - 1 point: Discuss what is your observation of the training set?
- **What to submit:**
  - Submit a1_part1.py file of python code to iLearn Assignment 1 submission. The code should be able to read images from FGVC-Aircraft dataset and can visualize 10 images from training set and 5 images from testing set.
  - Your response to the discussion of what is your observation of the training set in below anwser box.

### Part 2: Algorithm Selection

- **Task**: Select and test at 2 different deep learning methods.
- **Requirements**:
  - Include one self-designed method and one ResNet-50 model.
  - Test ResNet-50 with both fine-tuning and transfer learning.
  - Score breakdown:
    - 3 points: Include 2 methods with the required models.
    - 2 points: Test ResNet-50 with both fine-tuning and transfer learning.
- **What to submit:**
  - Submit a1_part2_m1.py file of python code to iLearn Assignment 1 submission. The code is to apply ResNet-50 to the classification of FGVC-Aircraft dataset with both fine-tuning setting and transfer learning setting. Testing code is also required.
  - Submit a1_part2_m2.py file of python code to iLearn Assignment 1 submission. The code is to apply your self-designed CNN model to the classification of FGVC-Aircraft dataset. Testing code is also required.

### Part 3: Performance Improvement

- **Task**: Implement strong data augmentation and a learning rate scheduler.
- **Requirements**:
  - Data augmentation must improve performance.
  - Provide a reasonable analysis on the effectiveness of data augmentation.
  - Score breakdown:
    - 2 points: Data augmentation improves performance.
    - 2 points: Implement a learning rate scheduler that improves performance.
    - 1 point: Provide a proper analysis on data augmentation.
- **What to submit:**
  - Submit a1_part3_aug.py file of python code to iLearn Assignment 1 submission. The code is to enable strong data augmentation to your designed model. Testing code is also required. An improved classification accuracy is expected.
  - Submit a1_part3_lr.py file of python code to iLearn Assignment 1 submission. The code is to enable an improved learning rate to your designed model. Testing code is also required. An improved classification accuracy is expected.
  - Your response of reasonable analysis on the effectiveness of data augmentation in below anwser box.

### Part 4: Deeper Analysis

- **Task**: Analyze limitations of the dataset and algorithms, propose improvements.
- **Requirements**:
  - Identify dataset limitations.
  - Discuss limitations of the 2 algorithms.
  - Apply GAN and Deepdream for augmentation and anaylysis.
  - Score breakdown:
    - 1 point: Identify dataset limitations.
    - 1 point: Discuss limitations of the 2 algorithms.
    - 1 point: Apply GAN to generate 10 sample images of any one class in FGVC-Aircraft dataset.
    - 2 points: Apply Deepdream algorithm and show 3 resulting images.
- **What to submit:**
  - Submit a1_part4_GAN.py file of python code to iLearn Assignment 1 submission. The code is to apply GAN to generate 10 samples based on any one of the categories from FGVC-Aircraft dataset. The 10 sample images visualization is required in the code.
  - Submit a1_part4_Deep.py file of python code to iLearn Assignment 1 submission. The code is to apply Deepdream to generate 3 resulting images. The visualization of resulting images is required in the code.
  - Your response to dataset limitation in below anwser box.
  - Your response to limitation of the two methods in below anwser box.
  
## Evaluation Criteria

Your assignment will be evaluated based on the following criteria:

- Fulfillment of assignment requirements.
- Design and testing of deep learning methods.
- Implementation of performance improvement techniques.
- Depth of analysis and proposed improvements.


### Part 1: Dataset Downloading

Discuss your observations from the training set

- Most aircraft classes are visually very similar, often varying only by manufacturer, length, engine configuration, or tail design. This makes the problem harder than generic object classification. 
- It contains a wide spread of aircrafts - large twins like the 737‑900 and A330‑200 to regional jets (ERJ‑145, BAE‑125) and business/turboprops (Gulfstream IV, EMB‑120)—so our model will learn very fine shape and panel‑line distinctions.  
- Image samples are from different lighting, angles, liveries, and backgrounds (airfield, urban, grass, hangar, sky).This helps the model generalize better if sufficient examples per class are present.
- Images are clear and high resolution (approx. 1–2 MP as per FGVC docs). Hence, they are good for CNNs. 
- The images appear to have been preprocessed correctly: no bottom copyright banners are visible in samples, indicating `remove_banner=True` was active, which is a necessary preprocessing steps mentioned in FGVC documentation.
- Aircrafts are consistently in the center of the frame and mostly horizontally aligned which is again favorable for CNNs.
- The copyright banner‑removal logic and 224×224 resizing give consistency, yet some images still look soft or overcast—adding brightness/contrast jitter or a light sharpening step can help cover varied lighting conditions.  
- Simple augmentations (random flips, small rotations, color jitter, slight random crops) will further encourage the model to generalize across different poses and atmospheres.

### Part 3: Performance Improvement


Answer to data augmentation analysis and accuracy:

### **Observed Change**:
- **Before** (Baseline CNN or shallow model): ~14% accuracy
- **After** (Same model + strong augmentation): ~24% accuracy  
- **~71% relative improvement in accuracy**  
This improvement demonstrates that **my model started generalizing better**, not just memorizing training samples.

The original 14% accuracy likely came from **overfitting to small, local patterns** (e.g., logo shape, sky background). With augmentations, your model sees **a richer version of the dataset** during every epoch. It is now **forced to generalize** to shape, silhouette, tail-fin, window pattern, and engine placement rather than color schemes or logo alignment.

### **Breakdown of Each Augmentation's Contribution**

**1. `RandomResizedCrop(224, scale=(0.8, 1.0), ratio=(0.9, 1.1))`**
- **Effect**: Forces model to learn from different spatial parts of the aircraft.
- **Why it helps**: Aircraft photos might contain logos, engines, wings, or windows at varying scales—this helps model learn features that are **not dependent on position** or size.

**2. `RandomHorizontalFlip()`**
- **Effect**: Inverts image left-right randomly.
- **Why it helps**: Aircraft from either side might appear in images. Horizontal flip forces invariance to orientation.

**3. `ColorJitter()`**
- **Effect**: Randomly changes brightness, contrast, saturation, and hue.
- **Why it helps**: Aircraft are photographed under different lighting conditions (sunny, overcast, dusk). This avoids overfitting to color tone and enhances **illumination invariance**.

**4. `RandomRotation(15)`**
- **Effect**: Slight rotation in ±15° range.
- **Why it helps**: Viewpoint may vary (especially during takeoff/landing). Rotation makes model robust to mild angular displacements.

**5. `RandomPerspective(distortion_scale=0.3)`**
- **Effect**: Simulates 3D viewpoint distortion by shifting corner points.
- **Why it helps**: Introduces **perspective variation** resembling oblique viewpoints—common in aircraft photography.

**7. `RandomErasing(p=0.5, scale=(0.02, 0.2))`**
- **Effect**: Randomly masks a rectangular region in the image.
- **Why it helps**: Prevents reliance on single parts (e.g., airline logo), encourages **holistic understanding** of the aircraft structure.

Such **strong data augmentations** introduce controlled chaos, helping my model:
- Pay attention to shape, proportion, component placement
- Avoid overfitting to easily memorized visual cues

### Part 4: Deeper Analysis



  - Your response to dataset limitation in below anwser box.


- The FGVC-Aircraft benchmark dataset contains only about $10,200$ images (roughly 100 images for each of 102 aircraft variants)​. This relatively small size is a fundamental limitation – deep models risk overfitting when training data per class is so limited​.
- Additionally, the images exhibit high intra-class variability (e.g. different angles, lighting, and backgrounds for the same aircraft type) and low inter-class variability (many aircraft variants look extremely similar)​. Such subtle visual differences between classes make it challenging for models to learn discriminative features. 
- Moreover, the dataset images are contributed by aircraft spotters and often include cluttered backgrounds (airport tarmacs, skies, other aircraft) that introduce noise​.
- While labels are organized in a hierarchy (model variant, family, manufacturer), certain variants are nearly indistinguishable, and the labeling had to collapse those cases​.
- Annotation consistency is generally high for class labels, but the fine-grained nature means even minor labeling errors or ambiguities (e.g. misidentifying a sub-variant) can hurt performance. 
- In summary, the dataset’s limited size, biased image conditions, and fine granularity of classes all impose challenges. The result is that models can easily overfit to spurious details or background cues rather than true aircraft-specific features, underscoring the need for data augmentation and careful model design for this domain​.


  - Your response to limitation of the two methods in below anwser box.

**Limitation of ResNet-50 Pre-Trained Model**

- Using a generic **ResNet-50** pre-trained on **ImageNet** as a starting point provides robust general features, but it has limitations for fine-grained aircraft recognition. There is a domain gap – ImageNet pre-training teaches the network to recognize broad object classes, not the subtle shape differences between, say, a Boeing 737-700 and 737-800. 
- Without significant **fine-tuning**, the ResNet’s filters may not optimally discriminate fine-grained details (e.g. engine placement or tail shape). 
- Another issue is **overfitting**: with only a few thousand training images, a high-capacity model like **ResNet-50** can memorize training quirks if fine-tuned too aggressively. The network might latch onto background-color correlations or airline liveries unique to the training set rather than true variant features. 
- Indeed, FGVC tasks often report overfitting when large networks are trained on small datasets​. **Regularization** and **early stopping** are needed to avoid overfitting. 
- A related limitation is **limited interpretability** – ResNet-50 is a complex 50-layer architecture, so understanding which features it uses for decisions is difficult. This complicates debugging when it confuses look-alike aircraft. 
- Lastly, ResNet’s fixed input size (often $224×224$) forces heavy image resizing; fine details like cockpit window shape or logo might be lost. 
- In summary, while **ResNet-50** provides a strong baseline, it requires careful adaptation to avoid missing fine details or focusing on the wrong cues for fine-grained aircraft classification.

**Limitation of Self-Designed CNN Model**

- A **custom-designed CNN** (built from scratch for this task) gives full control over architecture, but it typically has far fewer layers/parameters than sophisticated pre-trained models like ResNet-50 and no pre-trained knowledge. This leads to several limitations. First, a smaller CNN may have **limited capacity to capture** the subtle differences between many aircraft types. 
- If the network is too shallow, it might not develop the high-level discriminative features needed. Conversely, if it is made deep without pre-training, the lack of data can cause severe **overfitting**. 
- Training from scratch on ~6,600 training images (for 100 classes) is data-starved – the model may memorize training examples rather than generalize, since it doesn’t benefit from millions of images of pre-training. This means the custom CNN might converge to a lower accuracy ceiling than a fine-tuned ResNet. 
- Additionally, without advanced components (like residual connections or attention modules), a plain CNN might struggle with the saliency of fine parts – e.g. focusing on an aircraft’s distinctive engines or tailfin. It may end up using background or overall color, which are not reliable identifiers (many airliners are white!). 
- The **training efficiency is also lower**: it takes longer for a scratch model to learn basic visual patterns (edges, textures) that a pre-trained model already “knows.” 
- In practice, one might **need extensive data augmentation** and possibly **synthetic data** such as using **GAN** or **DeepDream** to make a self-designed CNN perform well. 
- In summary, a self-designed CNN faces capacity vs. overfitting trade-offs and typically underperforms a well-tuned transfer learning approach on this fine-grained image classification task.