# Land use and Land Cover Classification

The availability of free satellite data has increased its use in several applications in the domains of agriculture, disaster recovery, climate change, urban development, or environmental monitoring can be realized. However, to fully utilize the data for the previously mentioned domains, first satellite images must be processed and transformed into structured semantics. One type of such fundamental semantics is Land Use and Land Cover Classification. The aim of land use and land cover classification is to automatically provide labels describing the represented physical land type or how a land area is used (e.g., residential, industrial)

A satellite image dataset for the task of land use and land cover classification was presented in [[1]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8519248). The EuroSAT dataset is based on Sentinel-2 satellite images and consists of 27,000 labeled images with a total of 10 different classes listed below where the patches are 64x64 pixels each.

![alt text](./Images/dataset.png "The EuroSAT Dataset")

In this assignment you are going to use the optical bands of Sentinel-2 which are computed by combining the bands red (B04), green (B03) and blue (B02) from the Sentinel-2 product. More information about the Sentinel-2 bands can be found [here](https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi/resolutions/spatial). You will then train different Convolutional Neural Network (CNN) models to classify every 64x64 patches in one of the following classes:

1. AnnualCrop
2. Forest
3. Herbaceous Vegetation
4. Highway
5. Industrial
6. Pasture
7. Permanent Crop
8. Residential
9. River
10. SeaLake



## Instructions
The EuroSAT dataset is based on Sentinel-2 satellite images and consists of 27,000 labeled images with a total of 10 different classes. The dataset is structured as follows:
1. `train.txt`: this file contains a list of images that will be used to train the Convolutional Neural Network (CNN) models.
2. `test.txt`: this file contains a list of images that will be used to test the Convolutional Neural Network (CNN) models.
3. A list of 10 folders, each one containing the images pertaining to each class.

All code needs to be developed in Python 3 and run on a Ubuntu 20.04 environment or later versions of Ubuntu. The student is requested send the jupyter notebook using the template provided. Any textual or visual information and equations that the student might need to convey is expected to be written using the markdown language within the same Juputer Notebook. The Juputer Notebook should be named as follows

`name-surname.ipynb`

The list of packages that are allowed for this assignment are: `matplotlib`, `os`, `numpy`, `torch`, `open-cv`, `torchvision` and any other packages agreed with the lecturer.


## Assignment

**Q1:** A lot of effort in solving any machine learning and computer vision problem goes into preparing the data. PyTorch provides a simple mechanism to define a custom dataset using `torch.utils.data.Dataset`, which is an abstract class representing a dataset. Your custom dataset should inherit `Dataset` and override the following methods:

- `__init__` so that it initializes the dataset
- `__len__` so that len(dataset) returns the size of the dataset.
- `__getitem__` to support the indexing such that dataset[i] can be used to get i-th sample.


Write a class `DataLoaderClassification` that can be used to
- load the list of image filenames and the corresponding lables in two lists in `__init__`
- load a batch of images and corresponding lables when one calls `__getitem__`
- returns the length of the dataset using `__len__`

Write the code in one or more cells.

In [81]:
import os
import cv2
import numpy as np
import torch
from torch.utils.data import Dataset
from torchvision import transforms
import random


In [82]:
class DataLoaderClassification(Dataset):
    def __init__(self, data_dir, file_list_path):
        # Args:
        #    data_dir :: Path to the directory containing image subfolders.
        #    file_list_path :: Path to the text file containing image paths (e.g., train.txt or test.txt).

        self.data_dir = data_dir        
        self.image_paths = []
        self.labels = []
        
        with open(file_list_path, 'r') as f:
            for line in f:
                line = line.strip()
                if line:
                    updated_path = line.replace('EuroSAT/', '')  
                    self.image_paths.append(updated_path)
                    
                    label = os.path.split(updated_path)[0].replace('./Data/', '')  
                    self.labels.append(label)
                    # print(f'Path: {updated_path}, label: {label}')
        
        # Create label to index mapping
        #self.label_mapping = {label: idx for idx, label in enumerate(sorted(set(self.labels)))}
        #print(self.label_mapping)

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        img_full_path = self.image_paths[idx]
        label = self.labels[idx]
        #print(f"Loading image from: {img_full_path}")
        #print(f"Label: {label}")
        
        image = cv2.imread(img_full_path)
        if image is None:
            raise FileNotFoundError(f"pablo: Image at path {img_full_path} could not be loaded.")
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # Convert from BGR to RGB
        
        # Convert label to numeric value
        #label = self.label_mapping[label]
        #print(f"Label mapped: {label}")
        
        return image, label

In [87]:
data_dir = 'Data'
train_file_list = 'Data/train.txt'
test_file_list = 'Data/test.txt'

train_dataset = DataLoaderClassification(data_dir=data_dir, file_list_path=train_file_list)
test_dataset  = DataLoaderClassification(data_dir=data_dir, file_list_path=test_file_list)

print(f"Number of items in train_dataset: {len(train_dataset)}")
print(f"Number of items in test_dataset: {len(test_dataset)}")


# test
test_index = random.randint(0, 21599)
print(f"Check item {test_index} from train_dataset")
image, label = train_dataset[test_index]
print(f"Image {test_index} -> Shape: {image.shape}, Label: {label}")

Number of items in train_dataset: 21600
Number of items in test_dataset: 5400
Check item 9661 from train_dataset
Image 9661 -> Shape: (64, 64, 3), Label: AnnualCrop


**Q2:** Write the code in one cell that uses the list of files included in `train.txt` and `test.txt` to create a Pytorch dataloader for the training and testing data, respectively.

In [None]:
# Code goes here

**Q3:** PyTorch provides the elegantly designed modules and classes, including `torch.nn`, to help you create and train neural networks. An `nn.Module` contains layers, and a method `forward(input)` that returns the output. Write the `CNN` class to define a Convolutional Neural Network (CNN) where the first convolutional layer (`conv1`) takes 3 input channels, outputs 16 output channels and has a kernel size of 5. The output of `conv1` is fed into a ReLU followed by a Max-pooling operator. The second convolutional layer in this network (`conv2`) should have 32 filters with a kernel size of 5 followed by a ReLU and a max-pooling operator. The last layer is a fully-connected layer (`fc1`) with 10 output neurons. In this code you should define the `__init__` and `forward` member functions.

More information about `torch.nn` can be found [here](https://pytorch.org/tutorials/recipes/recipes/defining_a_neural_network.html).

In [None]:
# Code goes here

**Q4:** Write the code in one or more cells to train the CNN specified in **Q3**. Plot the accuracy against the number of epochs.  Save the best performing model in the folder `./Model/Simple-CNN/model.pth` and print the highest accuracy achieved after 100 epochs using the markdown language.

In [None]:
# Code goes here

**Q5:** Your role as a researcher is to improve the performance of the current neural network. Explain the architecture that provided the best performance and describe the modifications that you think provided the gain.
    

In [None]:
# Code goes here