<a href="https://colab.research.google.com/github/Brandon-7-Sharp/Spectral-Imagery-Field-Binary-Classification/blob/main/Raster_Field_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Spectral Imagery Field Binary Classification

  This program takes in spectral imagery of corn fields and soybean fields and performs a binary classification on the data.

## Specifications:
  * Conducts a binary classification
  * Linearly transforms the data
  * Uses the SGD for optimization

## Input:
  * Images are of corn and soybean fields
  * 1140 tif images of SENTIEL2 spectral imagery (Bands: Green, Red, and Near InfraRed)
  * Band Order: NIR, Red, Green
  * 610 are images of corn fields
  * 530 are images of soybean fields


## Analysis:
  * Splits the data into an 80/20 for learning and testing

## Results:
  * Obtains roughly a 92% accuracy of determining the type of field

# Data Used:

  * The data used was SENTINEL2 (Bands: Green, Red, and Near Infrared) taken on July 13th, 2023
  * I created the individual images by ..................


## Other Strategies:

  1. Used only the Red and NIR bands to create a dataset that has the simple vegetation index, which resulted in roughly 91% accuracy

  2. Used ADAM got 88% accuracy

In [None]:
# The data was created from taking the spectral imagery (SENTINEL2 satelite imagery) of corn and soybean fields in Sangamon County in Illinois  and creating 1140 independent images of those fields based on the Illinois field boundary polygon feature layer (located here:    )
#   Images 1 - 610:       Exclusively corn field imagery
#   Images 611 - 1140:    Exclusively soy-bean field imagery

# Installs and Imports

We need to first install torchgeo, which gives us several tools for manupulating raster images.

After torchgeo is installed, you will need to restart the session to use it

More info about torchgeo can be found on their website at https://www.osgeo.org/projects/torchgeo/


In [None]:
%pip install torchgeo

Collecting torchgeo
  Downloading torchgeo-0.5.2-py3-none-any.whl.metadata (20 kB)
Collecting kornia>=0.6.9 (from torchgeo)
  Downloading kornia-0.7.3-py2.py3-none-any.whl.metadata (7.7 kB)
Collecting lightly!=1.4.26,>=1.4.4 (from torchgeo)
  Downloading lightly-1.5.12-py3-none-any.whl.metadata (37 kB)
Collecting lightning>=2 (from lightning[pytorch-extra]>=2->torchgeo)
  Downloading lightning-2.4.0-py3-none-any.whl.metadata (38 kB)
Collecting rasterio>=1.2 (from torchgeo)
  Downloading rasterio-1.3.10-cp310-cp310-manylinux2014_x86_64.whl.metadata (14 kB)
Collecting rtree>=1 (from torchgeo)
  Downloading Rtree-1.3.0-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.1 kB)
Collecting segmentation-models-pytorch>=0.2 (from torchgeo)
  Downloading segmentation_models_pytorch-0.3.4-py3-none-any.whl.metadata (30 kB)
Collecting timm>=0.4.12 (from torchgeo)
  Downloading timm-1.0.9-py3-none-any.whl.metadata (42 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [

In [None]:
import os
import rasterio
import torch
import torchvision
from PIL import Image
from torchvision import transforms

In [None]:
# from google.colab import drive
# drive.mount('/drive')

In [None]:
# Import torch and set the manual seed to remove randomness
import torch

torch.manual_seed(42)
device = 'cpu'

# Resizing Images

* The data used for this is spectral imagery (3 Bands: Green, Red, and Near Infra-Red) stored in .tif files and have varying sizes.
* To create a dataloader and input this spectral imagery into a model for training and testing, it needs to be uniform.
* I firstly convert the iageery into a tensor in order to be able to manipulate it
* I resized the images to 32 x 32 resolution since the majority of images were slightly larger than that number (I also tried 48 x 48, but the results were slighly less accurate)
* Lastly, I store all of these tensors into an array

In [None]:
import os
import rasterio
import torch
import torchvision
from PIL import Image
from torchvision import transforms

# Function to resize and save images into an array
def resize_images(source_folder, size=(32, 32)):
    if not os.path.exists(source_folder):
        # os.makedirs(source_folder)
        print("Path not exists")

    # Define the transformation to resize the images
    transform = transforms.Compose([
        transforms.Resize(size),
        transforms.ToTensor()
    ])

    arrays = []

    for filename in os.listdir(source_folder):
        if filename.endswith(".tif"):  # You can add more extensions if needed
            img_path = os.path.join(source_folder, filename)
            with rasterio.open(img_path) as src:

              data = src.read()
              # Apply the transformation

              print(type(data))
              print(data.shape)
              data = torch.Tensor(data)
              data = torchvision.transforms.functional.resize(data, (32, 32), interpolation=2)
              print(data.shape)
              arrays.append(data)

    return arrays


# Define your source folder where the raw spectral imagery is (Folder provided on Github: https://github.com/Brandon-7-Sharp/Spectral-Imagery-Field-Binary-Classification)
source_folder = f"/content/drive/MyDrive/Colab Notebooks/Raster_Field_Data"
# target_folder = f"/content/drive/MyDrive/Colab Notebooks/Raster_Field_Data"

# Call the function to resize and save images
data = resize_images(source_folder)
print(len(data))

<class 'numpy.ndarray'>
(3, 61, 82)
torch.Size([3, 32, 32])
<class 'numpy.ndarray'>
(3, 30, 42)
torch.Size([3, 32, 32])
<class 'numpy.ndarray'>
(3, 34, 41)
torch.Size([3, 32, 32])
<class 'numpy.ndarray'>
(3, 26, 59)
torch.Size([3, 32, 32])
<class 'numpy.ndarray'>
(3, 30, 42)
torch.Size([3, 32, 32])
<class 'numpy.ndarray'>
(3, 26, 42)
torch.Size([3, 32, 32])
<class 'numpy.ndarray'>
(3, 28, 81)
torch.Size([3, 32, 32])
<class 'numpy.ndarray'>
(3, 28, 41)
torch.Size([3, 32, 32])
<class 'numpy.ndarray'>
(3, 26, 42)
torch.Size([3, 32, 32])
<class 'numpy.ndarray'>
(3, 31, 46)
torch.Size([3, 32, 32])
<class 'numpy.ndarray'>
(3, 31, 45)
torch.Size([3, 32, 32])
<class 'numpy.ndarray'>
(3, 31, 47)
torch.Size([3, 32, 32])
<class 'numpy.ndarray'>
(3, 28, 41)
torch.Size([3, 32, 32])
<class 'numpy.ndarray'>
(3, 59, 121)
torch.Size([3, 32, 32])
<class 'numpy.ndarray'>
(3, 35, 63)
torch.Size([3, 32, 32])
<class 'numpy.ndarray'>
(3, 30, 40)
torch.Size([3, 32, 32])
<class 'numpy.ndarray'>
(3, 31, 59)
tor

#### This was the resizing images function that used the Red and NIR bands to create arrays of the Normal Vegetative Index

In [None]:
# import os
# import rasterio
# # import torchgeo
# import torch
# import torchvision
# from PIL import Image
# from torchvision import transforms

# Function to resize and save images
def resize_images(source_folder, size=(32, 32)):
    if not os.path.exists(target_folder):
        os.makedirs(target_folder)
        print("Path not exists")

    # Define the transformation to resize the images
    transform = transforms.Compose([
        transforms.Resize(size),
        transforms.ToTensor()
    ])

    arrays_1 = []

    for filename in os.listdir(source_folder):
        if filename.endswith(".tif"):  # You can add more extensions if needed
            img_path = os.path.join(source_folder, filename)
            with rasterio.open(img_path) as src:

              data = src.read()
              # Apply the transformation

              print(type(data))
              print(data.shape)
              data = torch.Tensor(data)
              data = torchvision.transforms.functional.resize(data, (32, 32), interpolation=2)
              data = data[:2] # Removes the Green Band


              data = data[1] / data[0]  # Calculates the Simple Vegetation Index by dividing the NIR band by the Red band
                                        # This results in a (32, 32) sized Tensor

              data = torch.nan_to_num(data, nan=0.0)


              print(data.shape)
              print(data)
              arrays_1.append(data)

    return arrays_1


# Define your source and target folders
source_folder = f"/content/drive/MyDrive/Colab Notebooks/Raster_Field_Data"
target_folder = f"/content/drive/MyDrive/Colab Notebooks/Raster_Field_Data"

# Call the function to resize and save images
data = resize_images(source_folder)
print(len(data))

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
(3, 31, 43)
torch.Size([32, 32])
tensor([[0.2484, 0.1904, 0.1749,  ..., 0.2678, 0.3390, 0.0000],
        [0.1753, 0.1343, 0.1241,  ..., 0.2056, 0.2851, 0.0000],
        [0.1707, 0.1435, 0.1388,  ..., 0.2153, 0.3075, 0.3650],
        ...,
        [0.0000, 0.1293, 0.1274,  ..., 0.2454, 0.3302, 0.0000],
        [0.0000, 0.1329, 0.1389,  ..., 0.2355, 0.3158, 0.0000],
        [0.0000, 0.0000, 0.2760,  ..., 0.0000, 0.0000, 0.0000]])
<class 'numpy.ndarray'>
(3, 31, 41)
torch.Size([32, 32])
tensor([[0.3401, 0.2625, 0.1670,  ..., 0.1455, 0.1857, 0.2196],
        [0.1682, 0.1587, 0.1354,  ..., 0.1592, 0.1825, 0.2022],
        [0.1289, 0.1286, 0.1232,  ..., 0.1778, 0.1946, 0.2090],
        ...,
        [0.1193, 0.1259, 0.1360,  ..., 0.1479, 0.1479, 0.0000],
        [0.1473, 0.1370, 0.1295,  ..., 0.1895, 0.1895, 0.0000],
        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000]])
<class 'numpy.ndarray'>
(3, 31, 34)
torch.Size([3

## Dataset Class: RasterImageryDataset

* Creates a function for returning the length of the datset
* Creates a function for returning a specific index of the dataset and the label for it

In [None]:
from torch.utils.data import Dataset


# Custom Dataset class that takes in an array of tensors (of spectral imagery) and their labels (corn field or soy-bean field)

class RasterImageryDataset(Dataset):
  def __init__(self, data, labels):
    self.labels = labels
    self.data = data

  def __len__(self):
    return len(self.data)

  def __getitem__(self, idx):
    return(self.data[idx], self.labels[idx])

# Labels

* Creates a labels array Of the 1140 images
  * Images 1 - 610:       Exclusively corn field imagery
  * Images 611 - 1140:    Exclusively soy-bean field imagery

In [None]:
import numpy as np
import random


# Creates a labels array
#   Of the 1140 images
#     Images 1 - 610:       Exclusively corn field imagery
#     Images 611 - 1140:    Exclusively soy-bean field imagery
corn_lables = np.zeros(610)
wheat_labels = np.zeros(530)
labels = np.append(corn_lables, wheat_labels)

test_data = []
train_data = []
test_labels = []
train_labels = []

for i, value in enumerate(data):
  if i < 610:
    if i < 122:
      test_data.append(data[i])
      test_labels.append(0)
    else:
      train_data.append(data[i])
      train_labels.append(0)
  else:
    if i < 716:
      test_data.append(data[i])
      test_labels.append(1)
    else:
      train_data.append(data[i])
      train_labels.append(1)

# print(len(test_data))
# print(len(train_data))
# print(len(test_labels))
# print(len(train_labels))

# Creates a dataset variable with the images and their labels
dataset = RasterImageryDataset(data=data, labels=labels)

# Datasets and Dataloaders

In [None]:
# Creates two seperate datasets with no overlapping data (One for training and another for testing. 80/20 Split)
train_dataset = RasterImageryDataset(train_data, train_labels)
test_dataset = RasterImageryDataset(test_data, test_labels)

In [None]:
from torch.utils.data import DataLoader

# Setup the batch size hyperparameter
BATCH_SIZE = 32

# Turn datasets into iterabels (batches)
train_dataloader = DataLoader(dataset=train_dataset,
                              batch_size=BATCH_SIZE,
                              shuffle=True)

test_dataloader = DataLoader(dataset=test_dataset,
                             batch_size=BATCH_SIZE,
                             shuffle=False)

train_dataloader, test_dataloader

(<torch.utils.data.dataloader.DataLoader at 0x7969d1bd2b00>,
 <torch.utils.data.dataloader.DataLoader at 0x7969d1bd32e0>)

# Model Class: Sentinel2ModelV0

In [None]:
from torch import nn

# Create a model class that takes in the input shape of the data, the hidden units of the data, and the output shape of the data
class Sentinel2ModelV0(nn.Module):
  def __init__(self,
               input_shape: int,
               hidden_units: int,
               output_shape: int):
    super().__init__()
    self.layer_stack = nn.Sequential(
        nn.Flatten(),
        nn.Linear(in_features=input_shape,
                  out_features=hidden_units),
        nn.ReLU(),  # Non-Linear Function
        nn.Linear(in_features=hidden_units,
                  out_features=output_shape),
        nn.ReLU()  # Non-Linear Function
    )

  def forward(self, x):
    return self.layer_stack(x)

In [None]:
torch.manual_seed(42)

# Setup model with input parameters
model_0 = Sentinel2ModelV0(
    input_shape=1024, # This is 32 x 32 (pixel width and height of the resized raster images)
    hidden_units=2, # How many units in the hidden layer
    output_shape=81 # One for every class

).to(device)

model_0

Sentinel2ModelV0(
  (layer_stack): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=1024, out_features=2, bias=True)
    (2): ReLU()
    (3): Linear(in_features=2, out_features=81, bias=True)
    (4): ReLU()
  )
)

In [None]:
import requests
from pathlib import Path

# Download helper functions from Learn PyTorch repo
if Path("helper_fuunctions.py").is_file():
  print("helper_functions.py already exists, skipping download...")
else:
  print("Downloading helper_functions.py")
  request = requests.get("https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/helper_functions.py")
  with open("helper_functions.py", "wb") as f:
    f.write(request.content)

Downloading helper_functions.py


In [None]:
# Import accuracy metric
from helper_functions import accuracy_fn

# Setup loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(params=model_0.parameters(),
                            lr=0.01)

# Training and Testing

In [None]:
# Import tqdm for progress bar
from timeit import default_timer as timer
from tqdm.auto import tqdm

# Set the start timer
train_time_start_on_cpu = timer()

# Set the number of epochs (we'll keep this smal for faster training time)
epochs = 100

# Create training and test loop
for epoch in tqdm(range(epochs)):
  print(f"Epoch: {epoch}\n------")
  ### Training
  train_loss = 0
  # Add a looop to loop through the training batches
  for batch, (X, y) in enumerate(train_dataloader): # -> (image, label)
    model_0.train()
    # 1. Forward Pass
    y_pred = model_0(X)
    y = torch.tensor(y, dtype=torch.long, device=device)

    # 2. Calculate Loss (Per Batch)
    loss = loss_fn(y_pred, y)
    train_loss += loss # Accumulate train loss

    # 3. Optimizer Zero Grad
    optimizer.zero_grad()

    # 4. Loss Backward
    loss.backward()

    # 5. Optimizer Step
    optimizer.step()

  # Divide total train loss by length of train dataloader
  train_loss /= len(train_dataloader)

  ### Testing
  test_loss, test_acc = 0, 0
  model_0.eval()
  with torch.inference_mode():
    for X_test, y_test in test_dataloader:
      # 1. Forward Pass
      test_pred = model_0(X_test)
      y_test = torch.tensor(y_test, dtype=torch.long, device=device)

      # 2. Calculate Loss (accumulatively)
      test_loss += loss_fn(test_pred, y_test)

      # 3. Calculate Accuracy
      test_acc += accuracy_fn(y_true=y_test, y_pred=test_pred.argmax(dim=1))

    # Calculate the test loss average per batch
    test_loss /= len(test_dataloader)

    # Calculate the test accuracy average per batch
    test_acc /= len(test_dataloader)

  # Print out what is happening
  print(f"\nTrain loss: {train_loss:.4f} | Test Loss: {test_loss:.4f}, Test Accuracy: {test_acc:.4f}")

  0%|          | 0/100 [00:00<?, ?it/s]

Epoch: 0
------


  y = torch.tensor(y, dtype=torch.long, device=device)
  y_test = torch.tensor(y_test, dtype=torch.long, device=device)



Train loss: 4.1829 | Test Loss: 4.0174, Test Accuracy: 52.3438
Epoch: 1
------

Train loss: 3.9775 | Test Loss: 3.8333, Test Accuracy: 52.3438
Epoch: 2
------

Train loss: 3.8344 | Test Loss: 3.6925, Test Accuracy: 52.3438
Epoch: 3
------

Train loss: 3.7228 | Test Loss: 3.5611, Test Accuracy: 52.3438
Epoch: 4
------

Train loss: 3.6179 | Test Loss: 3.4418, Test Accuracy: 52.3438
Epoch: 5
------

Train loss: 3.5145 | Test Loss: 3.3300, Test Accuracy: 52.3438
Epoch: 6
------

Train loss: 3.4275 | Test Loss: 3.2307, Test Accuracy: 52.3438
Epoch: 7
------

Train loss: 3.3269 | Test Loss: 3.1412, Test Accuracy: 52.3438
Epoch: 8
------

Train loss: 3.2701 | Test Loss: 3.0630, Test Accuracy: 52.3438
Epoch: 9
------

Train loss: 3.2087 | Test Loss: 2.9960, Test Accuracy: 52.3438
Epoch: 10
------

Train loss: 3.1458 | Test Loss: 2.9451, Test Accuracy: 52.3438
Epoch: 11
------

Train loss: 3.1196 | Test Loss: 2.9023, Test Accuracy: 52.3438
Epoch: 12
------

Train loss: 3.0995 | Test Loss: 2.87

In [None]:
torch.manual_seed(42)
def eval_model(model: torch.nn.Module,
               data_loader: torch.utils.data.DataLoader,
               loss_fn: torch.nn.Module,
               accuracy_fn):
  """Returns a dictionary containing the results of model predicting on data_loader."""
  loss, acc = 0, 0
  model.eval()
  with torch.inference_mode():
    for X, y in data_loader:
      # Make Predictions
      y_pred = model(X)

      # Accumulate the loss and acc values per batch
      loss += loss_fn(y_pred, y)
      acc += accuracy_fn(y_true=y,
                         y_pred=y_pred.argmax(dim=1))

    # Scale loss and acc to find the average loss/acc per batch
    loss /= len(data_loader)
    acc /= len(data_loader)

  return {"model_name": model.__class__.__name__, # Only works when model was created with a unique name
          "model_loss": loss.item(),
          "model_acc": acc}

# Calculate model results on test dataset
model_0_results = eval_model(model=model_0,
                             data_loader=test_dataloader,
                             loss_fn=loss_fn,
                             accuracy_fn=accuracy_fn)
model_0_results

{'model_name': 'Sentinel2ModelV0',
 'model_loss': 2.7878549098968506,
 'model_acc': 52.34375}