# Image classification

Notebook to produce the classification of regions within the images in northern scotland based on our training data.

+ We will start by loading in the classification data, as [lat,lon] coordinates and a classification.
+ We will also load in the GEE image, and extracting the appropriate bands, and creating new ones (NDWI, etc) if appropriate.

+ We will have to split the classification data into a training set and a validation set.
+ We will then train a GEE classifier using the training data, and verify its performance on the validation dataset.
+ We will also proceed to classify the rest of the image tiles to determine the burnt regions of the image, etc.

In [1]:
# ee initialisation
import ee
ee.Initialize()

In [11]:
# initial imports and parameter initialisation
import pandas as pd
import numpy as np
import xarray as xr
import re

import folium
import geehydro


# intial parameters for the program to run
rng = np.random.default_rng(87347234) # seed the random number generator for consistency.
classification_file = '../data/classification/master_classification.csv' # file from which the classification data is to be loaded.
training_validation_ratio = 0.2 # ratio of classification data to be used in the training dataset. Keep low to avoid overfitting.

### Load in the classification data

The file was created using the produce_classification.py script.

The land classifications are:
| Classification    | Value |
|-------------------|-------|
| Peatland          | 1     |
| Burnt peatland    | 2     |
| Cleared land      | 3     |
| Agricultural land | 4     |
| Plantation        | 5     |

In [16]:
########## STEP 1: Load in the classification data
classification_data = pd.read_csv(classification_file)

coords = classification_data['coords']
classes = classification_data['classification']

# The classification data needs to be converted into a ee.FeatureCollection object.
# I will start by creating a list of ee.Feature objects. 
# These will then be shuffled and split to create the training and validation datasets.
# These ee.Feature objects can then be converted into ee.FeatureCollection objects

ee_Features = []

# generate the list of ee.Features objects 
for point in zip(coords,classes):
    # convert the string coordinate of the csv to an array
    coord = point[0].split(',')
    lon = float(coord[0][1:])
    lat = float(coord[1][:-1])
    coord = np.array([lon, lat])
    
    feat = ee.Feature(ee.Geometry.Point(coord), {'landcover':point[1]})
    ee_Features.append(feat)

# randomly shuffle and then split the ee.Features list
rng.shuffle(ee_Features)
split_index = np.round( len(ee_Features) * training_validation_ratio ).astype(int)

ee_Features_training = ee_Features[:split_index]
ee_Features_validation = ee_Features[split_index:]

# create the feature collections for the training and validation data
ee_training = ee.FeatureCollection(ee_Features_training)
ee_validation = ee.FeatureCollection(ee_Features_validation)

print(ee_training.getInfo())

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()