# EXERCISE 6.3: Maize and non-maize crop binary classification

---

**Use of Google Earth Engine with US geodata to train a random forest classifier to predict occurence of a specific crop (maize)**

Land-cover classification in complex landscapes has been constrained by inherent short-distance transition in crop/vegetation types, especially in smallholder farming systems. The increasing availability and accessibility of earth observation imagery provides significant opportunities to assess status and monitor changes in land cover, yet unlocking such capability is contingent on availability of relevant ground truth data to calibrate and validate classification algorithms. The critically needed spatially-explicit ground-truth data are often unavailable in sub-Saharan African farming systems and this constrains development of relevant analytical tools to monitor cropland dynamics or generate [near]real-time insights on farming systems.  

This land cover classification was implemented based on available data which was collected under a multi-year project (https://tamasa.cimmyt.org/) which was focused on advancing digital agronomic innovation for decision support in maize-based farming systems. Therefore, the ground truth data in this analytical workflow is rich in maize farm locations, and contains much fewer data points for other crop types within the focal geography. Considering this limitation, the scope of this classification tool and this tutorial is limited to binary classification of maizelands (i.e. maize vs. non-maize cultivated) within the period of data collection (i.e. 2017).

## Importing and visualizing data
Import the Nigerian boundary as the focal geography and maize target region boundary as the area of interest (AOI). Using the code below, you will import a FeatureCollection object, and filter by "Country" to select "Nigeria". FeatureCollections are groups of features (spatial data and attributes). Filter is the method to extract a specific set of features from a feature collection. Assign the output to a variable called nigeriaBorder. The analyses will be limited to the maize target region in Nigeria, i.e. the region that accounts for ~70% of Nigeria's maize production. Therefore, you will import a predefined shapefile layer (already converted to GEE asset) and assign it as the variable aoi. Display both layers to the map using Map.addLayer() with customized display parameters.

In [None]:
!pip install geemap -qqq

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting geemap
  Downloading geemap-0.20.6-py2.py3-none-any.whl (2.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m28.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bqplot (from geemap)
  Downloading bqplot-0.12.39-py2.py3-none-any.whl (1.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m44.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting colour (from geemap)
  Downloading colour-0.1.5-py2.py3-none-any.whl (23 kB)
Collecting eerepr>=0.0.4 (from geemap)
  Downloading eerepr-0.0.4-py3-none-any.whl (9.7 kB)
Collecting geocoder (from geemap)
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.6/98.6 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting ipyevents (from geemap)
  Downloading ipyevents-2.0.1-py2.py3-none-any.whl

In [None]:
import ee
import geemap

In [None]:
from google.colab import drive
drive.mount('/content/drive/', force_remount=True)

Mounted at /content/drive/


## Reading imagery from Google Earth Engine (GEE)

For this example we will be using Google Earth Engine to select and filter Satellite imagery that we will classify with the training data.

### Setup Your Google Earth Engine Credentials
Upload the `.private-key.json` you created while setting up GEE to the current runtime. Click Files > Upload to Session storage on the left pane in this notebook to upload. <br/>
Replace the service account in the code below with your Google Cloud project service account email. It should be of the format <br/>`<id>@ml4eo-<some_number>.iam.gserviceaccount.com`

In [None]:
service_account = 'ml4eo-24@crete-1556636591709.iam.gserviceaccount.com'
credentials = ee.ServiceAccountCredentials(service_account, '.private-key.json')
ee.Initialize(credentials)

In [None]:
# Import country boundaries feature collection.
dataset = ee.FeatureCollection('USDOS/LSIB_SIMPLE/2017')

# Apply filter where country name equals Nigeria.
nigeria = dataset.filter(ee.Filter.eq('country_na', 'Nigeria'))

# Print the "nigeria" object and explore features and properties.
# There should only be one feature representing Nigeria.
print('Nigeria feature collection:', nigeria)

# Convert the Nigeria boundary feature collection to a line for map display.
nigeriaBorder = (
    ee.Image()
    .byte()
    .paint(featureCollection=nigeria, color=1, width=3)
)

# Set map options and add the Nigeria boundary as a layer to the map.
Map = geemap.Map()
Map.setOptions('SATELLITE')
Map.centerObject(nigeria, 6)
Map.addLayer(nigeriaBorder, None, 'Nigeria border')

# Import the maize target region asset.
aoi = ee.FeatureCollection('projects/earthengine-community/tutorials/classify-maizeland-ng/aoi')

# Display the maize target area boundary to the map.
Map.addLayer(aoi, {'color': 'white', 'strokeWidth': 5}, 'AOI', True, 0.6)


Nigeria feature collection: ee.FeatureCollection({
  "functionInvocationValue": {
    "functionName": "Collection.filter",
    "arguments": {
      "collection": {
        "functionInvocationValue": {
          "functionName": "Collection.loadTable",
          "arguments": {
            "tableId": {
              "constantValue": "USDOS/LSIB_SIMPLE/2017"
            }
          }
        }
      },
      "filter": {
        "functionInvocationValue": {
          "functionName": "Filter.equals",
          "arguments": {
            "leftField": {
              "constantValue": "country_na"
            },
            "rightValue": {
              "constantValue": "Nigeria"
            }
          }
        }
      }
    }
  }
})


In [None]:
Map

Map(center=[9.589014927025342, 8.09294083307367], controls=(ZoomControl(options=['position', 'zoom_in_text', '…

We now import ground truth data for georeferenced locations where maize (and other crops) were cultivated during the growing season of 2017 (June - Oct). The data have been pre-processed and randomly split (70:30) into training and validation datasets. Import the training and validation datasets, assigning variable names as "trainingPts" and "validationPts", respectively. Add the points as layers to the map.

In [None]:
# Import ground truth data that are divided into training and validation sets.
trainingPts = ee.FeatureCollection('projects/earthengine-community/tutorials/classify-maizeland-ng/training-pts')
validationPts = ee.FeatureCollection('projects/earthengine-community/tutorials/classify-maizeland-ng/validation-pts')


## Question 6.3.1
Display the two sets of points on the map `Map` as layers named 'Training points' and 'Validation points' with colors green and yellow respectively.

Next, you will import Copernicus Sentinel-2 TOA imagery. The imagery is organized as an ImageCollection object, which is a container for a collection of individual images. With the code snippet below, you will import the Sentinel-2 ImageCollection (the same method can be used to import an ImageCollection for other types of multi-temporal or multi-spectral data including Landsat, vegetation index, rainfall, temperature etc). Considering the context, you will apply relevant filters to restrict selected image tiles to the AOI and date range for the growing season in 2017 (to coincide with the period of data collection). Clouds are masked from each image using their corresponding cloud probability layer. Two functions are provided to achieve cloud masking: a function to join the cloud probability layer to the relevant image and one to apply the mask where cloud probability is greater than 50 percent. Finally, a medoid composite is generated from the set of overlapping pixels by selecting the pixel nearest to the multi-dimensional median of overlapping pixels ([Flood, 2013](https://doi.org/10.3390/rs5126481)). The result minimizes contamination from residual clouds and cloud shadows.

In [None]:
# Import S2 TOA reflectance and corresponding cloud probability collections.
s2 = ee.ImageCollection('COPERNICUS/S2')
s2c = ee.ImageCollection('COPERNICUS/S2_CLOUD_PROBABILITY')

# Define dates over which to create a composite.
start = ee.Date('2017-06-15')
end = ee.Date('2017-10-15')

# Define a collection filtering function.
def filterBoundsDate(imgCol, aoi, start, end):
    return imgCol.filterBounds(aoi).filterDate(start, end)

# Filter the collection by AOI and date.
s2 = filterBoundsDate(s2, aoi, start, end)
s2c = filterBoundsDate(s2c, aoi, start, end)

In [None]:
# Define a function to join the two collections on their 'system:index'
# property. The 'propName' parameter is the name of the property that
# references the joined image.
def indexJoin(colA, colB, propName):
    joined = ee.ImageCollection(
        ee.Join.saveFirst(propName).apply(
            primary=colA,
            secondary=colB,
            condition=ee.Filter.equals(
                leftField='system:index',
                rightField='system:index'
            )
        )
    )
    
    # Merge the bands of the joined image.
    def mergeBands(image):
        return image.addBands(ee.Image(image.get(propName)))
    
    return joined.map(mergeBands)

In [None]:
# Define a function to create a cloud masking function.
def buildMaskFunction(cloudProb):
    def maskImage(img):
        # Define clouds as pixels having greater than the given cloud probability.
        cloud = img.select('probability').gt(ee.Image(cloudProb))

        # Apply the cloud mask to the image and return it.
        return img.updateMask(cloud.Not())
    
    return maskImage

In [None]:
# Join the cloud probability collection to the TOA reflectance collection.
withCloudProbability = indexJoin(s2, s2c, 'cloud_probability')

# Map the cloud masking function over the joined collection, select only the
# reflectance bands.
maskClouds = buildMaskFunction(50)
s2Masked = (
    ee.ImageCollection(withCloudProbability.map(maskClouds))
    .select(ee.List.sequence(0, 12))
)

# Calculate the median of overlapping pixels per band.
median = s2Masked.median()

## Question 6.3.2
Complete the following function so that it can compute the difference between each image and the median. Ensure that the `system:time_start` property is copied from the original image to the difference image before returning the difference image.


Use the function you created to compute the difference.

In [None]:
difFromMedian = s2Masked.map(calculateDifference)

 Generate a composite image by selecting the pixel that is closest to the median.

In [None]:
# Get the band names from the first image in difFromMedian
bandNames = difFromMedian.first().bandNames()

# Create a list of band positions
bandPositions = ee.List.sequence(1, bandNames.length().subtract(1))

# Reduce the difFromMedian collection using the min reducer
mosaic = difFromMedian.reduce(ee.Reducer.min(bandNames.length())) \
                      .select(bandPositions, bandNames.slice(1)) \
                      .clipToCollection(aoi)

# Display the mosaic.
Map.addLayer(
    mosaic, {'bands': ['B11', 'B8', 'B3'], 'min': 225, 'max': 4000}, 'S2 mosaic')

In [None]:
Map

Map(bottom=15940.0, center=[8.146242825034385, 7.547607421875001], controls=(ZoomControl(options=['position', …

## Analytics
Now that you have prepared the mosaic, proceed to select the spectral bands that are relevant for the classification. By selecting more bands, the analysis will become more computationally intensive. The bands have differing spatial resolution (https://en.wikipedia.org/wiki/Sentinel-2), but ultimately, the scale of analysis is determined by the argument provided to the scale parameter in sampling and reduction steps. In the code below, all reflectance bands of the S2 data are selected, but you can adjust this by selecting fewer bands. Note that our goal is to utilize as much spectral information as possible to train the classifier algorithm to differentiate between maize and non-maize. The training points (trainingPts) will be used to extract the reflectance values of the pixels from all spectral bands and this will be passed to the classifier algorithms.

In [None]:
# Specify and select bands that will be used in the classification.
bands = [
    'B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B8A', 'B9', 'B10', 'B11',
    'B12'
]
imageCl = mosaic.select(bands)

# Overlay the training points on the imagery to get a training sample; include
# the crop classification property ('class') in the sample feature collection.
training = imageCl.sampleRegions(
    collection=trainingPts,
    properties=['class'],
    scale=30,
    tileScale=8
).filter(ee.Filter.neq('B1', None))  # Remove null pixels.

For the binary classification you will be applying two classifiers: classification and regression trees (CART) and Random Forest (RF), which are both suitable for categorical classification and have been used in various contexts for classification. By comparing outputs from both CART and RF, users can make objective inference on the most accurate classifier. Default parameters will be accepted (adjustments such as optimizing the number of trees in RF, e.g., are outside the scope of this tutorial). The output images will include values 0 (maize, shown as orange in the map) and 1 (non-maize, shown as grey in the map). Metrics regarding model accuracies are printed to the console.

In [None]:
# Train a CART classifier with default parameters.
trainedCart = ee.Classifier.smileCart().train(
    features=training,
    classProperty='class',
    inputProperties=bands
)

# Train a random forest classifier with default parameters.
trainedRf = ee.Classifier.smileRandomForest(numberOfTrees=10).train(
    features=training,
    classProperty='class',
    inputProperties=bands
)

# Classify the image with the same bands used for training.
classifiedCart = imageCl.select(bands).classify(trainedCart)
classifiedRf = imageCl.select(bands).classify(trainedRf)

# Define visualization parameters for classification display.
classVis = {'min': 0, 'max': 1, 'palette': ['f2c649', '484848']}

# Add the output of the training classification to the map.
Map.addLayer(classifiedCart.clipToCollection(aoi), classVis, 'Classes (CART)')
Map.addLayer(classifiedRf.clipToCollection(aoi), classVis, 'Classes (RF)')

# Calculate the training error matrix and accuracy for both classifiers by
# using the "confusionMatrix" function to generate metrics on the
# resubstitution accuracy.
trainAccuracyCart = trainedCart.confusionMatrix()
trainAccuracyRf = trainedRf.confusionMatrix()

# Print model accuracy results.
print('##### TRAINING ACCURACY #####')
print('CART: overall accuracy:', trainAccuracyCart.accuracy().getInfo())
print('RF: overall accuracy:', trainAccuracyRf.accuracy().getInfo())
print('CART: error matrix:', trainAccuracyCart.getInfo())
print('RF: error matrix:', trainAccuracyRf.getInfo())


##### TRAINING ACCURACY #####
CART: overall accuracy: 0.9933530280649926
RF: overall accuracy: 0.9896602658788775
CART: error matrix: [[936, 0], [9, 409]]
RF: error matrix: [[935, 1], [13, 405]]


To assess the reliability of the classification outputs, use the validationPts dataset (imported previously) to extract spectral information from the mosaic image bands. You will further apply ee.Filter.neq on the "B1" band to remove pixels with null value, and predict the classified values for the validationPts pixels based on the trained models. Note that accuracy assessment is conducted for each classifier.

## Question 6.3.3
Write code that will sample points from `validationPts`, run the RF and CART models on the sampled points, and report the accuracy of each classiier.

##### VALIDATION ACCURACY #####
CART: overall accuracy: 0.6298245614035087
RF: overall accuracy: 0.756140350877193
CART: error matrix: [[295, 118], [93, 64]]
RF: error matrix: [[365, 48], [91, 66]]


# Bonus Task
1. Download the GEE data to your local storage (e.g. Colab Session storage), train a Random Forest, a Classification Tree on the dataset using `sklearn`. Run some sort of hyperprameter search on the dataset and report the best results.