<table class="ee-notebook-buttons" align="left">
    <td><a target="_blank"  href="https://github.com/davidelomeo/mangroves_deep_learning/blob/main/Notebook_1-Generate_Patches.ipynb"><img width=32px src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" /> View source on GitHub</a></td>
    <td><a target="_blank"  href="https://colab.research.google.com/github/davidelomeo/mangroves_deep_learning/blob/main/Notebook_1-Generate_Patches.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" /> Run in Google Colab</a></td>
</table>

# **Requirements**
The requirements to run this notebook are:
1. Have a @gmail account **->** here how to get one: https://support.google.com/accounts/answer/27441?hl=en
2. Have a Google Earth Engine account **->** here how to get one: https://signup.earthengine.google.com/
3. (Optional) Have a Google Coud Storage setup **->** here how to get one: https://cloud.google.com/storage *(please read Note 2 below)*

---
**Note**: Google Earth Engine is a free to use online tool, but it requires authorisation from Google first. After signing up, it may take a few days before being able to access the platform.

**Note 2**: Google Cloud Storage IS NOT a free tool. Like many other cloud services, it has different costs for different services. <br/>
At the time of writing of this notebook, Google offers new Cloud Storage users a 90-day free trial with some funds attached to it.<br/>
Please find more info at: https://cloud.google.com/free/docs/gcp-free-tier <br/>

---
**Disclaimer**: Using Google Cloud storge is optional to run this notebook, but be warned that depending on the size of the geographical area that you exoprt as pacthes, and hence the number of patches exported, the free 15GB offered by Google in Google Drive may not be enough.

# **Objective**

This Notebook has the purpose of generating image composites of a target geographical area and time period, classify the image using the traditional  machine learning classifier Random Forest, and export the classified image as TFRecords patches to the target Google storage.

Exporting multi-bands, classified images within a target area of ineterest as patches of user-defined sizes is a key step to later feed satellite imagery to deep convolutional neural networks for training.

# 1. Preparing the workspace

## Cloning the Github Repository
The github repository that stores the project is cloned to the workspace to allow accessing the needed packages.

In [None]:
github_repo = "https://github.com/davidelomeo/mangroves_deep_learning.git"
print("Github Repository: ", github_repo)

!git clone "{github_repo}" # clone the github repository

## Installing the required packages
Although Google Colab has a pre-installed environment that contains many packages, a `requirement.txt` was provided in the GitHub repository for consistency (please see disclaimer below).

The following code also install custom packages created specifically to facilitate the reproducibility of some key parts of the worfkflow, and hence allow the user to re-use these packages in other projects.

---
**Disclaimer**: The notebook was specifically designed to work on Google Colab. The user may use the notebook on a local machine (e.g. using jupyter notebook), but mounting the Google Drive will not be possible with the method showed below. In that scenario, the user may need to use Google Cloud Storagae only.

In [2]:
# Installing requirements.txt
# '&> /dev/null' allows to hide the terminal output when running the command
!pip install -r mangroves_deep_learning/requirements.txt &> /dev/null

## Importing the required packages
Here the code imports all the needed packages for this notebook.

**Note**: it is necessary to authenticate Google Drive and Earth Engine to use the notebook. Make sure to have previoulsy created the necessary accounts.

In [None]:
import ee
import geemap
import time
import json
from pprint import pprint

# This is a custom package. Please see the README in the repo for details.
import eeCustomTools as ct

# Authorising Google Colab notebook to access the target
# Google Drive and mount it
from google.colab import drive
drive.mount('/content/drive')

# Checking if Earth Engine is already authenticated and
# starts the authentication process if it is not
try:
    ee.Initialize()
except Exception as e:
    ee.Authenticate()
    ee.Initialize()

----> Only Run the next cell if wanting to export the generated pacthes to Google Cloud Storage.

---
Please authenticate Google Cloud Storage as done above with Google Drive and Earth Engine

In [None]:
# Authorising Google Colab notebook to access the target Google Cloud account
auth.authenticate_user()

# 2. Generating image composites
In this section Google Earth Engine Python API is used to generate image composites for a target year and geographical area, using pre-created regions of interests.




## Loading necessary FeatureCollections

This project uses the mangroves classification baselines generated by Worthington *et al.* (2020). <br/>
These can be found here: https://data.unep-wcmc.org/datasets/48

---
The four large shapefiles were imported into Google Earth Engine as assets here: 
* 1996 - https://code.earthengine.google.com/?asset=users/davidelomeo/lomeo_and_singh_2022/mangrove_typology_1996
* 2007 - https://code.earthengine.google.com/?asset=users/davidelomeo/lomeo_and_singh_2022/mangrove_typology_2007
* 2010 - https://code.earthengine.google.com/?asset=users/davidelomeo/lomeo_and_singh_2022/mangrove_typology_2010
* 2016 - https://code.earthengine.google.com/?asset=users/davidelomeo/lomeo_and_singh_2022/mangrove_typology_2016

---
The target geographical area (i.e., Southeast Asia) was defined with the Earth Engine drawing tool and exported as an assset here:
* https://code.earthengine.google.com/?asset=users/davidelomeo/lomeo_and_singh_2022/SEA

---
For the purposes of classifying land cover types in the target geographical area, 120 markers for each of the idntified 9 classes were manually generated using the Earth Engine online drawing tool and exported as an asset here: 
* https://code.earthengine.google.com/?asset=users/davidelomeo/lomeo_and_singh_2022/mangrove_custom_classes_2016

The markers were generated and exported using the JavaScript code here: 
* https://code.earthengine.google.com/6ef129f21d8920c45f22d14a8ddc4eb3.

The identified classes are:
* Mangroves -> Delta, Estuary, Lagoon, OpenCoast
* Water
* NonMangroves
* Clouds
* Ground
* Urban

---
Becuase the geographical area was too large to export single patches for its entirety, it was decided to choose 3 small sample areas for the export of the pactehs. The thre areas were converted to an asset that can be found here:
* https://code.earthengine.google.com/?asset=users/davidelomeo/lomeo_and_singh_2022/export_patches_regions_2016_3_squares

The target areas generates 48786 pacthes if setting patch-size to 256x256 pixels.

In [4]:
# Manually classified markers
classes_2016 = ee.FeatureCollection(
    'users/davidelomeo/lomeo_and_singh_2022/mangrove_custom_classes_2016')

# Loading a pre-defined Southeast Asia area of interest. This variable is not
# necessary but useful for delimiting the geographical area to Southeas Asia.
ROI = ee.FeatureCollection('users/davidelomeo/lomeo_and_singh_2022/SEA')

# Loading pre-drawned small sample areas for patch export.
patches_regions = ee.FeatureCollection(
    'users/davidelomeo/lomeo_and_singh_2022/export_patches_regions_2016_3_squares') #Maybe need changing to 2_squares

## Setup custom variables
Please modify these variables according to own needs and research

In [5]:
# Setting the start and end dates
start_date = ee.Date.fromYMD(2016, 1, 1)
end_date = ee.Date.fromYMD(2016, 12, 31)

class_columns_name = 'Class'

# Selecting the bands needed for the classification
bands = [
    'B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B8A', 'B9', 'B11', 'B12', 
    'NDVI', 'NDWI', 'MNDWI', 'NDSI', 'NDMI', 'EVI', 'EVI2', 'GOSAVI', 'SAVI'
]

## Creation of buffers
Snippet of code to generate buffers around the points in the FeatureCollection `classes_2016`. The procedure is done to get more pixels for the classification.

**Update**: although 9 classes were originally generated, later classification using deep neural networks suggested that the model was struggling to classify certain classes. It was decided to reduce the classes to 7 by merging Clouds, Ground and Urban classes.

In [None]:
# Separating clouds classes and creating small 5 meters buffers. This was done
# because clouds can be seen as very small from the satellite and could not risk
# to capture any of the pixels that did not belong to clouds
clouds = classes_2016.filter(
    ee.Filter.eq(class_columns_name, 6)).map(ct.buffer_size(5))

# Creating 50m buffers around no-cloud classes points
no_clouds = classes_2016.filter(
    ee.Filter.neq(class_columns_name, 6)).map(ct.buffer_size(50))

# Merging clouds with no_clouds classes and remapping classes Clouds, Urban
# and Ground so that they are classified as being the same (i.e., other)
region_of_interest = no_clouds.merge(clouds).remap([0, 1, 2, 3, 4, 5, 6, 7, 8],
                                                   [0, 1, 2, 3, 4, 5, 6, 6, 6],
                                                   'Class')

# Checking classes points count
print('Clouds points count:', clouds.size().getInfo())
print('noClouds points count:', no_clouds.size().getInfo())
print('Total points count:', region_of_interest.size().getInfo())

# Defining the name of the classes
classes_names = ['Delta', 'Estuary', 'Lagoon', 'OpenCoast', 'Water', 'NonMangroves', 'Other']

## Loading images of the target year

In [7]:
# Loading the images for the selected period and bouding them to the target ROI.
# The images are pre-filtered to only get those with maximum 30% of cloud cover
# to reduce the cloud masking effect on the images.
image_collection = ee.ImageCollection('COPERNICUS/S2') \
                     .filterDate(start_date, end_date) \
                     .filterBounds(ROI) \
                     .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 30)) \
                     .map(ct.mask_sentinel_clouds)

# Getting spectral indices for the median image and clipping to ROI. The median
# pixel value was preferred from the mean to avoid potential skeweness in the 
# pixels values distribution
median_image = ct.sentinel2_spectral_indices(
    image_collection.median()).clip(ROI)

# Creating a separate image for RGB visualisation
rgb_image = median_image.select(['B2', 'B3', 'B4'])

# Adding the computed bands to the median image
median_image = median_image.select(bands)

# 3. Classification of the image composite
In this section the median image was classified to obtain labelled images and later export them as patches ready for neural networks.

## Image Segmentation
The image was segmented using Simple Non-Iterative Clusteringto help the Random Forest classifier to better distinguish between land cover types.

In [8]:
segmented_image = ct.segment_image(median_image, bands)

## Splitting the data and generate the classifier
The data was split into training and test set using a 70-30 split. The classifier was then generated using the training set.

In [9]:
# Reducing the segmented image to the input collection
training = segmented_image.sampleRegions(
  collection=region_of_interest, 
  properties=[class_columns_name], 
  scale=10)

# Initialising random splitting adding a column of pseudo-random 
# numbers between 0 and 1 to the collection
random_column = training.randomColumn('random')

# Splitting the dataset into training and test datasets using the custom
# percentage.
train_dataset = random_column.filter(ee.Filter.lt('random', 0.7))
test_dataset = random_column.filter(ee.Filter.gte('random', 0.7))

# Checking the sizes of the datasets
# print('Train Dataset size:', train_dataset.size().getInfo()) 
# print('Test Dataset size:', test_dataset.size().getInfo())

# Generating the classifier using random forest
classifier = ee.Classifier.smileRandomForest(
  numberOfTrees=200,
).train(
  features=train_dataset,
  classProperty=class_columns_name,
  inputProperties=median_image.bandNames())

### Checking the accuracy of the classification (optional)
In the custom library created for this project, the package `get_metrics` allows to explore the result/ accuracy of the classification.

One of the downside of using the Python API (especially on Google Colab) is that if the user wants to get numerical values from the Google Server, it is necessary to request them using the method `.getInfo()` which may be very slow (depending on the machine's internet speed and how busy the server is at that time).

For this reason, it is adviced to only use `get_metrics` with caution. To prevent unwanted long runtimes, the package will only return the error matrix by default. This action alone may take up to 5 minutes to run.

If all the available metrics are run at the same time (please see package documentation), the following line of code may take up to 15 minutes to run.

In [None]:
metrics = ct.get_metrics(classifier, test_dataset, class_columns_name)
pprint(metrics)

## Classify the image using the custom classifier

In [10]:
# Assigning the name of the labels
classes_label = 'classes'

# Classifying the target median image
classified_image = median_image.classify(classifier).rename(classes_label)

# 4. Outputting the image and the classification

In [None]:
# Creating colour palette for the target classes
legend_dict = {
    'Delta': '80D604', 
    'Estuary': '01BD7C', 
    'Lagoon': '36DFFF', 
    'OpenCoast': 'DEFF00', 
    'Water': '0050D5', 
    'NonMangroves': '106703', 
    'Other': 'B06F03'
}

# Creating a True Colour Composite image
RGB = {
  'min': 0.0,
  'max': 0.3,
  'bands': ['B4', 'B3', 'B2']}

# Generating the map and adding layers for every feature that needs output
Map = geemap.Map(center=(7.8, -261), zoom=5, lite_mode=False)

# True colour composite image
Map.addLayer(rgb_image, RGB, 'RGB Image')

# Classified Image
Map.addLayer(classified_image, 
             {'min': 0, 'max': 6, 
              'palette': [v for v in legend_dict.values()]},
              'Classification 2016')

# Legend
Map.add_legend(title='Legend', legend_dict=legend_dict)

# Export regions
Map.addLayer(patches_regions, {}, 'Patches export regions', opacity=0.5)

Map.centerObject(patches_regions, 8)
Map

# 5. Prepare images for export in pathces
This section allows the user to generate pacthes of custom size within the pre-define `patches_regions` areas.



## Setting export parameters
The notebook is setup to save to Google Drive but the user can alternatively save to Google Cloud Storage.

In [None]:
# Parameters needed for saving the pacthes in Google Drive.
pixels = 256
year = 2016
scale = 10
region = patches_regions
folder = 'Labelled_dataset_' + str(year) + '_' + str(pixels) + 'x' + str(pixels) + '_classes_7_48786'
prefix = 'record_' + str(pixels) + 'x' + str(pixels) + '-'
updated_bands = bands + [classes_label]

# Image needed for the export
export_image = median_image.addBands(classified_image).select(updated_bands)

## Saving parameters into a .json file
Collecting the general info about the exported pacthes to a .json file for later access

In [None]:
patches_info = {
    'pixels': pixels,
    'year': year,
    'folder': folder,
    'prefix': prefix,
    'bands': bands,
    'classes_label': classes_label,
    'classes': classes_names
     }

# Exporting the .json file to Google Drive
path_to_json = '/content/drive/MyDrive/' + folder + '_export_pacthes_info.js'
with open(path_to_json, 'w') as f:
    json.dump(patches_info, f)

## Export patches to the target cloud storage
The choice for this project was to use Google Drive as much as possible to reduce runnning costs, but the user may choose to save the pacthes to Google Cloud Storage using the commented snippet of code below instead.

In [None]:
# Specify patch and file dimensions.
export_options = {
  'patchDimensions': [pixels, pixels],
  'compressed': True
}

# Exporting patches to Google Drive
image_task = ee.batch.Export.image.toDrive(
  image = export_image,
  description = 'Patches_Export',
  folder = folder,
  fileNamePrefix = prefix,
  scale = scale,
  maxPixels = 3784216672400,
  fileFormat = 'TFRecord',
  region = region.geometry(),
  formatOptions = export_options,
)

# # Exporting patches to Google Cloud Storage
# image_task = ee.batch.Export.image.toCloudStorage(
#   image = image,
#   description = 'Patches_Export',
#   fileNamePrefix = prefix,
#   bucket = 'define-the-google-storage-bucket-here',
#   scale = scale,
#   maxPixels = 3784216672400,
#   fileFormat = 'TFRecord',
#   region = region.geometry(),
#   formatOptions = export_options,
# )

# Starting the export
image_task.start()

### Monitoring export status (optional)
If running this cell, the code will run as long as the export tasks runs

In [None]:
# Checking the status of the export
while image_task.active():
  print('Running id: {}.'.format(image_task.id))
  time.sleep(120)

# 6. Access Notebook 2 to generate the Neural Network model

<table class="ee-notebook-buttons" align="left">
    <td><a target="_blank"  href="https://github.com/davidelomeo/mangroves_deep_learning/blob/main/Notebook_2-Generate_Model.ipynb"><img width=32px src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" /> Access Notebook_2 on Github</a></td>
    <td><a target="_blank"  href="https://colab.research.google.com/github.com/davidelomeo/mangroves_deep_learning/blob/main/Notebook_2-Generate_Model.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" /> Run Notebook_2 in Google Colab</a></td>
</table>
