This notebook houses all the tutorials to familiarise a new user to `pyeo`.

- Session 0, documents the prerequisites for creating a python environment with the required libraries for `pyeo`.
- Session 1, documents how to train a machine learning model capable of detecting deforestation with Sentinel-2 imagery.
- Session 2, covers how to build a composite baseline to compare forest changes against.
- Session 3, covers how to run the automatic change detection functions to identify deforestation
- Session 4, concerns how to analyse and report the deforestation alerts generated by `pyeo`.

**Table of contents**<a id='toc0_'></a>    
- [Session 0: Pre-requisites](#toc1_)    
  - [Installing Miniconda](#toc1_1_)    
  - [Creating a Conda Environment for pyeo](#toc1_2_)    
  - [Configure GDAL and PROJ_LIB Paths](#toc1_3_)    
  - [Installing Sen2Cor](#toc1_4_)    
- [Session 1: Model Training](#toc2_)    
- [Session 2: Composite Baseline Building](#toc3_)    
  - [Directory and Variable Setup](#toc3_1_)    
    - [Import Libraries](#toc3_1_1_)    
    - [Parse Parameters/Arguments](#toc3_1_2_)    
    - [Declare our Processing Parameters with In-Notebook Variables](#toc3_1_3_)    
    - [Create the Folder Structure pyeo Expects](#toc3_1_4_)    
    - [Create the Log File](#toc3_1_5_)    
    - [Create Zip and Unzip Functions](#toc3_1_6_)    
  - [Query Sentinel-2 Composite Imagery](#toc3_2_)    
    - [Search for L2A Images Corresponding to L1C](#toc3_2_1_)    
  - [Download Sentinel-2 Composite Imagery](#toc3_3_)    
    - [Download and Process L1Cs](#toc3_3_1_)    
    - [Download L2As](#toc3_3_2_)    
    - [Housekeeping](#toc3_3_3_)    
  - [Process the Downloaded Imagery](#toc3_4_)    
    - [Timestamp stuff (decide better name)](#toc3_4_1_)    
    - [Apply SCL Cloud Mask](#toc3_4_2_)    
    - [Apply Processing Baseline Offset](#toc3_4_3_)    
    - [Create Quicklooks of Cloud-Masked Images](#toc3_4_4_)    
  - [Create Composite from our Imagery](#toc3_5_)    
    - [Create Quicklook of the Composite](#toc3_5_1_)    
    - [Final Housekeeping](#toc3_5_2_)    
- [Session 3: Automatic Change Detection](#toc4_)    
  - [Import Libraries](#toc4_1_)    
  - [Re-declarations](#toc4_2_)    
    - [Re-Declare Arguments for Change Detection](#toc4_2_1_)    
    - [Compression Functions](#toc4_2_2_)    
  - [Query Sentinel-2 Change Imagery](#toc4_3_)    
  - [Download and Pre-Process L1C Change Imagery](#toc4_4_)    
  - [Download L2A Change Imagery](#toc4_5_)    
    - [Housekeeping - Compress L1Cs](#toc4_5_1_)    
  - [Cloud Masking, Offsetting and Quicklooks](#toc4_6_)    
  - [Classification of the Baseline Composite and Change Images](#toc4_7_)    
  - [Change Detection](#toc4_8_)    
    - [Could remove this not do_dev alternative. Since we want users to use the dev version.](#toc4_8_1_)    
    - [Final Housekeeping](#toc4_8_2_)    
- [Section 4: Alert Generation](#toc5_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Session 0: Pre-requisites](#toc0_)

## <a id='toc1_1_'></a>[Installing Miniconda](#toc0_)

1. **Ensure that you are running a `t2` instance on SEPAL before proceding with these steps.**

1. Within the SEPAL Terminal, type the following to download the latest version of Miniconda, and then install it:
    - `wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh`
    - `bash ~/miniconda.sh -p $HOME/miniconda`
    - Follow the installation prompts
    - The installer should ask for confirmation of the Miniconda installation directory, it should be: `/home/sepal-user/miniconda3`
    - After installing, the installer will ask if you want to initialise conda, enter: `yes`
1. Restart the SEPAL Terminal for the Conda installation to take effect.
    - You should now see the `(base)` prefix to your SEPAL path in the Terminal.

## <a id='toc1_2_'></a>[Creating a Conda Environment for pyeo](#toc0_)

This tutorial assumes you have a folder structure like the schematic diagram in the cell below:

1. Ensure you are within */home/sepal-user/pyeo/pyeo* :
    - `cd /home/sepal-user/pyeo/pyeo`
1. Run the following command to create the Conda environment with the required libraries and their correct versions.
    - `conda env create --file environment.yml --name pyeo_env`  
        - This process will take 15 - 20 minutes and the terminal will appear unresponsive, but it is working away in the background!
        - If the conda environment creation fails because of *Errno16: device or resource busy: "./nfs"*, run the environment creation again. This is a Linux OS error and not a pyeo installation error.  
    - `conda activate pyeo_env`  

1. Install ipykernel, to enable support for JupyterLab
    - `conda install ipykernel`

1. Ensure you are still within `/home/sepal-user/pyeo/pyeo` by typing:
    - `pwd`
    - The output path should be the same as above, e.g. `home/sepal-user/pyeo/pyeo`

3. This command below installs pyeo within the conda environment, so it can be imported.  
    - `python -m pip install -e .`   

4. The command below registers the Python interpreter in the conda environment as an ipykernel. If after running this command and the jupyter notebook cannot see your environment, restart the jupyter notebook and kernel.   
    - `python -m ipykernel install --user --name pyeo_env`

5. Close JupyterLab and reopen, so JupyterLab can see the new conda environment.

5. Make sure to select the `pyeo_env` kernel in JupyterLab when running the pyeo tutorial notebooks.

## <a id='toc1_3_'></a>[Configure GDAL and PROJ_LIB Paths](#toc0_)

Because a GDAL installation already exists on SEPAL (but that installation is too new for pyeo), we need to point Python to use the correct paths for GDAL and PROJ_LIB.  
If you've followed the instructions above, your respective libraries will be located at:

In [1]:
import os
os.environ["GDAL_DATA"] = "/home/sepal-user/miniconda3/envs/pyeo_env/share/gdal"
os.environ["PROJ_LIB"] = "/home/sepal-user/miniconda3/envs/pyeo_env/share/proj"

## <a id='toc1_4_'></a>[Installing Sen2Cor](#toc0_)

These instructions apply to Linux OS, which also applies to SEPAL.  
1. First, check whether Conda is activated, if so, deactivate Conda by typing:
    - `conda deactivate`  
    
To install Sen2Cor:  
1. Within the Terminal, navigate to the folder *pyeo_tutorials* by typing:
    - `cd pyeo_tutorials`
    
1. Within the Terminal:
    - `wget https://step.esa.int/thirdparties/sen2cor/2.11.0/Sen2Cor-02.11.00-Linux64.run`
    - `chmod +x Sen2Cor-02.11.00-Linux64.run`
    - `./Sen2Cor-02.11.00-Linux64.run`

# <a id='toc2_'></a>[Session 1: Model Training](#toc0_)

See the model training notebook. 

TODO Insert Path: 

# <a id='toc3_'></a>[Session 2: Composite Baseline Building](#toc0_)

This session will take us through the imagery query, download and composite creation aspects of the `rolling_composite_s2_change_detection.py` script, which is intended to run in a terminal via python.  

jupyter notebooks provide a useful and engaging interface to understand the components of this script, so we will follow an extracted version throughout this notebook.

This session has five components:   
[Component 1](#toc1_): Directory and Variable Setup.

[Component 2](#toc2_): Query for Sentinel-2 imagery that meet our search criteria.

[Component 3](#toc3_): Download the Sentinel-2 imagery from the Query.

[Component 4](#toc4_): If necessary, preprocess any L1C to L2A by applying atmospheric corrections. Then, cloud-mask the L2A imagery.

[Component 5](#toc5_): Finally, create a composite from the time series we downloaded and processed.


## <a id='toc3_1_'></a>[Directory and Variable Setup](#toc0_)

### <a id='toc3_1_1_'></a>[Import Libraries](#toc0_)

In [1]:
import shutil
import sys

import pyeo.classification
import pyeo.queries_and_downloads
import pyeo.raster_manipulation
import pyeo.filesystem_utilities

import configparser
import argparse
import json
import numpy as np
import os
from osgeo import gdal
import pandas as pd
import datetime as dt
import zipfile

gdal.UseExceptions()

### <a id='toc3_1_2_'></a>[Parse Parameters/Arguments](#toc0_)

Because `pyeo` is designed to be run in a terminal with flags passed to `tile_based_change_detection_from_cover_maps.py`, we will instead emulate these flags by declaring them as variables, below:

In [2]:
# these variables below, we are interested in for this notebook
# this path needs to be absolute, not relative
config_path = "/data/clcr/shared/IMPRESS/matt/pyeo/kenya.ini"
build_composite = True
chunks = 10
download_source = "scihub"
tile_id = "36MYE"
do_dev = True
do_quicklooks = True

# unsure if these are needed
arg_start_date = None
arg_end_date = None

# these variables below, we do not need for this notebook (but still need to declare as None or False)
do_download = False
build_prob_image = False
do_classify = False
do_change = False
do_update = False
do_delete = False
do_zip = False
do_all = False
skip_existing = False

### <a id='toc3_1_3_'></a>[Declare our Processing Parameters with In-Notebook Variables](#toc0_)

When running change detection in the terminal mode, we use an Initialisation file (.ini) to provide `pyeo` with parameters.

However, it is far easier for the purpose of this tutorial to declare our parameters within the notebook (except any sensitive credentials).

In [3]:
# the path root of where the user wants to store the imagery, does not need to exist
# this path needs to be absolute, not relative
root_dir = "/data/clcr/shared/IMPRESS/matt/pyeo"

# creates a path which is a combination of root_dir and the tile chosen in the cell above
tile_root_dir = os.path.join(root_dir, tile_id)

# start and end date for the change imagery, required despite not running change detection?
start_date = "20230301"
end_date = "20230331"

# start and end date for the composite
# composite_start_date = "20230101"
# composite_end_date = "20230228"
composite_start_date = "20220101"
composite_end_date = "20221231"

# maximum cloud cover (%)
cloud_cover = 25
cloud_certainty_threshold = 0

# the projection the user wants to work with
epsg = 21097

# the Sentinel-2 bands to use, currently restricted to B02, B03, B04 and B08
bands = ["B02", "B03", "B04", "B08"]

# file name pattern to search for when identifying band file locations in "" string notation
resolution = "10m"

# spatial resolution of the output raster files in metres. Can be any resolution, not just 10, 20 or 60 as in the default band resolutions of Sentinel-2
out_resolution = 10

# set buffer in number of pixels for dilating the SCL cloud mask (recommend 30 pixels of 10 m) for the change detection
buffer_size = 20

# set buffer in number of pixels for dilating the SCL cloud mask (recommend 10 pixels of 10 m) for the composite building
buffer_size_composite = 10

# maximum number of images to be downloaded for compositing, in order of least cloud cover
max_image_number = 12

# granules below this size in MB will not be downloaded, this prevents slivers of imagery being downloaded
faulty_granule_threshold = 200

# we will not use these variables in this tutorial, but they do need to be defined for pyeo to run
model_path = "../models/model_36MYE_37MER_37NCC_Unoptimised_20230313.pkl"


# path to sen2cor, for converting L1C to L2A
sen2cor_path = "/home/m/mp730/Downloads/Sen2Cor-02.11.00-Linux64/bin/L2A_Process"

class_labels = ["primary forest", "plantation forest", "bare soil", "crops", "grassland", "open water", "burn scar", "cloud", "cloud shadow", "haze", "sparse woodland", "dense woodland", "artificial"]
from_classes = [1,11,12]
to_classes = [3,4,5,13]

# whether to filter out (sieve) pixel noise from the classification
sieve = 0

But, it is prudent to keep any sensitive information (such as passwords) in a separate file, so these credentials are not displayed on screen.  
An empty credentials.ini file is provided for users to enter their Copernicus hub username and password.

In [4]:
credentials_path = "../credentials/credentials.ini"
conf = configparser.ConfigParser(allow_no_value=True)
conf.read(credentials_path)

sen_user = conf['sent_2']['user']
sen_pass = conf['sent_2']['pass']

### <a id='toc3_1_4_'></a>[Create the Folder Structure pyeo Expects](#toc0_)

In [5]:
pyeo.filesystem_utilities.create_folder_structure_for_tiles(tile_root_dir)

### <a id='toc3_1_5_'></a>[Create the Log File](#toc0_)

With `pyeo`, we use a Log file as a convenient location to store the processing information.  If we printed the processing information through iPython, the output text can be truncated and so users cannot view all of the processing information. Additionally, the log file acts as a record of which parameters were used.

In [6]:
log = pyeo.filesystem_utilities.init_log(os.path.join(tile_root_dir, "log", tile_id+"_log.txt"))
log.info("---------------------------------------------------------------")
log.info("---   PROCESSING START: {}   ---".format(tile_root_dir))
log.info("---------------------------------------------------------------")
log.info("Options:")
if do_dev:
    log.info("  --dev Running in development mode, choosing development versions of functions where available")
else:
    log.info("  Running in production mode, avoiding any development versions of functions.")
if do_all:
    log.info("  --do_all")
if build_composite:
    log.info("  --build_composite for baseline composite")
    log.info("  --download_source = {}".format(download_source))
if do_download:
    log.info("  --download for change detection images")
    if not build_composite:
        log.info("  --download_source = {}".format(download_source))
if do_classify:
    log.info("  --classify to apply the random forest model and create classification layers")
if build_prob_image:
    log.info("  --build_prob_image to save classification probability layers")
if do_change:
    log.info("  --change to produce change detection layers and report images")
if do_update:
    log.info("  --update to update the baseline composite with new observations")
if do_quicklooks:
    log.info("  --quicklooks to create image quicklooks")
if do_delete:
    log.info("  --remove downloaded L1C images and intermediate image products")
    log.info("           (cloud-masked band-stacked rasters, class images, change layers) after use.")
    log.info("           Deletes remaining temporary directories starting with \'tmp\' from interrupted processing runs.")
    log.info("           Keeps only L2A images, composites and report files.")
    log.info("           Overrides --zip for the above files. WARNING! FILE LOSS!")
if do_zip:
    log.info("  --zip archives L2A images, and if --remove is not selected also L1C,")
    log.info("           cloud-masked band-stacked rasters, class images and change layers after use.")

log.info("List of image bands: {}".format(bands))
log.info("Model used: {}".format(model_path))
log.info("List of class labels:")
for c, this_class in enumerate(class_labels):
    log.info("  {} : {}".format(c+1, this_class))
log.info("Detecting changes from any of the classes: {}".format(from_classes))
log.info("                    to any of the classes: {}".format(to_classes))


2023-03-24 11:29:42,983: INFO: ****PROCESSING START****
2023-03-24 11:29:42,984: INFO: ---------------------------------------------------------------
2023-03-24 11:29:42,985: INFO: ---   PROCESSING START: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE   ---
2023-03-24 11:29:42,985: INFO: ---------------------------------------------------------------
2023-03-24 11:29:42,986: INFO: Options:
2023-03-24 11:29:42,987: INFO:   --dev Running in development mode, choosing development versions of functions where available
2023-03-24 11:29:42,988: INFO:   --build_composite for baseline composite
2023-03-24 11:29:42,988: INFO:   --download_source = scihub
2023-03-24 11:29:42,989: INFO:   --quicklooks to create image quicklooks
2023-03-24 11:29:42,990: INFO: List of image bands: ['B02', 'B03', 'B04', 'B08']
2023-03-24 11:29:42,991: INFO: Model used: ../models/model_36MYE_37MER_37NCC_Unoptimised_20230313.pkl
2023-03-24 11:29:42,991: INFO: List of class labels:
2023-03-24 11:29:42,992: INFO:   1 : prim

This cell ensures the paths for the directories are available, to create when needed.

In [7]:
log.info("\nCreating the directory paths")

change_image_dir = os.path.join(tile_root_dir, r"images")
l1_image_dir = os.path.join(tile_root_dir, r"images/L1C")
l2_image_dir = os.path.join(tile_root_dir, r"images/L2A")
l2_masked_image_dir = os.path.join(tile_root_dir, r"images/cloud_masked")
categorised_image_dir = os.path.join(tile_root_dir, r"output/classified")
probability_image_dir = os.path.join(tile_root_dir, r"output/probabilities")
sieved_image_dir = os.path.join(tile_root_dir, r"output/sieved")
composite_dir = os.path.join(tile_root_dir, r"composite")
composite_l1_image_dir = os.path.join(tile_root_dir, r"composite/L1C")
composite_l2_image_dir = os.path.join(tile_root_dir, r"composite/L2A")
composite_l2_masked_image_dir = os.path.join(tile_root_dir, r"composite/cloud_masked")
quicklook_dir = os.path.join(tile_root_dir, r"output/quicklooks")


2023-03-24 11:29:45,735: INFO: 
Creating the directory paths


### <a id='toc3_1_6_'></a>[Create Zip and Unzip Functions](#toc0_)

These functions are useful for reducing the storage size of the imagery that we have downloaded.

In [8]:
def zip_contents(directory, notstartswith=None):
        paths = [f for f in os.listdir(directory) if not f.endswith(".zip")]
        for f in paths:
            do_it = True
            if notstartswith is not None:
                for i in notstartswith:
                    if f.startswith(i):
                        do_it = False
                        log.info('Skipping file that starts with \'{}\':   {}'.format(i,f))
            if do_it:
                file_to_zip = os.path.join(directory, f)
                zipped_file = file_to_zip.split(".")[0]
                log.info('Zipping   {}'.format(file_to_zip))
                if os.path.isdir(file_to_zip):
                    shutil.make_archive(zipped_file, 'zip', file_to_zip)
                else:
                    with zipfile.ZipFile(zipped_file+".zip", "w", compression=zipfile.ZIP_DEFLATED) as zf:
                        zf.write(file_to_zip, os.path.basename(file_to_zip))
                if (os.path.exists(zipped_file+".zip")):
                    if os.path.isdir(file_to_zip):
                        shutil.rmtree(file_to_zip)
                    else:
                        os.remove(file_to_zip)
                else:
                    log.error("Zipping failed: {}".format(zipped_file+".zip"))
        return

def unzip_contents(zippath, ifstartswith=None, ending=None):
    dirpath = zippath[:-4] # cut away the  .zip ending
    if ifstartswith is not None and ending is not None:
        if dirpath.startswith(ifstartswith):
            dirpath = dirpath + ending
    log.info("Unzipping {}".format(zippath))
    if not os.path.exists(dirpath):
        os.makedirs(dirpath)
    if os.path.exists(dirpath):
        if os.path.exists(zippath):
            shutil.unpack_archive(
                filename=zippath,
                extract_dir=dirpath,
                format='zip'
                )
            os.remove(zippath)
    else:
        log.error("Unzipping failed")
    return

## <a id='toc3_2_'></a>[Query Sentinel-2 Composite Imagery](#toc0_)

Now that we have the query, file handling and log parameters set up, we can start querying the Copernicus Hub for the Sentinel-2 imagery that we want.  

The cell below starts the `build_composite` process. First, we query for the `L1C` products that match our criteria (date range, tile of interest, cloud cover).

Since we have declared a download limit of 12 images, the software caps the number of images in our query. This is a useful tool if we have limited disk space.

In [10]:
# ------------------------------------------------------------------------
# Step 1: Create an initial cloud-free median composite from Sentinel-2 as a baseline map
# ------------------------------------------------------------------------

if build_composite or do_all:
    log.info("---------------------------------------------------------------")
    log.info("Creating an initial cloud-free median composite from Sentinel-2 as a baseline map")
    log.info("---------------------------------------------------------------")
    log.info("Searching for images for initial composite.")

    composite_products_all = pyeo.queries_and_downloads.check_for_s2_data_by_date(root_dir,
                                                                                    composite_start_date,
                                                                                    composite_end_date,
                                                                                    conf,
                                                                                    cloud_cover=cloud_cover,
                                                                                    tile_id=tile_id,
                                                                                    producttype=None #"S2MSI2A" or "S2MSI1C"
                                                                                    )
    
    log.info("--> Found {} L1C and L2A products for the composite:".format(len(composite_products_all)))
    df_all = pd.DataFrame.from_dict(composite_products_all, orient='index')

    # check granule sizes on the server
    df_all['size'] = df_all['size'].str.split(' ').apply(lambda x: float(x[0]) * {'GB': 1e3, 'MB': 1, 'KB': 1e-3}[x[1]])
    df = df_all.query('size >= '+str(faulty_granule_threshold))
    log.info("Removed {} faulty scenes <{}MB in size from the list:".format(len(df_all)-len(df), faulty_granule_threshold))
    df_faulty = df_all.query('size < '+str(faulty_granule_threshold))
    for r in range(len(df_faulty)):
        log.info("   {} MB: {}".format(df_faulty.iloc[r,:]['size'], df_faulty.iloc[r,:]['title']))

    l1c_products = df[df.processinglevel == 'Level-1C']
    l2a_products = df[df.processinglevel == 'Level-2A']
    log.info("    {} L1C products".format(l1c_products.shape[0]))
    log.info("    {} L2A products".format(l2a_products.shape[0]))

    rel_orbits = np.unique(l1c_products['relativeorbitnumber'])
    if len(rel_orbits) > 0:
        if l1c_products.shape[0] > max_image_number / len(rel_orbits):
            log.info("Capping the number of L1C products to {}".format(max_image_number))
            log.info("Relative orbits found covering tile: {}".format(rel_orbits))
            uuids = []
            for orb in rel_orbits:
                uuids = uuids + list(l1c_products.loc[l1c_products['relativeorbitnumber'] == orb].sort_values(by=['cloudcoverpercentage'], ascending=True)['uuid'][:int(max_image_number/len(rel_orbits))])
            l1c_products = l1c_products[l1c_products['uuid'].isin(uuids)]
            log.info("    {} L1C products remain:".format(l1c_products.shape[0]))
            for product in l1c_products['title']:
                log.info("       {}".format(product))

    rel_orbits = np.unique(l2a_products['relativeorbitnumber'])
    if len(rel_orbits) > 0:
        if l2a_products.shape[0] > max_image_number/len(rel_orbits):
            log.info("Capping the number of L2A products to {}".format(max_image_number))
            log.info("Relative orbits found covering tile: {}".format(rel_orbits))
            uuids = []
            for orb in rel_orbits:
                uuids = uuids + list(l2a_products.loc[l2a_products['relativeorbitnumber'] == orb].sort_values(by=['cloudcoverpercentage'], ascending=True)['uuid'][:int(max_image_number/len(rel_orbits))])
            l2a_products = l2a_products[l2a_products['uuid'].isin(uuids)]
            log.info("    {} L2A products remain:".format(l2a_products.shape[0]))
            for product in l2a_products['title']:
                log.info("       {}".format(product))

    if l1c_products.shape[0]>0 and l2a_products.shape[0]>0:
        log.info("Filtering out L1C products that have the same 'beginposition' time stamp as an existing L2A product.")
        l1c_products, l2a_products = pyeo.queries_and_downloads.filter_unique_l1c_and_l2a_data(df)
        log.info("--> {} L1C and L2A products with unique 'beginposition' time stamp for the composite:".format(l1c_products.shape[0]+l2a_products.shape[0]))
        log.info("    {} L1C products".format(l1c_products.shape[0]))
        log.info("    {} L2A products".format(l2a_products.shape[0]))
    df = None

2023-03-21 17:31:21,340: INFO: ---------------------------------------------------------------
2023-03-21 17:31:21,341: INFO: Creating an initial cloud-free median composite from Sentinel-2 as a baseline map
2023-03-21 17:31:21,342: INFO: ---------------------------------------------------------------
2023-03-21 17:31:21,342: INFO: Searching for images for initial composite.
2023-03-21 17:31:21,343: INFO: Sending Sentinel-2 query for Tile ID:
2023-03-21 17:31:21,344: INFO:    tile_id: 36MYE
2023-03-21 17:31:21,344: INFO:    start_date: 20220101
2023-03-21 17:31:21,345: INFO:    end_date: 20221231
2023-03-21 17:31:21,345: INFO:    cloud_cover: 25
2023-03-21 17:31:21,346: INFO:    product_type: None
2023-03-21 17:31:21,346: INFO:    file_name: None
2023-03-21 17:31:23,596: INFO: --> Found 52 L1C and L2A products for the composite:
2023-03-21 17:31:23,606: INFO: Removed 0 faulty scenes <200MB in size from the list:
2023-03-21 17:31:23,611: INFO:     52 L1C products
2023-03-21 17:31:23,612

### <a id='toc3_2_1_'></a>[Search for L2A Images Corresponding to L1C](#toc0_)

The cell below does two things.  

Firstly, it searches our download directory for any existing imagery. If we have downloaded any imagery already, `pyeo` will remove the matching image from our search query.  

Secondly, `pyeo` searches the Copernicus archive for any corresponding `L2A` products. If we find a matching L2A product, then we remove the `L1C` counterpart from the query. 

In [11]:
# Search the composite/L2A and L1C directories whether the scenes have already been downloaded and/or processed and check their dir sizes
if l1c_products.shape[0] > 0:
    log.info("Checking for already downloaded and zipped L1C or L2A products and")
    log.info("  availability of matching L2A products for download.")
    n = len(l1c_products)
    drop=[]
    add=[]
    for r in range(n):
        id = l1c_products.iloc[r,:]['title']
        search_term = id.split("_")[2]+"_"+id.split("_")[3]+"_"+id.split("_")[4]+"_"+id.split("_")[5]
        log.info("Searching locally for file names containing: {}.".format(search_term))
        file_list = [os.path.join(composite_l1_image_dir, f) for f in os.listdir(composite_l1_image_dir)] + \
            [os.path.join(composite_l2_image_dir, f) for f in os.listdir(composite_l2_image_dir)] + \
            [os.path.join(composite_l2_masked_image_dir, f) for f in os.listdir(composite_l2_masked_image_dir)]
        for f in file_list:
            if search_term in f:
                log.info("  Product already downloaded: {}".format(f))
                drop.append(l1c_products.index[r])
        search_term = "*"+id.split("_")[2]+"_"+id.split("_")[3]+"_"+id.split("_")[4]+"_"+id.split("_")[5]+"*"
        log.info("Searching on the data hub for files containing: {}.".format(search_term))
        matching_l2a_products = pyeo.queries_and_downloads._file_api_query(user=sen_user,
                                                                            passwd=sen_pass,
                                                                            start_date=composite_start_date,
                                                                            end_date=composite_end_date,
                                                                            filename=search_term,
                                                                            cloud=cloud_cover,
                                                                            producttype="S2MSI2A"
                                                                            )

        matching_l2a_products_df = pd.DataFrame.from_dict(matching_l2a_products, orient='index')
        # 07/03/2023: Applied Ali's fix for converting product size to MB to compare against faulty_grandule_threshold
        if len(matching_l2a_products_df) == 1 and [float(x[0]) * {'GB': 1e3, 'MB': 1, 'KB': 1e-3}[x[1]] for x in [matching_l2a_products_df['size'][0].split(' ')]][0] > faulty_granule_threshold:
            log.info("Replacing L1C {} with L2A product:".format(id))
            log.info("              {}".format(matching_l2a_products_df.iloc[0,:]['title']))
            drop.append(l1c_products.index[r])
            add.append(matching_l2a_products_df.iloc[0,:])
        if len(matching_l2a_products_df) == 0:
            pass
        if len(matching_l2a_products_df) > 1:
            # check granule sizes on the server
            matching_l2a_products_df['size'] = matching_l2a_products_df['size'].str.split(' ').apply(lambda x: float(x[0]) * {'GB': 1e3, 'MB': 1, 'KB': 1e-3}[x[1]])
            matching_l2a_products_df = matching_l2a_products_df.query('size >= '+str(faulty_granule_threshold))
            if matching_l2a_products_df.iloc[0,:]['size'].str.split(' ').apply(lambda x: float(x[0]) * {'GB': 1e3, 'MB': 1, 'KB': 1e-3}[x[1]]) > faulty_granule_threshold:
                log.info("Replacing L1C {} with L2A product:".format(id))
                log.info("              {}".format(matching_l2a_products_df.iloc[0,:]['title']))
                drop.append(l1c_products.index[r])
                add.append(matching_l2a_products_df.iloc[0,:])
    if len(drop) > 0:
        l1c_products = l1c_products.drop(index=drop)
    if len(add) > 0:
        l2a_products = l2a_products.append(add)
    l2a_products = l2a_products.drop_duplicates(subset='title')
    log.info("    {} L1C products remaining for download".format(l1c_products.shape[0]))
    log.info("    {} L2A products remaining for download".format(l2a_products.shape[0]))    

2023-03-21 17:31:30,854: INFO: Checking for already downloaded and zipped L1C or L2A products and
2023-03-21 17:31:30,855: INFO:   availability of matching L2A products for download.
2023-03-21 17:31:30,856: INFO: Searching locally for file names containing: 20220928T074709_N0400_R135_T36MYE.
2023-03-21 17:31:30,860: INFO: Searching on the data hub for files containing: *20220928T074709_N0400_R135_T36MYE*.
2023-03-21 17:31:33,814: INFO: Replacing L1C S2B_MSIL1C_20220928T074709_N0400_R135_T36MYE_20220928T094508 with L2A product:
2023-03-21 17:31:33,815: INFO:               S2B_MSIL2A_20220928T074709_N0400_R135_T36MYE_20220928T101456
2023-03-21 17:31:33,817: INFO: Searching locally for file names containing: 20220720T074619_N0400_R135_T36MYE.
2023-03-21 17:31:33,819: INFO: Searching on the data hub for files containing: *20220720T074619_N0400_R135_T36MYE*.
2023-03-21 17:31:36,680: INFO: Replacing L1C S2B_MSIL1C_20220720T074619_N0400_R135_T36MYE_20220720T095512 with L2A product:
2023-03-2

## <a id='toc3_3_'></a>[Download Sentinel-2 Composite Imagery](#toc0_)

### <a id='toc3_3_1_'></a>[Download and Process L1Cs](#toc0_)

From the `log` output above in the previous section, we can see that `pyeo` has found a matching `L2A` image for each of the `L1Cs` in our search query. So now we have only L2As in our search query.  

If we did have `L1Cs` in our search query, then the cell below would download these L1Cs and apply `atmospheric_correction` using `Sen2Cor`.

In [12]:
if l1c_products.shape[0] > 0:
    log.info("Downloading Sentinel-2 L1C products.")
            
    pyeo.queries_and_downloads.download_s2_data_from_df(l1c_products,
                                                        composite_l1_image_dir,
                                                        composite_l2_image_dir,
                                                        download_source,
                                                        user=sen_user,
                                                        passwd=sen_pass,
                                                        try_scihub_on_fail=True)
    log.info("Atmospheric correction with sen2cor.")
    pyeo.raster_manipulation.atmospheric_correction(composite_l1_image_dir,
                                                    composite_l2_image_dir,
                                                    sen2cor_path,
                                                    delete_unprocessed_image=False)

### <a id='toc3_3_2_'></a>[Download L2As](#toc0_)

In this subsection, we will download the L2As from our search query.  

But first, let's take a look at what our search query result, `l2a_products` looks like by printing the first 3 rows with `.head(3)`:



In [13]:
l2a_products.head(3)

Unnamed: 0,title,link,link_alternative,link_icon,summary,ondemand,datatakesensingstart,generationdate,beginposition,endposition,...,uuid,illuminationazimuthangle,illuminationzenithangle,vegetationpercentage,notvegetatedpercentage,waterpercentage,unclassifiedpercentage,mediumprobacloudspercentage,highprobacloudspercentage,snowicepercentage
aaeae097-1d47-4cae-abdb-f3959ae8ee35,S2B_MSIL2A_20220928T074709_N0400_R135_T36MYE_2...,https://apihub.copernicus.eu/apihub/odata/v1/P...,https://apihub.copernicus.eu/apihub/odata/v1/P...,https://apihub.copernicus.eu/apihub/odata/v1/P...,"Date: 2022-09-28T07:47:09.024Z, Instrument: MS...",False,NaT,2022-09-28 10:14:56,2022-09-28 07:47:09.024,2022-09-28 07:47:09.024,...,aaeae097-1d47-4cae-abdb-f3959ae8ee35,94.69738,19.89525,89.273399,5.84651,0.371303,0.808315,0.482288,0.026178,0.0
9bf4d659-ce3e-4b95-b0f9-c216080c0349,S2B_MSIL2A_20220720T074619_N0400_R135_T36MYE_2...,https://apihub.copernicus.eu/apihub/odata/v1/P...,https://apihub.copernicus.eu/apihub/odata/v1/P...,https://apihub.copernicus.eu/apihub/odata/v1/P...,"Date: 2022-07-20T07:46:19.024Z, Instrument: MS...",False,NaT,2022-07-20 10:31:03,2022-07-20 07:46:19.024,2022-07-20 07:46:19.024,...,9bf4d659-ce3e-4b95-b0f9-c216080c0349,46.31214,31.399527,82.696658,7.556468,0.389574,1.579262,4.07288,3.299518,0.0
1ddabc9d-d2ad-4425-aaeb-ad2663074574,S2A_MSIL2A_20220615T074621_N0400_R135_T36MYE_2...,https://apihub.copernicus.eu/apihub/odata/v1/P...,https://apihub.copernicus.eu/apihub/odata/v1/P...,https://apihub.copernicus.eu/apihub/odata/v1/P...,"Date: 2022-06-15T07:46:21.024Z, Instrument: MS...",False,NaT,2022-06-15 11:30:17,2022-06-15 07:46:21.024,2022-06-15 07:46:21.024,...,1ddabc9d-d2ad-4425-aaeb-ad2663074574,40.766014,32.158949,94.336027,1.929038,0.407898,0.100099,1.063407,0.681584,0.0


Let's highlight a few columns of interest:  

In the cell output above, we can see the product `uuid` as the dataframe index (*the first column, it has no column name*). These are the unique identifiers used to distinguish the scenes from each other.  

From the `title` column, we can see the titles of each product, the titles themselves show us important information, for example: the Satellite (*S2A or S2B*), the Sensor (*MSI*), the product type (*L2A*), the date the image was captured (*YYYYMMDD*) or the corresponding tile for the image (*TXXXXX*).

We can also see if the product is online or in the Long-Term Archive (`LTA`), by looking at the column `ondemand`, where `false` indicates the product is in the LTA or `true` indicates the product is online and ready for download.

--

Now, let's download the `L2As` in our search query `l2a_products`, by asking `pyeo` to download these images from the Copernicus archive. If any incomplete downloads are present from a previous run (*remember, pyeo is an iterative download, classification and change detection process*), then `pyeo` will flag these files to the user through the log file.

If the images are in the Long Term Archive (`LTA`), then `pyeo` will linearly activate and wait for the LTA image to become available, before downloading and moving onto the next L2A in the search query.

In [14]:
if l2a_products.shape[0] > 0:
    log.info("Downloading Sentinel-2 L2A products.")
    pyeo.queries_and_downloads.download_s2_data(l2a_products.to_dict('index'),
                                                composite_l1_image_dir,
                                                composite_l2_image_dir,
                                                download_source,
                                                user=sen_user,
                                                passwd=sen_pass,
                                                try_scihub_on_fail=True)

# check for incomplete L2A downloads
incomplete_downloads, sizes = pyeo.raster_manipulation.find_small_safe_dirs(composite_l2_image_dir, threshold=faulty_granule_threshold*1024*1024)
if len(incomplete_downloads) > 0:
    for index, safe_dir in enumerate(incomplete_downloads):
        if sizes[index]/1024/1024 < faulty_granule_threshold and os.path.exists(safe_dir):
            log.warning("Found likely incomplete download of size {} MB: {}".format(str(round(sizes[index]/1024/1024)), safe_dir))
            #shutil.rmtree(safe_dir)

2023-03-21 17:32:28,777: INFO: Downloading Sentinel-2 L2A products.
2023-03-21 17:32:28,782: INFO: ../36MYE/composite/L2A/S2B_MSIL2A_20220928T074709_N0400_R135_T36MYE_20220928T101456.SAFE does not exist.
2023-03-21 17:32:28,783: INFO: Downloading S2B_MSIL2A_20220928T074709_N0400_R135_T36MYE_20220928T101456 from scihub to ../36MYE/composite/L2A
2023-03-21 17:32:29,127: INFO: Product aaeae097-1d47-4cae-abdb-f3959ae8ee35 is online. Starting download.
2023-03-21 17:32:29,128: INFO: I.R. Tenacity will retry download up to 5 times.


Downloading S2B_MSIL2A_20220928T074709_N0400_R135_T36MYE_20220928T101456.zip:   0%|          | 0.00/1.19G [00:…

MD5 checksumming:   0%|          | 0.00/1.19G [00:00<?, ?B/s]

2023-03-21 17:32:40,852: INFO: Unzipping ../36MYE/composite/L2A/S2B_MSIL2A_20220928T074709_N0400_R135_T36MYE_20220928T101456.zip to ../36MYE/composite/L2A
2023-03-21 17:32:47,098: INFO: Removing ../36MYE/composite/L2A/S2B_MSIL2A_20220928T074709_N0400_R135_T36MYE_20220928T101456.zip
2023-03-21 17:32:47,356: INFO: ../36MYE/composite/L2A/S2B_MSIL2A_20220720T074619_N0400_R135_T36MYE_20220720T103103.SAFE does not exist.
2023-03-21 17:32:47,357: INFO: Downloading S2B_MSIL2A_20220720T074619_N0400_R135_T36MYE_20220720T103103 from scihub to ../36MYE/composite/L2A
2023-03-21 17:32:47,692: INFO: Product 9bf4d659-ce3e-4b95-b0f9-c216080c0349 is online. Starting download.
2023-03-21 17:32:47,693: INFO: I.R. Tenacity will retry download up to 5 times.


Downloading S2B_MSIL2A_20220720T074619_N0400_R135_T36MYE_20220720T103103.zip:   0%|          | 0.00/1.19G [00:…

MD5 checksumming:   0%|          | 0.00/1.19G [00:00<?, ?B/s]

2023-03-21 17:32:57,438: INFO: Unzipping ../36MYE/composite/L2A/S2B_MSIL2A_20220720T074619_N0400_R135_T36MYE_20220720T103103.zip to ../36MYE/composite/L2A
2023-03-21 17:33:04,065: INFO: Removing ../36MYE/composite/L2A/S2B_MSIL2A_20220720T074619_N0400_R135_T36MYE_20220720T103103.zip
2023-03-21 17:33:04,320: INFO: ../36MYE/composite/L2A/S2A_MSIL2A_20220615T074621_N0400_R135_T36MYE_20220615T113017.SAFE does not exist.
2023-03-21 17:33:04,321: INFO: Downloading S2A_MSIL2A_20220615T074621_N0400_R135_T36MYE_20220615T113017 from scihub to ../36MYE/composite/L2A
2023-03-21 17:33:04,571: INFO: Product 1ddabc9d-d2ad-4425-aaeb-ad2663074574 is not online. Triggering retrieval from long-term archive.
2023-03-21 17:33:04,572: INFO: Remember: 'Patience is bitter, but its fruit is sweet.' (Jean-Jacques Rousseau)


Downloading products:   0%|          | 0/1 [00:00<?, ?product/s]

LTA retrieval:   0%|          | 0/1 [00:00<?, ?product/s]

Downloading S2A_MSIL2A_20220615T074621_N0400_R135_T36MYE_20220615T113017.zip:   0%|          | 0.00/1.19G [00:…

MD5 checksumming:   0%|          | 0.00/1.19G [00:00<?, ?B/s]

2023-03-21 18:03:44,231: INFO: Downloaded: {'id': '1ddabc9d-d2ad-4425-aaeb-ad2663074574', 'title': 'S2A_MSIL2A_20220615T074621_N0400_R135_T36MYE_20220615T113017', 'size': 1185519360, 'md5': '1462ad6096a9407baf5de55b60632ff3', 'date': datetime.datetime(2022, 6, 15, 7, 46, 21, 24000), 'footprint': 'POLYGON((34.796693520402904 0,35.78263300407275 0,35.783047841608315 -0.992214719802887,34.79696161901289 -0.992902323890143,34.796693520402904 0))', 'url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('1ddabc9d-d2ad-4425-aaeb-ad2663074574')/$value", 'Online': True, 'Creation Date': datetime.datetime(2022, 6, 15, 13, 33, 7, 12000), 'Ingestion Date': datetime.datetime(2022, 6, 15, 13, 32, 38, 515000), 'quicklook_url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('1ddabc9d-d2ad-4425-aaeb-ad2663074574')/Products('Quicklook')/$value", 'path': '../36MYE/composite/L2A/S2A_MSIL2A_20220615T074621_N0400_R135_T36MYE_20220615T113017.zip', 'downloaded_bytes': 1185519360}
2023-03-21 18:03:4

Downloading products:   0%|          | 0/1 [00:00<?, ?product/s]

LTA retrieval:   0%|          | 0/1 [00:00<?, ?product/s]

Downloading S2B_MSIL2A_20220521T074609_N0400_R135_T36MYE_20220521T102923.zip:   0%|          | 0.00/1.18G [00:…

MD5 checksumming:   0%|          | 0.00/1.18G [00:00<?, ?B/s]

2023-03-21 18:54:10,468: INFO: Downloaded: {'id': '229852de-20cc-4e22-98e5-b06f48cc12e7', 'title': 'S2B_MSIL2A_20220521T074609_N0400_R135_T36MYE_20220521T102923', 'size': 1184150275, 'md5': 'efadb13d307a3fdaaf031a31b8411fba', 'date': datetime.datetime(2022, 5, 21, 7, 46, 9, 24000), 'footprint': 'POLYGON((34.796693520402904 0,35.78263300407275 0,35.783047841608315 -0.992214719802887,34.79696161901289 -0.992902323890143,34.796693520402904 0))', 'url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('229852de-20cc-4e22-98e5-b06f48cc12e7')/$value", 'Online': True, 'Creation Date': datetime.datetime(2022, 5, 21, 15, 4, 2, 564000), 'Ingestion Date': datetime.datetime(2022, 5, 21, 15, 3, 43, 916000), 'quicklook_url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('229852de-20cc-4e22-98e5-b06f48cc12e7')/Products('Quicklook')/$value", 'path': '../36MYE/composite/L2A/S2B_MSIL2A_20220521T074609_N0400_R135_T36MYE_20220521T102923.zip', 'downloaded_bytes': 1184150275}
2023-03-21 18:54:10,

Downloading products:   0%|          | 0/1 [00:00<?, ?product/s]

LTA retrieval:   0%|          | 0/1 [00:00<?, ?product/s]

Downloading S2A_MSIL2A_20220516T074621_N0400_R135_T36MYE_20220516T124012.zip:   0%|          | 0.00/1.18G [00:…

MD5 checksumming:   0%|          | 0.00/1.18G [00:00<?, ?B/s]

2023-03-21 20:24:38,061: INFO: Downloaded: {'id': 'a9fc6b06-729a-48a5-957c-fe45d5f39781', 'title': 'S2A_MSIL2A_20220516T074621_N0400_R135_T36MYE_20220516T124012', 'size': 1184909009, 'md5': '6cba128b344f870c0261d7bb0e2d0bbf', 'date': datetime.datetime(2022, 5, 16, 7, 46, 21, 24000), 'footprint': 'POLYGON((34.796693520402904 0,35.78263300407275 0,35.783047841608315 -0.992214719802887,34.79696161901289 -0.992902323890143,34.796693520402904 0))', 'url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('a9fc6b06-729a-48a5-957c-fe45d5f39781')/$value", 'Online': True, 'Creation Date': datetime.datetime(2022, 5, 16, 14, 10, 3, 850000), 'Ingestion Date': datetime.datetime(2022, 5, 16, 14, 9, 50, 72000), 'quicklook_url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('a9fc6b06-729a-48a5-957c-fe45d5f39781')/Products('Quicklook')/$value", 'path': '../36MYE/composite/L2A/S2A_MSIL2A_20220516T074621_N0400_R135_T36MYE_20220516T124012.zip', 'downloaded_bytes': 1184909009}
2023-03-21 20:24:38

Downloading products:   0%|          | 0/1 [00:00<?, ?product/s]

LTA retrieval:   0%|          | 0/1 [00:00<?, ?product/s]

Downloading S2B_MSIL2A_20220511T074609_N0400_R135_T36MYE_20220511T103313.zip:   0%|          | 0.00/1.19G [00:…

MD5 checksumming:   0%|          | 0.00/1.19G [00:00<?, ?B/s]

2023-03-21 20:54:59,866: INFO: Downloaded: {'id': '3531a110-d6b5-406b-862a-ad2f80a893d8', 'title': 'S2B_MSIL2A_20220511T074609_N0400_R135_T36MYE_20220511T103313', 'size': 1185108640, 'md5': 'e28e14320bf3c0c78087ecc2d14967f4', 'date': datetime.datetime(2022, 5, 11, 7, 46, 9, 24000), 'footprint': 'POLYGON((34.796693520402904 0,35.78263300407275 0,35.783047841608315 -0.992214719802887,34.79696161901289 -0.992902323890143,34.796693520402904 0))', 'url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('3531a110-d6b5-406b-862a-ad2f80a893d8')/$value", 'Online': True, 'Creation Date': datetime.datetime(2022, 5, 11, 21, 2, 2, 876000), 'Ingestion Date': datetime.datetime(2022, 5, 11, 21, 1, 54, 301000), 'quicklook_url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('3531a110-d6b5-406b-862a-ad2f80a893d8')/Products('Quicklook')/$value", 'path': '../36MYE/composite/L2A/S2B_MSIL2A_20220511T074609_N0400_R135_T36MYE_20220511T103313.zip', 'downloaded_bytes': 1185108640}
2023-03-21 20:54:59,

Downloading products:   0%|          | 0/1 [00:00<?, ?product/s]

LTA retrieval:   0%|          | 0/1 [00:00<?, ?product/s]

Downloading S2A_MSIL2A_20220406T074611_N0400_R135_T36MYE_20220406T102936.zip:   0%|          | 0.00/1.21G [00:…

MD5 checksumming:   0%|          | 0.00/1.21G [00:00<?, ?B/s]

2023-03-21 21:25:45,705: INFO: Downloaded: {'id': 'e1dae544-5710-44fd-a63b-137e7b11e2f1', 'title': 'S2A_MSIL2A_20220406T074611_N0400_R135_T36MYE_20220406T102936', 'size': 1205894894, 'md5': 'caeb625f88abed6ff193c8e104841f00', 'date': datetime.datetime(2022, 4, 6, 7, 46, 11, 24000), 'footprint': 'POLYGON((34.796693520402904 0,35.78263300407275 0,35.783047841608315 -0.992214719802887,34.79696161901289 -0.992902323890143,34.796693520402904 0))', 'url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('e1dae544-5710-44fd-a63b-137e7b11e2f1')/$value", 'Online': True, 'Creation Date': datetime.datetime(2022, 4, 6, 14, 10, 4, 507000), 'Ingestion Date': datetime.datetime(2022, 4, 6, 14, 9, 37, 658000), 'quicklook_url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('e1dae544-5710-44fd-a63b-137e7b11e2f1')/Products('Quicklook')/$value", 'path': '../36MYE/composite/L2A/S2A_MSIL2A_20220406T074611_N0400_R135_T36MYE_20220406T102936.zip', 'downloaded_bytes': 1205894894}
2023-03-21 21:25:45,7

Downloading products:   0%|          | 0/1 [00:00<?, ?product/s]

LTA retrieval:   0%|          | 0/1 [00:00<?, ?product/s]

Downloading S2B_MSIL2A_20220322T074609_N0400_R135_T36MYE_20220322T103858.zip:   0%|          | 0.00/1.19G [00:…

MD5 checksumming:   0%|          | 0.00/1.19G [00:00<?, ?B/s]

2023-03-21 22:46:08,695: INFO: Downloaded: {'id': '0ca89014-7725-4ff9-b982-644f4d36a3b5', 'title': 'S2B_MSIL2A_20220322T074609_N0400_R135_T36MYE_20220322T103858', 'size': 1190278805, 'md5': 'cbc55006b5b88e6ba4e7d720ccd2df0d', 'date': datetime.datetime(2022, 3, 22, 7, 46, 9, 24000), 'footprint': 'POLYGON((34.796693520402904 0,35.78263300407275 0,35.783047841608315 -0.992214719802887,34.79696161901289 -0.992902323890143,34.796693520402904 0))', 'url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('0ca89014-7725-4ff9-b982-644f4d36a3b5')/$value", 'Online': True, 'Creation Date': datetime.datetime(2022, 3, 22, 13, 29, 4, 598000), 'Ingestion Date': datetime.datetime(2022, 3, 22, 13, 28, 47, 785000), 'quicklook_url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('0ca89014-7725-4ff9-b982-644f4d36a3b5')/Products('Quicklook')/$value", 'path': '../36MYE/composite/L2A/S2B_MSIL2A_20220322T074609_N0400_R135_T36MYE_20220322T103858.zip', 'downloaded_bytes': 1190278805}
2023-03-21 22:46:0

Downloading products:   0%|          | 0/1 [00:00<?, ?product/s]

LTA retrieval:   0%|          | 0/1 [00:00<?, ?product/s]

Downloading S2B_MSIL2A_20220312T074719_N0400_R135_T36MYE_20220312T103427.zip:   0%|          | 0.00/1.18G [00:…

MD5 checksumming:   0%|          | 0.00/1.18G [00:00<?, ?B/s]

2023-03-21 23:36:30,890: INFO: Downloaded: {'id': 'e0848885-2145-4c6c-bca9-4616caf3faa5', 'title': 'S2B_MSIL2A_20220312T074719_N0400_R135_T36MYE_20220312T103427', 'size': 1179804668, 'md5': '08cb4b1df5a8dc73706591fae54f8938', 'date': datetime.datetime(2022, 3, 12, 7, 47, 19, 24000), 'footprint': 'POLYGON((34.796693520402904 0,35.78263300407275 0,35.783047841608315 -0.992214719802887,34.79696161901289 -0.992902323890143,34.796693520402904 0))', 'url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('e0848885-2145-4c6c-bca9-4616caf3faa5')/$value", 'Online': True, 'Creation Date': datetime.datetime(2022, 3, 12, 14, 41, 8, 735000), 'Ingestion Date': datetime.datetime(2022, 3, 12, 14, 40, 40, 178000), 'quicklook_url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('e0848885-2145-4c6c-bca9-4616caf3faa5')/Products('Quicklook')/$value", 'path': '../36MYE/composite/L2A/S2B_MSIL2A_20220312T074719_N0400_R135_T36MYE_20220312T103427.zip', 'downloaded_bytes': 1179804668}
2023-03-21 23:36:

Downloading products:   0%|          | 0/1 [00:00<?, ?product/s]

LTA retrieval:   0%|          | 0/1 [00:00<?, ?product/s]

Downloading S2A_MSIL2A_20220307T074801_N0400_R135_T36MYE_20220307T113320.zip:   0%|          | 0.00/1.21G [00:…

MD5 checksumming:   0%|          | 0.00/1.21G [00:00<?, ?B/s]

2023-03-22 00:06:54,382: INFO: Downloaded: {'id': 'c03afcb1-f64b-4a73-995e-104043223e26', 'title': 'S2A_MSIL2A_20220307T074801_N0400_R135_T36MYE_20220307T113320', 'size': 1209253889, 'md5': 'e01dda802ed43a8914ce103e03c056ce', 'date': datetime.datetime(2022, 3, 7, 7, 48, 1, 24000), 'footprint': 'POLYGON((34.796693520402904 0,35.78263300407275 0,35.783047841608315 -0.992214719802887,34.79696161901289 -0.992902323890143,34.796693520402904 0))', 'url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('c03afcb1-f64b-4a73-995e-104043223e26')/$value", 'Online': True, 'Creation Date': datetime.datetime(2022, 3, 7, 18, 39, 0, 325000), 'Ingestion Date': datetime.datetime(2022, 3, 7, 18, 38, 8, 231000), 'quicklook_url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('c03afcb1-f64b-4a73-995e-104043223e26')/Products('Quicklook')/$value", 'path': '../36MYE/composite/L2A/S2A_MSIL2A_20220307T074801_N0400_R135_T36MYE_20220307T113320.zip', 'downloaded_bytes': 1209253889}
2023-03-22 00:06:54,38

Downloading products:   0%|          | 0/1 [00:00<?, ?product/s]

LTA retrieval:   0%|          | 0/1 [00:00<?, ?product/s]

Downloading S2A_MSIL2A_20220126T075211_N0400_R135_T36MYE_20220126T111035.zip:   0%|          | 0.00/1.21G [00:…

MD5 checksumming:   0%|          | 0.00/1.21G [00:00<?, ?B/s]

2023-03-22 01:17:21,059: INFO: Downloaded: {'id': '6b9534df-22e3-46de-838d-35a57e10414b', 'title': 'S2A_MSIL2A_20220126T075211_N0400_R135_T36MYE_20220126T111035', 'size': 1205325483, 'md5': '5c35452279ff2d60b66a977d9560c59d', 'date': datetime.datetime(2022, 1, 26, 7, 52, 11, 25000), 'footprint': 'POLYGON((34.796693520402904 0,35.78263300407275 0,35.783047841608315 -0.992214719802887,34.79696161901289 -0.992902323890143,34.796693520402904 0))', 'url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('6b9534df-22e3-46de-838d-35a57e10414b')/$value", 'Online': True, 'Creation Date': datetime.datetime(2022, 1, 26, 15, 39, 0, 901000), 'Ingestion Date': datetime.datetime(2022, 1, 26, 15, 38, 30, 881000), 'quicklook_url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('6b9534df-22e3-46de-838d-35a57e10414b')/Products('Quicklook')/$value", 'path': '../36MYE/composite/L2A/S2A_MSIL2A_20220126T075211_N0400_R135_T36MYE_20220126T111035.zip', 'downloaded_bytes': 1205325483}
2023-03-22 01:17:

Downloading products:   0%|          | 0/1 [00:00<?, ?product/s]

LTA retrieval:   0%|          | 0/1 [00:00<?, ?product/s]

Downloading S2B_MSIL2A_20220111T075209_N0301_R135_T36MYE_20220111T102011.zip:   0%|          | 0.00/1.16G [00:…

MD5 checksumming:   0%|          | 0.00/1.16G [00:00<?, ?B/s]

2023-03-22 02:08:32,378: INFO: Downloaded: {'id': '0a6d5568-3a8c-4ad6-950a-37850bb90c67', 'title': 'S2B_MSIL2A_20220111T075209_N0301_R135_T36MYE_20220111T102011', 'size': 1162309499, 'md5': 'b68659f74ef500c3f85f658f5b54bf36', 'date': datetime.datetime(2022, 1, 11, 7, 52, 9, 24000), 'footprint': 'POLYGON((34.796693520402904 0,35.78263300407275 0,35.783047841608315 -0.992214719802887,34.79696161901289 -0.992902323890143,34.796693520402904 0))', 'url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('0a6d5568-3a8c-4ad6-950a-37850bb90c67')/$value", 'Online': True, 'Creation Date': datetime.datetime(2022, 1, 11, 13, 1, 16, 733000), 'Ingestion Date': datetime.datetime(2022, 1, 11, 13, 0, 57, 855000), 'quicklook_url': "https://scihub.copernicus.eu/dhus/odata/v1/Products('0a6d5568-3a8c-4ad6-950a-37850bb90c67')/Products('Quicklook')/$value", 'path': '../36MYE/composite/L2A/S2B_MSIL2A_20220111T075209_N0301_R135_T36MYE_20220111T102011.zip', 'downloaded_bytes': 1162309499}
2023-03-22 02:08:32

### <a id='toc3_3_3_'></a>[Housekeeping](#toc0_)

The cell below performs some housekeeping if we have told `pyeo` to delete or zip imagery. This functionality is useful for ensuring disk space is kept to a minimum.

In [None]:
if do_delete:
    log.info("---------------------------------------------------------------")
    log.info("Deleting downloaded L1C images for composite, keeping only derived L2A products")
    log.info("---------------------------------------------------------------")
    directory = composite_l1_image_dir
    log.info('Deleting {}'.format(directory))
    shutil.rmtree(directory)
    log.info("---------------------------------------------------------------")
    log.info("Deletion of L1C images complete. Keeping only L2A images.")
    log.info("---------------------------------------------------------------")
else:
    if do_zip:
        log.info("---------------------------------------------------------------")
        log.info("Zipping downloaded L1C images for composite after atmospheric correction")
        log.info("---------------------------------------------------------------")
        zip_contents(composite_l1_image_dir)
        log.info("---------------------------------------------------------------")
        log.info("Zipping complete")
        log.info("---------------------------------------------------------------")

## <a id='toc3_4_'></a>[Process the Downloaded Imagery](#toc0_)

Now that we have downloaded the L2A Imagery, we will process the imagery. Processing refers to:  

1. Applying the `SCL Cloud Mask` to remove cloud, haze or cloud shadow pixels from the imagery.
2. Applying a `Processing Baseline Correction Offset` to the imagery, if applicable.
3. Create `Quicklooks` (*.png*) of the processed imagery.

### <a id='toc3_4_1_'></a>[Timestamp stuff (decide better name)](#toc0_)

In [9]:
directory = composite_l2_masked_image_dir
masked_file_paths = [f for f in os.listdir(directory) if f.endswith(".tif") \
                        and os.path.isfile(os.path.join(directory, f))]

directory = composite_l2_image_dir
l2a_zip_file_paths = [f for f in os.listdir(directory) if f.endswith(".zip")]

if len(l2a_zip_file_paths) > 0:
    for f in l2a_zip_file_paths:
        # check whether the zipped file has already been cloud masked
        zip_timestamp  = pyeo.filesystem_utilities.get_image_acquisition_time(os.path.basename(f)).strftime("%Y%m%dT%H%M%S")
        if any(zip_timestamp in f for f in masked_file_paths):
            continue
        else:
            # extract it if not
            unzip_contents(os.path.join(composite_l2_image_dir, f),
                            ifstartswith="S2", ending=".SAFE")

directory = composite_l2_image_dir
l2a_safe_file_paths = [f for f in os.listdir(directory) if f.endswith(".SAFE") \
                        and os.path.isdir(os.path.join(directory, f))]

2023-03-24 11:30:17,702: INFO: ---------------------------------------------------------------
2023-03-24 11:30:17,704: INFO: Applying simple cloud, cloud shadow and haze mask based on SCL files and stacking the masked band raster files.
2023-03-24 11:30:17,704: INFO: ---------------------------------------------------------------


### <a id='toc3_4_2_'></a>[Apply SCL Cloud Mask](#toc0_)

Optical data is affected by the presence of clouds over the land cover of interest. So, we use `apply_scl_cloud_mask` to remove cloudy pixels from the imagery, as we are not interested in clouds.

The cell below peforms two things:

1. Checks whether any L2A SAFE files have been cloud masked from a previous run.

2. If any L2A SAFE files have not been cloud masked, then `apply_scl_cloud_mask` is applied.

In [10]:
log.info("---------------------------------------------------------------")
log.info("Applying simple cloud, cloud shadow and haze mask based on SCL files and stacking the masked band raster files.")
log.info("---------------------------------------------------------------")

files_for_cloud_masking = []
if len(l2a_safe_file_paths) > 0:
    for f in l2a_safe_file_paths:
        # check whether the L2A SAFE file has already been cloud masked
        safe_timestamp  = pyeo.filesystem_utilities.get_image_acquisition_time(os.path.basename(f)).strftime("%Y%m%dT%H%M%S")
        if any(safe_timestamp in f for f in masked_file_paths):
            continue
        else:
            # add it to the list of files to do if it has not been cloud masked yet
            files_for_cloud_masking = files_for_cloud_masking + [f]

if len(files_for_cloud_masking) == 0:
    log.info("No L2A images found for cloud masking. They may already have been done.")
else:
    pyeo.raster_manipulation.apply_scl_cloud_mask(composite_l2_image_dir,
                                                    composite_l2_masked_image_dir,
                                                    scl_classes=[0,1,2,3,8,9,10,11],
                                                    buffer_size=buffer_size_composite,
                                                    bands=bands,
                                                    out_resolution=out_resolution,
                                                    haze=None,
                                                    epsg=epsg,
                                                    skip_existing=skip_existing)

2023-03-24 11:30:47,977: INFO: ---------------------------------------------------------------
2023-03-24 11:30:47,978: INFO: Applying simple cloud, cloud shadow and haze mask based on SCL files and stacking the masked band raster files.
2023-03-24 11:30:47,979: INFO: ---------------------------------------------------------------
2023-03-24 11:30:47,981: INFO:   L2A raster file: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/L2A/S2A_MSIL2A_20230121T075231_N0509_R135_T36MYE_20230121T110651.SAFE
2023-03-24 11:30:47,982: INFO: File pattern S2A_MSIL2A_20230121T075231_N0509_R135_T36MYE and dir pattern  not found in /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked
2023-03-24 11:30:48,388: INFO: Merging band rasters into a single file:
2023-03-24 11:30:57,723: INFO:   /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/L2A/S2A_MSIL2A_20230121T075231_N0509_R135_T36MYE_20230121T110651.SAFE/GRANULE/L2A_T36MYE_A039600_20230121T080456/IMG_DATA/R10m/T36MYE_20230121T075231_B02_

### <a id='toc3_4_3_'></a>[Apply Processing Baseline Offset](#toc0_)

Before Sentinel-2 imagery is provided to the user as L1C or L2A formats, the raw imagery (L0) are processed by the ESA Copernicus Ground Segment ([see here](https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi/processing-levels)). The algorithms used in the processing baseline, are indicated by the field `N0XXX` in the product title and the changes introduced by each processing baseline iteration are listed [here](https://sentinel.esa.int/web/sentinel/technical-guides/sentinel-2-msi/processing-baseline).

The advent of processing baseline `N0400` introduced an offset of `-1000` in the spectral reflectance values, the reasoning and suggested reading can be viewed [here](https://forum.step.esa.int/t/info-introduction-of-additional-radiometric-offset-in-pb04-00-products/35431). Therefore, to ensure that the spectral reflectance of imagery before and after `N0400` can be compared, we apply the offset correction of `+1000`.

The cell below, applies such an offset correction.

In [11]:
# Apply offset to any images of processing baseline 0400 in the composite cloud_masked folder
log.info("---------------------------------------------------------------")
log.info("Offsetting cloud masked L2A images for composite.")
log.info("---------------------------------------------------------------")

pyeo.raster_manipulation.apply_processing_baseline_offset_correction_to_tiff_file_directory(composite_l2_masked_image_dir, composite_l2_masked_image_dir)

log.info("---------------------------------------------------------------")
log.info("Offsetting of cloud masked L2A images for composite complete.")
log.info("---------------------------------------------------------------")

2023-03-24 11:49:23,617: INFO: ---------------------------------------------------------------
2023-03-24 11:49:23,619: INFO: Offsetting cloud masked L2A images for composite.
2023-03-24 11:49:23,619: INFO: ---------------------------------------------------------------
2023-03-24 11:49:23,620: INFO: apply_processing_baseline_offset_correction_to_tiff_file_directory() running on: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked
2023-03-24 11:49:23,622: INFO: File: S2B_MSIL2A_20230106T075219_N0509_R135_T36MYE_20230106T101809.tif, Baseline: 0509
2023-03-24 11:49:23,622: INFO: Offsetting file: S2B_MSIL2A_20230106T075219_N0509_R135_T36MYE_20230106T101809.tif
2023-03-24 11:49:23,623: INFO: out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2B_MSIL2A_20230106T075219_N0509_R135_T36MYE_20230106T101809_offset_temp.tif
2023-03-24 11:49:23,672: INFO: in_raster_array dtype: float32
2023-03-24 11:49:23,673: INFO: in_raster_array dtype_max us

apply_processing_baseline_offset_correction_to_tiff_file_directory() running on: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked
File: S2B_MSIL2A_20230106T075219_N0509_R135_T36MYE_20230106T101809.tif, Baseline: 0509
Offsetting file: S2B_MSIL2A_20230106T075219_N0509_R135_T36MYE_20230106T101809.tif
in_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2B_MSIL2A_20230106T075219_N0509_R135_T36MYE_20230106T101809.tif
out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2B_MSIL2A_20230106T075219_N0509_R135_T36MYE_20230106T101809_offset_temp.tif
in_raster_array dtype: float32
in_raster_array dtype_max used: 10000


2023-03-24 11:49:40,274: INFO: File: S2A_MSIL2A_20230210T075051_N0509_R135_T36MYE_20230210T110753.tif, Baseline: 0509
2023-03-24 11:49:40,275: INFO: Offsetting file: S2A_MSIL2A_20230210T075051_N0509_R135_T36MYE_20230210T110753.tif
2023-03-24 11:49:40,275: INFO: out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2A_MSIL2A_20230210T075051_N0509_R135_T36MYE_20230210T110753_offset_temp.tif
2023-03-24 11:49:40,310: INFO: in_raster_array dtype: float32
2023-03-24 11:49:40,311: INFO: in_raster_array dtype_max used: 10000


File: S2A_MSIL2A_20230210T075051_N0509_R135_T36MYE_20230210T110753.tif, Baseline: 0509
Offsetting file: S2A_MSIL2A_20230210T075051_N0509_R135_T36MYE_20230210T110753.tif
in_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2A_MSIL2A_20230210T075051_N0509_R135_T36MYE_20230210T110753.tif
out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2A_MSIL2A_20230210T075051_N0509_R135_T36MYE_20230210T110753_offset_temp.tif
in_raster_array dtype: float32
in_raster_array dtype_max used: 10000


2023-03-24 11:49:58,250: INFO: File: S2A_MSIL2A_20230101T075331_N0509_R135_T36MYE_20230101T110554.tif, Baseline: 0509
2023-03-24 11:49:58,252: INFO: Offsetting file: S2A_MSIL2A_20230101T075331_N0509_R135_T36MYE_20230101T110554.tif
2023-03-24 11:49:58,253: INFO: out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2A_MSIL2A_20230101T075331_N0509_R135_T36MYE_20230101T110554_offset_temp.tif
2023-03-24 11:49:58,278: INFO: in_raster_array dtype: float32
2023-03-24 11:49:58,279: INFO: in_raster_array dtype_max used: 10000


File: S2A_MSIL2A_20230101T075331_N0509_R135_T36MYE_20230101T110554.tif, Baseline: 0509
Offsetting file: S2A_MSIL2A_20230101T075331_N0509_R135_T36MYE_20230101T110554.tif
in_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2A_MSIL2A_20230101T075331_N0509_R135_T36MYE_20230101T110554.tif
out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2A_MSIL2A_20230101T075331_N0509_R135_T36MYE_20230101T110554_offset_temp.tif
in_raster_array dtype: float32
in_raster_array dtype_max used: 10000


2023-03-24 11:50:12,240: INFO: File: S2A_MSIL2A_20230111T075301_N0509_R135_T36MYE_20230111T111505.tif, Baseline: 0509
2023-03-24 11:50:12,241: INFO: Offsetting file: S2A_MSIL2A_20230111T075301_N0509_R135_T36MYE_20230111T111505.tif
2023-03-24 11:50:12,242: INFO: out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2A_MSIL2A_20230111T075301_N0509_R135_T36MYE_20230111T111505_offset_temp.tif
2023-03-24 11:50:12,260: INFO: in_raster_array dtype: float32
2023-03-24 11:50:12,261: INFO: in_raster_array dtype_max used: 10000


File: S2A_MSIL2A_20230111T075301_N0509_R135_T36MYE_20230111T111505.tif, Baseline: 0509
Offsetting file: S2A_MSIL2A_20230111T075301_N0509_R135_T36MYE_20230111T111505.tif
in_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2A_MSIL2A_20230111T075301_N0509_R135_T36MYE_20230111T111505.tif
out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2A_MSIL2A_20230111T075301_N0509_R135_T36MYE_20230111T111505_offset_temp.tif
in_raster_array dtype: float32
in_raster_array dtype_max used: 10000


2023-03-24 11:50:33,314: INFO: File: S2A_MSIL2A_20230121T075231_N0509_R135_T36MYE_20230121T110651.tif, Baseline: 0509
2023-03-24 11:50:33,315: INFO: Offsetting file: S2A_MSIL2A_20230121T075231_N0509_R135_T36MYE_20230121T110651.tif
2023-03-24 11:50:33,315: INFO: out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2A_MSIL2A_20230121T075231_N0509_R135_T36MYE_20230121T110651_offset_temp.tif
2023-03-24 11:50:33,342: INFO: in_raster_array dtype: float32
2023-03-24 11:50:33,343: INFO: in_raster_array dtype_max used: 10000


File: S2A_MSIL2A_20230121T075231_N0509_R135_T36MYE_20230121T110651.tif, Baseline: 0509
Offsetting file: S2A_MSIL2A_20230121T075231_N0509_R135_T36MYE_20230121T110651.tif
in_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2A_MSIL2A_20230121T075231_N0509_R135_T36MYE_20230121T110651.tif
out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2A_MSIL2A_20230121T075231_N0509_R135_T36MYE_20230121T110651_offset_temp.tif
in_raster_array dtype: float32
in_raster_array dtype_max used: 10000


2023-03-24 11:50:55,110: INFO: File: S2A_MSIL2A_20230220T074941_N0509_R135_T36MYE_20230220T111156.tif, Baseline: 0509
2023-03-24 11:50:55,111: INFO: Offsetting file: S2A_MSIL2A_20230220T074941_N0509_R135_T36MYE_20230220T111156.tif
2023-03-24 11:50:55,112: INFO: out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2A_MSIL2A_20230220T074941_N0509_R135_T36MYE_20230220T111156_offset_temp.tif
2023-03-24 11:50:55,141: INFO: in_raster_array dtype: float32
2023-03-24 11:50:55,142: INFO: in_raster_array dtype_max used: 10000


File: S2A_MSIL2A_20230220T074941_N0509_R135_T36MYE_20230220T111156.tif, Baseline: 0509
Offsetting file: S2A_MSIL2A_20230220T074941_N0509_R135_T36MYE_20230220T111156.tif
in_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2A_MSIL2A_20230220T074941_N0509_R135_T36MYE_20230220T111156.tif
out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2A_MSIL2A_20230220T074941_N0509_R135_T36MYE_20230220T111156_offset_temp.tif
in_raster_array dtype: float32
in_raster_array dtype_max used: 10000


2023-03-24 11:51:12,133: INFO: File: S2B_MSIL2A_20230126T075109_N0509_R135_T36MYE_20230126T101405.tif, Baseline: 0509
2023-03-24 11:51:12,134: INFO: Offsetting file: S2B_MSIL2A_20230126T075109_N0509_R135_T36MYE_20230126T101405.tif
2023-03-24 11:51:12,135: INFO: out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2B_MSIL2A_20230126T075109_N0509_R135_T36MYE_20230126T101405_offset_temp.tif
2023-03-24 11:51:12,172: INFO: in_raster_array dtype: float32
2023-03-24 11:51:12,172: INFO: in_raster_array dtype_max used: 10000


File: S2B_MSIL2A_20230126T075109_N0509_R135_T36MYE_20230126T101405.tif, Baseline: 0509
Offsetting file: S2B_MSIL2A_20230126T075109_N0509_R135_T36MYE_20230126T101405.tif
in_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2B_MSIL2A_20230126T075109_N0509_R135_T36MYE_20230126T101405.tif
out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2B_MSIL2A_20230126T075109_N0509_R135_T36MYE_20230126T101405_offset_temp.tif
in_raster_array dtype: float32
in_raster_array dtype_max used: 10000


2023-03-24 11:51:26,290: INFO: File: S2B_MSIL2A_20230116T075149_N0509_R135_T36MYE_20230116T101517.tif, Baseline: 0509
2023-03-24 11:51:26,291: INFO: Offsetting file: S2B_MSIL2A_20230116T075149_N0509_R135_T36MYE_20230116T101517.tif
2023-03-24 11:51:26,292: INFO: out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2B_MSIL2A_20230116T075149_N0509_R135_T36MYE_20230116T101517_offset_temp.tif
2023-03-24 11:51:26,382: INFO: in_raster_array dtype: float32
2023-03-24 11:51:26,383: INFO: in_raster_array dtype_max used: 10000


File: S2B_MSIL2A_20230116T075149_N0509_R135_T36MYE_20230116T101517.tif, Baseline: 0509
Offsetting file: S2B_MSIL2A_20230116T075149_N0509_R135_T36MYE_20230116T101517.tif
in_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2B_MSIL2A_20230116T075149_N0509_R135_T36MYE_20230116T101517.tif
out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked/S2B_MSIL2A_20230116T075149_N0509_R135_T36MYE_20230116T101517_offset_temp.tif
in_raster_array dtype: float32
in_raster_array dtype_max used: 10000


2023-03-24 11:51:47,484: INFO: Offsetting Finished
2023-03-24 11:51:47,486: INFO: ---------------------------------------------------------------
2023-03-24 11:51:47,486: INFO: Offsetting of cloud masked L2A images for composite complete.
2023-03-24 11:51:47,487: INFO: ---------------------------------------------------------------


Offsetting Finished


### <a id='toc3_4_4_'></a>[Create Quicklooks of Cloud-Masked Images](#toc0_)

We can also create quicklooks of the Cloud-Masked images. These are especially useful for viewing the images quickly using a standard photo viewer, and for use in presentations.

In [12]:
if do_quicklooks or do_all:
    log.info("---------------------------------------------------------------")
    log.info("Producing quicklooks.")
    log.info("---------------------------------------------------------------")
    dirs_for_quicklooks = [composite_l2_masked_image_dir]
    for main_dir in dirs_for_quicklooks:
        files = [ f.path for f in os.scandir(main_dir) if f.is_file() and os.path.basename(f).endswith(".tif") ]
        #files = [ f.path for f in os.scandir(main_dir) if f.is_file() and os.path.basename(f).endswith(".tif") and "class" in os.path.basename(f) ] # do classification images only
        if len(files) == 0:
            log.warning("No images found in {}.".format(main_dir))
        else:
            for f in files:
                quicklook_path = os.path.join(quicklook_dir, os.path.basename(f).split(".")[0]+".png")
                log.info("Creating quicklook: {}".format(quicklook_path))
                pyeo.raster_manipulation.create_quicklook(f,
                                                            quicklook_path,
                                                            width=512,
                                                            height=512,
                                                            format="PNG",
                                                            bands=[3,2,1],
                                                            scale_factors=[[0,2000,0,255]]
                                                            )
log.info("Quicklooks complete.")

if do_zip:
    log.info("---------------------------------------------------------------")
    log.info("Zipping downloaded L2A images for composite after cloud masking and band stacking")
    log.info("---------------------------------------------------------------")
    zip_contents(composite_l2_image_dir)
    log.info("---------------------------------------------------------------")
    log.info("Zipping complete")
    log.info("---------------------------------------------------------------")


2023-03-24 11:51:47,552: INFO: ---------------------------------------------------------------
2023-03-24 11:51:47,553: INFO: Producing quicklooks.
2023-03-24 11:51:47,553: INFO: ---------------------------------------------------------------
2023-03-24 11:51:47,555: INFO: Creating quicklook: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/quicklooks/S2A_MSIL2A_20230210T075051_NA509_R135_T36MYE_20230210T110753.png
2023-03-24 11:51:54,952: INFO: Creating quicklook: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/quicklooks/S2B_MSIL2A_20230106T075219_NA509_R135_T36MYE_20230106T101809.png
2023-03-24 11:52:01,540: INFO: Creating quicklook: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/quicklooks/S2A_MSIL2A_20230111T075301_NA509_R135_T36MYE_20230111T111505.png
2023-03-24 11:52:07,259: INFO: Creating quicklook: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/quicklooks/S2A_MSIL2A_20230121T075231_NA509_R135_T36MYE_20230121T110651.png
2023-03-24 11:52:13,031: INFO: Creating quicklook: /d

## <a id='toc3_5_'></a>[Create Composite from our Imagery](#toc0_)

Now we come to the last section of the notebook. Previously, we have queried the Copernicus archive for Sentinel-2 images that matched our search criteria, we evaluated which L2A products were present in the archive to avoid unecessary processing from pyeo for conversion from L1C to L2A. We then downloaded the resulting imagery, applied a cloud mask and a baseline offset correction, if necessary. 

In [13]:
log.info("---------------------------------------------------------------")
log.info("Building initial cloud-free median composite from directory {}".format(composite_l2_masked_image_dir))
log.info("---------------------------------------------------------------")
directory = composite_l2_masked_image_dir
masked_file_paths = [f for f in os.listdir(directory) if f.endswith(".tif") \
                        and os.path.isfile(os.path.join(directory, f))]

if len(masked_file_paths) > 0:
    pyeo.raster_manipulation.clever_composite_directory(composite_l2_masked_image_dir,
                                                        composite_dir,
                                                        chunks=chunks,
                                                        generate_date_images=True,
                                                        missing_data_value=0)
    log.info("---------------------------------------------------------------")
    log.info("Baseline composite complete.")
    log.info("---------------------------------------------------------------")
else:
    log.error("No cloud-masked L2A image products found in {}.".format(composite_l2_image_dir))
    log.error("Cannot produce a median composite. Download and cloud-mask some images first.")

2023-03-24 11:52:37,952: INFO: ---------------------------------------------------------------
2023-03-24 11:52:37,953: INFO: Building initial cloud-free median composite from directory /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked
2023-03-24 11:52:37,954: INFO: ---------------------------------------------------------------
2023-03-24 11:52:37,956: INFO: Cleverly compositing all images in directory into a median composite: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/cloud_masked
2023-03-24 11:52:37,958: INFO: Image number 1 has time stamp 20230101T075331
2023-03-24 11:52:37,958: INFO: Image number 2 has time stamp 20230106T075219
2023-03-24 11:52:37,959: INFO: Image number 3 has time stamp 20230111T075301
2023-03-24 11:52:37,959: INFO: Image number 4 has time stamp 20230116T075149
2023-03-24 11:52:37,960: INFO: Image number 5 has time stamp 20230121T075231
2023-03-24 11:52:37,960: INFO: Image number 6 has time stamp 20230126T075109
2023-03-24 11:52:37,961: I

### <a id='toc3_5_1_'></a>[Create Quicklook of the Composite](#toc0_)

In [14]:
if do_quicklooks or do_all:
    log.info("---------------------------------------------------------------")
    log.info("Producing quicklooks.")
    log.info("---------------------------------------------------------------")
    dirs_for_quicklooks = [composite_dir]
    for main_dir in dirs_for_quicklooks:
        files = [ f.path for f in os.scandir(main_dir) if f.is_file() and os.path.basename(f).endswith(".tif") ]
        #files = [ f.path for f in os.scandir(main_dir) if f.is_file() and os.path.basename(f).endswith(".tif") and "class" in os.path.basename(f) ] # do classification images only
        if len(files) == 0:
            log.warning("No images found in {}.".format(main_dir))
        else:
            for f in files:
                quicklook_path = os.path.join(quicklook_dir, os.path.basename(f).split(".")[0]+".png")
                log.info("Creating quicklook: {}".format(quicklook_path))
                pyeo.raster_manipulation.create_quicklook(f,
                                                            quicklook_path,
                                                            width=512,
                                                            height=512,
                                                            format="PNG",
                                                            bands=[3,2,1],
                                                            scale_factors=[[0,2000,0,255]]
                                                            )
    log.info("Quicklooks complete.")

2023-03-24 12:23:06,634: INFO: ---------------------------------------------------------------
2023-03-24 12:23:06,636: INFO: Producing quicklooks.
2023-03-24 12:23:06,637: INFO: ---------------------------------------------------------------
2023-03-24 12:23:06,647: INFO: Creating quicklook: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/quicklooks/composite_T36MYE_20230220T074941.png
2023-03-24 12:23:15,476: INFO: Quicklooks complete.


### <a id='toc3_5_2_'></a>[Final Housekeeping](#toc0_)

Now that we have created our composite and produced any quicklooks, we tell `pyeo` to delete or compress the cloud-masked L2A images that the composite was derived from.

In [None]:
if do_delete:
    log.info("---------------------------------------------------------------")
    log.info("Deleting intermediate cloud-masked L2A images used for the baseline composite")
    log.info("---------------------------------------------------------------")
    f = composite_l2_masked_image_dir
    log.info('Deleting {}'.format(f))
    shutil.rmtree(f)
    log.info("---------------------------------------------------------------")
    log.info("Intermediate file products have been deleted.")
    log.info("They can be reprocessed from the downloaded L2A images.")
    log.info("---------------------------------------------------------------")
else:
    if do_zip:
        log.info("---------------------------------------------------------------")
        log.info("Zipping cloud-masked L2A images used for the baseline composite")
        log.info("---------------------------------------------------------------")
        zip_contents(composite_l2_masked_image_dir)
        log.info("---------------------------------------------------------------")
        log.info("Zipping complete")
        log.info("---------------------------------------------------------------")

log.info("---------------------------------------------------------------")
log.info("Compressing tiff files in directory {} and all subdirectories".format(composite_dir))
log.info("---------------------------------------------------------------")
for root, dirs, files in os.walk(composite_dir):
    all_tiffs = [image_name for image_name in files if image_name.endswith(".tif")]
    for this_tiff in all_tiffs:
        pyeo.raster_manipulation.compress_tiff(os.path.join(root, this_tiff), os.path.join(root, this_tiff))

log.info("---------------------------------------------------------------")
log.info("Baseline image composite, file compression, zipping and deletion of")
log.info("intermediate file products (if selected) are complete.")
log.info("---------------------------------------------------------------")

# <a id='toc4_'></a>[Session 3: Automatic Change Detection](#toc0_)

## <a id='toc4_1_'></a>[Import Libraries](#toc0_)

In [5]:
import shutil
import sys

import pyeo.classification
import pyeo.queries_and_downloads
import pyeo.raster_manipulation
import pyeo.filesystem_utilities

# check if pyeo_production is installed, as these two are production specific functions
from pyeo.filesystem_utilities import serial_date_to_string
from pyeo.raster_manipulation import apply_processing_baseline_offset_correction_to_tiff_file_directory

import configparser
import argparse
import json
import numpy as np
import os
from osgeo import gdal
import pandas as pd
import datetime as dt
import zipfile

gdal.UseExceptions()

## <a id='toc4_2_'></a>[Re-declarations](#toc0_)

### <a id='toc4_2_1_'></a>[Re-Declare Arguments for Change Detection](#toc0_)

In [2]:
# config_path needs to be an absolute path, i.e. not relative
config_path = "/data/clcr/shared/IMPRESS/matt/pyeo/kenya.ini"
chunks = 10
download_source = "scihub"
tile_id = "36MYE"
do_dev = True
do_quicklooks = True

# change these parameters to enable Automatic Change Detection
do_download = True
do_classify = True
do_change = True
arg_start_date = "20230301"
arg_end_date = "TODAY"


# these variables below, we do not need for this session (but still need to declare as False)
build_composite = False
build_prob_image = False
do_update = False
do_delete = False
do_all = False
do_zip = False
skip_existing = False

In [6]:
# the path root of where the user wants to store the imagery, does not need to exist
# root_dir needs to be an absolute path, i.e. not relative
root_dir = "/data/clcr/shared/IMPRESS/matt/pyeo"

# creates a path which is a combination of root_dir and the tile chosen in the cell above
tile_root_dir = os.path.join(root_dir, tile_id)

# start and end date for the change imagery, required despite not running change detection?
# both of these parameters overide arg_start_date and arg_end_date if supplied.
start_date = "20230301"
end_date = "20230331"

# start and end date for the composite
composite_start_date = "20230101"
composite_end_date = "20230228"

# maximum cloud cover (%)
cloud_cover = 25
cloud_certainty_threshold = 0

# the projection the user wants to work with
epsg = 21097

# the Sentinel-2 bands to use, currently restricted to B02, B03, B04 and B08
bands = ["B02", "B03", "B04", "B08"]

# file name pattern to search for when identifying band file locations in "" string notation
resolution = "10m"

# spatial resolution of the output raster files in metres. Can be any resolution, not just 10, 20 or 60 as in the default band resolutions of Sentinel-2
out_resolution = 10

# set buffer in number of pixels for dilating the SCL cloud mask (recommend 30 pixels of 10 m) for the change detection
buffer_size = 20

# set buffer in number of pixels for dilating the SCL cloud mask (recommend 10 pixels of 10 m) for the composite building
buffer_size_composite = 10

# maximum number of images to be downloaded for compositing, in order of least cloud cover
max_image_number = 12

# granules below this size in MB will not be downloaded, this prevents slivers of imagery being downloaded
faulty_granule_threshold = 200

# the path to a machine learning model
model_path = "../models/model_36MYE_37MER_37NCC_Unoptimised_20230313.pkl"

# path to sen2cor, for converting L1C to L2A
sen2cor_path = "../Sen2Cor-02.11.00-Linux64/bin/L2A_Process"

class_labels = ["primary forest", "plantation forest", "bare soil", "crops", "grassland", "open water", "burn scar", "cloud", "cloud shadow", "haze", "sparse woodland", "dense woodland", "artificial"]
from_classes = [1,11,12]
to_classes = [3,4,5,13]

# whether to filter out (sieve) pixel noise from the classification
sieve = 20

In [7]:
credentials_path = "../credentials/credentials.ini"
conf = configparser.ConfigParser(allow_no_value=True)
conf.read(credentials_path)

sen_user = conf['sent_2']['user']
sen_pass = conf['sent_2']['pass']

In [8]:
if arg_start_date == "LATEST":
    report_file_name = [f for f in os.listdir(probability_image_dir) if os.path.isfile(f) and f.startswith("report_") and f.endswith(".tif")][0]
    report_file_path = os.path.join(probability_image_dir, report_file_name)
    after_timestamp  = pyeo.filesystem_utilities.get_change_detection_dates(os.path.basename(report_file_path))[-1]
    after_timestamp.strftime("%Y%m%d") # Returns the yyyymmdd string of the acquisition date from which the latest classified image was derived
elif arg_start_date:
    start_date = arg_start_date

if arg_end_date == "TODAY":
    end_date = dt.date.today().strftime("%Y%m%d")
elif arg_end_date:
    end_date = arg_end_date

In [9]:
log = pyeo.filesystem_utilities.init_log(os.path.join(tile_root_dir, "log", tile_id+"_log.txt"))
log.info("---------------------------------------------------------------")
log.info("---   PROCESSING START: {}   ---".format(tile_root_dir))
log.info("---------------------------------------------------------------")

2023-03-24 13:30:51,772: INFO: ****PROCESSING START****
2023-03-24 13:30:51,775: INFO: ---------------------------------------------------------------
2023-03-24 13:30:51,775: INFO: ---   PROCESSING START: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE   ---
2023-03-24 13:30:51,776: INFO: ---------------------------------------------------------------


Re-declare the directory variables. If you are running the jupyter notebook on a new kernel (i.e. you started the notebook and have since shut down or put to sleep your computer), we need to re-declare these variables.

In [10]:
log.info("\nCreating the directory paths variables")

change_image_dir = os.path.join(tile_root_dir, r"images")
l1_image_dir = os.path.join(tile_root_dir, r"images/L1C")
l2_image_dir = os.path.join(tile_root_dir, r"images/L2A")
l2_masked_image_dir = os.path.join(tile_root_dir, r"images/cloud_masked")
categorised_image_dir = os.path.join(tile_root_dir, r"output/classified")
probability_image_dir = os.path.join(tile_root_dir, r"output/probabilities")
sieved_image_dir = os.path.join(tile_root_dir, r"output/sieved")
composite_dir = os.path.join(tile_root_dir, r"composite")
composite_l1_image_dir = os.path.join(tile_root_dir, r"composite/L1C")
composite_l2_image_dir = os.path.join(tile_root_dir, r"composite/L2A")
composite_l2_masked_image_dir = os.path.join(tile_root_dir, r"composite/cloud_masked")
quicklook_dir = os.path.join(tile_root_dir, r"output/quicklooks")


2023-03-24 13:30:53,285: INFO: 
Creating the directory paths variables


In [11]:
pyeo.filesystem_utilities.create_folder_structure_for_tiles(tile_root_dir)

### <a id='toc4_2_2_'></a>[Compression Functions](#toc0_)

In [22]:
def zip_contents(directory, notstartswith=None):
        paths = [f for f in os.listdir(directory) if not f.endswith(".zip")]
        for f in paths:
            do_it = True
            if notstartswith is not None:
                for i in notstartswith:
                    if f.startswith(i):
                        do_it = False
                        log.info('Skipping file that starts with \'{}\':   {}'.format(i,f))
            if do_it:
                file_to_zip = os.path.join(directory, f)
                zipped_file = file_to_zip.split(".")[0]
                log.info('Zipping   {}'.format(file_to_zip))
                if os.path.isdir(file_to_zip):
                    shutil.make_archive(zipped_file, 'zip', file_to_zip)
                else:
                    with zipfile.ZipFile(zipped_file+".zip", "w", compression=zipfile.ZIP_DEFLATED) as zf:
                        zf.write(file_to_zip, os.path.basename(file_to_zip))
                if (os.path.exists(zipped_file+".zip")):
                    if os.path.isdir(file_to_zip):
                        shutil.rmtree(file_to_zip)
                    else:
                        os.remove(file_to_zip)
                else:
                    log.error("Zipping failed: {}".format(zipped_file+".zip"))
        return

def unzip_contents(zippath, ifstartswith=None, ending=None):
    dirpath = zippath[:-4] # cut away the  .zip ending
    if ifstartswith is not None and ending is not None:
        if dirpath.startswith(ifstartswith):
            dirpath = dirpath + ending
    log.info("Unzipping {}".format(zippath))
    if not os.path.exists(dirpath):
        os.makedirs(dirpath)
    if os.path.exists(dirpath):
        if os.path.exists(zippath):
            shutil.unpack_archive(
                filename=zippath,
                extract_dir=dirpath,
                format='zip'
                )
            os.remove(zippath)
    else:
        log.error("Unzipping failed")
    return

## <a id='toc4_3_'></a>[Query Sentinel-2 Change Imagery](#toc0_)

In [13]:
# ------------------------------------------------------------------------
# Step 2: Download change detection images for the specific time window (L2A where available plus additional L1C)
# ------------------------------------------------------------------------
if do_all or do_download:
    log.info("---------------------------------------------------------------")
    log.info("Downloading change detection images between {} and {} with cloud cover <= {}".format(start_date, end_date, cloud_cover))
    log.info("---------------------------------------------------------------")

    products_all = pyeo.queries_and_downloads.check_for_s2_data_by_date(root_dir,
                                                                        start_date,
                                                                        end_date,
                                                                        conf,
                                                                        cloud_cover=cloud_cover,
                                                                        tile_id=tile_id,
                                                                        producttype=None #"S2MSI2A" or "S2MSI1C"
                                                                        )
    log.info("--> Found {} L1C and L2A products for change detection:".format(len(products_all)))
    df_all = pd.DataFrame.from_dict(products_all, orient='index')

    # check granule sizes on the server
    df_all['size'] = df_all['size'].str.split(' ').apply(lambda x: float(x[0]) * {'GB': 1e3, 'MB': 1, 'KB': 1e-3}[x[1]])
    df = df_all.query('size >= '+str(faulty_granule_threshold))
    log.info("Removed {} faulty scenes <{}MB in size from the list:".format(len(df_all)-len(df), faulty_granule_threshold))
    df_faulty = df_all.query('size < '+str(faulty_granule_threshold))
    for r in range(len(df_faulty)):
        log.info("   {} MB: {}".format(df_faulty.iloc[r,:]['size'], df_faulty.iloc[r,:]['title']))

    l1c_products = df[df.processinglevel == 'Level-1C']
    l2a_products = df[df.processinglevel == 'Level-2A']
    log.info("    {} L1C products".format(l1c_products.shape[0]))
    log.info("    {} L2A products".format(l2a_products.shape[0]))

    if l1c_products.shape[0]>0 and l2a_products.shape[0]>0:
        log.info("Filtering out L1C products that have the same 'beginposition' time stamp as an existing L2A product.")
        l1c_products, l2a_products = pyeo.queries_and_downloads.filter_unique_l1c_and_l2a_data(df)
        log.info("--> {} L1C and L2A products with unique 'beginposition' time stamp for the composite:".format(l1c_products.shape[0]+l2a_products.shape[0]))
        log.info("    {} L1C products".format(l1c_products.shape[0]))
        log.info("    {} L2A products".format(l2a_products.shape[0]))
    df = None

    #TODO: Before the next step, search the composite/L2A and L1C directories whether the scenes have already been downloaded and/or processed and check their dir sizes
    # Remove those already obtained from the list

    if l1c_products.shape[0] > 0:
        log.info("Checking for availability of L2A products to minimise download and atmospheric correction of L1C products.")
        n = len(l1c_products)
        drop=[]
        add=[]
        for r in range(n):
            id = l1c_products.iloc[r,:]['title']
            search_term = "*"+id.split("_")[2]+"_"+id.split("_")[3]+"_"+id.split("_")[4]+"_"+id.split("_")[5]+"*"
            log.info("Search term: {}.".format(search_term))
            matching_l2a_products = pyeo.queries_and_downloads._file_api_query(user=sen_user,
                                                                                passwd=sen_pass,
                                                                                start_date=start_date,
                                                                                end_date=end_date,
                                                                                filename=search_term,
                                                                                cloud=cloud_cover,
                                                                                producttype="S2MSI2A"
                                                                                )

            matching_l2a_products_df = pd.DataFrame.from_dict(matching_l2a_products, orient='index')
            if len(matching_l2a_products_df) == 1:
                log.info(matching_l2a_products_df.iloc[0,:]['size'])
                matching_l2a_products_df['size'] = matching_l2a_products_df['size'].str.split(' ').apply(lambda x: float(x[0]) * {'GB': 1e3, 'MB': 1, 'KB': 1e-3}[x[1]])
                if matching_l2a_products_df.iloc[0,:]['size'] > faulty_granule_threshold:
                    log.info("Replacing L1C {} with L2A product:".format(id))
                    log.info("              {}".format(matching_l2a_products_df.iloc[0,:]['title']))
                    drop.append(l1c_products.index[r])
                    add.append(matching_l2a_products_df.iloc[0,:])
            if len(matching_l2a_products_df) == 0:
                log.info("Found no match for L1C: {}.".format(id))
            if len(matching_l2a_products_df) > 1:
                # check granule sizes on the server
                matching_l2a_products_df['size'] = matching_l2a_products_df['size'].str.split(' ').apply(lambda x: float(x[0]) * {'GB': 1e3, 'MB': 1, 'KB': 1e-3}[x[1]])
                if matching_l2a_products_df.iloc[0,:]['size'] > faulty_granule_threshold:
                    log.info("Replacing L1C {} with L2A product:".format(id))
                    log.info("              {}".format(matching_l2a_products_df.iloc[0,:]['title']))
                    drop.append(l1c_products.index[r])
                    add.append(matching_l2a_products_df.iloc[0,:])

        if len(drop) > 0:
            l1c_products = l1c_products.drop(index=drop)
        if len(add) > 0:
            if do_dev:
                add = pd.DataFrame(add)
                log.info("Types for concatenation: {}, {}".format(type(l2a_products), type(add)))
                l2a_products = pd.concat([l2a_products, add])
                #TODO: test the above fix for:
                # pyeo/pyeo/apps/change_detection/tile_based_change_detection_from_cover_maps.py:456: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
            else:
                l2a_products = l2a_products.append(add)

        log.info("    {} L1C products remaining for download".format(l1c_products.shape[0]))
        l2a_products = l2a_products.drop_duplicates(subset='title')
        #I.R.
        log.info("    {} L2A products remaining for download".format(l2a_products.shape[0]))

2023-03-24 11:02:54,398: INFO: ---------------------------------------------------------------
2023-03-24 11:02:54,399: INFO: Downloading change detection images between 20230301 and 20230324 with cloud cover <= 25
2023-03-24 11:02:54,400: INFO: ---------------------------------------------------------------
2023-03-24 11:02:54,402: INFO: Sending Sentinel-2 query for Tile ID:
2023-03-24 11:02:54,402: INFO:    tile_id: 36MYE
2023-03-24 11:02:54,403: INFO:    start_date: 20230301
2023-03-24 11:02:54,403: INFO:    end_date: 20230324
2023-03-24 11:02:54,404: INFO:    cloud_cover: 25
2023-03-24 11:02:54,405: INFO:    product_type: None
2023-03-24 11:02:54,405: INFO:    file_name: None
2023-03-24 11:02:54,909: INFO: --> Found 3 L1C and L2A products for change detection:
2023-03-24 11:02:54,924: INFO: Removed 0 faulty scenes <200MB in size from the list:
2023-03-24 11:02:54,929: INFO:     3 L1C products
2023-03-24 11:02:54,930: INFO:     0 L2A products
2023-03-24 11:02:54,930: INFO: Checking 

## <a id='toc4_4_'></a>[Download and Pre-Process L1C Change Imagery](#toc0_)

If there any L1C products in the change images search query that are not matched with L2A products, then download these L1Cs and apply `atmospheric_correction`.

In [14]:
if (l1c_products.shape[0] >0):
    log.info("Downloading Sentinel-2 L1C products.")
    pyeo.queries_and_downloads.download_s2_data_from_df(l1c_products,
                                                l1_image_dir,
                                                l2_image_dir,
                                                download_source,
                                                user=sen_user,
                                                passwd=sen_pass,
                                                try_scihub_on_fail=True)
    log.info("Atmospheric correction with sen2cor.")
    pyeo.raster_manipulation.atmospheric_correction(l1_image_dir,
                                                    l2_image_dir,
                                                    sen2cor_path,
                                                    delete_unprocessed_image=False)

2023-03-24 11:02:59,122: INFO: Downloading Sentinel-2 L1C products.
2023-03-24 11:02:59,124: INFO:   af89bf68-1c5c-4a6b-b57c-2564372ea673   S2A_MSIL1C_20230302T074831_N0509_R135_T36MYE_20230302T093441
2023-03-24 11:02:59,126: INFO: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/L1C/S2A_MSIL1C_20230302T074831_N0509_R135_T36MYE_20230302T093441.SAFE does not exist.
2023-03-24 11:02:59,126: INFO: Downloading S2A_MSIL1C_20230302T074831_N0509_R135_T36MYE_20230302T093441 from scihub to /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/L1C
2023-03-24 11:02:59,297: INFO: Product af89bf68-1c5c-4a6b-b57c-2564372ea673 is online. Starting download.
2023-03-24 11:02:59,298: INFO: I.R. Tenacity will retry download up to 5 times.
2023-03-24 11:02:59,988: INFO: Unzipping /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/L1C/S2A_MSIL1C_20230302T074831_N0509_R135_T36MYE_20230302T093441.zip to /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/L1C
2023-03-24 11:03:14,938: INFO: Removing /data/clcr/shared/IM

## <a id='toc4_5_'></a>[Download L2A Change Imagery](#toc0_)

In [15]:
if l2a_products.shape[0] > 0:
    log.info("Downloading Sentinel-2 L2A products.")
    pyeo.queries_and_downloads.download_s2_data(l2a_products.to_dict('index'),
                                                l1_image_dir,
                                                l2_image_dir,
                                                download_source,
                                                user=sen_user,
                                                passwd=sen_pass,
                                                try_scihub_on_fail=True)

# check for incomplete L2A downloads and remove them
incomplete_downloads, sizes = pyeo.raster_manipulation.find_small_safe_dirs(l2_image_dir, threshold=faulty_granule_threshold*1024*1024)
if len(incomplete_downloads) > 0:
    for index, safe_dir in enumerate(incomplete_downloads):
        if sizes[index]/1024/1024 < faulty_granule_threshold and os.path.exists(safe_dir):
            log.warning("Found likely incomplete download of size {} MB: {}".format(str(round(sizes[index]/1024/1024)), safe_dir))
            #shutil.rmtree(safe_dir)

    log.info("---------------------------------------------------------------")
    log.info("Image download and atmospheric correction for change detection images is complete.")
    log.info("---------------------------------------------------------------")

2023-03-24 11:05:33,930: INFO: Downloading Sentinel-2 L2A products.
2023-03-24 11:05:33,939: INFO: Checking /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/L2A/S2B_MSIL2A_20230317T074649_N0509_R135_T36MYE_20230317T101339.SAFE for incomplete 10m imagery
2023-03-24 11:05:33,949: INFO:    /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/L2A/S2B_MSIL2A_20230317T074649_N0509_R135_T36MYE_20230317T101339.SAFE/GRANULE/L2A_T36MYE_A031478_20230317T080456/IMG_DATA/R10m/T36MYE_20230317T074649_B08_10m.jp2
2023-03-24 11:05:33,953: INFO:    /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/L2A/S2B_MSIL2A_20230317T074649_N0509_R135_T36MYE_20230317T101339.SAFE/GRANULE/L2A_T36MYE_A031478_20230317T080456/IMG_DATA/R10m/T36MYE_20230317T074649_B04_10m.jp2
2023-03-24 11:05:33,958: INFO:    /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/L2A/S2B_MSIL2A_20230317T074649_N0509_R135_T36MYE_20230317T101339.SAFE/GRANULE/L2A_T36MYE_A031478_20230317T080456/IMG_DATA/R10m/T36MYE_20230317T074649_B03_10m.jp2
2023-03-24 

### <a id='toc4_5_1_'></a>[Housekeeping - Compress L1Cs](#toc0_)

If you have set your `do_zip` argument to `True`, then this cell will compress the L1Cs now that they have been atmospherically corrected and relabelled as L2As.

In [15]:

#TODO: delete L1C images if do_delete is True
if do_delete:
    log.info("---------------------------------------------------------------")
    log.info("Deleting L1C images downloaded for change detection.")
    log.info("Keeping only the derived L2A images after atmospheric correction.")
    log.info("---------------------------------------------------------------")
    directory = l1_image_dir
    log.info('Deleting {}'.format(directory))
    shutil.rmtree(directory)
    log.info("---------------------------------------------------------------")
    log.info("Deletion complete")
    log.info("---------------------------------------------------------------")
else:
    if do_zip:
        log.info("---------------------------------------------------------------")
        log.info("Zipping L1C images downloaded for change detection")
        log.info("---------------------------------------------------------------")
        zip_contents(l1_image_dir)
        log.info("---------------------------------------------------------------")
        log.info("Zipping complete")
        log.info("---------------------------------------------------------------")

2023-03-22 17:44:47,141: INFO: ---------------------------------------------------------------
2023-03-22 17:44:47,143: INFO: Zipping L1C images downloaded for change detection
2023-03-22 17:44:47,143: INFO: ---------------------------------------------------------------
2023-03-22 17:44:47,146: INFO: Zipping   /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/L1C/S2A_MSIL1C_20230302T074831_N0509_R135_T36MYE_20230302T093441.SAFE
2023-03-22 17:45:23,075: INFO: ---------------------------------------------------------------
2023-03-22 17:45:23,077: INFO: Zipping complete
2023-03-22 17:45:23,078: INFO: ---------------------------------------------------------------


## <a id='toc4_6_'></a>[Cloud Masking, Offsetting and Quicklooks](#toc0_)

Here, like before in the previous session, we cloud mask, apply the baseline offset correction and produce quicklooks (if selected).  

Additionally, if you have set the `do_zip` flag to True, then `pyeo` will compress the cloud masked L2A images, as we no longer need these once classified.

In [16]:
log.info("---------------------------------------------------------------")
log.info("Applying simple cloud, cloud shadow and haze mask based on SCL files and stacking the masked band raster files.")
log.info("---------------------------------------------------------------")

pyeo.raster_manipulation.apply_scl_cloud_mask(l2_image_dir,
                                                l2_masked_image_dir,
                                                scl_classes=[0,1,2,3,8,9,10,11],
                                                buffer_size=buffer_size,
                                                bands=bands,
                                                out_resolution=out_resolution,
                                                haze=None,
                                                epsg=epsg,
                                                skip_existing=skip_existing)

log.info("---------------------------------------------------------------")
log.info("Cloud masking and band stacking of new L2A images are complete.")
log.info("---------------------------------------------------------------")


log.info("---------------------------------------------------------------")
log.info("Offsetting cloud masked L2A images.")
log.info("---------------------------------------------------------------")

pyeo.raster_manipulation.apply_processing_baseline_offset_correction_to_tiff_file_directory(l2_masked_image_dir, l2_masked_image_dir)

log.info("---------------------------------------------------------------")
log.info("Offsetting of cloud masked L2A images complete.")
log.info("---------------------------------------------------------------")

if do_quicklooks or do_all:
    log.info("---------------------------------------------------------------")
    log.info("Producing quicklooks.")
    log.info("---------------------------------------------------------------")
    dirs_for_quicklooks = [l2_masked_image_dir]
    for main_dir in dirs_for_quicklooks:
        files = [ f.path for f in os.scandir(main_dir) if f.is_file() and os.path.basename(f).endswith(".tif") ]
        if len(files) == 0:
            log.warning("No images found in {}.".format(main_dir))
        else:
            for f in files:
                quicklook_path = os.path.join(quicklook_dir, os.path.basename(f).split(".")[0]+".png")
                log.info("Creating quicklook: {}".format(quicklook_path))
                pyeo.raster_manipulation.create_quicklook(f,
                                                            quicklook_path,
                                                            width=512,
                                                            height=512,
                                                            format="PNG",
                                                            bands=[3,2,1],
                                                            scale_factors=[[0,2000,0,255]]
                                                            )
log.info("Quicklooks complete.")

if do_zip:
    log.info("---------------------------------------------------------------")
    log.info("Zipping L2A images downloaded for change detection")
    log.info("---------------------------------------------------------------")
    zip_contents(l2_image_dir)
    log.info("---------------------------------------------------------------")
    log.info("Zipping complete")
    log.info("---------------------------------------------------------------")

log.info("---------------------------------------------------------------")
log.info("Compressing tiff files in directory {} and all subdirectories".format(l2_masked_image_dir))
log.info("---------------------------------------------------------------")
for root, dirs, files in os.walk(l2_masked_image_dir):
    all_tiffs = [image_name for image_name in files if image_name.endswith(".tif")]
    for this_tiff in all_tiffs:
        pyeo.raster_manipulation.compress_tiff(os.path.join(root, this_tiff), os.path.join(root, this_tiff))

log.info("---------------------------------------------------------------")
log.info("Pre-processing of change detection images, file compression, zipping")
log.info("and deletion of intermediate file products (if selected) are complete.")
log.info("---------------------------------------------------------------")

2023-03-24 11:06:20,375: INFO: ---------------------------------------------------------------
2023-03-24 11:06:20,376: INFO: Applying simple cloud, cloud shadow and haze mask based on SCL files and stacking the masked band raster files.
2023-03-24 11:06:20,377: INFO: ---------------------------------------------------------------
2023-03-24 11:06:20,378: INFO:   L2A raster file: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/L2A/S2B_MSIL2A_20230317T074649_N0509_R135_T36MYE_20230317T101339.SAFE
2023-03-24 11:06:20,697: INFO: Merging band rasters into a single file:
2023-03-24 11:06:29,657: INFO:   /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/L2A/S2B_MSIL2A_20230317T074649_N0509_R135_T36MYE_20230317T101339.SAFE/GRANULE/L2A_T36MYE_A031478_20230317T080456/IMG_DATA/R10m/T36MYE_20230317T074649_B02_10m.jp2
2023-03-24 11:06:47,693: INFO:   /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/L2A/S2B_MSIL2A_20230317T074649_N0509_R135_T36MYE_20230317T101339.SAFE/GRANULE/L2A_T36MYE_A031478_202

apply_processing_baseline_offset_correction_to_tiff_file_directory() running on: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/cloud_masked
File: S2B_MSIL2A_20230317T074649_N0509_R135_T36MYE_20230317T101339.tif, Baseline: 0509
Offsetting file: S2B_MSIL2A_20230317T074649_N0509_R135_T36MYE_20230317T101339.tif
in_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/cloud_masked/S2B_MSIL2A_20230317T074649_N0509_R135_T36MYE_20230317T101339.tif
out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/cloud_masked/S2B_MSIL2A_20230317T074649_N0509_R135_T36MYE_20230317T101339_offset_temp.tif
in_raster_array dtype: float32
in_raster_array dtype_max used: 10000


2023-03-24 11:12:23,838: INFO: File: S2B_MSIL2A_20230307T074759_N0509_R135_T36MYE_20230307T102115.tif, Baseline: 0509
2023-03-24 11:12:23,839: INFO: Offsetting file: S2B_MSIL2A_20230307T074759_N0509_R135_T36MYE_20230307T102115.tif
2023-03-24 11:12:23,839: INFO: out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/cloud_masked/S2B_MSIL2A_20230307T074759_N0509_R135_T36MYE_20230307T102115_offset_temp.tif
2023-03-24 11:12:23,856: INFO: in_raster_array dtype: float32
2023-03-24 11:12:23,856: INFO: in_raster_array dtype_max used: 10000


File: S2B_MSIL2A_20230307T074759_N0509_R135_T36MYE_20230307T102115.tif, Baseline: 0509
Offsetting file: S2B_MSIL2A_20230307T074759_N0509_R135_T36MYE_20230307T102115.tif
in_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/cloud_masked/S2B_MSIL2A_20230307T074759_N0509_R135_T36MYE_20230307T102115.tif
out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/cloud_masked/S2B_MSIL2A_20230307T074759_N0509_R135_T36MYE_20230307T102115_offset_temp.tif
in_raster_array dtype: float32
in_raster_array dtype_max used: 10000


2023-03-24 11:12:39,871: INFO: File: S2A_MSIL2A_20230302T074831_N0509_R135_T36MYE_20230302T093441.tif, Baseline: 0509
2023-03-24 11:12:39,872: INFO: Offsetting file: S2A_MSIL2A_20230302T074831_N0509_R135_T36MYE_20230302T093441.tif
2023-03-24 11:12:39,872: INFO: out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/cloud_masked/S2A_MSIL2A_20230302T074831_N0509_R135_T36MYE_20230302T093441_offset_temp.tif
2023-03-24 11:12:39,889: INFO: in_raster_array dtype: float32
2023-03-24 11:12:39,890: INFO: in_raster_array dtype_max used: 10000


File: S2A_MSIL2A_20230302T074831_N0509_R135_T36MYE_20230302T093441.tif, Baseline: 0509
Offsetting file: S2A_MSIL2A_20230302T074831_N0509_R135_T36MYE_20230302T093441.tif
in_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/cloud_masked/S2A_MSIL2A_20230302T074831_N0509_R135_T36MYE_20230302T093441.tif
out_temporary_raster_path: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/cloud_masked/S2A_MSIL2A_20230302T074831_N0509_R135_T36MYE_20230302T093441_offset_temp.tif
in_raster_array dtype: float32
in_raster_array dtype_max used: 10000


2023-03-24 11:12:55,409: INFO: Offsetting Finished
2023-03-24 11:12:55,410: INFO: ---------------------------------------------------------------
2023-03-24 11:12:55,411: INFO: Offsetting of cloud masked L2A images complete.
2023-03-24 11:12:55,411: INFO: ---------------------------------------------------------------
2023-03-24 11:12:55,412: INFO: ---------------------------------------------------------------
2023-03-24 11:12:55,412: INFO: Producing quicklooks.
2023-03-24 11:12:55,413: INFO: ---------------------------------------------------------------
2023-03-24 11:12:55,415: INFO: Creating quicklook: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/quicklooks/S2B_MSIL2A_20230317T074649_NA509_R135_T36MYE_20230317T101339.png


Offsetting Finished


2023-03-24 11:13:02,373: INFO: Creating quicklook: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/quicklooks/S2A_MSIL2A_20230302T074831_NA509_R135_T36MYE_20230302T093441.png
2023-03-24 11:13:09,029: INFO: Creating quicklook: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/quicklooks/S2B_MSIL2A_20230307T074759_NA509_R135_T36MYE_20230307T102115.png
2023-03-24 11:13:16,134: INFO: Quicklooks complete.
2023-03-24 11:13:16,135: INFO: ---------------------------------------------------------------
2023-03-24 11:13:16,135: INFO: Compressing tiff files in directory /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/images/cloud_masked and all subdirectories
2023-03-24 11:13:16,136: INFO: ---------------------------------------------------------------
2023-03-24 11:14:34,728: INFO: ---------------------------------------------------------------
2023-03-24 11:14:34,729: INFO: Pre-processing of change detection images, file compression, zipping
2023-03-24 11:14:34,730: INFO: and deletion of intermediate

## <a id='toc4_7_'></a>[Classification of the Baseline Composite and Change Images](#toc0_)

Here, we classify the Baseline Composite and the Change images using the model we created in the model training session.

In [13]:
skip_existing = True
# ------------------------------------------------------------------------
# Step 3: Classify each L2A image and the baseline composite
# ------------------------------------------------------------------------
if do_all or do_classify:
    log.info("---------------------------------------------------------------")
    log.info("Classify a land cover map for each L2A image and composite image using a saved model")
    log.info("---------------------------------------------------------------")
    log.info("Model used: {}".format(model_path))
    if skip_existing:
        log.info("Skipping existing classification images if found.")
    pyeo.classification.classify_directory(composite_dir,
                                            model_path,
                                            categorised_image_dir,
                                            prob_out_dir = None,
                                            apply_mask=False,
                                            out_type="GTiff",
                                            chunks=chunks,
                                            skip_existing=skip_existing)
    pyeo.classification.classify_directory(l2_masked_image_dir,
                                            model_path,
                                            categorised_image_dir,
                                            prob_out_dir = None,
                                            apply_mask=False,
                                            out_type="GTiff",
                                            chunks=chunks,
                                            skip_existing=skip_existing)

    log.info("---------------------------------------------------------------")
    log.info("Compressing tiff files in directory {} and all subdirectories".format(categorised_image_dir))
    log.info("---------------------------------------------------------------")
    for root, dirs, files in os.walk(categorised_image_dir):
        all_tiffs = [image_name for image_name in files if image_name.endswith(".tif")]
        for this_tiff in all_tiffs:
            pyeo.raster_manipulation.compress_tiff(os.path.join(root, this_tiff), os.path.join(root, this_tiff))

    log.info("---------------------------------------------------------------")
    log.info("Classification of all images is complete.")
    log.info("---------------------------------------------------------------")

    if do_quicklooks or do_all:
        log.info("---------------------------------------------------------------")
        log.info("Producing quicklooks.")
        log.info("---------------------------------------------------------------")
        dirs_for_quicklooks = [categorised_image_dir]
        for main_dir in dirs_for_quicklooks:
            files = [ f.path for f in os.scandir(main_dir) if f.is_file() and os.path.basename(f).endswith(".tif") and "class" in os.path.basename(f) ] # do classification images only
            if len(files) == 0:
                log.warning("No images found in {}.".format(main_dir))
            else:
                for f in files:
                    quicklook_path = os.path.join(quicklook_dir, os.path.basename(f).split(".")[0]+".png")
                    log.info("Creating quicklook: {}".format(quicklook_path))
                    pyeo.raster_manipulation.create_quicklook(f,
                                                                quicklook_path,
                                                                width=512,
                                                                height=512,
                                                                format="PNG"
                                                                )
    log.info("Quicklooks complete.")

2023-03-24 13:32:28,656: INFO: ---------------------------------------------------------------
2023-03-24 13:32:28,658: INFO: Classify a land cover map for each L2A image and composite image using a saved model
2023-03-24 13:32:28,659: INFO: ---------------------------------------------------------------
2023-03-24 13:32:28,659: INFO: Model used: ../models/model_36MYE_37MER_37NCC_Unoptimised_20230313.pkl
2023-03-24 13:32:28,660: INFO: Skipping existing classification images if found.
2023-03-24 13:32:28,660: INFO: Classifying files in /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite
2023-03-24 13:32:28,661: INFO: Class files saved in /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/classified
2023-03-24 13:32:28,662: INFO: Skipping existing files.
2023-03-24 13:32:28,662: INFO: Checking for existing classification /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/classified/composite_T36MYE_20230220T074941_class.tif
2023-03-24 13:32:28,677: INFO: Class image exists and is readable - 

## <a id='toc4_8_'></a>[Change Detection](#toc0_)

To perform Change Detection, we take the Classified Change Imagery and compare it with the Classified Baseline Composite.

Because we are concerned with monitoring deforestation for our Change Detection, `pyeo` examines whether any forest classes (*classes 1, 11 and 12*) change to non-forest classes (*classes 3, 4, 5 and 13*).  

As new change imagery becomes available (*as deforestation monitoring is an iterative process through time*), these change images are classified and compared to the baseline, again.

---

Let's start the Change Detection process.  

The cell below appends to the log file which from classes and to classes we are interested in comparing. Additionally, if the sieve argument was supplied a number greater than 0 in the argument re-declaration section at the top of this session, `pyeo` will apply a sieve filter of that integer size. For example, in the cell below we apply a sieve filter of `>= 20` pixels, because we supplied that in the argument section previously.

In [18]:
# ------------------------------------------------------------------------
# Step 4: Pair up the class images with the composite baseline map
# and identify all pixels with the change between groups of classes of interest.
# Optionally applies a sieve filter to the class images if specified in the ini file.
# Confirms detected changes by NDVI differencing.
# ------------------------------------------------------------------------

if do_all or do_change:
    log.info("---------------------------------------------------------------")
    log.info("Creating change layers from stacked class images.")
    log.info("---------------------------------------------------------------")
    log.info("Changes of interest:")
    log.info("  from any of the classes {}".format(from_classes))
    log.info("  to   any of the classes {}".format(to_classes))

    # optionally sieve the class images
    if sieve > 0:
        log.info("Applying sieve to classification outputs.")
        sieved_paths = pyeo.raster_manipulation.sieve_directory(in_dir = categorised_image_dir,
                                                                out_dir = sieved_image_dir,
                                                                neighbours = 8,
                                                                sieve = sieve,
                                                                out_type="GTiff",
                                                                skip_existing=skip_existing)
        # if sieve was chosen, work with the sieved class images
        class_image_dir = sieved_image_dir
    else:
        # if sieve was not chosen, work with the original class images
        class_image_dir = categorised_image_dir

2023-03-24 14:28:22,350: INFO: ---------------------------------------------------------------
2023-03-24 14:28:22,352: INFO: Creating change layers from stacked class images.
2023-03-24 14:28:22,352: INFO: ---------------------------------------------------------------
2023-03-24 14:28:22,353: INFO: Changes of interest:
2023-03-24 14:28:22,353: INFO:   from any of the classes [1, 11, 12]
2023-03-24 14:28:22,354: INFO:   to   any of the classes [3, 4, 5, 13]
2023-03-24 14:28:22,355: INFO: Applying sieve to classification outputs.
2023-03-24 14:28:22,355: INFO: Sieving class files in      /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/classified
2023-03-24 14:28:22,356: INFO: Sieved class files saved in /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/sieved
2023-03-24 14:28:22,356: INFO: Neighbours = 8   Sieve = 20
2023-03-24 14:28:22,356: INFO: Skip existing files? True


0...10...20...30...40...50...60...70...80...90...100 - done.
0...10...20...30...40...50...60...70...80...90...100 - done.
0...10...20...30...40...50...60...70...80...90...100 - done.
0...10...20...30...40...50...60...70...80...90...100 - done.


The cell below looks for the composite and change imagery classifications and orders them by most recent.  

In [19]:
if do_all or do_change:
    # get all image paths in the classification maps directory except the class composites
    class_image_paths = [ f.path for f in os.scandir(class_image_dir) if f.is_file() and f.name.endswith(".tif") \
                            and not "composite_" in f.name ]
    if len(class_image_paths) == 0:
        raise FileNotFoundError("No class images found in {}.".format(class_image_dir))

    # sort class images by image acquisition date
    class_image_paths = list(filter(pyeo.filesystem_utilities.get_image_acquisition_time, class_image_paths))
    class_image_paths.sort(key=lambda x: pyeo.filesystem_utilities.get_image_acquisition_time(x))
    for index, image in enumerate(class_image_paths):
        log.info("{}: {}".format(index, image))

    # find the latest available composite
    try:
        latest_composite_name = \
            pyeo.filesystem_utilities.sort_by_timestamp(
                [image_name for image_name in os.listdir(composite_dir) if image_name.endswith(".tif")],
                recent_first=True
            )[0]
        latest_composite_path = os.path.join(composite_dir, latest_composite_name)
        log.info("Most recent composite at {}".format(latest_composite_path))
    except IndexError:
        log.critical("Latest composite not found. The first time you run this script, you need to include the "
                        "--build-composite flag to create a base composite to work off. If you have already done this,"
                        "check that the earliest dated image in your images/merged folder is later than the earliest"
                        " dated image in your composite/ folder.")
        sys.exit(1)

    latest_class_composite_path = os.path.join(
                                                class_image_dir, \
                                                [ f.path for f in os.scandir(class_image_dir) if f.is_file() \
                                                    and os.path.basename(latest_composite_path)[:-4] in f.name \
                                                    and f.name.endswith(".tif")][0]
                                    )

    log.info("Most recent class composite at {}".format(latest_class_composite_path))
    if not os.path.exists(latest_class_composite_path):
        log.critical("Latest class composite not found. The first time you run this script, you need to include the "
                        "--build-composite flag to create a base composite to work off. If you have already done this,"
                        "check that the earliest dated image in your images/merged folder is later than the earliest"
                        " dated image in your composite/ folder. Then, you need to run the --classify option.")
        sys.exit(1)

2023-03-24 14:28:45,580: INFO: 0: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/sieved/S2A_MSIL2A_20230302T074831_NA509_R135_T36MYE_20230302T093441_class_sieved.tif
2023-03-24 14:28:45,581: INFO: 1: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/sieved/S2B_MSIL2A_20230307T074759_NA509_R135_T36MYE_20230307T102115_class_sieved.tif
2023-03-24 14:28:45,582: INFO: 2: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/sieved/S2B_MSIL2A_20230317T074649_NA509_R135_T36MYE_20230317T101339_class_sieved.tif
2023-03-24 14:28:45,583: INFO: Most recent composite at /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/composite/composite_T36MYE_20230220T074941.tif
2023-03-24 14:28:45,583: INFO: Most recent class composite at /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/sieved/composite_T36MYE_20230220T074941_class_sieved.tif


Then, we search for existing report files created from previous `pyeo` runs and archive them, moving them to an archived folder.  

In [20]:
if do_all or do_change:

    if do_dev:
        before_timestamp = pyeo.filesystem_utilities.get_change_detection_dates(os.path.basename(latest_class_composite_path))[0]

        after_timestamp  = pyeo.filesystem_utilities.get_image_acquisition_time(os.path.basename(class_image_paths[-1]))

        output_product = os.path.join(probability_image_dir,
                                        "report_{}_{}_{}.tif".format(
                                        before_timestamp.strftime("%Y%m%dT%H%M%S"),
                                        tile_id,
                                        after_timestamp.strftime("%Y%m%dT%H%M%S"))
                                        )
        log.info("Report file name will be {}".format(output_product))

        # if a report file exists, archive it
        n_report_files = len([ f for f in os.scandir(probability_image_dir) if f.is_file() \
                                and f.name.startswith("report_") \
                                and f.name.endswith(".tif")])

        if n_report_files > 0:
            output_product_existing = [ f.path for f in os.scandir(probability_image_dir) if f.is_file() \
                                        and f.name.startswith("report_") \
                                        and f.name.endswith(".tif")][0]
            log.info("Found existing report image product: {}".format(output_product_existing))

            output_product_existing_archived = os.path.join(os.path.dirname(output_product_existing), 'archived_' + os.path.basename(output_product_existing))
            log.info("Renaming existing report image product to: {}".format(output_product_existing_archived))
            os.rename(output_product_existing, output_product_existing_archived)

2023-03-24 14:28:45,693: INFO: Report file name will be /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/probabilities/report_20230220T074941_36MYE_20230317T074649.tif


Now, we create the change report, which is the `change_raster` variable in the code below. This change report is created by sequentially comparing the classified change imagery against the classified baseline composite.

Once finished, `pyeo` does some housekeeping, compressing unneeded files.

In [21]:
if do_all or do_change:

    # find change patterns in the stack of classification images
    for index, image in enumerate(class_image_paths):
        before_timestamp = pyeo.filesystem_utilities.get_change_detection_dates(os.path.basename(latest_class_composite_path))[0]
        after_timestamp  = pyeo.filesystem_utilities.get_image_acquisition_time(os.path.basename(image))
        log.info("  class image index: {} of {}".format(index, len(class_image_paths)))
        log.info("  early time stamp: {}".format(before_timestamp))
        log.info("  late  time stamp: {}".format(after_timestamp))
        change_raster = os.path.join(probability_image_dir,
                                        "change_{}_{}_{}.tif".format(
                                        before_timestamp.strftime("%Y%m%dT%H%M%S"),
                                        tile_id,
                                        after_timestamp.strftime("%Y%m%dT%H%M%S"))
                                        )
        log.info("  Change raster file to be created: {}".format(change_raster))
        if do_dev:
            log.info("Update of the report image product based on change detection image.")
            pyeo.raster_manipulation.__change_from_class_maps(latest_class_composite_path,
                                                        image,
                                                        change_raster,
                                                        change_from = from_classes,
                                                        change_to = to_classes,
                                                        report_path = output_product,
                                                        skip_existing = skip_existing,
                                                        old_image_dir = composite_dir,
                                                        new_image_dir = l2_masked_image_dir,
                                                        viband1 = 4,
                                                        viband2 = 3,
                                                        threshold = -0.2
                                                        )
        else:
            pyeo.raster_manipulation.change_from_class_maps(latest_class_composite_path,
                                                        image,
                                                        change_raster,
                                                        change_from = from_classes,
                                                        change_to = to_classes,
                                                        skip_existing = skip_existing
                                                        )

    log.info("---------------------------------------------------------------")
    log.info("Post-classification change detection complete.")
    log.info("---------------------------------------------------------------")

    log.info("---------------------------------------------------------------")
    log.info("Compressing tiff files in directory {} and all subdirectories".format(probability_image_dir))
    log.info("---------------------------------------------------------------")
    for root, dirs, files in os.walk(probability_image_dir):
        all_tiffs = [image_name for image_name in files if image_name.endswith(".tif")]
        for this_tiff in all_tiffs:
            pyeo.raster_manipulation.compress_tiff(os.path.join(root, this_tiff), os.path.join(root, this_tiff))

    log.info("---------------------------------------------------------------")
    log.info("Compressing tiff files in directory {} and all subdirectories".format(sieved_image_dir))
    log.info("---------------------------------------------------------------")
    for root, dirs, files in os.walk(sieved_image_dir):
        all_tiffs = [image_name for image_name in files if image_name.endswith(".tif")]
        for this_tiff in all_tiffs:
            pyeo.raster_manipulation.compress_tiff(os.path.join(root, this_tiff), os.path.join(root, this_tiff))


2023-03-24 14:28:45,834: INFO:   class image index: 0 of 3
2023-03-24 14:28:45,835: INFO:   early time stamp: 2023-02-20 07:49:41
2023-03-24 14:28:45,836: INFO:   late  time stamp: 2023-03-02 07:48:31
2023-03-24 14:28:45,836: INFO:   Change raster file to be created: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/probabilities/change_20230220T074941_36MYE_20230302T074831.tif
2023-03-24 14:28:45,837: INFO: Update of the report image product based on change detection image.
2023-03-24 14:28:45,838: INFO: Report file being created: /data/clcr/shared/IMPRESS/matt/pyeo/36MYE/output/probabilities/report_20230220T074941_36MYE_20230317T074649.tif
2023-03-24 14:28:48,762: INFO: Adding masks:
2023-03-24 14:28:48,763: INFO:    /lustre/alice3/data/clcr/shared/IMPRESS/matt/pyeo/36MYE/tmpaclh4qjk/composite_T36MYE_20230220T074941_class_sieved_temp.msk
2023-03-24 14:28:48,764: INFO:    /lustre/alice3/data/clcr/shared/IMPRESS/matt/pyeo/36MYE/tmpaclh4qjk/S2A_MSIL2A_20230302T074831_NA509_R135_T36MYE_20

### <a id='toc4_8_1_'></a>[Could remove this not do_dev alternative. Since we want users to use the dev version.](#toc0_)

In [18]:
if not do_dev:
    
    log.info("---------------------------------------------------------------")
    log.info("Creating aggregated report file. Deprecated in the development version.")
    log.info("---------------------------------------------------------------")
    # combine all change layers into one output raster with two layers:
    #   (1) pixels show the earliest change detection date (expressed as the number of days since 1/1/2000)
    #   (2) pixels show the number of change detection dates (summed up over all change images in the folder)
    date_image_paths = [ f.path for f in os.scandir(probability_image_dir) if f.is_file() and f.name.endswith(".tif") \
                            and "change_" in f.name ]
    if len(date_image_paths) == 0:
        raise FileNotFoundError("No class images found in {}.".format(categorised_image_dir))

    before_timestamp = pyeo.filesystem_utilities.get_change_detection_dates(os.path.basename(latest_class_composite_path))[0]
    after_timestamp  = pyeo.filesystem_utilities.get_image_acquisition_time(os.path.basename(class_image_paths[-1]))
    output_product = os.path.join(probability_image_dir,
                                    "report_{}_{}_{}.tif".format(
                                    before_timestamp.strftime("%Y%m%dT%H%M%S"),
                                    tile_id,
                                    after_timestamp.strftime("%Y%m%dT%H%M%S"))
                                    )
    log.info("Combining date maps: {}".format(date_image_paths))
    pyeo.raster_manipulation.combine_date_maps(date_image_paths, output_product)

    log.info("---------------------------------------------------------------")
    log.info("Report image product completed / updated: {}".format(output_product))
    log.info("Compressing the report image.")
    log.info("---------------------------------------------------------------")
    pyeo.raster_manipulation.compress_tiff(output_product, output_product)

### <a id='toc4_8_2_'></a>[Final Housekeeping](#toc0_)

Finally, we run some more housekeeping, deleting or compressing unnecessary files, depending on the argument supplied at the beginning of this session.

In [19]:
if do_delete:
    log.info("---------------------------------------------------------------")
    log.info("Deleting intermediate class images used in change detection.")
    log.info("They can be recreated from the cloud-masked, band-stacked L2A images and the saved model.")
    log.info("---------------------------------------------------------------")
    directories = [ categorised_image_dir, sieved_image_dir, probability_image_dir ]
    for directory in directories:
        paths = [f for f in os.listdir(directory)]
        for f in paths:
            # keep the classified composite layers and the report image product for the next change detection
            if not f.startswith("composite_") and not f.startswith("report_"):
                log.info('Deleting {}'.format(os.path.join(directory, f)))
                if os.path.isdir(os.path.join(directory, f)):
                    shutil.rmtree(os.path.join(directory, f))
                else:
                    os.remove(os.path.join(directory, f))
    log.info("---------------------------------------------------------------")
    log.info("Deletion of intermediate file products complete.")
    log.info("---------------------------------------------------------------")
else:
    if do_zip:
        log.info("---------------------------------------------------------------")
        log.info("Zipping intermediate class images used in change detection")
        log.info("---------------------------------------------------------------")
        directories = [ categorised_image_dir, sieved_image_dir ]
        for directory in directories:
            zip_contents(directory, notstartswith = ["composite_", "report_"])
        log.info("---------------------------------------------------------------")
        log.info("Zipping complete")
        log.info("---------------------------------------------------------------")

log.info("---------------------------------------------------------------")
log.info("Change detection and report image product updating, file compression, zipping")
log.info("and deletion of intermediate file products (if selected) are complete.")
log.info("---------------------------------------------------------------")

if do_delete:
    log.info("---------------------------------------------------------------")
    log.info("Deleting temporary directories starting with \'tmp*\'")
    log.info("These can be left over from interrupted processing runs.")
    log.info("---------------------------------------------------------------")
    directory = tile_root_dir
    for root, dirs, files in os.walk(directory):
        temp_dirs = [d for d in dirs if d.startswith("tmp")]
        for temp_dir in temp_dirs:
            log.info('Deleting {}'.format(os.path.join(root, temp_dir)))
            if os.path.isdir(os.path.join(directory, f)):
                shutil.rmtree(os.path.join(directory, f))
            else:
                log.warning("This should not have happened. {} is not a directory. Skipping deletion.".format(os.path.join(root, temp_dir)))
    log.info("---------------------------------------------------------------")
    log.info("Deletion of temporary directories complete.")
    log.info("---------------------------------------------------------------")

log.info("---------------------------------------------------------------")
log.info("---                  PROCESSING END                         ---")
log.info("---------------------------------------------------------------")

2023-03-23 12:07:59,986: INFO: ---------------------------------------------------------------
2023-03-23 12:07:59,988: INFO: Change detection and report image product updating, file compression, zipping
2023-03-23 12:07:59,988: INFO: and deletion of intermediate file products (if selected) are complete.
2023-03-23 12:07:59,989: INFO: ---------------------------------------------------------------
2023-03-23 12:07:59,990: INFO: ---------------------------------------------------------------
2023-03-23 12:07:59,991: INFO: ---                  PROCESSING END                         ---
2023-03-23 12:07:59,992: INFO: ---------------------------------------------------------------


# <a id='toc5_'></a>[Section 4: Tile-based Alert Generation](#toc0_)

Here, we run through how to vectorise and create and filter alerts dataframe

In [None]:
# call the tile-based vectorise function