# PyEO Processing: How to build a change detection pipeline using your classification model..

- Here we will cover how to query and download images in order to build a cloud-free composite baseline
- We can then use this a reference against which to compare later 'change' images and so be able to identify forest changes
- Identified forest changes can then be processed and filtered based on the pattern over time
- Manual assessment can then be judge which alert ares to flag for further investigation

# Composite Baseline Building

- This section will take us stepwise through the imagery query, download and composite creation aspects of the `run_acd_national.py` script, which runs the full PyEO pipeline from the command line in a terminal.  
- Jupyter notebooks provide a useful and engaging interface to understand the components of this script, so we will follow an extracted version throughout this notebook.

This section comprises several stages:   
1. Directory and V=variable setup.
1. Querying for Sentinel-2 imagery that meets our search criteria.
1. Downloading the Sentinel-2 imagery identified from the Query.
1. If necessary, preprocess any L1C to L2A by applying atmospheric corrections. 
1. Cloud-masking the L2A imagery.
1. Creation of a composite baseline reference from the time series that has been downloaded and processed. 
1. Query and Download of a set of Change Detection Images
1. Classification of Baseline and Change Detection Images
1. Creation of Forest Alerts

# Setup: Requirements to use this Notebook

## Select the virtual environment
- Use the drop-down list at the top right of the Jupyter notebook window
- Select (venv) Python for Earth Observation (PyEO)

## Check the working directory is set to `pyeo` within your PyEO installation

In [1]:
pwd

'/home/sepal-user/20230626_pyeo_installation/pyeo/notebooks'

In [2]:
cd /home/sepal-user/20230626_pyeo_installation/pyeo

/home/sepal-user/20230626_pyeo_installation/pyeo


In [3]:
pwd

'/home/sepal-user/20230626_pyeo_installation/pyeo'

## <a id='toc3_1_'></a>[Directory and Variable Setup](#toc0_)

### <a id='toc3_1_1_'></a>[Import Libraries](#toc0_)

In [4]:
import shutil
import sys

from pyeo import (classification, filesystem_utilities,
                    queries_and_downloads, raster_manipulation)

from pyeo.acd_national import (acd_initialisation,
                                 acd_config_to_log,
                                 acd_roi_tile_intersection)
import configparser
import argparse
import json
import numpy as np
import os
from osgeo import gdal
import geopandas as gpd
import pandas as pd
from datetime import datetime
import warnings
import zipfile

gdal.UseExceptions()

print("Libraries successfully imported")

Libraries successfully imported


### <a id='toc3_1_3_'></a>[Declare Processing Parameters with In-Notebook Variables](#toc0_)

- When running the `pyeo` pipeline we use an initialisation file (.ini) to provide the required parameters.  
- For SEPAL, we will use `pyeo_sepal.ini`  
- Below the parameters are explained, but we will read these in via the config parser.

- Now, let's read in the `.ini` file

### Declare the path to the initialisation file
- You can leave this path alone if you are using the `pyeo_sepal.ini` file that comes packaged with PyEO

In [5]:
pwd

'/home/sepal-user/20230626_pyeo_installation/pyeo'

In [6]:
config_path = "/home/sepal-user/20230626_pyeo_installation/pyeo/pyeo_sepal.ini"

# Edit the `pyeo_sepal.ini` file
You can either:  
- Check that `pyeo_dir` and `tile_dir` in `pyeo_sepal.ini` match those below. These paths should be the same as below if you have followed the instructions in `pyeo_sepal_orientation` notebook"
    - ```pyeo_dir = /home/sepal-user/20230626_pyeo_installation/pyeo```
    - ```tile_dir = /home/sepal-user/20230626_pyeo_installation```  
    
    <br>
- Or, amend the `pyeo_sepal.ini` file to match your file paths if you cloned `pyeo` into a different directory:
    - Right-Click and 'Open' pyeo_sepal.ini in the file browser on the left to be able to edit it
    - Change pyeo_dir to point to the pyeo code in your installation directory
    ```pyeo_dir = /home/sepal-user/20230626_pyeo_installation/pyeo```
    - Change tile_dir to point to your installation directory, this is where data files will be stored
    ```tile_dir = /home/sepal-user/20230626_pyeo_installation```
    - Save the edited initialisation file - by pressing ```Ctrl+S```

# Edit the `credentials_dummy.ini` file:
- Ensure the credentials path in the `pyeo_sepal.ini` corresponds to your credentials file.
- The default path is to `credentials/credentials.ini`
- To use this default option open the file `credentials_dummy.ini` in the editor (Right-Click then 'Open')
- Edit the file to add your personal credentials for the dataspace API - following the convention of this file.
- Save the file as `credentials.ini` into the credentials folder (using File -> Save File As)

In [42]:
config_dict, acd_log = acd_initialisation(config_path)

2023-06-29 08:21:19,959: INFO: ---------------------------------------------------------------
2023-06-29 08:21:19,960: INFO:                     ****PROCESSING START****
2023-06-29 08:21:19,961: INFO: ---------------------------------------------------------------
2023-06-29 08:21:19,962: INFO: ---------------------------------------------------------------
2023-06-29 08:21:19,963: INFO: ---                  INTEGRATED PROCESSING START            ---
2023-06-29 08:21:19,964: INFO: ---------------------------------------------------------------
2023-06-29 08:21:19,964: INFO: Reading in parameters defined in: /home/sepal-user/20230626_pyeo_installation/pyeo/pyeo_sepal.ini
2023-06-29 08:21:19,965: INFO: ---------------------------------------------------------------


### Read the `.ini` file into Python
- Here we create a variable called `config_dict`, which is a Python dictionary containing the configuration parameters we explored in one of the cells above.

In [43]:
pwd

'/home/sepal-user/20230626_pyeo_installation/pyeo'

### Print the configuration parameters
- We print the configuration parameter to create a record of what pyeo has been configured to do.

In [44]:
acd_config_to_log(config_dict, acd_log)

2023-06-29 08:21:20,093: INFO: Options:
2023-06-29 08:21:20,093: INFO: The Environment Manager configured to use is : venv
2023-06-29 08:21:20,094: INFO:   --dev Running in development mode, choosing development versions of functions where available
2023-06-29 08:21:20,095: INFO:   -do_tile_intersection
2023-06-29 08:21:20,095: INFO:       Sentinel-2 tile intersection with ROI enabled
2023-06-29 08:21:20,096: INFO:   --do_raster
2023-06-29 08:21:20,098: INFO:       raster pipeline enabled
2023-06-29 08:21:20,099: INFO:   --quicklooks to create image quicklooks
2023-06-29 08:21:20,099: INFO:   --do_delete_existing_vector , when vectorising the change report rasters, 
2023-06-29 08:21:20,100: INFO:             existing vectors files will be deleted and new vector files created.
2023-06-29 08:21:20,100: INFO:   --do_vectorise
2023-06-29 08:21:20,101: INFO:       raster change reports will be vectorised
2023-06-29 08:21:20,102: INFO: EPSG used is: 21097
2023-06-29 08:21:20,102: INFO: List 

## Identify the required Sentinel-2 tiles

- PyEO operates by looking at a shapefile to determine the Region of Interest (ROI)
- This directory path and filename of this shapefile needs to be specified in these two lines in `pyeo_sepal.ini`:
    - `roi_dir = ./roi`
    - `roi_filename = kfs_roi_subset_c.shp`
- Then in the cell below, PyEO identifies what Sentinel-2 tiles intersect with the Region Of Interest (ROI).

In [27]:
os.chdir(config_dict["pyeo_dir"]) # ensures pyeo is looking in the correct directory
tilelist_filepath = acd_roi_tile_intersection(config_dict, acd_log)

2023-06-29 08:21:21,147: INFO: The provided ROI intersects with 2 Sentinel-2 tiles
2023-06-29 08:21:21,148: INFO: These tiles are  :
2023-06-29 08:21:21,149: INFO:   1 : 36NXG
2023-06-29 08:21:21,150: INFO:   2 : 36NYG
2023-06-29 08:21:21,150: INFO: Writing Sentinel-2 tiles that intersect with the provided ROI to  : ./roi/tilelist.csv
2023-06-29 08:21:21,169: INFO: Finished ROI tile intersection


In [28]:
print(tilelist_filepath)

./roi/tilelist.csv


- **Right-Click on tile_list.csv in the JupyterLab explorer to the left and select 'open' to view it in a tab within JupyterLab.** 

## Running PyEO Per-Tile

- PyEO is designed to run per-tile.
- It takes `tilelist.csv` created in the above cell and runs the pipeline for each tile in this `.csv` file.
- This tutorial will run through the pipeline for the first tile in `tilelist.csv` : `36NXG`.

## <a id='toc3_1_4_'></a>[Create the Folder Structure PyEO Expects](#toc0_)

In [29]:
os.chdir(config_dict["pyeo_dir"]) # ensures pyeo is looking in the correct directory

tile_to_process = pd.read_csv(tilelist_filepath)["tile"][0]
individual_tile_directory_path = os.path.join(config_dict["tile_dir"], tile_to_process)
filesystem_utilities.create_folder_structure_for_tiles(individual_tile_directory_path)
print("Folder structure build successfully finished")

Folder structure build successfully finished


- You can now use the JupyterLab file explorer to view the new folder structure which should be in your installation directory and called `36NXG`
- These folders provide the skeleton for the pipeline to store and process the tile's imagery

In [41]:
individual_tile_directory_path

'/home/sepal-user/20230626_pyeo_installation/36NXG'

## <a id='toc3_1_5_'></a>[Create the Log File](#toc0_)

- `PyEO` uses a Log file as a convenient location to monitor pipeline progress
- Additionally, the log file acts as a record of which parameters were used.

In [16]:
tile_log = filesystem_utilities.init_log_acd(
    log_path=os.path.join(individual_tile_directory_path, "log", tile_to_process + ".log"),
    logger_name=f"pyeo_{tile_to_process}"
)

2023-06-29 08:21:21,424: INFO: ---------------------------------------------------------------
2023-06-29 08:21:21,425: INFO:                     ****PROCESSING START****
2023-06-29 08:21:21,426: INFO: ---------------------------------------------------------------


- You can now use the JupyterLab file explorer to find the log file which will be in a log folder beneath the main tile directory
    - For example, the path to the Log file for `36NXG`, is : `20230626_pyeo_installation/36NXG/log/36NXG.log`

## Create the Required Directory Variables

- This cell ensures the paths for the directory strucutre are available to pipeline commands.

In [30]:
tile_log.info("Creating the directory paths")

change_image_dir = os.path.join(individual_tile_directory_path, r"images")
l1_image_dir = os.path.join(individual_tile_directory_path, r"images/L1C")
l2_image_dir = os.path.join(individual_tile_directory_path, r"images/L2A")
l2_masked_image_dir = os.path.join(individual_tile_directory_path, r"images/cloud_masked")
categorised_image_dir = os.path.join(individual_tile_directory_path, r"output/classified")
probability_image_dir = os.path.join(individual_tile_directory_path, r"output/probabilities")
sieved_image_dir = os.path.join(individual_tile_directory_path, r"output/sieved")
composite_dir = os.path.join(individual_tile_directory_path, r"composite")
composite_l1_image_dir = os.path.join(individual_tile_directory_path, r"composite/L1C")
composite_l2_image_dir = os.path.join(individual_tile_directory_path, r"composite/L2A")
composite_l2_masked_image_dir = os.path.join(individual_tile_directory_path, r"composite/cloud_masked")
quicklook_dir = os.path.join(individual_tile_directory_path, r"output/quicklooks")

tile_log.info("Successfully created the directory paths")

2023-06-29 08:21:21,493: INFO: Creating the directory paths
2023-06-29 08:21:21,495: INFO: Successfully created the directory paths


- So now, we can type any of the above variables into a Python cell and when we execute it - we will see the file path for that directory.

In [18]:
l1_image_dir

'/home/sepal-user/20230626_pyeo_installation/36NXG/images/L1C'

## Create the Processing Argument Variables

- `PyEO` uses these parameters to make decisions throughout the pipeline.

In [31]:
os.chdir(config_dict["pyeo_dir"]) # ensures pyeo is looking in the correct directory

In [32]:
start_date = config_dict["start_date"]
end_date = config_dict["end_date"]
composite_start_date = config_dict["composite_start"]
composite_end_date = config_dict["composite_end"]
cloud_cover = config_dict["cloud_cover"]
cloud_certainty_threshold = config_dict["cloud_certainty_threshold"]
model_path = config_dict["model_path"]
sen2cor_path = config_dict["sen2cor_path"]
epsg = config_dict["epsg"]
bands = config_dict["bands"]
resolution = config_dict["resolution_string"]
out_resolution = config_dict["output_resolution"]
buffer_size = config_dict["buffer_size_cloud_masking"]
buffer_size_composite = config_dict["buffer_size_cloud_masking_composite"]
max_image_number = config_dict["download_limit"]
faulty_granule_threshold = config_dict["faulty_granule_threshold"]
download_limit = config_dict["download_limit"]

skip_existing = config_dict["do_skip_existing"]
sieve = config_dict["sieve"]
from_classes = config_dict["from_classes"]
to_classes = config_dict["to_classes"]

download_source = config_dict["download_source"]
if download_source == "scihub":
    tile_log.info("scihub API is the download source")
if download_source == "dataspace":
    tile_log.info("dataspace API is the download source")

tile_log.info(f"Faulty Granule Threshold is set to   : {config_dict['faulty_granule_threshold']}")
tile_log.info("    Files below this threshold will not be downloaded")

credentials_path = config_dict["credentials_path"]
if not os.path.exists(credentials_path):
    tile_log.error(f"The credentials path does not exist  :{credentials_path}")
    tile_log.error(f"Current working directory :{os.getcwd()}")
    tile_log.error("Exiting raster pipeline")
    sys.exit(1)

conf = configparser.ConfigParser(allow_no_value=True, interpolation=None)
conf.read(credentials_path)
credentials_dict = {}

tile_log.info("Successfully read the processing arguments and credentials")

2023-06-29 08:21:21,751: INFO: dataspace API is the download source
2023-06-29 08:21:21,752: INFO: Faulty Granule Threshold is set to   : 350
2023-06-29 08:21:21,753: INFO:     Files below this threshold will not be downloaded
2023-06-29 08:21:21,762: INFO: Successfully read the processing arguments and credentials


## Read the Specified Credentials

In [33]:
if download_source == "dataspace":

    tile_log.info(f'Running download handler for {download_source}')

    credentials_dict["sent_2"] = {}
    credentials_dict["sent_2"]["user"] = conf["dataspace"]["user"]
    credentials_dict["sent_2"]["pass"] = conf["dataspace"]["pass"]
    sen_user = credentials_dict["sent_2"]["user"]
    sen_pass = credentials_dict["sent_2"]["pass"]

if download_source == "scihub":

    tile_log.info(f'Running download handler for {download_source}')

    credentials_dict["sent_2"] = {}
    credentials_dict["sent_2"]["user"] = conf["sent_2"]["user"]
    credentials_dict["sent_2"]["pass"] = conf["sent_2"]["pass"]
    sen_user = credentials_dict["sent_2"]["user"]
    sen_pass = credentials_dict["sent_2"]["pass"]
    
tile_log.info(f"Successfully configured the credentials for {download_source}")

2023-06-29 08:21:21,830: INFO: Running download handler for dataspace
2023-06-29 08:21:21,831: INFO: Successfully configured the credentials for dataspace


## <a id='toc3_2_'></a>[Query Sentinel-2 Composite Imagery](#toc0_)

First, a brief primer on the two Sentinel-2 data products we are concerned with:
- L1C
- L2A

**L1C** corresponds to the 1st processing level for the imagery. <br>

**L2A** corresponds to the 2nd processing level and this is the imagery we want to work with as these have been **atmospherically corrected**.

-------------------------------

Now that we have the query, file handling and log parameters set up, we can start querying the Copernicus Hub for the Sentinel-2 imagery that we want.  

The cell below starts the `build_composite` process. First, we query for the `L1C` products that match our criteria (date range, tile of interest, cloud cover).

Since we have declared a download limit of 12 images, the software caps the number of images in our query. This is a useful tool if we have limited disk space.

In [20]:
if config_dict["build_composite"] or config_dict["do_all"]:
    tile_log.info("---------------------------------------------------------------")
    tile_log.info(
        "Creating an initial cloud-free median composite from Sentinel-2 as a baseline map"
    )
    tile_log.info("---------------------------------------------------------------")
    tile_log.info("Searching for images for initial composite.")

    if download_source == "dataspace":

        try:
            tiles_geom_path = os.path.join(config_dict["pyeo_dir"], os.path.join(config_dict["geometry_dir"], config_dict["s2_tiles_filename"]))
            tile_log.info(f"Path to the S2 tile geometry information absolute path: {os.path.abspath(tiles_geom_path)}")
            tiles_geom = gpd.read_file(os.path.abspath(tiles_geom_path))
        except FileNotFoundError:
            tile_log.error(f"Path to the S2 tile geometry does not exist, absolute path given: {os.path.abspath(tiles_geom_path)}")

        tile_geom = tiles_geom[tiles_geom["Name"] == tile_to_process]
        tile_geom = tile_geom.to_crs(epsg=4326)
        geometry = tile_geom["geometry"].iloc[0]
        geometry = geometry.representative_point()

        # convert date string to YYYY-MM-DD
        date_object = datetime.strptime(composite_start_date, "%Y%m%d")
        dataspace_composite_start = date_object.strftime("%Y-%m-%d")
        date_object = datetime.strptime(composite_end_date, "%Y%m%d")
        dataspace_composite_end = date_object.strftime("%Y-%m-%d")

        try:
            dataspace_composite_products_all = queries_and_downloads.query_dataspace_by_polygon(
                max_cloud_cover=cloud_cover,
                start_date=dataspace_composite_start,
                end_date=dataspace_composite_end,
                area_of_interest=geometry,
                max_records=100,
                log=tile_log
            )
        except Exception as error:
            tile_log.error(f"query_dataspace_by_polygon received this error: {error}")

        titles = dataspace_composite_products_all["title"].tolist()
        sizes = list()
        uuids = list()
        for elem in dataspace_composite_products_all.itertuples(index=False):
            sizes.append(elem[-2]["download"]["size"])
            uuids.append(elem[-2]["download"]["url"].split("/")[-1])

        relative_orbit_numbers = dataspace_composite_products_all["relativeOrbitNumber"].tolist()
        processing_levels = dataspace_composite_products_all["processingLevel"].tolist()
        transformed_levels = ['Level-1C' if level == 'S2MSI1C' else 'Level-2A' for level in processing_levels]
        cloud_covers = dataspace_composite_products_all["cloudCover"].tolist()
        begin_positions = dataspace_composite_products_all["startDate"].tolist()
        statuses = dataspace_composite_products_all["status"].tolist()

        scihub_compatible_df = pd.DataFrame({"title": titles,
                                            "size": sizes,
                                            "beginposition": begin_positions,
                                            "relativeorbitnumber": relative_orbit_numbers,
                                            "cloudcoverpercentage": cloud_covers,
                                            "processinglevel": transformed_levels,
                                            "uuid": uuids,
                                            "status": statuses})

        # check granule sizes on the server
        scihub_compatible_df["size"] = scihub_compatible_df["size"].apply(lambda x: round(float(x) * 1e-6, 2))
        # reassign to match the scihub variable
        df_all = scihub_compatible_df


    if download_source == "scihub":

        try:
            composite_products_all = queries_and_downloads.check_for_s2_data_by_date(
                config_dict["tile_dir"],
                composite_start_date,
                composite_end_date,
                conf=credentials_dict,
                cloud_cover=cloud_cover,
                tile_id=tile_to_process,
                producttype=None,
            )

        except Exception as error:
            tile_log.error(
                f"check_for_s2_data_by_date failed, got this error :  {error}"
            )

        tile_log.info(
            "--> Found {} L1C and L2A products for the composite:".format(
                len(composite_products_all)
            )
        )

        df_all = pd.DataFrame.from_dict(composite_products_all, orient="index")

        # check granule sizes on the server
        df_all["size"] = (
            df_all["size"]
            .str.split(" ")
            .apply(lambda x: float(x[0]) * {"GB": 1e3, "MB": 1, "KB": 1e-3}[x[1]])
        )

    if download_source == "scihub":
        min_granule_size = faulty_granule_threshold
    else:
        min_granule_size = 0  # Required for dataspace API which doesn't report size correctly (often reported as zero)

    df = df_all.query("size >= " + str(min_granule_size))

    tile_log.info(
        "Removed {} faulty scenes <{}MB in size from the list".format(
            len(df_all) - len(df), min_granule_size
        )
    )
    # find < threshold sizes, report to log
    df_faulty = df_all.query("size < " + str(min_granule_size))
    for r in range(len(df_faulty)):
        tile_log.info(
            "   {} MB: {}".format(
                df_faulty.iloc[r, :]["size"], df_faulty.iloc[r, :]["title"]
            )
        )

    l1c_products = df[df.processinglevel == "Level-1C"]
    l2a_products = df[df.processinglevel == "Level-2A"]
    tile_log.info("    {} L1C products".format(l1c_products.shape[0]))
    tile_log.info("    {} L2A products".format(l2a_products.shape[0]))


    rel_orbits = np.unique(l1c_products["relativeorbitnumber"])
    if len(rel_orbits) > 0:
        if l1c_products.shape[0] > max_image_number / len(rel_orbits):
            tile_log.info(
                "Capping the number of L1C products to {}".format(max_image_number)
            )
            tile_log.info(
                "Relative orbits found covering tile: {}".format(rel_orbits)
            )
            tile_log.info("dataspace branch reaches here")
            uuids = []
            for orb in rel_orbits:
                uuids = uuids + list(
                    l1c_products.loc[
                        l1c_products["relativeorbitnumber"] == orb
                    ].sort_values(by=["cloudcoverpercentage"], ascending=True)[
                        "uuid"
                    ][
                        : int(max_image_number / len(rel_orbits))
                    ]
                )
            # keeps least cloudy n (max image number)
            l1c_products = l1c_products[l1c_products["uuid"].isin(uuids)]
            tile_log.info(
                "    {} L1C products remain:".format(l1c_products.shape[0])
            )
            for product in l1c_products["title"]:
                tile_log.info("       {}".format(product))
            tile_log.info(f"len of L1C products for dataspace is {len(l1c_products['title'])}")

    rel_orbits = np.unique(l2a_products["relativeorbitnumber"])
    if len(rel_orbits) > 0:
        if l2a_products.shape[0] > max_image_number / len(rel_orbits):
            tile_log.info(
                "Capping the number of L2A products to {}".format(max_image_number)
            )
            tile_log.info(
                "Relative orbits found covering tile: {}".format(rel_orbits)
            )
            uuids = []
            for orb in rel_orbits:
                uuids = uuids + list(
                    l2a_products.loc[
                        l2a_products["relativeorbitnumber"] == orb
                    ].sort_values(by=["cloudcoverpercentage"], ascending=True)[
                        "uuid"
                    ][
                        : int(max_image_number / len(rel_orbits))
                    ]
                )
            l2a_products = l2a_products[l2a_products["uuid"].isin(uuids)]
            tile_log.info(
                "    {} L2A products remain:".format(l2a_products.shape[0])
            )
            for product in l2a_products["title"]:
                tile_log.info("       {}".format(product))
            tile_log.info(f"len of L2A products for dataspace is {len(l2a_products['title'])}")

    if l1c_products.shape[0] > 0 and l2a_products.shape[0] > 0:
        tile_log.info(
            "Filtering out L1C products that have the same 'beginposition' time stamp as an existing L2A product."
        )
        if download_source == "scihub":
            (l1c_products,l2a_products,) = queries_and_downloads.filter_unique_l1c_and_l2a_data(df,log=tile_log)

        if download_source == "dataspace":
            l1c_products = queries_and_downloads.filter_unique_dataspace_products(l1c_products=l1c_products, l2a_products=l2a_products, log=tile_log)

    df = None
    tile_log.info(f" {len(l1c_products['title'])} L1C products for the Composite")
    tile_log.info(f" {len(l2a_products['title'])} L2A products for the Composite")
    
    tile_log.info("Successfully queried the L1C and L2A products for the Composite")

### <a id='toc3_2_1_'></a>[Search for L2A Images Corresponding to L1C](#toc0_)

- The cell below searches our download directory for any existing imagery. If we have downloaded any imagery already, `pyeo` will remove the matching image from our search query.  

- Secondly, if we have opted to use `scihub` as our `download_source`, then `pyeo` searches the Copernicus archive for any corresponding `L2A` products. If it finds a matching L2A product, then it removes the `L1C` counterpart from the query. The `dataspace` option handles this on the server.

In [21]:
if config_dict["build_composite"] or config_dict["do_all"]:
    # Search the local directories, composite/L2A and L1C, checking if scenes have already been downloaded and/or processed whilst checking their dir sizes
    if download_source == "scihub":
        if l1c_products.shape[0] > 0:
            tile_log.info(
                "Checking for already downloaded and zipped L1C or L2A products and"
            )
            tile_log.info("  availability of matching L2A products for download.")
            n = len(l1c_products)
            drop = []
            add = []
            for r in range(n):
                id = l1c_products.iloc[r, :]["title"]
                search_term = (
                    id.split("_")[2]
                    + "_"
                    + id.split("_")[3]
                    + "_"
                    + id.split("_")[4]
                    + "_"
                    + id.split("_")[5]
                )
                tile_log.info(
                    "Searching locally for file names containing: {}.".format(
                        search_term
                    )
                )
                file_list = (
                    [
                        os.path.join(composite_l1_image_dir, f)
                        for f in os.listdir(composite_l1_image_dir)
                    ]
                    + [
                        os.path.join(composite_l2_image_dir, f)
                        for f in os.listdir(composite_l2_image_dir)
                    ]
                    + [
                        os.path.join(composite_l2_masked_image_dir, f)
                        for f in os.listdir(composite_l2_masked_image_dir)
                    ]
                )
                for f in file_list:
                    if search_term in f:
                        tile_log.info("  Product already downloaded: {}".format(f))
                        drop.append(l1c_products.index[r])
                search_term = (
                    "*"
                    + id.split("_")[2]
                    + "_"
                    + id.split("_")[3]
                    + "_"
                    + id.split("_")[4]
                    + "_"
                    + id.split("_")[5]
                    + "*"
                )


                tile_log.info(
                    "Searching on the data hub for files containing: {}.".format(
                        search_term
                    )
                )
                matching_l2a_products = queries_and_downloads._file_api_query(
                    user=sen_user,
                    passwd=sen_pass,
                    start_date=composite_start_date,
                    end_date=composite_end_date,
                    filename=search_term,
                    cloud=cloud_cover,
                    producttype="S2MSI2A",
                )

                matching_l2a_products_df = pd.DataFrame.from_dict(
                    matching_l2a_products, orient="index"
                )
                # 07/03/2023: Matt - Applied Ali's fix for converting product size to MB to compare against faulty_grandule_threshold
                if (
                    len(matching_l2a_products_df) == 1
                    and [
                        float(x[0]) * {"GB": 1e3, "MB": 1, "KB": 1e-3}[x[1]]
                        for x in [matching_l2a_products_df["size"][0].split(" ")]
                    ][0]
                    > faulty_granule_threshold
                ):
                    tile_log.info("Replacing L1C {} with L2A product:".format(id))
                    tile_log.info(
                        "              {}".format(
                            matching_l2a_products_df.iloc[0, :]["title"]
                        )
                    )

                    drop.append(l1c_products.index[r])
                    add.append(matching_l2a_products_df.iloc[0, :])
                if len(matching_l2a_products_df) == 0:
                    pass
                if len(matching_l2a_products_df) > 1:
                    # check granule sizes on the server
                    matching_l2a_products_df["size"] = (
                        matching_l2a_products_df["size"]
                        .str.split(" ")
                        .apply(
                            lambda x: float(x[0])
                            * {"GB": 1e3, "MB": 1, "KB": 1e-3}[x[1]]
                        )
                    )
                    matching_l2a_products_df = matching_l2a_products_df.query(
                        "size >= " + str(faulty_granule_threshold)
                    )
                    if (
                        matching_l2a_products_df.iloc[0, :]["size"]
                        .str.split(" ")
                        .apply(
                            lambda x: float(x[0])
                            * {"GB": 1e3, "MB": 1, "KB": 1e-3}[x[1]]
                        )
                        > faulty_granule_threshold
                    ):
                        tile_log.info("Replacing L1C {} with L2A product:".format(id))
                        tile_log.info(
                            "              {}".format(
                                matching_l2a_products_df.iloc[0, :]["title"]
                            )
                        )
                        drop.append(l1c_products.index[r])
                        add.append(matching_l2a_products_df.iloc[0, :])
            if len(drop) > 0:
                l1c_products = l1c_products.drop(index=drop)
            if len(add) > 0:
                # l2a_products = l2a_products.append(add)
                add = pd.DataFrame(add)
                l2a_products = pd.concat([l2a_products, add])

            tile_log.info("\n Successfully searched for the L2A counterparts for the L1C products for the Composite")
        
    # here, dataspace and scihub derived l1c_products and l2a_products lists are the "same"
    l2a_products = l2a_products.drop_duplicates(subset="title")
    tile_log.info(
        "    {} L1C products remaining for download".format(
            l1c_products.shape[0]
        )
    )
    tile_log.info(
        "    {} L2A products remaining for download".format(
            l2a_products.shape[0]
        )
    )

    tile_log.info("Cell successfully finished")

## <a id='toc3_3_'></a>[Download Sentinel-2 Composite Imagery](#toc0_)

### <a id='toc3_3_1_'></a>[Download and Process L1Cs](#toc0_)

- From the `log` output above in the previous section, we can see that `pyeo` has found a matching `L2A` image for each of the `L1Cs` in our search query. So now we have only L2As in our search query.  

- If we did have `L1Cs` in our search query, then the cell below would download these L1Cs and apply `atmospheric_correction` using `Sen2Cor`.

In [22]:
if config_dict["build_composite"] or config_dict["do_all"]:
    if l1c_products.shape[0] > 0:
        tile_log.info(f"Downloading Sentinel-2 L1C products from {download_source}:")

        if download_source == "scihub":

            queries_and_downloads.download_s2_data_from_df(
                l1c_products,
                composite_l1_image_dir,
                composite_l2_image_dir,
                source="scihub",
                user=sen_user,
                passwd=sen_pass,
                try_scihub_on_fail=True,
            )

        if download_source == "dataspace":

            queries_and_downloads.download_s2_data_from_dataspace(
                product_df=l1c_products,
                l1c_directory=composite_l1_image_dir,
                l2a_directory=composite_l2_image_dir,
                dataspace_username=sen_user,
                dataspace_password=sen_pass,
                log=tile_log
            )
        '''
        tile_log.info("Atmospheric correction with sen2cor.")
        raster_manipulation.atmospheric_correction(
            composite_l1_image_dir,
            composite_l2_image_dir,
            sen2cor_path,
            delete_unprocessed_image=False,
            log=tile_log,
        )
        '''
    tile_log.info("Successfully downloaded the Sentinel-2 L1C products")

### <a id='toc3_3_2_'></a>[Download L2As](#toc0_)

In this subsection, we will download the L2As from our search query.  

But first, let's take a look at what our search query result, `l2a_products` looks like by printing the first 3 rows with `.head(3)`:

In [23]:
if config_dict["build_composite"] or config_dict["do_all"]:
    l2a_products.head(3)

Let's highlight a few columns of interest:  

In the cell output above, we can see the product `uuid` as the dataframe index (*the first column, it has no column name*). These are the unique identifiers used to distinguish the scenes from each other.  

From the `title` column, we can see the titles of each product, the titles themselves show us important information, for example: the Satellite (*S2A or S2B*), the Sensor (*MSI*), the product type (*L2A*), the date the image was captured (*YYYYMMDD*) or the corresponding tile for the image (*TXXXXX*).

We can also see if the product is online or in the Long-Term Archive (`LTA`), by looking at the column `ondemand`, where `false` indicates the product is in the LTA or `true` indicates the product is online and ready for download.

---------

Now, let's download the `L2As` in our search query `l2a_products`, by asking `pyeo` to download these images from the Copernicus archive. If any incomplete downloads are present from a previous run (*remember, pyeo is an iterative download, classification and change detection process*), then `pyeo` will flag these files to the user through the log file.

If the images are in the Long Term Archive (`LTA`), then `pyeo` will linearly activate and wait for the LTA image to become available, before downloading and moving onto the next L2A in the search query.

In [24]:
if config_dict["build_composite"] or config_dict["do_all"]:
    if l2a_products.shape[0] > 0:
        tile_log.info("Downloading Sentinel-2 L2A products.")

        if download_source == "scihub":

            queries_and_downloads.download_s2_data(
                l2a_products.to_dict("index"),
                composite_l1_image_dir,
                composite_l2_image_dir,
                source="scihub",
                user=sen_user,
                passwd=sen_pass,
                try_scihub_on_fail=True,
            )
        if download_source == "dataspace":

            queries_and_downloads.download_s2_data_from_dataspace(
                product_df=l2a_products,
                l1c_directory=composite_l1_image_dir,
                l2a_directory=composite_l2_image_dir,
                dataspace_username=sen_user,
                dataspace_password=sen_pass,
                log=tile_log
            )

    # check for incomplete L2A downloads
    incomplete_downloads, sizes = raster_manipulation.find_small_safe_dirs(
        composite_l2_image_dir, threshold=faulty_granule_threshold * 1024 * 1024
    )
    if len(incomplete_downloads) > 0:
        for index, safe_dir in enumerate(incomplete_downloads):
            if sizes[
                index
            ] / 1024 / 1024 < faulty_granule_threshold and os.path.exists(safe_dir):
                tile_log.warning("Found likely incomplete download of size {} MB: {}".format(
                        str(round(sizes[index] / 1024 / 1024)), safe_dir))

    tile_log.info("---------------------------------------------------------------")
    tile_log.info("Image download and atmospheric correction for composite is complete.")
    tile_log.info("---------------------------------------------------------------")

### <a id='toc3_3_3_'></a>[Housekeeping](#toc0_)

The cell below performs some housekeeping if we have told `pyeo` to delete or zip imagery. This functionality is useful for ensuring disk space is kept to a minimum.

In [25]:
if config_dict["build_composite"] or config_dict["do_all"]:
    if config_dict["do_delete"]:
        tile_log.info("---------------------------------------------------------------")
        tile_log.info("Deleting downloaded L1C images for composite, keeping only derived L2A products")
        tile_log.info(
            "---------------------------------------------------------------"
        )
        directory = composite_l1_image_dir
        tile_log.info("Deleting {}".format(directory))
        shutil.rmtree(directory)
        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info("Deletion of L1C images complete. Keeping only L2A images.")
        tile_log.info(
            "---------------------------------------------------------------"
        )
    else:
        if config_dict["do_zip"]:
            tile_log.info("---------------------------------------------------------------")
            tile_log.info("Zipping downloaded L1C images for composite after atmospheric correction")
            tile_log.info("---------------------------------------------------------------")
            filesystem_utilities.zip_contents(composite_l1_image_dir)
            tile_log.info("---------------------------------------------------------------")
            tile_log.info("Zipping complete")
            tile_log.info("---------------------------------------------------------------")

    tile_log.info("Cell successfully finished")

## <a id='toc3_4_'></a>[Process the Downloaded Imagery](#toc0_)

Now that we have downloaded the L2A Imagery, we will process the imagery. Processing refers to:  

1. Applying the `SCL Cloud Mask` to remove cloud, haze or cloud shadow pixels from the imagery.
2. Applying a `Processing Baseline Correction Offset` to the imagery, if applicable.
3. Create `Quicklooks` (*.png*) of the processed imagery.

### Check for pre-downloaded Imagery

In [26]:
if config_dict["build_composite"] or config_dict["do_all"]:
    tile_log.info("---------------------------------------------------------------")
    tile_log.info("Applying simple cloud, cloud shadow and haze mask based on SCL files and stacking the masked band raster files.")
    tile_log.info("---------------------------------------------------------------")

    directory = composite_l2_masked_image_dir
    masked_file_paths = [
        f
        for f in os.listdir(directory)
        if f.endswith(".tif") and os.path.isfile(os.path.join(directory, f))
    ]

    directory = composite_l2_image_dir
    l2a_zip_file_paths = [f for f in os.listdir(directory) if f.endswith(".zip")]

    if len(l2a_zip_file_paths) > 0:
        for f in l2a_zip_file_paths:
            # check whether the zipped file has already been cloud masked
            zip_timestamp = filesystem_utilities.get_image_acquisition_time(
                os.path.basename(f)
            ).strftime("%Y%m%dT%H%M%S")
            if any(zip_timestamp in f for f in masked_file_paths):
                continue
            else:
                # extract it if not
                filesystem_utilities.unzip_contents(
                    os.path.join(composite_l2_image_dir, f),
                    ifstartswith="S2",
                    ending=".SAFE",
                )

    directory = composite_l2_image_dir
    l2a_safe_file_paths = [
        f
        for f in os.listdir(directory)
        if f.endswith(".SAFE") and os.path.isdir(os.path.join(directory, f))
    ]

    files_for_cloud_masking = []
    if len(l2a_safe_file_paths) > 0:
        for f in l2a_safe_file_paths:
            # check whether the L2A SAFE file has already been cloud masked
            safe_timestamp = filesystem_utilities.get_image_acquisition_time(
                os.path.basename(f)
            ).strftime("%Y%m%dT%H%M%S")
            if any(safe_timestamp in f for f in masked_file_paths):
                continue
            else:
                # add it to the list of files to do if it has not been cloud masked yet
                files_for_cloud_masking = files_for_cloud_masking + [f]

    tile_log.info("Cell successfully finished")

### <a id='toc3_4_2_'></a>[Apply SCL Cloud Mask](#toc0_)

Optical data is affected by the presence of clouds over the land cover of interest. So, we use `apply_scl_cloud_mask` to remove cloudy pixels from the imagery, as we are not interested in clouds.

The cell below peforms two things:

- Checks whether any L2A SAFE files have been cloud masked from a previous run.

- If any L2A SAFE files have not been cloud masked, then `apply_scl_cloud_mask` is applied.

In [27]:
if config_dict["build_composite"] or config_dict["do_all"]:
    if len(files_for_cloud_masking) == 0:
        tile_log.info("No L2A images found for cloud masking. They may already have been done.")
    else:
        raster_manipulation.apply_scl_cloud_mask(
            composite_l2_image_dir,
            composite_l2_masked_image_dir,
            scl_classes=[0, 1, 2, 3, 8, 9, 10, 11],
            buffer_size=buffer_size_composite,
            bands=bands,
            out_resolution=out_resolution,
            haze=None,
            epsg=epsg,
            skip_existing=skip_existing,)

    tile_log.info("Successfully applied the Cloud Mask")

### <a id='toc3_4_3_'></a>[Apply Processing Baseline Offset](#toc0_)

Before Sentinel-2 imagery is provided to the user as L1C or L2A formats, the raw imagery (L0) are processed by the ESA Copernicus Ground Segment ([see here](https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi/processing-levels)). The algorithms used in the processing baseline, are indicated by the field `N0XXX` in the product title and the changes introduced by each processing baseline iteration are listed [here](https://sentinel.esa.int/web/sentinel/technical-guides/sentinel-2-msi/processing-baseline).

The advent of processing baseline `N0400` introduced an offset of `-1000` in the spectral reflectance values, the reasoning and suggested reading can be viewed [here](https://forum.step.esa.int/t/info-introduction-of-additional-radiometric-offset-in-pb04-00-products/35431). Therefore, to ensure that the spectral reflectance of imagery before and after `N0400` can be compared, we apply the offset correction of `+1000`.

The cell below, applies such an offset correction.

In [28]:
if config_dict["build_composite"] or config_dict["do_all"]:
    tile_log.info("---------------------------------------------------------------")
    tile_log.info("Offsetting cloud masked L2A images for composite.")
    tile_log.info("---------------------------------------------------------------")

    raster_manipulation.apply_processing_baseline_offset_correction_to_tiff_file_directory(
        composite_l2_masked_image_dir, composite_l2_masked_image_dir)

    tile_log.info("---------------------------------------------------------------")
    tile_log.info("Offsetting of cloud masked L2A images for composite complete.")
    tile_log.info("---------------------------------------------------------------")

### <a id='toc3_4_4_'></a>[Create Quicklooks of Cloud-Masked Images](#toc0_)

- We can also create quicklooks of the Cloud-Masked images. These are especially useful for viewing the images quickly using a standard photo viewer, and for use in presentations.

In [29]:
if config_dict["build_composite"] or config_dict["do_all"]:
    if config_dict["do_quicklooks"] or config_dict["do_all"]:
        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info("Producing quicklooks.")
        tile_log.info(
            "---------------------------------------------------------------"
        )
        dirs_for_quicklooks = [composite_l2_masked_image_dir]
        for main_dir in dirs_for_quicklooks:
            files = [
                f.path
                for f in os.scandir(main_dir)
                if f.is_file() and os.path.basename(f).endswith(".tif")
            ]
            # files = [ f.path for f in os.scandir(main_dir) if f.is_file() and os.path.basename(f).endswith(".tif") and "class" in os.path.basename(f) ] # do classification images only
            if len(files) == 0:
                tile_log.warning("No images found in {}.".format(main_dir))
            else:
                for f in files:
                    quicklook_path = os.path.join(
                        quicklook_dir,
                        os.path.basename(f).split(".")[0] + ".png",
                    )
                    tile_log.info("Creating quicklook: {}".format(quicklook_path))
                    raster_manipulation.create_quicklook(
                        f,
                        quicklook_path,
                        width=512,
                        height=512,
                        format="PNG",
                        bands=[3, 2, 1],
                        scale_factors=[[0, 2000, 0, 255]],
                    )
        tile_log.info("Quicklooks complete.")
    else:
        tile_log.info("Quicklook option disabled in ini file.")


    if config_dict["do_zip"]:
        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info(
            "Zipping downloaded L2A images for composite after cloud masking and band stacking"
        )
        tile_log.info(
            "---------------------------------------------------------------"
        )
        filesystem_utilities.zip_contents(composite_l2_image_dir)
        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info("Zipping complete")
        tile_log.info(
            "---------------------------------------------------------------"
        )


## <a id='toc3_5_'></a>[Create Composite from the Baseline Imagery](#toc0_)

Now we come to the last section of Tutorial Section 2. Previously, we have queried the Copernicus archive for Sentinel-2 images that matched our search criteria, we evaluated which L2A products were present in the archive to avoid unecessary processing from pyeo for conversion from L1C to L2A. We then downloaded the resulting imagery, applied a cloud mask and a baseline offset correction, if necessary. 

In [30]:
import warnings
if config_dict["build_composite"] or config_dict["do_all"]:
    with warnings.catch_warnings():
        warnings.simplefilter("ignore", category=RuntimeWarning)

        tile_log.info("---------------------------------------------------------------")
        tile_log.info(
            "Building initial cloud-free median composite from directory {}".format(
                composite_l2_masked_image_dir
            )
        )
        tile_log.info("---------------------------------------------------------------")
        directory = composite_l2_masked_image_dir
        masked_file_paths = [
            f
            for f in os.listdir(directory)
            if f.endswith(".tif") and os.path.isfile(os.path.join(directory, f))
        ]

        if len(masked_file_paths) > 0:
            raster_manipulation.clever_composite_directory(
                composite_l2_masked_image_dir,
                composite_dir,
                chunks=config_dict["chunks"],
                generate_date_images=True,
                missing_data_value=0,
            )
            tile_log.info("---------------------------------------------------------------")
            tile_log.info("Baseline composite complete.")
            tile_log.info("---------------------------------------------------------------")


### <a id='toc3_5_1_'></a>[Create Quicklook of the Composite](#toc0_)

In [31]:
if config_dict["build_composite"] or config_dict["do_all"]:
    if config_dict["do_quicklooks"] or config_dict["do_all"]:
        tile_log.info("---------------------------------------------------------------")
        tile_log.info("Producing quicklooks.")
        tile_log.info("---------------------------------------------------------------")
        dirs_for_quicklooks = [composite_dir]
        for main_dir in dirs_for_quicklooks:
            files = [
                f.path
                for f in os.scandir(main_dir)
                if f.is_file() and os.path.basename(f).endswith(".tif")
            ]
            if len(files) == 0:
                tile_log.warning("No images found in {}.".format(main_dir))
            else:
                for f in files:
                    quicklook_path = os.path.join(
                        quicklook_dir,
                        os.path.basename(f).split(".")[0] + ".png",
                    )
                    tile_log.info(
                        "Creating quicklook: {}".format(quicklook_path)
                    )
                    raster_manipulation.create_quicklook(
                        f,
                        quicklook_path,
                        width=512,
                        height=512,
                        format="PNG",
                        bands=[3, 2, 1],
                        scale_factors=[[0, 2000, 0, 255]],
                    )
        tile_log.info("Quicklooks complete.")


### <a id='toc3_5_2_'></a>[Final Housekeeping](#toc0_)

Now that we have created our composite and produced any quicklooks, we tell `pyeo` to delete or compress the cloud-masked L2A images that the composite was derived from.

In [32]:
if config_dict["build_composite"] or config_dict["do_all"]:
    if config_dict["do_quicklooks"] or config_dict["do_all"]:
        if config_dict["do_delete"]:
            tile_log.info(
                "---------------------------------------------------------------"
            )
            tile_log.info(
                "Deleting intermediate cloud-masked L2A images used for the baseline composite"
            )
            tile_log.info(
                "---------------------------------------------------------------"
            )
            f = composite_l2_masked_image_dir
            tile_log.info("Deleting {}".format(f))
            shutil.rmtree(f)
            tile_log.info(
                "---------------------------------------------------------------"
            )
            tile_log.info("Intermediate file products have been deleted.")
            tile_log.info("They can be reprocessed from the downloaded L2A images.")
            tile_log.info(
                "---------------------------------------------------------------"
            )
        else:
            if config_dict["do_zip"]:
                tile_log.info(
                    "---------------------------------------------------------------"
                )
                tile_log.info(
                    "Zipping cloud-masked L2A images used for the baseline composite"
                )
                tile_log.info(
                    "---------------------------------------------------------------"
                )
                filesystem_utilities.zip_contents(composite_l2_masked_image_dir)
                tile_log.info(
                    "---------------------------------------------------------------"
                )
                tile_log.info("Zipping complete")
                tile_log.info(
                    "---------------------------------------------------------------"
                )

        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info(
            "Compressing tiff files in directory {} and all subdirectories".format(
                composite_dir
            )
        )
        tile_log.info(
            "---------------------------------------------------------------"
        )
        for root, dirs, files in os.walk(composite_dir):
            all_tiffs = [
                image_name for image_name in files if image_name.endswith(".tif")
            ]
            for this_tiff in all_tiffs:
                raster_manipulation.compress_tiff(
                    os.path.join(root, this_tiff), os.path.join(root, this_tiff)
                )

        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info(
            "Baseline image composite, file compression, zipping and deletion of"
        )
        tile_log.info("intermediate file products (if selected) are complete.")
        tile_log.info(
            "---------------------------------------------------------------"
        )


# <a id='toc4_'></a>[Session 3: Change Detection](#toc0_)

- If we are returning to this notebook after a break, we need to re-run the cells in the subsection - [Directory and Variable Setup.](#toc3_1_)


## <a id='toc4_3_'></a>[Query Sentinel-2 Change Imagery](#toc0_)

In [33]:
if config_dict["do_all"] or config_dict["do_download"]:
    tile_log.info("---------------------------------------------------------------")
    tile_log.info(
        "Downloading change detection images between {} and {} with cloud cover <= {}".format(
            start_date, end_date, cloud_cover
        )
    )
    tile_log.info("---------------------------------------------------------------")
    if download_source == "dataspace":

        try:
            tiles_geom_path = os.path.join(config_dict["pyeo_dir"], os.path.join(config_dict["geometry_dir"], config_dict["s2_tiles_filename"]))
            tiles_geom = gpd.read_file(os.path.abspath(tiles_geom_path))

        except FileNotFoundError:
            tile_log.error(f"tiles_geom does not exist, the path is :{tiles_geom_path}")

        tile_geom = tiles_geom[tiles_geom["Name"] == tile_to_process]
        tile_geom = tile_geom.to_crs(epsg=4326)
        geometry = tile_geom["geometry"].iloc[0]
        geometry = geometry.representative_point()

        # convert date string to YYYY-MM-DD
        date_object = datetime.strptime(start_date, "%Y%m%d")
        dataspace_change_start = date_object.strftime("%Y-%m-%d")
        date_object = datetime.strptime(end_date, "%Y%m%d")
        dataspace_change_end = date_object.strftime("%Y-%m-%d")

        try:
            dataspace_change_products_all = queries_and_downloads.query_dataspace_by_polygon(
                max_cloud_cover=cloud_cover,
                start_date=dataspace_change_start,
                end_date=dataspace_change_end,
                area_of_interest=geometry,
                max_records=100,
                log=tile_log
            )
        except Exception as error:
            tile_log.error(f"query_by_polygon received this error: {error}")

        titles = dataspace_change_products_all["title"].tolist()
        sizes = list()
        uuids = list()
        for elem in dataspace_change_products_all.itertuples(index=False):
            sizes.append(elem[-2]["download"]["size"])
            uuids.append(elem[-2]["download"]["url"].split("/")[-1])


        relative_orbit_numbers = dataspace_change_products_all["relativeOrbitNumber"].tolist()
        processing_levels = dataspace_change_products_all["processingLevel"].tolist()
        transformed_levels = ['Level-1C' if level == 'S2MSI1C' else 'Level-2A' for level in processing_levels]
        cloud_covers = dataspace_change_products_all["cloudCover"].tolist()
        begin_positions = dataspace_change_products_all["startDate"].tolist()
        statuses = dataspace_change_products_all["status"].tolist()

        scihub_compatible_df = pd.DataFrame({"title": titles,
                                            "size": sizes,
                                            "beginposition": begin_positions,
                                            "relativeorbitnumber": relative_orbit_numbers,
                                            "cloudcoverpercentage": cloud_covers,
                                            "processinglevel": transformed_levels,
                                            "uuid": uuids,
                                            "status": statuses})

        # check granule sizes on the server
        scihub_compatible_df["size"] = scihub_compatible_df["size"].apply(lambda x: round(float(x) * 1e-6, 2))
        # reassign to match the scihub variable
        df_all = scihub_compatible_df

    if download_source == "scihub":
        products_all = queries_and_downloads.check_for_s2_data_by_date(
            config_dict["tile_dir"],
            start_date,
            end_date,
            credentials_dict,
            cloud_cover=cloud_cover,
            tile_id=tile_to_process,
            producttype=None,  # "S2MSI2A" or "S2MSI1C"
        )
        tile_log.info(
            "--> Found {} L1C and L2A products for change detection:".format(
                len(products_all)
            )
        )
        df_all = pd.DataFrame.from_dict(products_all, orient="index")

        # check granule sizes on the server
        df_all["size"] = (
            df_all["size"]
            .str.split(" ")
            .apply(lambda x: float(x[0]) * {"GB": 1e3, "MB": 1, "KB": 1e-3}[x[1]])
        )

    # here the main call (from if download_source == "scihub" branch) is resumed
    df = df_all.query("size >= " + str(faulty_granule_threshold))
    tile_log.info(
        "Removed {} faulty scenes <{}MB in size from the list:".format(
            len(df_all) - len(df), faulty_granule_threshold
        )
    )
    df_faulty = df_all.query("size < " + str(faulty_granule_threshold))
    for r in range(len(df_faulty)):
        tile_log.info(
            "   {} MB: {}".format(
                df_faulty.iloc[r, :]["size"], df_faulty.iloc[r, :]["title"]
            )
        )

    l1c_products = df[df.processinglevel == "Level-1C"]
    l2a_products = df[df.processinglevel == "Level-2A"]
    tile_log.info("    {} L1C products".format(l1c_products.shape[0]))
    tile_log.info("    {} L2A products".format(l2a_products.shape[0]))

    if l1c_products.shape[0] > 0 and l2a_products.shape[0] > 0:
        tile_log.info(
            "Filtering out L1C products that have the same 'beginposition' time stamp as an existing L2A product."
        )
        if download_source == "scihub":
            (l1c_products,l2a_products,) = queries_and_downloads.filter_unique_l1c_and_l2a_data(df,log=tile_log)

        if download_source == "dataspace":
            l1c_products = queries_and_downloads.filter_unique_dataspace_products(l1c_products=l1c_products, l2a_products=l2a_products, log=tile_log)

        tile_log.info(
            "--> {} L1C and L2A products with unique 'beginposition' time stamp for the composite:".format(
                l1c_products.shape[0] + l2a_products.shape[0]
            )
        )

    df = None
    tile_log.info(f" {len(l1c_products['title'])} L1C Change Images")
    tile_log.info(f" {len(l2a_products['title'])} L2A Change Images")
    
    tile_log.info("Cell successfully finished")

## Search for L2A Images Corresponding to L1C

In [34]:
if config_dict["do_all"] or config_dict["do_download"]:
    if download_source == "scihub":    
        if l1c_products.shape[0] > 0:
            tile_log.info("Checking for availability of L2A products to minimise download and atmospheric correction of L1C products.")
            n = len(l1c_products)
            drop = []
            add = []
            for r in range(n):
                id = l1c_products.iloc[r, :]["title"]
                search_term = (
                    "*"
                    + id.split("_")[2]
                    + "_"
                    + id.split("_")[3]
                    + "_"
                    + id.split("_")[4]
                    + "_"
                    + id.split("_")[5]
                    + "*"
                )
                tile_log.info("Search term: {}.".format(search_term))
                matching_l2a_products = queries_and_downloads._file_api_query(
                    user=sen_user,
                    passwd=sen_pass,
                    start_date=start_date,
                    end_date=end_date,
                    filename=search_term,
                    cloud=cloud_cover,
                    producttype="S2MSI2A",
                )

                matching_l2a_products_df = pd.DataFrame.from_dict(
                    matching_l2a_products, orient="index"
                )
                if len(matching_l2a_products_df) == 1:
                    tile_log.info(matching_l2a_products_df.iloc[0, :]["size"])
                    matching_l2a_products_df["size"] = (
                        matching_l2a_products_df["size"]
                        .str.split(" ")
                        .apply(
                            lambda x: float(x[0])
                            * {"GB": 1e3, "MB": 1, "KB": 1e-3}[x[1]]
                        )
                    )
                    if (
                        matching_l2a_products_df.iloc[0, :]["size"]
                        > faulty_granule_threshold
                    ):
                        tile_log.info("Replacing L1C {} with L2A product:".format(id))
                        tile_log.info(
                            "              {}".format(
                                matching_l2a_products_df.iloc[0, :]["title"]
                            )
                        )
                        drop.append(l1c_products.index[r])
                        add.append(matching_l2a_products_df.iloc[0, :])
                if len(matching_l2a_products_df) == 0:
                    tile_log.info("Found no match for L1C: {}.".format(id))
                if len(matching_l2a_products_df) > 1:
                    # check granule sizes on the server
                    matching_l2a_products_df["size"] = (
                        matching_l2a_products_df["size"]
                        .str.split(" ")
                        .apply(
                            lambda x: float(x[0])
                            * {"GB": 1e3, "MB": 1, "KB": 1e-3}[x[1]]
                        )
                    )
                    if (
                        matching_l2a_products_df.iloc[0, :]["size"]
                        > faulty_granule_threshold
                    ):
                        tile_log.info("Replacing L1C {} with L2A product:".format(id))
                        tile_log.info(
                            "              {}".format(
                                matching_l2a_products_df.iloc[0, :]["title"]
                            )
                        )
                        drop.append(l1c_products.index[r])
                        add.append(matching_l2a_products_df.iloc[0, :])

            if len(drop) > 0:
                l1c_products = l1c_products.drop(index=drop)
            if len(add) > 0:
                if config_dict["do_dev"]:
                    add = pd.DataFrame(add)
                    l2a_products = pd.concat([l2a_products, add])
                else:
                    add = pd.DataFrame(add)
                    l2a_products = pd.concat([l2a_products, add])

            tile_log.info(
                "    {} L1C products remaining for download".format(
                    l1c_products.shape[0]
                )
            )
    l2a_products = l2a_products.drop_duplicates(subset="title")
    tile_log.info("    {} L2A products remaining for download".format(l2a_products.shape[0]))

## <a id='toc4_4_'></a>[Download and Pre-Process L1C Change Imagery](#toc0_)

- If there any L1C products in the change images search query that are not matched with L2A products, then download these L1Cs and apply `atmospheric_correction`.

In [35]:
if config_dict["do_all"] or config_dict["do_download"]:
    if l1c_products.shape[0] > 0:

        tile_log.info(f"Downloading Sentinel-2 L1C products from {download_source}")

        if download_source == "scihub":
            queries_and_downloads.download_s2_data_from_df(
                l1c_products,
                l1_image_dir,
                l2_image_dir,
                download_source,
                user=sen_user,
                passwd=sen_pass,
                try_scihub_on_fail=True,
            )
        elif download_source == "dataspace":
                queries_and_downloads.download_s2_data_from_dataspace(
                product_df=l1c_products,
                l1c_directory=l1_image_dir,
                l2a_directory=l2_image_dir,
                dataspace_username=sen_user,
                dataspace_password=sen_pass,
                log=tile_log
            )
        else:
            tile_log.error(f"download source specified did not match 'scihub' or 'dataspace'")
            tile_log.error(f"download source supplied was  :  {download_source}")
            tile_log.error("exiting pipeline...")
            sys.exit(1)

    #     tile_log.info("Atmospheric correction with sen2cor.")
    #     raster_manipulation.atmospheric_correction(
    #         l1_image_dir,
    #         l2_image_dir,
    #         sen2cor_path,
    #         delete_unprocessed_image=False,
    #         log=tile_log,
    #     )

    tile_log.info("Successfully downloaded the Sentinel-2 L1C products")

## <a id='toc4_5_'></a>[Download L2A Change Imagery](#toc0_)

In [36]:
if config_dict["do_all"] or config_dict["do_download"]:
    if l2a_products.shape[0] > 0:
        tile_log.info(f"Downloading Sentinel-2 L2A products from {download_source}")

        if download_source == "scihub":
            queries_and_downloads.download_s2_data(
                l2a_products.to_dict("index"),
                l1_image_dir,
                l2_image_dir,
                download_source,
                user=sen_user,
                passwd=sen_pass,
                try_scihub_on_fail=True,
            )
        if download_source == "dataspace":
            queries_and_downloads.download_s2_data_from_dataspace(
                product_df=l2a_products,
                l1c_directory=l1_image_dir,
                l2a_directory=l2_image_dir,
                dataspace_username=sen_user,
                dataspace_password=sen_pass,
                log=tile_log
            )

    # check for incomplete L2A downloads and remove them
    incomplete_downloads, sizes = raster_manipulation.find_small_safe_dirs(
        l2_image_dir, threshold=faulty_granule_threshold * 1024 * 1024
    )
    if len(incomplete_downloads) > 0:
        for index, safe_dir in enumerate(incomplete_downloads):
            if sizes[
                index
            ] / 1024 / 1024 < faulty_granule_threshold and os.path.exists(safe_dir):
                tile_log.warning(
                    "Found likely incomplete download of size {} MB: {}".format(
                        str(round(sizes[index] / 1024 / 1024)), safe_dir
                    )
                )

    tile_log.info("Successfully downloaded the Sentinel-2 L2A products")

    tile_log.info("---------------------------------------------------------------")
    tile_log.info("Image download and atmospheric correction for change detection images is complete.")
    tile_log.info("---------------------------------------------------------------")


### <a id='toc4_5_1_'></a>[Housekeeping - Compress L1Cs](#toc0_)

If you have set your `do_zip` argument to `True`, then this cell will compress the L1Cs now that they have been atmospherically corrected and relabelled as L2As.

In [37]:
if config_dict["do_all"] or config_dict["do_download"]:
    if config_dict["do_delete"]:
        tile_log.info("---------------------------------------------------------------")
        tile_log.info("Deleting L1C images downloaded for change detection.")
        tile_log.info("Keeping only the derived L2A images after atmospheric correction.")
        tile_log.info("---------------------------------------------------------------")
        directory = l1_image_dir
        tile_log.info("Deleting {}".format(directory))
        shutil.rmtree(directory)
        tile_log.info("---------------------------------------------------------------")
        tile_log.info("Deletion complete")
        tile_log.info("---------------------------------------------------------------")
    else:
        if config_dict["do_zip"]:
            tile_log.info("---------------------------------------------------------------")
            tile_log.info("Zipping L1C images downloaded for change detection")
            tile_log.info("---------------------------------------------------------------")
            filesystem_utilities.zip_contents(l1_image_dir)
            tile_log.info("---------------------------------------------------------------")
            tile_log.info("Zipping complete")
            tile_log.info("---------------------------------------------------------------")

    tile_log.info("Cell successfully finished")

## <a id='toc4_6_'></a>[Cloud Masking, Offsetting and Quicklooks](#toc0_)

Here, like before in the previous session, we cloud mask, apply the baseline offset correction and produce quicklooks (if selected).  

Additionally, if you have set the `do_zip` flag to True, then `pyeo` will compress the cloud masked L2A images, as we no longer need these once classified.

### Cloud Masking

In [38]:
if config_dict["do_all"] or config_dict["do_download"]:
    tile_log.info("---------------------------------------------------------------")
    tile_log.info(
        "Applying simple cloud, cloud shadow and haze mask based on SCL files and stacking the masked band raster files."
    )
    tile_log.info("---------------------------------------------------------------")

    raster_manipulation.apply_scl_cloud_mask(
        l2_image_dir,
        l2_masked_image_dir,
        scl_classes=[0, 1, 2, 3, 8, 9, 10, 11],
        buffer_size=buffer_size,
        bands=bands,
        out_resolution=out_resolution,
        haze=None,
        epsg=epsg,
        skip_existing=skip_existing,
    )

    tile_log.info("---------------------------------------------------------------")
    tile_log.info("Cloud masking and band stacking of new L2A images are complete.")
    tile_log.info("---------------------------------------------------------------")


### Offsetting

In [39]:
if config_dict["do_all"] or config_dict["do_download"]:
    tile_log.info("---------------------------------------------------------------")
    tile_log.info("Offsetting cloud masked L2A images.")
    tile_log.info("---------------------------------------------------------------")

    raster_manipulation.apply_processing_baseline_offset_correction_to_tiff_file_directory(
        l2_masked_image_dir, l2_masked_image_dir)

    tile_log.info("---------------------------------------------------------------")
    tile_log.info("Offsetting of cloud masked L2A images complete.")
    tile_log.info("---------------------------------------------------------------")

### Quicklooks

In [40]:
if config_dict["do_all"] or config_dict["do_download"]:
    if config_dict["do_quicklooks"] or config_dict["do_all"]:
        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info("Producing quicklooks.")
        tile_log.info(
            "---------------------------------------------------------------"
        )
        dirs_for_quicklooks = [l2_masked_image_dir]
        for main_dir in dirs_for_quicklooks:
            files = [
                f.path
                for f in os.scandir(main_dir)
                if f.is_file() and os.path.basename(f).endswith(".tif")
            ]
            if len(files) == 0:
                tile_log.warning("No images found in {}.".format(main_dir))
            else:
                for f in files:
                    quicklook_path = os.path.join(
                        quicklook_dir,
                        os.path.basename(f).split(".")[0] + ".png",
                    )
                    tile_log.info("Creating quicklook: {}".format(quicklook_path))
                    raster_manipulation.create_quicklook(
                        f,
                        quicklook_path,
                        width=512,
                        height=512,
                        format="PNG",
                        bands=[3, 2, 1],
                        scale_factors=[[0, 2000, 0, 255]],
                    )
        tile_log.info("Quicklooks complete.")
    else:
        tile_log.info("Quicklooks disabled in ini file.")


### Housekeeping

In [41]:
if config_dict["do_all"] or config_dict["do_download"]:
    if config_dict["do_zip"]:
        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info("Zipping L2A images downloaded for change detection")
        tile_log.info(
            "---------------------------------------------------------------"
        )
        filesystem_utilities.zip_contents(l2_image_dir)
        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info("Zipping complete")
        tile_log.info(
            "---------------------------------------------------------------"
        )

    tile_log.info("---------------------------------------------------------------")
    tile_log.info(
        "Compressing tiff files in directory {} and all subdirectories".format(
            l2_masked_image_dir
        )
    )
    tile_log.info("---------------------------------------------------------------")
    for root, dirs, files in os.walk(l2_masked_image_dir):
        all_tiffs = [
            image_name for image_name in files if image_name.endswith(".tif")
        ]
        for this_tiff in all_tiffs:
            raster_manipulation.compress_tiff(
                os.path.join(root, this_tiff), os.path.join(root, this_tiff)
            )

    tile_log.info("---------------------------------------------------------------")
    tile_log.info(
        "Pre-processing of change detection images, file compression, zipping"
    )
    tile_log.info(
        "and deletion of intermediate file products (if selected) are complete."
    )
    tile_log.info("---------------------------------------------------------------")



## <a id='toc4_7_'></a>[Classification of the Baseline Composite and Change Images](#toc0_)

Here, we classify the Baseline Composite and the Change images using the model we created in the model training session.

In [42]:
with warnings.catch_warnings():
    warnings.simplefilter("ignore", category=UserWarning)
    if config_dict["do_all"] or config_dict["do_classify"]:
        tile_log.info("---------------------------------------------------------------")
        tile_log.info(
            "Classify a land cover map for each L2A image and composite image using a saved model"
        )
        tile_log.info("---------------------------------------------------------------")
        tile_log.info("Model used: {}".format(model_path))
        if skip_existing:
            tile_log.info("Skipping existing classification images if found.")
        classification.classify_directory(
            composite_dir,
            model_path,
            categorised_image_dir,
            prob_out_dir=None,
            apply_mask=False,
            out_type="GTiff",
            chunks=config_dict["chunks"],
            skip_existing=skip_existing,
        )
        classification.classify_directory(
            l2_masked_image_dir,
            model_path,
            categorised_image_dir,
            prob_out_dir=None,
            apply_mask=False,
            out_type="GTiff",
            chunks=config_dict["chunks"],
            skip_existing=skip_existing,
        )

        tile_log.info("---------------------------------------------------------------")
        tile_log.info(
            "Compressing tiff files in directory {} and all subdirectories".format(
                categorised_image_dir
            )
        )
        tile_log.info("---------------------------------------------------------------")
        for root, dirs, files in os.walk(categorised_image_dir):
            all_tiffs = [
                image_name for image_name in files if image_name.endswith(".tif")
            ]
            for this_tiff in all_tiffs:
                raster_manipulation.compress_tiff(
                    os.path.join(root, this_tiff), os.path.join(root, this_tiff)
                )

        tile_log.info("---------------------------------------------------------------")
        tile_log.info("Classification of all images is complete.")
        tile_log.info("---------------------------------------------------------------")


### Housekeeping

In [43]:
if config_dict["do_all"] or config_dict["do_classify"]:
    if config_dict["do_quicklooks"] or config_dict["do_all"]:
        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info("Producing quicklooks.")
        tile_log.info(
            "---------------------------------------------------------------"
        )
        dirs_for_quicklooks = [categorised_image_dir]
        for main_dir in dirs_for_quicklooks:
            # files = [ f.path for f in os.scandir(main_dir) if f.is_file() and os.path.basename(f).endswith(".tif") ]
            files = [
                f.path
                for f in os.scandir(main_dir)
                if f.is_file()
                and os.path.basename(f).endswith(".tif")
                and "class" in os.path.basename(f)
            ]  # do classification images only
            if len(files) == 0:
                tile_log.warning("No images found in {}.".format(main_dir))
            else:
                for f in files:
                    quicklook_path = os.path.join(
                        quicklook_dir,
                        os.path.basename(f).split(".")[0] + ".png",
                    )
                    tile_log.info("Creating quicklook: {}".format(quicklook_path))
                    raster_manipulation.create_quicklook(
                        f, quicklook_path, width=512, height=512, format="PNG"
                    )
        tile_log.info("Quicklooks complete.")
    else:
        tile_log.info("Quicklooks disabled in ini file.")


## <a id='toc4_8_'></a>[Change Detection](#toc0_)

To perform Change Detection, we take the Classified Change Imagery and compare it with the Classified Baseline Composite.

Because we are concerned with monitoring deforestation for our Change Detection, `pyeo` examines whether any forest classes (*classes 1, 11 and 12*) change to non-forest classes (*classes 3, 4, 5 and 13*).  

As new change imagery becomes available (*as deforestation monitoring is an iterative process through time*), these change images are classified and compared to the baseline, again.

---

The overall Change Detection can be summarised as this:
- PyEO first looks for the composite and change imagery classifications and orders them by most recent.
- Then, it searches for existing report files created from previous PyEO runs and archive them, moving them to an archived folder.
- Then it creates the change report by sequentially comparing the classified change imagery against the classified baseline composite.
- Once finished, PyEO does some housekeeping, compressing unneeded files.

In [44]:
if config_dict["do_all"] or config_dict["do_change"]:
    tile_log.info("---------------------------------------------------------------")
    tile_log.info("Creating change layers from stacked class images.")
    tile_log.info("---------------------------------------------------------------")
    tile_log.info("Changes of interest:")
    tile_log.info(
        "  from any of the classes {}".format(config_dict["from_classes"])
    )
    tile_log.info("  to   any of the classes {}".format(config_dict["to_classes"]))

    # optionally sieve the class images
    if sieve > 0:
        tile_log.info("Applying sieve to classification outputs.")
        sieved_paths = raster_manipulation.sieve_directory(
            in_dir=categorised_image_dir,
            out_dir=sieved_image_dir,
            neighbours=8,
            sieve=sieve,
            out_type="GTiff",
            skip_existing=skip_existing,
        )
        # if sieve was chosen, work with the sieved class images
        class_image_dir = sieved_image_dir
    else:
        # if sieve was not chosen, work with the original class images
        class_image_dir = categorised_image_dir

    # get all image paths in the classification maps directory except the class composites
    class_image_paths = [
        f.path
        for f in os.scandir(class_image_dir)
        if f.is_file() and f.name.endswith(".tif") and not "composite_" in f.name
    ]
    if len(class_image_paths) == 0:
        raise FileNotFoundError(
            "No class images found in {}.".format(class_image_dir)
        )

    # sort class images by image acquisition date
    class_image_paths = list(
        filter(filesystem_utilities.get_image_acquisition_time, class_image_paths)
    )
    class_image_paths.sort(
        key=lambda x: filesystem_utilities.get_image_acquisition_time(x)
    )
    for index, image in enumerate(class_image_paths):
        tile_log.info("{}: {}".format(index, image))

    # find the latest available composite
    try:
        latest_composite_name = filesystem_utilities.sort_by_timestamp(
            [
                image_name
                for image_name in os.listdir(composite_dir)
                if image_name.endswith(".tif")
            ],
            recent_first=True,
        )[0]
        latest_composite_path = os.path.join(composite_dir, latest_composite_name)
        tile_log.info("Most recent composite at {}".format(latest_composite_path))
    except IndexError:
        tile_log.critical(
            "Latest composite not found. The first time you run this script, you need to include the "
            "--build-composite flag to create a base composite to work off. If you have already done this,"
            "check that the earliest dated image in your images/merged folder is later than the earliest"
            " dated image in your composite/ folder."
        )
        sys.exit(1)

    latest_class_composite_path = os.path.join(
        class_image_dir,
        [
            f.path
            for f in os.scandir(class_image_dir)
            if f.is_file()
            and os.path.basename(latest_composite_path)[:-4] in f.name
            and f.name.endswith(".tif")
        ][0],
    )

    tile_log.info(
        "Most recent class composite at {}".format(latest_class_composite_path)
    )
    if not os.path.exists(latest_class_composite_path):
        tile_log.critical(
            "Latest class composite not found. The first time you run this script, you need to include the "
            "--build-composite flag to create a base composite to work off. If you have already done this,"
            "check that the earliest dated image in your images/merged folder is later than the earliest"
            " dated image in your composite/ folder. Then, you need to run the --classify option."
        )
        sys.exit(1)

    if config_dict[
        "do_dev"
    ]:  # set the name of the report file in the development version run
        before_timestamp = filesystem_utilities.get_change_detection_dates(
            os.path.basename(latest_class_composite_path)
        )[0]
        # I.R. 20220611 START
        ## Timestamp report with the date of most recent classified image that contributes to it
        after_timestamp = filesystem_utilities.get_image_acquisition_time(
            os.path.basename(class_image_paths[-1])
        )
        ## ORIGINAL
        # gets timestamp of the earliest change image of those available in class_image_path
        # after_timestamp  = pyeo.filesystem_utilities.get_image_acquisition_time(os.path.basename(class_image_paths[0]))
        # I.R. 20220611 END
        output_product = os.path.join(
            probability_image_dir,
            "report_{}_{}_{}.tif".format(
                before_timestamp.strftime("%Y%m%dT%H%M%S"),
                tile_to_process,
                after_timestamp.strftime("%Y%m%dT%H%M%S"),
            ),
        )
        tile_log.info("I.R. Report file name will be {}".format(output_product))

        # if a report file exists, archive it  ( I.R. Changed from 'rename it to show it has been updated')
        n_report_files = len(
            [
                f
                for f in os.scandir(probability_image_dir)
                if f.is_file()
                and f.name.startswith("report_")
                and f.name.endswith(".tif")
            ]
        )

        if n_report_files > 0:
            # I.R. ToDo: Should iterate over output_product_existing in case more than one report file is present (though unlikely)
            output_product_existing = [
                f.path
                for f in os.scandir(probability_image_dir)
                if f.is_file()
                and f.name.startswith("report_")
                and f.name.endswith(".tif")
            ][0]
            tile_log.info(
                "Found existing report image product: {}".format(
                    output_product_existing
                )
            )

            output_product_existing_archived = os.path.join(
                os.path.dirname(output_product_existing),
                "archived_" + os.path.basename(output_product_existing),
            )
            tile_log.info(
                "Renaming existing report image product to: {}".format(
                    output_product_existing_archived
                )
            )
            os.rename(output_product_existing, output_product_existing_archived)

    # find change patterns in the stack of classification images
    for index, image in enumerate(class_image_paths):
        tile_log.info("")
        tile_log.info("")
        tile_log.info(f"  printing index, image   : {index}, {image}")
        tile_log.info("")
        tile_log.info("")
        before_timestamp = filesystem_utilities.get_change_detection_dates(
            os.path.basename(latest_class_composite_path)
        )[0]
        after_timestamp = filesystem_utilities.get_image_acquisition_time(
            os.path.basename(image)
        )
        tile_log.info(
            "*** PROCESSING CLASSIFIED IMAGE: {} of {} filename: {} ***".format(
                index, len(class_image_paths), image
            )
        )
        tile_log.info("  early time stamp: {}".format(before_timestamp))
        tile_log.info("  late  time stamp: {}".format(after_timestamp))
        change_raster = os.path.join(
            probability_image_dir,
            "change_{}_{}_{}.tif".format(
                before_timestamp.strftime("%Y%m%dT%H%M%S"),
                tile_to_process,
                after_timestamp.strftime("%Y%m%dT%H%M%S"),
            ),
        )
        tile_log.info(
            "  Change raster file to be created: {}".format(change_raster)
        )

        dNDVI_raster = os.path.join(
            probability_image_dir,
            "dNDVI_{}_{}_{}.tif".format(
                before_timestamp.strftime("%Y%m%dT%H%M%S"),
                tile_to_process,
                after_timestamp.strftime("%Y%m%dT%H%M%S"),
            ),
        )
        tile_log.info(
            "  I.R. dNDVI raster file to be created: {}".format(dNDVI_raster)
        )

        NDVI_raster = os.path.join(
            probability_image_dir,
            "NDVI_{}_{}_{}.tif".format(
                before_timestamp.strftime("%Y%m%dT%H%M%S"),
                tile_to_process,
                after_timestamp.strftime("%Y%m%dT%H%M%S"),
            ),
        )
        tile_log.info(
            "  I.R. NDVI raster file of change image to be created: {}".format(
                NDVI_raster
            )
        )

        if config_dict["do_dev"]:
            # This function looks for changes from class 'change_from' in the composite to any of the 'change_to_classes'
            # in the change images. Pixel values are the acquisition date of the detected change of interest or zero.
            # TODO: In change_from_class_maps(), add a flag (e.g. -1) whether a pixel was a cloud in the later image.
            # Applying check whether dNDVI < -0.2, i.e. greenness has decreased over changed areas

            tile_log.info("Update of the report image product based on change detection image.")
            raster_manipulation.__change_from_class_maps(
                old_class_path=latest_class_composite_path,
                new_class_path=image,
                change_raster=change_raster,
                dNDVI_raster=dNDVI_raster,
                NDVI_raster=NDVI_raster,
                change_from=from_classes,
                change_to=to_classes,
                report_path=output_product,
                skip_existing=skip_existing,
                old_image_dir=composite_dir,
                new_image_dir=l2_masked_image_dir,
                viband1=4,
                viband2=3,
                dNDVI_threshold=-0.2,
                log=tile_log,
            )
        else:
            raster_manipulation.change_from_class_maps(
                latest_class_composite_path,
                image,
                change_raster,
                change_from=from_classes,
                change_to=to_classes,
                skip_existing=skip_existing,
            )

    tile_log.info("---------------------------------------------------------------")
    tile_log.info("Post-classification change detection complete.")
    tile_log.info("---------------------------------------------------------------")

    tile_log.info("---------------------------------------------------------------")
    tile_log.info(
        "Compressing tiff files in directory {} and all subdirectories".format(
            probability_image_dir
        )
    )
    tile_log.info("---------------------------------------------------------------")
    for root, dirs, files in os.walk(probability_image_dir):
        all_tiffs = [
            image_name for image_name in files if image_name.endswith(".tif")
        ]
        for this_tiff in all_tiffs:
            raster_manipulation.compress_tiff(
                os.path.join(root, this_tiff), os.path.join(root, this_tiff)
            )

    tile_log.info("---------------------------------------------------------------")
    tile_log.info(
        "Compressing tiff files in directory {} and all subdirectories".format(
            sieved_image_dir
        )
    )
    tile_log.info("---------------------------------------------------------------")
    for root, dirs, files in os.walk(sieved_image_dir):
        all_tiffs = [
            image_name for image_name in files if image_name.endswith(".tif")
        ]
        for this_tiff in all_tiffs:
            raster_manipulation.compress_tiff(
                os.path.join(root, this_tiff), os.path.join(root, this_tiff)
            )

    if not config_dict["do_dev"]:
        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info(
            "Creating aggregated report file. Deprecated in the development version."
        )
        tile_log.info(
            "---------------------------------------------------------------"
        )
        # combine all change layers into one output raster with two layers:
        #   (1) pixels show the earliest change detection date (expressed as the number of days since 1/1/2000)
        #   (2) pixels show the number of change detection dates (summed up over all change images in the folder)
        date_image_paths = [
            f.path
            for f in os.scandir(probability_image_dir)
            if f.is_file() and f.name.endswith(".tif") and "change_" in f.name
        ]
        if len(date_image_paths) == 0:
            raise FileNotFoundError(
                "No class images found in {}.".format(categorised_image_dir)
            )

        before_timestamp = filesystem_utilities.get_change_detection_dates(
            os.path.basename(latest_class_composite_path)
        )[0]
        after_timestamp = filesystem_utilities.get_image_acquisition_time(
            os.path.basename(class_image_paths[-1])
        )
        output_product = os.path.join(
            probability_image_dir,
            "report_{}_{}_{}.tif".format(
                before_timestamp.strftime("%Y%m%dT%H%M%S"),
                tile_to_process,
                # tile_id,
                after_timestamp.strftime("%Y%m%dT%H%M%S"),
            ),
        )
        tile_log.info("Combining date maps: {}".format(date_image_paths))
        raster_manipulation.combine_date_maps(date_image_paths, output_product)

    tile_log.info("---------------------------------------------------------------")
    tile_log.info(
        "Report image product completed / updated: {}".format(output_product)
    )
        
if config_dict["do_all"] or config_dict["do_vectorise"]:
    from pyeo.apps.acd_national.acd_by_tile_vectorisation import vector_report_generation
    output_vector_products = vector_report_generation(config_path, tile_to_process)

    tile_log.info("---------------------------------------------------------------")
    tile_log.info("Report image vectorised. Output file(s) created:")
    for i in range(len(output_vector_products)):
        tile_log.info("  {}".format(output_vector_products[i]))


2023-06-29 08:21:23,996: INFO: ---------------------------------------------------------------
2023-06-29 08:21:23,997: INFO:                     ****PROCESSING START****
2023-06-29 08:21:23,998: INFO: ---------------------------------------------------------------
2023-06-29 08:21:23,999: INFO: ----------------------------------------
2023-06-29 08:21:23,999: INFO: Starting Vectorisation of the Change Report Raster of Tile: 36NXG
2023-06-29 08:21:24,000: INFO: ----------------------------------------
2023-06-29 08:21:24,002: INFO: what is change_report_path  :  /home/sepal-user/20230626_pyeo_installation/36NXG/output/probabilities/report_20221202T075301_36NXG_20230317T074649.tif
2023-06-29 08:21:24,123: INFO: Opening /home/sepal-user/20230626_pyeo_installation/36NXG/output/probabilities/report_20221202T075301_36NXG_20230317T074649.tif
2023-06-29 08:21:24,124: INFO: Successfully opened /home/sepal-user/20230626_pyeo_installation/36NXG/output/probabilities/report_20221202T075301_36NXG_2

TypeError: merge_and_calculate_spatial() got an unexpected keyword argument 'write_kmlfile'

### <a id='toc4_8_2_'></a>[Final Housekeeping](#toc0_)

Finally, we run some more housekeeping, deleting or compressing unnecessary files, depending on the argument supplied at the beginning of this session.

In [None]:
if config_dict["do_all"] or config_dict["do_change"]:
    tile_log.info("Compressing the report image.")
    tile_log.info("---------------------------------------------------------------")
    raster_manipulation.compress_tiff(output_product, output_product)

    if config_dict["do_delete"]:
        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info(
            "Deleting intermediate class images used in change detection."
        )
        tile_log.info(
            "They can be recreated from the cloud-masked, band-stacked L2A images and the saved model."
        )
        tile_log.info(
            "---------------------------------------------------------------"
        )
        directories = [
            categorised_image_dir,
            sieved_image_dir,
            probability_image_dir,
        ]
        for directory in directories:
            paths = [f for f in os.listdir(directory)]
            for f in paths:
                # keep the classified composite layers and the report image product for the next change detection
                if not f.startswith("composite_") and not f.startswith("report_"):
                    tile_log.info("Deleting {}".format(os.path.join(directory, f)))
                    if os.path.isdir(os.path.join(directory, f)):
                        shutil.rmtree(os.path.join(directory, f))
                    else:
                        os.remove(os.path.join(directory, f))
        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info("Deletion of intermediate file products complete.")
        tile_log.info(
            "---------------------------------------------------------------"
        )
    else:
        if config_dict["do_zip"]:
            tile_log.info(
                "---------------------------------------------------------------"
            )
            tile_log.info(
                "Zipping intermediate class images used in change detection"
            )
            tile_log.info(
                "---------------------------------------------------------------"
            )
            directories = [categorised_image_dir, sieved_image_dir]
            for directory in directories:
                filesystem_utilities.zip_contents(
                    directory, notstartswith=["composite_", "report_"]
                )
            tile_log.info(
                "---------------------------------------------------------------"
            )
            tile_log.info("Zipping complete")
            tile_log.info(
                "---------------------------------------------------------------"
            )

    tile_log.info("---------------------------------------------------------------")
    tile_log.info(
        "Change detection and report image product updating, file compression, zipping"
    )
    tile_log.info(
        "and deletion of intermediate file products (if selected) are complete."
    )
    tile_log.info("---------------------------------------------------------------")

    if config_dict["do_delete"]:
        tile_log.info("---------------------------------------------------------------")
        tile_log.info("Deleting temporary directories starting with 'tmp*'")
        tile_log.info("These can be left over from interrupted processing runs.")
        tile_log.info("---------------------------------------------------------------")
        directory = tile_root_dir
        for root, dirs, files in os.walk(directory):
            temp_dirs = [d for d in dirs if d.startswith("tmp")]
            for temp_dir in temp_dirs:
                tile_log.info("Deleting {}".format(os.path.join(root, temp_dir)))
                if os.path.isdir(os.path.join(directory, f)):
                    shutil.rmtree(os.path.join(directory, f))
                else:
                    tile_log.warning(
                        "This should not have happened. {} is not a directory. Skipping deletion.".format(
                            os.path.join(root, temp_dir)
                        )
                    )
        tile_log.info("---------------------------------------------------------------")
        tile_log.info("Deletion of temporary directories complete.")
        tile_log.info("---------------------------------------------------------------")


tile_log.info("---------------------------------------------------------------")
tile_log.info("---                  PROCESSING END                         ---")
tile_log.info("---------------------------------------------------------------")