# PyEO Forest Alerts: How to create a median image composite from a time-series of Sentinel-2 images.

This notebook was developed for pyeo on a Linux VM for Azure Lab.

- This notebook will cover how to query the Sentinel-2 image archive on the Copernicus Data Space Ecosystem (CDSE), how to download selected images based on the query results, and how to create a (nearly) cloud-free image composite from several images.
- The image composite will be used as a baseline against which the forest alerts will be assessed.

# Baseline Image Composite Creation

- This section will take us stepwise through the imagery query, download and composite creation aspects of the `run_acd_national.py` script, which runs the full PyEO pipeline from the command line in a terminal.  
- Jupyter notebooks provide a useful and engaging interface to understand the components of this script, so we will follow an extracted version throughout this notebook.

This section comprises several stages:   
1. Directory and V=variable setup.
1. Querying for Sentinel-2 imagery that meets our search criteria.
1. Downloading the Sentinel-2 imagery identified from the Query.
1. If necessary, preprocess any L1C to L2A by applying atmospheric corrections. 
1. Cloud-masking the L2A imagery.
1. Creation of a composite baseline reference from the time series that has been downloaded and processed. 
1. Query and Download of a set of Change Detection Images
1. Classification of Baseline and Change Detection Images
1. Creation of Forest Alerts

# Setup: Requirements to use this Notebook

Navigate to the pyeo installation directory

In [18]:
cd ~/pyeo

/home/cmsstudent/pyeo


## Import Libraries

In [19]:
import shutil
import sys

from pyeo import (classification, filesystem_utilities,
                    queries_and_downloads, raster_manipulation)

from pyeo.acd_national import (acd_initialisation,
                                 acd_config_to_log,
                                 acd_roi_tile_intersection)
import configparser
import argparse
import json
import numpy as np
import os
from osgeo import gdal
import geopandas as gpd
import pandas as pd
from datetime import datetime
import warnings
import zipfile

gdal.UseExceptions()

print("Libraries successfully imported")

Libraries successfully imported


## Declare Processing Parameters with In-Notebook Variables

- When running the `pyeo` pipeline we use an initialisation file (.ini) to provide the required parameters.  
- For Azure Labs, we will use `pyeo_linux_azure.ini`  
- Below the parameters are explained, but we will read these in via the config parser.

- Now, let's read in the `.ini` file

## Declare the path to the initialisation file
- The ini file contains a whole range of parameters that control how pyeo is run.
- It is worth opening it in a text editor to see which parameters can be changed.


In [20]:
pwd

'/home/cmsstudent/pyeo'

In [21]:
config_path = "pyeo_linux_azure.ini"

## Edit the `pyeo_linux_azure.ini` file
You can either:  
- Check that `pyeo_dir` and `tile_dir` in `pyeo_linux_azure.ini` are correct.
    <br>
- Or, amend the `.ini` file to match your file paths if you cloned `pyeo` into a different directory:
    - Right-Click and 'Open' `pyeo_linux_azure.ini` in the file browser on the left to be able to edit it
    - Change pyeo_dir to point to the pyeo code in your installation directory
    - Change tile_dir to point to your directory, where you want to save the created and downloaded data files
    - Save the edited initialisation file - by pressing ```Ctrl+S```

## Edit the `credentials.ini` file:
- Ensure the credentials path in the ``pyeo_linux_azure.ini` corresponds to your credentials file. The file contains your login details to the Copernicus Data Space Ecosystem.
- The default path from within pyeo is to `.\credentials\credentials.ini`
- To use this default option open the file `credentials_dummy.ini` in the editor (Right-Click then 'Open')
- Edit the file to add your personal credentials for the dataspace API - following the convention of this file.
- Save the file as `credentials.ini` into the credentials folder (using File -> Save File As)

In [22]:
config_dict, acd_log = acd_initialisation(config_path)

2024-09-25 14:15:30,425: INFO: ---------------------------------------------------------------
2024-09-25 14:15:30,426: INFO: ---                 PROCESSING START                        ---
2024-09-25 14:15:30,427: INFO: ---------------------------------------------------------------
2024-09-25 14:15:30,428: INFO: conda environment path found: /home/cmsstudent/miniconda3//envs/pyeo_env
2024-09-25 14:15:30,428: INFO: True
2024-09-25 14:15:30,429: INFO: ---------------------------------------------------------------
2024-09-25 14:15:30,430: INFO: ---                  INTEGRATED PROCESSING START            ---
2024-09-25 14:15:30,430: INFO: ---------------------------------------------------------------
2024-09-25 14:15:30,431: INFO: Reading in parameters defined in: pyeo_linux_azure.ini
2024-09-25 14:15:30,432: INFO: ---------------------------------------------------------------


## Print the configuration parameters
- We print the configuration parameter to create a record of what pyeo has been configured to do.

In [23]:
acd_config_to_log(config_dict, acd_log)

2024-09-25 14:15:31,150: INFO:   run_mode :  watch_period_seconds
2024-09-25 14:15:31,152: INFO:   forest_sentinel :  model
2024-09-25 14:15:31,153: INFO:   environment :  sen2cor_path
2024-09-25 14:15:31,153: INFO:   raster_processing_parameters :  change_to_classes
2024-09-25 14:15:31,154: INFO:   vector_processing_parameters :  minimum_area_to_report_m2
2024-09-25 14:15:31,155: INFO:   alerts_sending_options :  whatsapp_list_file
2024-09-25 14:15:31,155: INFO:   qsub_processor_options :  nodes=1:ppn=16,vmem=64Gb
2024-09-25 14:15:31,157: INFO:   wall_time_hours :  3
2024-09-25 14:15:31,157: INFO:   watch_time_hours :  3
2024-09-25 14:15:31,158: INFO:   watch_period_seconds :  60
2024-09-25 14:15:31,159: INFO:   --do_tile_intersection enables Sentinel-2 tile intersection with region of interest (ROI).
2024-09-25 14:15:31,161: INFO:   do_all :  False
2024-09-25 14:15:31,161: INFO:   --do_classify applies the random forest model and creates classification layers
2024-09-25 14:15:31,162:

## Identify the required Sentinel-2 tiles

- PyEO operates by looking at a shapefile to determine the Region of Interest (ROI)
- This directory path and filename of this shapefile needs to be specified in these two lines in the .ini file:
    - `roi_dir = roi`
    - `roi_filename = kfs_roi_subset_c.shp`
- Then in the cell below, PyEO identifies what Sentinel-2 tiles intersect with the Region Of Interest (ROI).

In [24]:
os.chdir(config_dict["pyeo_dir"]) # ensures pyeo is looking in the correct directory
tilelist_filepath = acd_roi_tile_intersection(config_dict, acd_log)

2024-09-25 14:15:33,260: INFO: The provided ROI intersects with 2 Sentinel-2 tiles:
2024-09-25 14:15:33,261: INFO:   1 : 36NXG
2024-09-25 14:15:33,261: INFO:   2 : 36NYG


In [25]:
print(tilelist_filepath)

./roi/tilelist.csv


- **Right-Click on tile_list.csv in the JupyterLab explorer to the left and select 'open' to view it in a tab within JupyterLab.** 

## Running PyEO Per Tile

- PyEO is designed to run per-tile.
- It takes `tilelist.csv` created in the above cell and runs the pipeline for each tile in this `.csv` file.
- This tutorial will run through the pipeline for the first tile in `tilelist.csv` : `36NXG`.

## Create the Folder Structure PyEO Expects

In [26]:
os.chdir(config_dict["pyeo_dir"]) # ensures pyeo is looking in the correct directory

tile_to_process = pd.read_csv(tilelist_filepath)["tile"][0]
individual_tile_directory_path = os.path.join(config_dict["tile_dir"], tile_to_process)
filesystem_utilities.create_folder_structure_for_tiles(individual_tile_directory_path)
print("Folder structure build successfully finished")

Folder structure build successfully finished


- You can now use the JupyterLab file explorer to view the new folder structure which should be in your installation directory and called `36NXG`
- These folders provide the skeleton for the pipeline to store and process the tile's imagery

In [27]:
individual_tile_directory_path

'/home/cmsstudent/Desktop/pyeo_data/36NXG'

## Create the Tile Log File

- `PyEO` uses a Log file as a convenient location to monitor pipeline progress
- Additionally, the log file acts as a record of which parameters were used.

In [28]:
tile_log = filesystem_utilities.init_log_acd(
    log_path=os.path.join(individual_tile_directory_path, "log", tile_to_process + ".log"),
    logger_name=f"pyeo_{tile_to_process}"
)

2024-09-25 14:15:42,880: INFO: ---------------------------------------------------------------
2024-09-25 14:15:42,882: INFO: ---                 PROCESSING START                        ---
2024-09-25 14:15:42,882: INFO: ---------------------------------------------------------------


- You can now use the JupyterLab file explorer to find the log file which will be in a log folder beneath the main tile directory
    - For example, the path to the Log file for `36NXG`, is : `20230626_pyeo_installation/36NXG/log/36NXG.log`

## Create the Processing Argument Variables

- `PyEO` uses these parameters to make decisions throughout the pipeline.

In [29]:
os.chdir(config_dict["pyeo_dir"]) # ensures pyeo is looking in the correct directory

In [30]:
start_date = config_dict["start_date"]
end_date = config_dict["end_date"]
composite_start_date = config_dict["composite_start"]
composite_end_date = config_dict["composite_end"]
cloud_cover = config_dict["cloud_cover"]
cloud_certainty_threshold = config_dict["cloud_certainty_threshold"]
model_path = config_dict["model_path"]
sen2cor_path = config_dict["sen2cor_path"]
epsg = config_dict["epsg"]
bands = config_dict["bands"]
resolution = config_dict["resolution_string"]
out_resolution = config_dict["output_resolution"]
buffer_size = config_dict["buffer_size_cloud_masking"]
buffer_size_composite = config_dict["buffer_size_cloud_masking_composite"]
max_image_number = config_dict["download_limit"]
faulty_granule_threshold = config_dict["faulty_granule_threshold"]
download_limit = config_dict["download_limit"]

skip_existing = config_dict["do_skip_existing"]
sieve = config_dict["sieve"]
from_classes = config_dict["from_classes"]
to_classes = config_dict["to_classes"]

download_source = config_dict["download_source"]
if download_source == "scihub":
    tile_log.info("scihub API is the download source")
if download_source == "dataspace":
    tile_log.info("dataspace API is the download source")

tile_log.info(f"Faulty Granule Threshold is set to   : {config_dict['faulty_granule_threshold']}")
tile_log.info("    Files below this threshold will not be downloaded")

credentials_path = config_dict["credentials_path"]
if not os.path.exists(credentials_path):
    tile_log.error(f"The credentials path does not exist  :{credentials_path}")
    tile_log.error(f"Current working directory :{os.getcwd()}")
    tile_log.error("Exiting raster pipeline")
    sys.exit(1)

conf = configparser.ConfigParser(allow_no_value=True, interpolation=None)
conf.read(credentials_path)
credentials_dict = {}

tile_log.info("Successfully read the processing arguments and credentials")

2024-09-25 14:15:46,592: INFO: dataspace API is the download source
2024-09-25 14:15:46,593: INFO: Faulty Granule Threshold is set to   : 200
2024-09-25 14:15:46,594: INFO:     Files below this threshold will not be downloaded
2024-09-25 14:15:46,596: INFO: Successfully read the processing arguments and credentials


## Create the Necessary Variables to the Directory Paths

In [31]:
tile_log.info("Creating the directory paths")

change_image_dir = os.path.join(individual_tile_directory_path, r"images")
l1_image_dir = os.path.join(individual_tile_directory_path, r"images", r"L1C")
l2_image_dir = os.path.join(individual_tile_directory_path, r"images", r"L2A")
l2_masked_image_dir = os.path.join(individual_tile_directory_path, r"images", r"cloud_masked")
categorised_image_dir = os.path.join(individual_tile_directory_path, r"output", r"classifications")
probability_image_dir = os.path.join(individual_tile_directory_path, r"output", r"probabilities")
sieved_image_dir = os.path.join(individual_tile_directory_path, r"output", r"sieved")
composite_dir = os.path.join(individual_tile_directory_path, r"composite")
composite_l1_image_dir = os.path.join(individual_tile_directory_path, r"composite", r"L1C")
composite_l2_image_dir = os.path.join(individual_tile_directory_path, r"composite", r"L2A")
composite_l2_masked_image_dir = os.path.join(individual_tile_directory_path, r"composite", r"cloud_masked")
quicklook_dir = os.path.join(individual_tile_directory_path, r"output", r"quicklooks")

tile_log.info("Successfully created the directory paths")

2024-09-25 14:15:49,712: INFO: Creating the directory paths
2024-09-25 14:15:49,714: INFO: Successfully created the directory paths


## Read the Specified Credentials

In [32]:
if download_source == "dataspace":

    tile_log.info(f'Running download handler for {download_source}')

    credentials_dict["sent_2"] = {}
    credentials_dict["sent_2"]["user"] = conf["dataspace"]["user"]
    credentials_dict["sent_2"]["pass"] = conf["dataspace"]["pass"]
    sen_user = credentials_dict["sent_2"]["user"]
    sen_pass = credentials_dict["sent_2"]["pass"]

if download_source == "scihub":

    tile_log.info(f'Running download handler for {download_source}')

    credentials_dict["sent_2"] = {}
    credentials_dict["sent_2"]["user"] = conf["sent_2"]["user"]
    credentials_dict["sent_2"]["pass"] = conf["sent_2"]["pass"]
    sen_user = credentials_dict["sent_2"]["user"]
    sen_pass = credentials_dict["sent_2"]["pass"]
    
tile_log.info(f"Successfully configured the credentials for {download_source}")

2024-09-25 14:15:51,409: INFO: Running download handler for dataspace
2024-09-25 14:15:51,411: INFO: Successfully configured the credentials for dataspace


# Query Sentinel-2 Composite Imagery

First, a brief primer on the two Sentinel-2 data products we are concerned with:
- L1C
- L2A

**L1C** corresponds to the 1st processing level for the imagery. <br>

**L2A** corresponds to the 2nd processing level and this is the imagery we want to work with as these have been **atmospherically corrected**.

-------------------------------

Now that we have the query, file handling and log parameters set up, we can start querying the Copernicus Hub for the Sentinel-2 imagery that we want.  

The cell below starts the `build_composite` process. First, we query for the `L1C` products that match our criteria (date range, tile of interest, cloud cover).

Since we have declared a download limit of 12 images, the software caps the number of images in our query. This is a useful tool if we have limited disk space.

## Submit the Query

In [33]:
if config_dict["build_composite"] or config_dict["do_all"]:
    tile_log.info("---------------------------------------------------------------")
    tile_log.info(
        "Creating an initial cloud-free median composite from Sentinel-2 as a baseline map"
    )
    tile_log.info("---------------------------------------------------------------")
    tile_log.info("Searching for images for initial composite.")

    if download_source == "dataspace":

        try:
            tiles_geom_path = os.path.join(config_dict["pyeo_dir"], os.path.join(config_dict["geometry_dir"], config_dict["s2_tiles_filename"]))
            tile_log.info(f"Path to the S2 tile geometry information absolute path: {os.path.abspath(tiles_geom_path)}")
            tiles_geom = gpd.read_file(os.path.abspath(tiles_geom_path))
        except FileNotFoundError:
            tile_log.error(f"Path to the S2 tile geometry does not exist, absolute path given: {os.path.abspath(tiles_geom_path)}")

        tile_geom = tiles_geom[tiles_geom["Name"] == tile_to_process]
        tile_geom = tile_geom.to_crs(epsg=4326)
        geometry = tile_geom["geometry"].iloc[0]
        geometry = geometry.representative_point().wkt

        # convert date string to YYYY-MM-DD
        date_object = datetime.strptime(composite_start_date, "%Y%m%d")
        dataspace_composite_start = date_object.strftime("%Y-%m-%d")
        date_object = datetime.strptime(composite_end_date, "%Y%m%d")
        dataspace_composite_end = date_object.strftime("%Y-%m-%d")

        try:
            dataspace_composite_products_all = queries_and_downloads.query_dataspace_by_polygon(
                max_cloud_cover=cloud_cover,
                start_date=dataspace_composite_start,
                end_date=dataspace_composite_end,
                area_of_interest=geometry,
                max_records=100,
                log=tile_log
            )
        except Exception as error:
            tile_log.error(f"query_dataspace_by_polygon received this error: {error}")

        titles = dataspace_composite_products_all["title"].tolist()
        sizes = list()
        uuids = list()
        for elem in dataspace_composite_products_all.itertuples(index=False):
            sizes.append(elem[-2]["download"]["size"])
            uuids.append(elem[-2]["download"]["url"].split("/")[-1])

        relative_orbit_numbers = dataspace_composite_products_all["relativeOrbitNumber"].tolist()
        processing_levels = dataspace_composite_products_all["processingLevel"].tolist()
        transformed_levels = ['Level-1C' if level == 'S2MSI1C' else 'Level-2A' for level in processing_levels]
        cloud_covers = dataspace_composite_products_all["cloudCover"].tolist()
        begin_positions = dataspace_composite_products_all["startDate"].tolist()
        statuses = dataspace_composite_products_all["status"].tolist()

        scihub_compatible_df = pd.DataFrame({"title": titles,
                                            "size": sizes,
                                            "beginposition": begin_positions,
                                            "relativeorbitnumber": relative_orbit_numbers,
                                            "cloudcoverpercentage": cloud_covers,
                                            "processinglevel": transformed_levels,
                                            "uuid": uuids,
                                            "status": statuses})

        # check granule sizes on the server
        scihub_compatible_df["size"] = scihub_compatible_df["size"].apply(lambda x: round(float(x) * 1e-6, 2))
        # reassign to match the scihub variable
        df_all = scihub_compatible_df


    if download_source == "scihub":

        try:
            composite_products_all = queries_and_downloads.check_for_s2_data_by_date(
                config_dict["tile_dir"],
                composite_start_date,
                composite_end_date,
                conf=credentials_dict,
                cloud_cover=cloud_cover,
                tile_id=tile_to_process,
                producttype=None,
            )

        except Exception as error:
            tile_log.error(
                f"check_for_s2_data_by_date failed, got this error :  {error}"
            )

        tile_log.info(
            "--> Found {} L1C and L2A products for the composite:".format(
                len(composite_products_all)
            )
        )

        df_all = pd.DataFrame.from_dict(composite_products_all, orient="index")

        # check granule sizes on the server
        df_all["size"] = (
            df_all["size"]
            .str.split(" ")
            .apply(lambda x: float(x[0]) * {"GB": 1e3, "MB": 1, "KB": 1e-3}[x[1]])
        )

    if download_source == "scihub":
        min_granule_size = faulty_granule_threshold
    else:
        min_granule_size = 0  # Required for dataspace API which doesn't report size correctly (often reported as zero)

    df = df_all.query("size >= " + str(min_granule_size))

    tile_log.info(
        "Removed {} faulty scenes <{}MB in size from the list".format(
            len(df_all) - len(df), min_granule_size
        )
    )
    # find < threshold sizes, report to log
    df_faulty = df_all.query("size < " + str(min_granule_size))
    for r in range(len(df_faulty)):
        tile_log.info(
            "   {} MB: {}".format(
                df_faulty.iloc[r, :]["size"], df_faulty.iloc[r, :]["title"]
            )
        )

    l1c_products = df[df.processinglevel == "Level-1C"]
    l2a_products = df[df.processinglevel == "Level-2A"]
    tile_log.info("    {} L1C products".format(l1c_products.shape[0]))
    tile_log.info("    {} L2A products".format(l2a_products.shape[0]))


    rel_orbits = np.unique(l1c_products["relativeorbitnumber"])
    if len(rel_orbits) > 0:
        if l1c_products.shape[0] > max_image_number / len(rel_orbits):
            tile_log.info(
                "Capping the number of L1C products to {}".format(max_image_number)
            )
            tile_log.info(
                "Relative orbits found covering tile: {}".format(rel_orbits)
            )
            tile_log.info("dataspace branch reaches here")
            uuids = []
            for orb in rel_orbits:
                uuids = uuids + list(
                    l1c_products.loc[
                        l1c_products["relativeorbitnumber"] == orb
                    ].sort_values(by=["cloudcoverpercentage"], ascending=True)[
                        "uuid"
                    ][
                        : int(max_image_number / len(rel_orbits))
                    ]
                )
            # keeps least cloudy n (max image number)
            l1c_products = l1c_products[l1c_products["uuid"].isin(uuids)]
            tile_log.info(
                "    {} L1C products remain:".format(l1c_products.shape[0])
            )
            for product in l1c_products["title"]:
                tile_log.info("       {}".format(product))
            tile_log.info(f"len of L1C products for dataspace is {len(l1c_products['title'])}")

    rel_orbits = np.unique(l2a_products["relativeorbitnumber"])
    if len(rel_orbits) > 0:
        if l2a_products.shape[0] > max_image_number / len(rel_orbits):
            tile_log.info(
                "Capping the number of L2A products to {}".format(max_image_number)
            )
            tile_log.info(
                "Relative orbits found covering tile: {}".format(rel_orbits)
            )
            uuids = []
            for orb in rel_orbits:
                uuids = uuids + list(
                    l2a_products.loc[
                        l2a_products["relativeorbitnumber"] == orb
                    ].sort_values(by=["cloudcoverpercentage"], ascending=True)[
                        "uuid"
                    ][
                        : int(max_image_number / len(rel_orbits))
                    ]
                )
            l2a_products = l2a_products[l2a_products["uuid"].isin(uuids)]
            tile_log.info(
                "    {} L2A products remain:".format(l2a_products.shape[0])
            )
            for product in l2a_products["title"]:
                tile_log.info("       {}".format(product))
            tile_log.info(f"len of L2A products for dataspace is {len(l2a_products['title'])}")

    if l1c_products.shape[0] > 0 and l2a_products.shape[0] > 0:
        tile_log.info(
            "Filtering out L1C products that have the same 'beginposition' time stamp as an existing L2A product."
        )
        if download_source == "scihub":
            (l1c_products,l2a_products,) = queries_and_downloads.filter_unique_l1c_and_l2a_data(df,log=tile_log)

        if download_source == "dataspace":
            l1c_products = queries_and_downloads.filter_unique_dataspace_products(l1c_products=l1c_products, l2a_products=l2a_products, log=tile_log)

    df = None
    tile_log.info(f" {len(l1c_products['title'])} L1C products for the Composite")
    tile_log.info(f" {len(l2a_products['title'])} L2A products for the Composite")
    
    tile_log.info("Successfully queried the L1C and L2A products for the Composite")

2024-09-25 14:15:58,025: INFO: ---------------------------------------------------------------
2024-09-25 14:15:58,027: INFO: Creating an initial cloud-free median composite from Sentinel-2 as a baseline map
2024-09-25 14:15:58,027: INFO: ---------------------------------------------------------------
2024-09-25 14:15:58,028: INFO: Searching for images for initial composite.
2024-09-25 14:15:58,029: INFO: Path to the S2 tile geometry information absolute path: /home/cmsstudent/pyeo/geometry/kenya_s2_tiles.shp
2024-09-25 14:16:03,496: INFO: Removed 0 faulty scenes <0MB in size from the list
2024-09-25 14:16:03,500: INFO:     36 L1C products
2024-09-25 14:16:03,501: INFO:     27 L2A products
2024-09-25 14:16:03,501: INFO: Capping the number of L1C products to 3
2024-09-25 14:16:03,502: INFO: Relative orbits found covering tile: [135]
2024-09-25 14:16:03,503: INFO: dataspace branch reaches here
2024-09-25 14:16:03,505: INFO:     3 L1C products remain:
2024-09-25 14:16:03,506: INFO:       

## Search for L2A Images Corresponding to L1C

- The cell below searches our download directory for any existing imagery. If we have downloaded any imagery already, `pyeo` will remove the matching image from our search query.  

- Secondly, if we have opted to use `scihub` as our `download_source`, then `pyeo` searches the Copernicus archive for any corresponding `L2A` products. If it finds a matching L2A product, then it removes the `L1C` counterpart from the query. The `dataspace` option handles this on the server.

In [34]:
if config_dict["build_composite"] or config_dict["do_all"]:
    # Search the local directories, composite/L2A and L1C, checking if scenes have already been downloaded and/or processed whilst checking their dir sizes
    if download_source == "scihub":
        if l1c_products.shape[0] > 0:
            tile_log.info(
                "Checking for already downloaded and zipped L1C or L2A products and"
            )
            tile_log.info("  availability of matching L2A products for download.")
            n = len(l1c_products)
            drop = []
            add = []
            for r in range(n):
                id = l1c_products.iloc[r, :]["title"]
                search_term = (
                    id.split("_")[2]
                    + "_"
                    + id.split("_")[3]
                    + "_"
                    + id.split("_")[4]
                    + "_"
                    + id.split("_")[5]
                )
                tile_log.info(
                    "Searching locally for file names containing: {}.".format(
                        search_term
                    )
                )
                file_list = (
                    [
                        os.path.join(composite_l1_image_dir, f)
                        for f in os.listdir(composite_l1_image_dir)
                    ]
                    + [
                        os.path.join(composite_l2_image_dir, f)
                        for f in os.listdir(composite_l2_image_dir)
                    ]
                    + [
                        os.path.join(composite_l2_masked_image_dir, f)
                        for f in os.listdir(composite_l2_masked_image_dir)
                    ]
                )
                for f in file_list:
                    if search_term in f:
                        tile_log.info("  Product already downloaded: {}".format(f))
                        drop.append(l1c_products.index[r])
                search_term = (
                    "*"
                    + id.split("_")[2]
                    + "_"
                    + id.split("_")[3]
                    + "_"
                    + id.split("_")[4]
                    + "_"
                    + id.split("_")[5]
                    + "*"
                )


                tile_log.info(
                    "Searching on the data hub for files containing: {}.".format(
                        search_term
                    )
                )
                matching_l2a_products = queries_and_downloads._file_api_query(
                    user=sen_user,
                    passwd=sen_pass,
                    start_date=composite_start_date,
                    end_date=composite_end_date,
                    filename=search_term,
                    cloud=cloud_cover,
                    producttype="S2MSI2A",
                )

                matching_l2a_products_df = pd.DataFrame.from_dict(
                    matching_l2a_products, orient="index"
                )
                # 07/03/2023: Matt - Applied Ali's fix for converting product size to MB to compare against faulty_grandule_threshold
                if (
                    len(matching_l2a_products_df) == 1
                    and [
                        float(x[0]) * {"GB": 1e3, "MB": 1, "KB": 1e-3}[x[1]]
                        for x in [matching_l2a_products_df["size"][0].split(" ")]
                    ][0]
                    > faulty_granule_threshold
                ):
                    tile_log.info("Replacing L1C {} with L2A product:".format(id))
                    tile_log.info(
                        "              {}".format(
                            matching_l2a_products_df.iloc[0, :]["title"]
                        )
                    )

                    drop.append(l1c_products.index[r])
                    add.append(matching_l2a_products_df.iloc[0, :])
                if len(matching_l2a_products_df) == 0:
                    pass
                if len(matching_l2a_products_df) > 1:
                    # check granule sizes on the server
                    matching_l2a_products_df["size"] = (
                        matching_l2a_products_df["size"]
                        .str.split(" ")
                        .apply(
                            lambda x: float(x[0])
                            * {"GB": 1e3, "MB": 1, "KB": 1e-3}[x[1]]
                        )
                    )
                    matching_l2a_products_df = matching_l2a_products_df.query(
                        "size >= " + str(faulty_granule_threshold)
                    )
                    if (
                        matching_l2a_products_df.iloc[0, :]["size"]
                        .str.split(" ")
                        .apply(
                            lambda x: float(x[0])
                            * {"GB": 1e3, "MB": 1, "KB": 1e-3}[x[1]]
                        )
                        > faulty_granule_threshold
                    ):
                        tile_log.info("Replacing L1C {} with L2A product:".format(id))
                        tile_log.info(
                            "              {}".format(
                                matching_l2a_products_df.iloc[0, :]["title"]
                            )
                        )
                        drop.append(l1c_products.index[r])
                        add.append(matching_l2a_products_df.iloc[0, :])
            if len(drop) > 0:
                l1c_products = l1c_products.drop(index=drop)
            if len(add) > 0:
                # l2a_products = l2a_products.append(add)
                add = pd.DataFrame(add)
                l2a_products = pd.concat([l2a_products, add])

            tile_log.info("\n Successfully searched for the L2A counterparts for the L1C products for the Composite")
        
    # here, dataspace and scihub derived l1c_products and l2a_products lists are the "same"
    l2a_products = l2a_products.drop_duplicates(subset="title")
    tile_log.info(
        "    {} L1C products remaining for download".format(
            l1c_products.shape[0]
        )
    )
    tile_log.info(
        "    {} L2A products remaining for download".format(
            l2a_products.shape[0]
        )
    )

    tile_log.info("Cell successfully finished")

2024-09-25 14:16:14,853: INFO:     1 L1C products remaining for download
2024-09-25 14:16:14,854: INFO:     3 L2A products remaining for download
2024-09-25 14:16:14,855: INFO: Cell successfully finished


# Download Sentinel-2 Composite Imagery

## Download and Process L1Cs

- From the `log` output above in the previous section, we can see that `pyeo` has found a matching `L2A` image for each of the `L1Cs` in our search query. So now we have only L2As in our search query.  

- If we did have `L1Cs` in our search query, then the cell below would download these L1Cs and apply `atmospheric_correction` using `Sen2Cor`.

In [37]:
if config_dict["build_composite"] or config_dict["do_all"]:
    tile_log.info(f"Path to Sen2Cor is   : {config_dict['sen2cor_path']}")
    # check whether Sen2cor is installed
    if os.path.isdir(config_dict['sen2cor_path']):
        log.info("  Sen2Cor path found.")
        if l1c_products.shape[0] > 0:
            tile_log.info(f"Downloading Sentinel-2 L1C products from {download_source}:")
    
            if download_source == "scihub":
    
                queries_and_downloads.download_s2_data_from_df(
                    l1c_products,
                    composite_l1_image_dir,
                    composite_l2_image_dir,
                    source="scihub",
                    user=sen_user,
                    passwd=sen_pass,
                    try_scihub_on_fail=True
                )
    
            if download_source == "dataspace":
    
                queries_and_downloads.download_s2_data_from_dataspace(
                    product_df=l1c_products,
                    l1c_directory=composite_l1_image_dir,
                    l2a_directory=composite_l2_image_dir,
                    dataspace_username=sen_user,
                    dataspace_password=sen_pass,
                    log=tile_log
                )
            tile_log.info("Atmospheric correction with sen2cor.")
            raster_manipulation.atmospheric_correction(
                composite_l1_image_dir,
                composite_l2_image_dir,
                sen2cor_path,
                delete_unprocessed_image=False,
                log=tile_log,
            )
        tile_log.info("Successfully downloaded the Sentinel-2 L1C products")
    else:
        tile_log.warning("  Sen2Cor path does not exist. Cannot convert L1C to L2A. Skipping download.")

2024-09-25 14:31:10,383: INFO: Path to Sen2Cor is   : /home/cmsstudent/Sen2Cor-02.10.01-Linux64/bin/L2A_Process


## Download L2As

In this subsection, we will download the L2As from our search query.  

But first, let's take a look at what our search query result, `l2a_products` looks like by printing the first 3 rows with `.head(3)`:

In [38]:
if config_dict["build_composite"] or config_dict["do_all"]:
    l2a_products.head(3)

Let's highlight a few columns of interest:  

In the cell output above, we can see the product `uuid` as the dataframe index (*the first column, it has no column name*). These are the unique identifiers used to distinguish the scenes from each other.  

From the `title` column, we can see the titles of each product, the titles themselves show us important information, for example: the Satellite (*S2A or S2B*), the Sensor (*MSI*), the product type (*L2A*), the date the image was captured (*YYYYMMDD*) or the corresponding tile for the image (*TXXXXX*).

We can also see if the product is online or in the Long-Term Archive (`LTA`), by looking at the column `ondemand`, where `false` indicates the product is in the LTA or `true` indicates the product is online and ready for download.

Now, let's download the `L2As` in our search query `l2a_products`, by asking `pyeo` to download these images from the Copernicus archive. If any incomplete downloads are present from a previous run (*remember, pyeo is an iterative download, classification and change detection process*), then `pyeo` will flag these files to the user through the log file.

If the images are in the Long Term Archive (`LTA`), then `pyeo` will linearly activate and wait for the LTA image to become available, before downloading and moving onto the next L2A in the search query.

In [39]:
if config_dict["build_composite"] or config_dict["do_all"]:
    if l2a_products.shape[0] > 0:
        tile_log.info("Downloading Sentinel-2 L2A products.")

        if download_source == "scihub":

            queries_and_downloads.download_s2_data(
                l2a_products.to_dict("index"),
                composite_l1_image_dir,
                composite_l2_image_dir,
                source="scihub",
                user=sen_user,
                passwd=sen_pass,
                try_scihub_on_fail=True,
            )
        if download_source == "dataspace":

            queries_and_downloads.download_s2_data_from_dataspace(
                product_df=l2a_products,
                l1c_directory=composite_l1_image_dir,
                l2a_directory=composite_l2_image_dir,
                dataspace_username=sen_user,
                dataspace_password=sen_pass,
                log=tile_log
            )

    # check for incomplete L2A downloads
    incomplete_downloads, sizes = raster_manipulation.find_small_safe_dirs(
        composite_l2_image_dir, threshold=faulty_granule_threshold * 1024 * 1024
    )
    if len(incomplete_downloads) > 0:
        for index, safe_dir in enumerate(incomplete_downloads):
            if sizes[
                index
            ] / 1024 / 1024 < faulty_granule_threshold and os.path.exists(safe_dir):
                tile_log.warning("Found likely incomplete download of size {} MB: {}".format(
                        str(round(sizes[index] / 1024 / 1024)), safe_dir))

    tile_log.info("---------------------------------------------------------------")
    tile_log.info("Image download for composite is complete.")
    tile_log.info("---------------------------------------------------------------")

2024-09-25 14:31:21,149: INFO: Downloading Sentinel-2 L2A products.
2024-09-25 14:31:21,151: INFO: --------------------------------------------------------------------------------
2024-09-25 14:31:21,152: INFO: Checking 1 of 3 : S2A_MSIL2A_20220126T075211_N0400_R135_T36NXG_20220126T111035.SAFE
2024-09-25 14:31:21,153: INFO: /home/cmsstudent/Desktop/pyeo_data/36NXG/composite/L2A/S2A_MSIL2A_20220126T075211_N0400_R135_T36NXG_20220126T111035.SAFE does not exist.
2024-09-25 14:31:21,154: INFO:     Downloading  : S2A_MSIL2A_20220126T075211_N0400_R135_T36NXG_20220126T111035.SAFE
2024-09-25 14:31:21,774: INFO: response.status_code: 301
2024-09-25 14:31:21,775: INFO: download url = response.headers['Location']: https://catalogue.dataspace.copernicus.eu/odata/v1/Products(b64fb22c-b8c5-5f2a-a23b-5e4ba581512b)/$value
2024-09-25 14:35:22,286: INFO: --------------------------------------------------------------------------------
2024-09-25 14:35:22,287: INFO: Checking 2 of 3 : S2B_MSIL2A_20220511T07

## Housekeeping

The cell below performs some housekeeping if we have told `pyeo` to delete or zip imagery. This functionality is useful for ensuring disk space is kept to a minimum.

In [40]:
if config_dict["build_composite"] or config_dict["do_all"]:
    if config_dict["do_delete"]:
        tile_log.info("---------------------------------------------------------------")
        tile_log.info("Deleting downloaded L1C images for composite, keeping only derived L2A products")
        tile_log.info(
            "---------------------------------------------------------------"
        )
        directory = composite_l1_image_dir
        tile_log.info("Deleting {}".format(directory))
        shutil.rmtree(directory)
        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info("Deletion of L1C images complete. Keeping only L2A images.")
        tile_log.info(
            "---------------------------------------------------------------"
        )
    else:
        if config_dict["do_zip"]:
            tile_log.info("---------------------------------------------------------------")
            tile_log.info("Zipping downloaded L1C images for composite after atmospheric correction")
            tile_log.info("---------------------------------------------------------------")
            filesystem_utilities.zip_contents(composite_l1_image_dir)
            tile_log.info("---------------------------------------------------------------")
            tile_log.info("Zipping complete")
            tile_log.info("---------------------------------------------------------------")

    tile_log.info("Cell successfully finished")

2024-09-25 14:50:18,216: INFO: ---------------------------------------------------------------
2024-09-25 14:50:18,218: INFO: Zipping downloaded L1C images for composite after atmospheric correction
2024-09-25 14:50:18,219: INFO: ---------------------------------------------------------------
2024-09-25 14:50:18,220: INFO: ---------------------------------------------------------------
2024-09-25 14:50:18,220: INFO: Zipping complete
2024-09-25 14:50:18,221: INFO: ---------------------------------------------------------------
2024-09-25 14:50:18,222: INFO: Cell successfully finished


# Process the Downloaded Imagery

Now that we have downloaded the L2A Imagery, we will process the imagery. Processing refers to:  

1. Applying the `SCL Cloud Mask` to remove cloud, haze or cloud shadow pixels from the imagery.
2. Applying a `Processing Baseline Correction Offset` to the imagery, if applicable.
3. Create `Quicklooks` (*.png*) of the processed imagery.

## Apply SCL Cloud Mask

Optical data is affected by the presence of clouds over the land cover of interest. So, we use `apply_scl_cloud_mask` to remove cloudy pixels from the imagery, as we are not interested in clouds.

The cell below peforms two things:

- Checks whether any L2A SAFE files have been cloud masked from a previous run.

- If any L2A SAFE files have not been cloud masked, then `apply_scl_cloud_mask` is applied.

In [None]:
# Check for pre-downloaded Imagery
if config_dict["build_composite"] or config_dict["do_all"]:
    tile_log.info("---------------------------------------------------------------")
    tile_log.info("Applying simple cloud, cloud shadow and haze mask based on SCL files and stacking the masked band raster files.")
    tile_log.info("---------------------------------------------------------------")

    directory = composite_l2_masked_image_dir
    masked_file_paths = [
        f
        for f in os.listdir(directory)
        if f.endswith(".tif") and os.path.isfile(os.path.join(directory, f))
    ]

    directory = composite_l2_image_dir
    l2a_zip_file_paths = [f for f in os.listdir(directory) if f.endswith(".zip")]

    if len(l2a_zip_file_paths) > 0:
        for f in l2a_zip_file_paths:
            # check whether the zipped file has already been cloud masked
            zip_timestamp = filesystem_utilities.get_image_acquisition_time(
                os.path.basename(f)
            ).strftime("%Y%m%dT%H%M%S")
            if any(zip_timestamp in f for f in masked_file_paths):
                continue
            else:
                # extract it if not
                filesystem_utilities.unzip_contents(
                    os.path.join(composite_l2_image_dir, f),
                    ifstartswith="S2",
                    ending=".SAFE",
                )

    directory = composite_l2_image_dir
    l2a_safe_file_paths = [
        f
        for f in os.listdir(directory)
        if f.endswith(".SAFE") and os.path.isdir(os.path.join(directory, f))
    ]

    files_for_cloud_masking = []
    if len(l2a_safe_file_paths) > 0:
        for f in l2a_safe_file_paths:
            # check whether the L2A SAFE file has already been cloud masked
            safe_timestamp = filesystem_utilities.get_image_acquisition_time(
                os.path.basename(f)
            ).strftime("%Y%m%dT%H%M%S")
            if any(safe_timestamp in f for f in masked_file_paths):
                continue
            else:
                # add it to the list of files to do if it has not been cloud masked yet
                files_for_cloud_masking = files_for_cloud_masking + [f]

    # Apply the cloud masks to images
    if len(files_for_cloud_masking) == 0:
        tile_log.info("No L2A images found for cloud masking. They may already have been done.")
    else:
        raster_manipulation.apply_scl_cloud_mask(
            composite_l2_image_dir,
            composite_l2_masked_image_dir,
            scl_classes=[0, 1, 2, 3, 8, 9, 10, 11],
            buffer_size=buffer_size_composite,
            bands=bands,
            out_resolution=out_resolution,
            haze=None,
            epsg=epsg,
            skip_existing=skip_existing,
            log=tile_log
        )

    tile_log.info("Successfully applied the Cloud Masks")

2024-09-25 14:50:26,746: INFO: ---------------------------------------------------------------
2024-09-25 14:50:26,747: INFO: Applying simple cloud, cloud shadow and haze mask based on SCL files and stacking the masked band raster files.
2024-09-25 14:50:26,748: INFO: ---------------------------------------------------------------
2024-09-25 14:50:26,751: INFO: 3 L2A raster files marked for SCL cloud masking.
2024-09-25 14:50:26,751: INFO:   Applying SCL cloud mask to L2A raster file: /home/cmsstudent/Desktop/pyeo_data/36NXG/composite/L2A/S2B_MSIL2A_20220511T074609_N0400_R135_T36NXG_20220511T103313.SAFE
2024-09-25 14:50:26,752: INFO: TMP:  Granule ID  : S2B_MSIL2A_20220511T074609_N0400_R135_T36NXG_20220511T103313
2024-09-25 14:50:26,752: INFO: TMP:  File pattern: S2B_MSIL2A_20220511T074609_N0400_R135_T36NXG
2024-09-25 14:50:26,762: INFO: Merging band rasters into a single file:
2024-09-25 14:50:26,763: INFO:   /home/cmsstudent/Desktop/pyeo_data/36NXG/composite/L2A/S2B_MSIL2A_20220511T0

## Apply Processing Baseline Offset

Before Sentinel-2 imagery is provided to the user as L1C or L2A formats, the raw imagery (L0) are processed by the ESA Copernicus Ground Segment ([see here](https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi/processing-levels)). The algorithms used in the processing baseline, are indicated by the field `N0XXX` in the product title and the changes introduced by each processing baseline iteration are listed [here](https://sentinel.esa.int/web/sentinel/technical-guides/sentinel-2-msi/processing-baseline).

The advent of processing baseline `N0400` introduced an offset of `-1000` in the spectral reflectance values, the reasoning and suggested reading can be viewed [here](https://forum.step.esa.int/t/info-introduction-of-additional-radiometric-offset-in-pb04-00-products/35431). Therefore, to ensure that the spectral reflectance of imagery before and after `N0400` can be compared, we apply the offset correction of `+1000`.

The cell below, applies such an offset correction.

In [None]:
if config_dict["build_composite"] or config_dict["do_all"]:
    tile_log.info("---------------------------------------------------------------")
    tile_log.info("Offsetting cloud masked L2A images for composite.")
    tile_log.info("---------------------------------------------------------------")

    raster_manipulation.apply_processing_baseline_offset_correction_to_tiff_file_directory(
        in_tif_directory = composite_l2_masked_image_dir,
        out_tif_directory = composite_l2_masked_image_dir,
        bands_to_offset_labels = ("B02", "B03", "B04", "B08"),
        bands_to_offset_index = [0, 1, 2, 3],
        BOA_ADD_OFFSET = -1000,
        backup_flag = False,
        log=tile_log
    )

    tile_log.info("---------------------------------------------------------------")
    tile_log.info("Offsetting of cloud masked L2A images for composite complete.")
    tile_log.info("---------------------------------------------------------------")

## Create Quicklooks of Cloud-Masked Images

- We can also create quicklooks of the Cloud-Masked images. These are especially useful for viewing the images quickly using a standard photo viewer, and for use in presentations.

In [28]:
if config_dict["build_composite"] or config_dict["do_all"]:
    if config_dict["do_quicklooks"] or config_dict["do_all"]:
        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info("Producing quicklooks.")
        tile_log.info(
            "---------------------------------------------------------------"
        )
        dirs_for_quicklooks = [composite_l2_masked_image_dir]
        for main_dir in dirs_for_quicklooks:
            files = [
                f.path
                for f in os.scandir(main_dir)
                if f.is_file() and os.path.basename(f).endswith(".tif")
            ]
            # files = [ f.path for f in os.scandir(main_dir) if f.is_file() and os.path.basename(f).endswith(".tif") and "class" in os.path.basename(f) ] # do classification images only
            if len(files) == 0:
                tile_log.warning("No images found in {}.".format(main_dir))
            else:
                for f in files:
                    quicklook_path = os.path.join(
                        quicklook_dir,
                        os.path.basename(f).split(".")[0] + ".png",
                    )
                    tile_log.info("Creating quicklook: {}".format(quicklook_path))
                    raster_manipulation.create_quicklook(
                        in_raster_path = f,
                        out_raster_path = quicklook_path,
                        width = 512,
                        height = 512,
                        format = "PNG",
                        bands = [3, 2, 1],
                        nodata = 0,
                        scale_factors=[[0, 2000, 0, 255]],
                        log=tile_log
                    )
        tile_log.info("Quicklooks complete.")
    else:
        tile_log.info("Quicklook option disabled in ini file.")


    if config_dict["do_zip"]:
        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info(
            "Zipping downloaded L2A images for composite after cloud masking and band stacking"
        )
        tile_log.info(
            "---------------------------------------------------------------"
        )
        filesystem_utilities.zip_contents(composite_l2_image_dir)
        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info("Zipping complete")
        tile_log.info(
            "---------------------------------------------------------------"
        )


2024-08-05 16:56:04,422: INFO: ---------------------------------------------------------------
2024-08-05 16:56:04,424: INFO: Producing quicklooks.
2024-08-05 16:56:04,426: INFO: ---------------------------------------------------------------
2024-08-05 16:56:04,429: INFO: Creating quicklook: Z:\gy7709\36NXG\output\quicklooks\S2A_MSIL2A_20220106T075321_N0301_R135_T36NXG_20220106T112039.png
2024-08-05 16:56:14,801: INFO: Creating quicklook: Z:\gy7709\36NXG\output\quicklooks\S2A_MSIL2A_20220126T075211_NA400_R135_T36NXG_20220126T111035.png
2024-08-05 16:56:28,559: INFO: Creating quicklook: Z:\gy7709\36NXG\output\quicklooks\S2A_MSIL2A_20220307T074801_NA400_R135_T36NXG_20220307T113320.png
2024-08-05 16:56:42,566: INFO: Creating quicklook: Z:\gy7709\36NXG\output\quicklooks\S2A_MSIL2A_20220406T074611_NA400_R135_T36NXG_20220406T102936.png
2024-08-05 16:56:55,636: INFO: Creating quicklook: Z:\gy7709\36NXG\output\quicklooks\S2A_MSIL2A_20220516T074621_NA400_R135_T36NXG_20220516T124012.png
2024-08

# Create Composite from the Baseline Imagery

Now we come to the last section of Tutorial Section 2. Previously, we have queried the Copernicus archive for Sentinel-2 images that matched our search criteria, we evaluated which L2A products were present in the archive to avoid unecessary processing from pyeo for conversion from L1C to L2A. We then downloaded the resulting imagery, applied a cloud mask and a baseline offset correction, if necessary. 

## Create the Image Composite

In [16]:
if config_dict["build_composite"] or config_dict["do_all"]:
    with warnings.catch_warnings():
        warnings.simplefilter("ignore", category=RuntimeWarning)

        tile_log.info("---------------------------------------------------------------")
        tile_log.info(
            "Building initial cloud-free median composite from directory {}".format(
                composite_l2_masked_image_dir
            )
        )
        tile_log.info("---------------------------------------------------------------")
        directory = composite_l2_masked_image_dir
        masked_file_paths = [
            f
            for f in os.listdir(directory)
            if f.endswith(".tif") and os.path.isfile(os.path.join(directory, f))
        ]

        if len(masked_file_paths) > 0:
            raster_manipulation.clever_composite_directory(
                composite_l2_masked_image_dir,
                composite_dir,
                chunks=config_dict["chunks"],
                generate_date_images=True,
                missing_data_value=0,
                log=tile_log
            )
            tile_log.info("---------------------------------------------------------------")
            tile_log.info("Baseline composite complete.")
            tile_log.info("---------------------------------------------------------------")

2024-08-05 17:32:14,907: INFO: ---------------------------------------------------------------
2024-08-05 17:32:14,909: INFO: Building initial cloud-free median composite from directory Z:\gy7709\36NXG\composite\cloud_masked
2024-08-05 17:32:14,911: INFO: ---------------------------------------------------------------
2024-08-05 17:32:14,918: INFO: Cleverly compositing all images in directory into a median composite: Z:\gy7709\36NXG\composite\cloud_masked
2024-08-05 17:32:14,922: INFO: Image number 1 has time stamp 20220106T075321
2024-08-05 17:32:14,923: INFO:   File: Z:\gy7709\36NXG\composite\cloud_masked\S2A_MSIL2A_20220106T075321_N0301_R135_T36NXG_20220106T112039.tif
2024-08-05 17:32:14,924: INFO: Image number 2 has time stamp 20220111T075209
2024-08-05 17:32:14,925: INFO:   File: Z:\gy7709\36NXG\composite\cloud_masked\S2B_MSIL2A_20220111T075209_N0301_R135_T36NXG_20220111T102011.tif
2024-08-05 17:32:14,925: INFO: Image number 3 has time stamp 20220126T075211
2024-08-05 17:32:14,927

## Create Quicklook of the Composite

In [17]:
if config_dict["build_composite"] or config_dict["do_all"]:
    if config_dict["do_quicklooks"] or config_dict["do_all"]:
        tile_log.info("---------------------------------------------------------------")
        tile_log.info("Producing quicklooks.")
        tile_log.info("---------------------------------------------------------------")
        dirs_for_quicklooks = [composite_dir]
        for main_dir in dirs_for_quicklooks:
            files = [
                f.path
                for f in os.scandir(main_dir)
                if f.is_file() and os.path.basename(f).endswith(".tif")
            ]
            if len(files) == 0:
                tile_log.warning("No images found in {}.".format(main_dir))
            else:
                for f in files:
                    quicklook_path = os.path.join(
                        quicklook_dir,
                        os.path.basename(f).split(".")[0] + ".png",
                    )
                    tile_log.info(
                        "Creating quicklook: {}".format(quicklook_path)
                    )
                    raster_manipulation.create_quicklook(
                        in_raster_path = f,
                        out_raster_path = quicklook_path,
                        width = 512,
                        height = 512,
                        format = "PNG",
                        bands = [3, 2, 1],
                        nodata = 0,
                        scale_factors=[[0, 2000, 0, 255]],
                        log=tile_log
                    )
        tile_log.info("Quicklooks complete.")


2024-08-05 18:27:04,903: INFO: ---------------------------------------------------------------
2024-08-05 18:27:04,905: INFO: Producing quicklooks.
2024-08-05 18:27:04,906: INFO: ---------------------------------------------------------------
2024-08-05 18:27:04,908: INFO: Creating quicklook: Z:\gy7709\36NXG\output\quicklooks\composite_T36NXG_20221202T075301.png
2024-08-05 18:27:18,571: INFO: Quicklooks complete.


## Final Housekeeping

Now that we have created our composite and produced any quicklooks, we tell `pyeo` to delete or compress the cloud-masked L2A images that the composite was derived from.

In [None]:
if config_dict["build_composite"] or config_dict["do_all"]:
    if config_dict["do_quicklooks"] or config_dict["do_all"]:
        if config_dict["do_delete"]:
            tile_log.info(
                "---------------------------------------------------------------"
            )
            tile_log.info(
                "Deleting intermediate cloud-masked L2A images used for the baseline composite"
            )
            tile_log.info(
                "---------------------------------------------------------------"
            )
            f = composite_l2_masked_image_dir
            tile_log.info("Deleting {}".format(f))
            shutil.rmtree(f)
            tile_log.info(
                "---------------------------------------------------------------"
            )
            tile_log.info("Intermediate file products have been deleted.")
            tile_log.info("They can be reprocessed from the downloaded L2A images.")
            tile_log.info(
                "---------------------------------------------------------------"
            )
        else:
            if config_dict["do_zip"]:
                tile_log.info(
                    "---------------------------------------------------------------"
                )
                tile_log.info(
                    "Zipping cloud-masked L2A images used for the baseline composite"
                )
                tile_log.info(
                    "---------------------------------------------------------------"
                )
                filesystem_utilities.zip_contents(composite_l2_masked_image_dir)
                tile_log.info(
                    "---------------------------------------------------------------"
                )
                tile_log.info("Zipping complete")
                tile_log.info(
                    "---------------------------------------------------------------"
                )

        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info(
            "Compressing tiff files in directory {} and all subdirectories".format(
                composite_dir
            )
        )
        tile_log.info(
            "---------------------------------------------------------------"
        )
        for root, dirs, files in os.walk(composite_dir):
            all_tiffs = [
                image_name for image_name in files if image_name.endswith(".tif")
            ]
            for this_tiff in all_tiffs:
                raster_manipulation.compress_tiff(
                    os.path.join(root, this_tiff), 
                    os.path.join(root, this_tiff),
                    tile_log
                )

        tile_log.info(
            "---------------------------------------------------------------"
        )
        tile_log.info(
            "Baseline image composite, file compression, zipping and deletion of"
        )
        tile_log.info("intermediate file products (if selected) are complete.")
        tile_log.info(
            "---------------------------------------------------------------"
        )

2024-08-05 18:28:24,179: INFO: ---------------------------------------------------------------
2024-08-05 18:28:24,179: INFO: Compressing tiff files in directory Z:\gy7709\36NXG\composite and all subdirectories
2024-08-05 18:28:24,179: INFO: ---------------------------------------------------------------
2024-08-05 18:28:24,222: INFO: GeoTiff file is already LZW compressed: Z:\gy7709\36NXG\composite\composite_T36NXG_20221202T075301.tif
2024-08-05 18:28:24,244: INFO: GeoTiff file: Z:\gy7709\36NXG\composite\cloud_masked\S2A_MSIL2A_20220106T075321_N0301_R135_T36NXG_20220106T112039.tif
2024-08-05 18:28:24,245: INFO: Current compression: None
2024-08-05 18:28:24,246: INFO: Compressing
2024-08-05 18:28:55,752: INFO: GeoTiff file is already LZW compressed: Z:\gy7709\36NXG\composite\cloud_masked\S2A_MSIL2A_20220126T075211_NA400_R135_T36NXG_20220126T111035.tif
2024-08-05 18:28:55,783: INFO: GeoTiff file is already LZW compressed: Z:\gy7709\36NXG\composite\cloud_masked\S2A_MSIL2A_20220307T074801