# Introduction

Concatenates and exports VNP46A1 HDF5 files that are spatially adjacent in the longitudinal direction. Used in cases when your study area bounding box intersects two VNP46A1 grid cells (.e.g. `VNP46A1.A2020001.h30v05.001.2020004003738.h5` and `VNP46A1.A2020001.h31v05.001.2020004003738.h5`)

This Notebook uses the following folder structure:

* `nighttime-radiance/`
    * `01-code-scripts/`
    * `02-raw-data/`
    * `03-processed-data/`
    * `04-graphics-outputs/`
    * `05-papers-writings/`

Running the Notebook from the `01-code-scripts/` folder works by default. If the Notebook runs from a different folder, the paths in the environment setup section may have to be changed.

This notebook uses files that have alrady been preprocessed and saved to GeoTiff files.

# Environment Setup

In [None]:
# Load Notebook formatter
%load_ext nb_black
# %reload_ext nb_black

In [None]:
# Import packages
import os
import warnings
import glob
import datetime as dt
import pandas as pd
import rasterio as rio
from rasterio.transform import from_origin
import viirs

In [None]:
# Set options
warnings.simplefilter("ignore")

In [None]:
# Set working directory
os.chdir("..")

# User-Defined Variables

In [None]:
# Define path to folder containing preprocessed VNP46A1 GeoTiff files
geotiff_input_folder = os.path.join(
    "03-processed-data", "raster", "south-korea", "vnp46a1-grid"
)

# Defne path to output folder to store concatenated, exported GeoTiff files
geotiff_output_folder = os.path.join(
    "03-processed-data", "raster", "south-korea", "vnp46a1-grid-concatenated"
)

# Data Preprocessing

Workflow:

* Get list of GeoTiff files
* Get list of dates to process
* Get bounding box of left-most file
* Get bounding box of right-most-file
* Specify the h values (30 and 31?)
* For each GeoTiff File (loop through dates and glob based on date, DONT loop through files, less efficient)
    * Extract transform (or get bounding box)
    * For each date
        * If date matches date in file name
            * Read geotiff into NumPy array
            * Add array to list (should result in list of 2 files, same vertical, different horizontal)
        * Sort list of 2 (so that left-most horizontal will be indexed at 0
        * Add list of two to dictionary, indexed by [Year][Month][Day] or [YearMonthDay]


* For each dictionary key/value
    * NumPy concatenate across the 1-axis (horizontal)
    * Store concatenated array in new dictionary, indexed by [Year][Month][Day] or [YearMonthDay]
    
    
* Create transform from bounding box
* Set export file name 
    * `vnp46a1-a2020001-h3031v05-001-2020004003738.tif`
* Create metadata
* Export
    
What about cases where there is only 1 image?


In [None]:
# Get list of dates from a range (test with single date)
dates = [
    dt.datetime.strftime(date, "%Y%m%d")
    for date in pd.date_range(start="2020-01-01", end="2020-01-05")
]

dates

In [None]:
def read_geotiff_into_array(geotiff_path, dimensions=1):
    """Reads a GeoTif file into a NumPy array.

    Parameters
    ----------
    geotiff_path : str
        Path to the GeoTiff file.

    dimensions : int, optional
        Number of bands to read in. Default value is 1.

    Returns
    -------
    array : numpy array
        Array containing the data.

    Example
    -------
        >>>
        >>>
        >>>
        >>>
    """
    # Read-in array
    with rio.open(geotiff_path) as src:
        array = src.read(dimensions)

    return array

In [None]:
def extract_geotiff_bounding_box(geotiff_path):
    """Extracts the bounding box from a GeoTiff file.

    Parameters
    ----------
    geotiff_path : str
        Path to the GeoTiff file.

    Returns
    -------
    bounding_box : rasterio.coords.BoundingBox
        Bounding box for the GeoTiff

    Example
    -------
        >>>
        >>>
        >>>
        >>>
    """
    # Extract bounding box
    with rio.open(geotiff_path) as src:
        bounding_box = src.bounds

    return bounding_box

In [None]:
def extract_date_vnp46a1(geotiff_path):
    """Extracts the file date from a preprocessed VNP46A1 GeoTiff.
    
    Parameters
    ----------
    geotiff_path : str
        Path to the GeoTiff file.
    
    Returns
    -------
    date : str
        Acquisition date of the preprocessed VNP46A1 GeoTiff.
    
    Example
    -------
        >>>
        >>>
        >>>
        >>>
    """
    # Get date (convert YYYYJJJ to YYYYMMDD)
    date = dt.datetime.strptime(os.path.basename(file)[9:16], "%Y%j").strftime(
        "%Y%m%d"
    )

    return date

In [None]:
def create_concatenated_export_name(west_image_path, east_image_path):
    """Creates a file name indicating the concatenation of adjacent two files.
    
    Paramaters
    ----------
    west_image_path : str
        Path to the West-most image.
        
    east_image_past : str
        Path to the East-most image.
        
    Returns
    -------
    export_name : str
        New file name for export, indicating concatenation.
    
    Example
    -------
        >>>
        >>>
        >>>
        >>>
    """
    # Extract the horizontal grid numbers from the West and East images
    west_image_horizontal_grid_number, east_image_horizontal_grid_number = (
        os.path.basename(west_image_path)[18:20],
        os.path.basename(east_image_path)[18:20],
    )

    # Replace the specific date/time with YYYYJJJ and the single horizontal
    #  grid number with both the West and East numbers
    export_name = (
        os.path.basename(west_image_path)
        .replace(os.path.basename(west_image_path)[35:41], "")
        .replace(
            west_image_horizontal_grid_number,
            west_image_horizontal_grid_number
            + east_image_horizontal_grid_number,
        )
    )

    return export_name

In [None]:
def concatenate_preprocessed_vnp46a1(
    west_geotiff_path, east_geotiff_path, output_folder
):
    """Concatenates horizontally-adjacent preprocessed VNP46A1 GeoTiff 
    file and exports the concatenated array to a single GeoTiff.
    
    Paramaters
    ----------
    west_geotiff_path : str
        Path to the West-most GeoTiff.
        
    east_geotiff_path : str
        Path to the East-most GeoTiff.
        
    output_folder : str
        Path to the folder where the concatenated file will be 
        exported to.

    Returns
    -------
    message : str
        Indication of concatenation completion status (success 
        or failure).

    Example
    -------
        >>>
        >>>
        >>>
        >>>
    """
    # Concatenate adjacent VNP46A1 GeoTiff files
    print(
        (
            f"Started concatenating:\n    "
            f"{os.path.basename(west_geotiff_path)}\n    "
            f"{os.path.basename(east_geotiff_path)}"
        )
    )

    try:
        print("Concatenating West and East arrays...")
        # Concatenate West and East images along the 1-axis
        concatenated = np.concatenate(
            (
                read_geotiff_into_array(geotiff_path=west_geotiff_path),
                read_geotiff_into_array(geotiff_path=east_geotiff_path),
            ),
            axis=1,
        )

        print("Getting bounding box information...")
        # Get bounding box (left, top, bottom) from west image and
        #  (right) from east image
        longitude_min = extract_geotiff_bounding_box(
            geotiff_path=west_geotiff_path
        ).left
        longitude_max = extract_geotiff_bounding_box(
            geotiff_path=east_geotiff_path
        ).right
        latitude_min = extract_geotiff_bounding_box(
            geotiff_path=west_geotiff_path
        ).bottom
        latitude_max = extract_geotiff_bounding_box(
            geotiff_path=west_geotiff_path
        ).top

        print("Creating transform...")
        # Set transform (west bound, north bound, x cell size, y cell size)
        concatenated_transform = from_origin(
            longitude_min,
            latitude_max,
            (longitude_max - longitude_min) / concatenated.shape[1],
            (latitude_max - latitude_min) / concatenated.shape[0],
        )

        print("Creating metadata...")
        # Create metadata for GeoTiff export
        metadata = viirs.create_metadata(
            array=concatenated,
            transform=concatenated_transform,
            driver="GTiff",
            nodata=np.nan,
            count=1,
            crs="epsg:4326",
        )

        print("Setting file export name...")
        # Get name for the exported file
        export_name = create_concatenated_export_name(
            west_image_path=west_geotiff_path,
            east_image_path=east_geotiff_path,
        )

        print("Exporting to GeoTiff...")
        # Export concatenated array
        viirs.export_array(
            array=concatenated,
            output_path=os.path.join(output_folder, export_name),
            metadata=metadata,
        )
    except Exception as error:
        message = print(f"Concatenating failed: {error}")
    else:
        message = print(
            (
                f"Completed concatenating:\n    "
                f"{os.path.basename(west_geotiff_path)}\n    "
                f"{os.path.basename(east_geotiff_path)}\n\n"
            )
        )

    return message

In [None]:
# Loop through all dates specified GeoTiff files
for date in dates:
    adjacent_images = []
    for file in glob.glob(os.path.join(geotiff_input_folder, "*.tif")):
        # Add images to list where the looped date matches the file date
        if date in extract_date_vnp46a1(geotiff_path=file):
            adjacent_images.append(file)
    # Ensure index 0 is West-most image and index 1 is East-most image
    adjacent_images_sorted = sorted(adjacent_images)
    # Concatenate and export to GeoTiff if there are two images (same date)
    if len(adjacent_images_sorted) is 2:
        concatenate_preprocessed_vnp46a1(
            west_geotiff_path=adjacent_images_sorted[0],
            east_geotiff_path=adjacent_images_sorted[1],
            output_folder=geotiff_output_folder,
        )

In [None]:
# Concatenate same-day images
for date in dates:
    adjacent_images = []
    for file in glob.glob(os.path.join(geotiff_input_folder, "*.tif")):
        # Extract acquisition date
        file_date = extract_date_vnp46a1(geotiff_path=file)
        # Add adjacent images to the list
        if date in file_date:
            adjacent_images.append(file)
    # Ensure index 0 is left-most image and index 1 is right-most image
    adjacent_images_sorted = sorted(adjacent_images)
    # Concatenate if there are two images
    if len(adjacent_images_sorted) is 2:
        # Concatenate West and East images along the 1-axis
        concatenated = np.concatenate(
            (
                read_geotiff_into_array(
                    geotiff_path=adjacent_images_sorted[0]
                ),
                read_geotiff_into_array(
                    geotiff_path=adjacent_images_sorted[1]
                ),
            ),
            axis=1,
        )

        # Get bounding box (left, top, bottom) from west image and
        #  (right) from east image
        longitude_min = extract_geotiff_bounding_box(
            geotiff_path=adjacent_images_sorted[0]
        ).left
        longitude_max = extract_geotiff_bounding_box(
            geotiff_path=adjacent_images_sorted[1]
        ).right
        latitude_min = extract_geotiff_bounding_box(
            geotiff_path=adjacent_images_sorted[0]
        ).bottom
        latitude_max = extract_geotiff_bounding_box(
            geotiff_path=adjacent_images_sorted[0]
        ).top

        # Set transform (west bound, north bound, x cell size, y cell size)
        concatenated_transform = from_origin(
            longitude_min,
            latitude_max,
            (longitude_max - longitude_min) / concatenated.shape[1],
            (latitude_max - latitude_min) / concatenated.shape[0],
        )

        # Create metadata for GeoTiff export
        metadata = viirs.create_metadata(
            array=concatenated,
            transform=concatenated_transform,
            driver="GTiff",
            nodata=np.nan,
            count=1,
            crs="epsg:4326",
        )

        # Get name for the exported file
        export_name = create_concatenated_export_name(
            west_image_path=adjacent_images_sorted[0],
            east_image_path=adjacent_images_sorted[1],
        )

        # Export concatenated array
        viirs.export_array(
            array=concatenated,
            output_path=os.path.join(geotiff_output_folder, export_name),
            metadata=metadata,
        )

In [None]:
# Identify same-day images
for date in dates:
    adjacent_images = []
    for file in glob.glob(os.path.join(geotiff_input_folder, "*.tif")):
        # Extract file name
        #         file_name = os.path.basename(file)
        # Extract file date (Julian day to YYYYMMDD)
        file_date = dt.datetime.strptime(
            os.path.basename(file)[9:16], "%Y%j"
        ).strftime("%Y%m%d")
        # Add adjacent images to the list
        if date in file_date:
            adjacent_images.append(file)
    # Ensure index 0 is left-most image and index 1 is right-most image
    adjacent_images_sorted = sorted(adjacent_images)
    # Concatenate if there are two images
    if len(adjacent_images_sorted) is 2:
        # Read left and right images into arrays - MAKE THIS A FUNCTION
        with rio.open(adjacent_images_sorted[0]) as src:
            left_image_arr = src.read(1)
            left_image_bounds = src.bounds
            left_image_metadata = src.meta
        with rio.open(adjacent_images_sorted[1]) as src:
            right_image_arr = src.read(1)
            right_image_bounds = src.bounds
            right_image_metadata = src.meta
            
        # Concatenate image along the 1-axis
        concatenated = np.concatenate(
            (left_image_arr, right_image_arr), axis=1
        )

        # Get bounding box (left, top, bottom) from left image and (right) from right image
        longitude_min, longitude_max, latitude_min, latitude_max = (
            left_image_bounds.left,
            right_image_bounds.right,
            left_image_bounds.bottom,
            left_image_bounds.top,
        )

        # Get number of rows and columns
        num_rows, num_columns = concatenated.shape[0], concatenated.shape[1]

        # Set transform (top-left corner, cell size)
        concatenated_transform = from_origin(
            longitude_min,
            latitude_max,
            (longitude_max - longitude_min) / num_columns,
            (latitude_max - latitude_min) / num_rows,
        )

        # Create metadata for GeoTiff export
        metadata = viirs.create_metadata(
            array=concatenated,
            transform=concatenated_transform,
            driver="GTiff",
            nodata=np.nan,
            count=1,
            crs="epsg:4326",
        )

        # Export array - MAKE THIS A FUNCTION
        # Extract the horizontal grid numbers from the left and right images
        left_image_horizontal_grid, right_image_horizontal_grid = (
            os.path.basename(adjacent_images[0])[18:20],
            os.path.basename(adjacent_images[1])[18:20],
        )
        # Replace the single horizontal grid number with both left and right
        export_name = (
            os.path.basename(adjacent_images[0])
            .replace(os.path.basename(adjacent_images[0])[35:41], "")
            .replace(
                left_image_horizontal_grid,
                left_image_horizontal_grid + right_image_horizontal_grid,
            )
        )
        
        # Export concatenated array
        viirs.export_array(
            array=concatenated,
            output_path=os.path.join(geotiff_output_folder, export_name),
            metadata=metadata,
        )

Use the below structure for these files. Main preprocessing function with helper functions that act on each file (in this case, each pair of files acquired on the same date.

In [None]:
# Preprocess each HDF5 file (extract bands, mask for fill values, clouds, and
#  sensor problems, fill masked values with NaN, export to GeoTiff)
hdf5_files = glob.glob(os.path.join(hdf5_input_folder, "*.h5"))
processed_files = 0
total_files = len(hdf5_files)
for hdf5 in hdf5_files:
    viirs.preprocess_vnp46a1(
        hdf5_path=hdf5, output_folder=geotiff_output_folder
    )
    processed_files += 1
    print(f"Preprocessed file: {processed_files} of {total_files}\n\n")

# Notes and References

**File download:**

VNP46A1 HDF5 files were first downloaded using the `01-code-scripts/download_laads_order.py` script. This script requires a user to have a valid [NASA Earthdata](https://urs.earthdata.nasa.gov/) account and have placed an order for files.

<br>

**Useful links:**

* [VNP46A1 Product Information](https://ladsweb.modaps.eosdis.nasa.gov/missions-and-measurements/products/VNP46A1/)
* [VIIRS Black Marble User Guide](https://viirsland.gsfc.nasa.gov/PDF/VIIRS_BlackMarble_UserGuide.pdf)
* [NASA Earthdata Scripts](https://git.earthdata.nasa.gov/projects/LPDUR/repos/nasa-viirs/browse/scripts)

<br>

**File naming convention:**

VNP46A1.AYYYYDDD.hXXvYY.CCC.YYYYDDDHHMMSS.h5

* VNP46A1 = Short-name
* AYYYYDDD = Acquisition Year and Day of Year
* hXXvYY = Tile Identifier (horizontalXXverticalYY)
* CCC = Collection Version
* YYYYDDDHHMMSS = Production Date – Year, Day, Hour, Minute, Second
* h5 = Data Format (HDF5)

<br>

**Bands of interest (User Guide pp. 12-13):**

| Scientific Dataset          | Units             | Description            | Bit Types               | Fill Value | Valid Range | Scale Factor | Offset |
|:-----------------------------|:-------------------|:------------------------|:-------------------------|:------------|:-------------|:--------------|:--------|
| DNB_At_Sensor_Radiance_500m | nW_per_cm2_per_sr | At-sensor DNB radiance | 16-bit unsigned integer | 65535      | 0 - 65534   | 0.1          | 0.0    |
| QF_Cloud_Mask               | Unitless          | Cloud mask status      | 16-bit unsigned integer | 65535      | 0 - 65534   | N/A          | N/A    |
| QF_DNB                      | Unitless          | DNB_quality flag       | 16-bit unsigned integer | 65535      | 0 - 65534   | N/A          | N/A    |
| UTC_Time                    | Decimal hours     | UTC Time               | 32-bit floating point   | -999.9     | 0 - 24      | 1.0          | 0.0    |

<br>

**Masking Criteria/Workflow:**

* mask where DNB_At_Sensor_Radiance_500m == 65535
* mask where QF_Cloud_Mask == 2 (Probably Cloudy)
* mask where QF_Cloud_Mask == 3 (Confident Cloudy)
* mask where QF_DNB != 0 (0 = no problems, any other number means some kind of issue)

<br>

**Preprocessing Workflow:**

* Extract bands
* Mask for fill values
* Mask for clouds
* Mask for sensor problems
* Fill masked values
* Create transform
* Create metadata
* Export array to GeoTiff

<br>

**QF_Cloud_Mask (base-2) (User Guide p. 14):**

| Bit | Flag Description Key                          | Interpretation                                                                            |
|:-----|:-----------------------------------------------|:-------------------------------------------------------------------------------------------|
| 0   | Day/Night                                     | 0 = Night <br> 1 = Day                                                                         |
| 1-3 | Land/Water Background                         | 000 = Land & Desert <br> 001 = Land no Desert <br> 010 = Inland Water <br> 011 = Sea Water <br> 101 = Coastal |
| 4-5 | Cloud Mask Quality                            | 00 = Poor <br> 01 = Low <br> 10 = Medium <br> 11 = High                                                  |
| 6-7 | Cloud Detection Results & Confidence Indicator | 00 = Confident Clear <br> 01 = Probably Clear <br> 10 = Probably Cloudy <br> 11 = Confident Cloudy     |
| 8   | Shadow Detected                               | 1 = Yes <br> 0 = No                                                                            |
| 9   | Cirrus Detection (IR) (BTM15 –BTM16)          | 1 = Cloud <br> 0 = No Cloud                                                                   |
| 10  | Snow/Ice Surface                              | 1 = Snow/Ice <br> 0 = No Snow/Ice                                                            |

<br>

**QF_Cloud_Mask (base-10) (Adapted from User Guide p. 14):**

| Bit | Flag Description Key                          | Interpretation                                                                            |
|:-----|:-----------------------------------------------|:-------------------------------------------------------------------------------------------|
| 0   | Day/Night                                     | 0 = Night <br> 1 = Day                                                                         |
| 1-3 | Land/Water Background                         | 0 = Land & Desert <br> 1 = Land no Desert <br> 2 = Inland Water <br> 3 = Sea Water <br> 5 = Coastal |
| 4-5 | Cloud Mask Quality                            | 0 = Poor <br> 1 = Low <br> 2 = Medium <br> 3 = High                                                  |
| 6-7 | Cloud Detection Results & Confidence Indicator | 0 = Confident Clear <br> 1 = Probably Clear <br> 2 = Probably Cloudy <br> 3 = Confident Cloudy     |
| 8   | Shadow Detected                               | 1 = Yes <br> 0 = No                                                                            |
| 9   | Cirrus Detection (IR) (BTM15 –BTM16)          | 1 = Cloud <br> 0 = No Cloud                                                                   |
| 10  | Snow/Ice Surface                              | 1 = Snow/Ice <br> 0 = No Snow/Ice                                                            |

<br>

**QF_DNB (base-10) (User Guide pp. 14-15)**:

| Science Data Set | Flag Mask Values and Descriptions|
|:-----------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| QF_DNB    | 1 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = Substitute_Cal<br>2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = Out_of_Range<br>4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = Saturation<br>8&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = Temp_not_Nominal<br>16&nbsp;&nbsp;&nbsp;&nbsp; = Stray_Light<br>256&nbsp;&nbsp; = Bowtie_Deleted/Range_Bit<br>512&nbsp;&nbsp; = Missing_EV<br>1024 = Cal_Fail<br>2048 = Dead_Detector |