The purpose of this notebook is to define and test the functions needed for the MODIS class for pre-processing the data. Having explored torchgeo, I have found that the preprocessing must be seperated from the torch work due to limitations on the RasterDataset class. Much of the preprocessing functionality can be taken from the Jupyter Notebooks already written.

Base Class: MODIS dataset <br>
Subclasses: Julian Day and Confidence Level 

Attributes required:<br>
    - 'root': where we can find all of the MODIS files (in practice a CEDA location accessed via JASMIN) - DONE<br>
    <br>
Methods required: <br>
    - extract data: look in root dir and form dataset object from files - DONE <br>
    - crop data: using a shape file, crop the dataset spatially - TODO <br>
    - plot data: create plots for the current dataset - TODO <br>
    - save data: save the altered files back in their tif format so torchgeo is ready to use - TODO<br>
    

In [3]:
import os
import re
import glob

import rioxarray as rxr
import xarray as xr
import matplotlib as mpl
import matplotlib.pylab as plt
import numpy as np
import fiona

## Base Class Specification 

In [162]:
class PP_ModisFireCCI():
    """Abstract base class for the preprocessing of all MODIS Fire CCI Burned Area datasets. This class should
       not be called directly - instead use the subclass objects.

    `MODIS Fire_cci Burned Area Dataset: <https://geogra.uah.es/fire_cci/firecci51.php>`_
    This dataset was developed by ESA, utilising the MODIS satellite. The dataset contains
    information at both PIXEL (~250m) and GRID (0.25 degrees) resolutions. A variety of 
    useful information is contained within the datasets including Julian Day of burn, 
    confidence level of burn etc.
    For more information, see:
    * `User Guide
      <https://climate.esa.int/media/documents/Fire_cci_D4.2_PUG-MODIS_v1.0.pdf>`_
    """
    
    #: Root directory (in CEDA) where the MODIS files can be found.
    root = None
    
    #: Glob expression used to search for files.
    filename_glob = None
    
    #: Regular expression used to extract date from filename.
    filename_regex = "(?P<date>\d{6})\S{33}(?P<tile_number>\d).*"

    #: Date format string used to parse date from filename.
    date_format = "%Y%m"
    
    #: DataArray used to store the relevant MODIS data.
    data = None
    
    
    def __init__(
        self,
        root: str,
    ) -> None:
        """Initialize a new Preprocessing instance.
        Args:
            root: root directory where dataset can be found
        Raises:
            FileNotFoundError: if no files are found in ``root``
        """
        
        # Set the root attribute to that passed into the constructor
        self.root = root
      
    
    def extract_data(self):
        """Extract the relevant data from the root directory and store it within the object using a DataArray"""
        
        # Populate a list of DataArrays using the seperate files
        dataArrays = []
        pathname = os.path.join(self.root, "**", self.filename_glob)
        filename_regex = re.compile(self.filename_regex, re.VERBOSE)
        
        # For each file found, index the data using the date it corresponds to
        for filepath in glob.iglob(pathname, recursive=True):
            match = re.match(self.filename_regex, os.path.basename(filepath))
            data = rxr.open_rasterio(filepath)
            data = data.assign_coords(date=match.group("date"))
            dataArrays.append(data)
            
        # Finally concatenate the DataArrays and store the result in the object     
        self.data = xr.concat(dataArrays, dim='date')
    
    

## Julian Day Implementation 

In [163]:
class PP_Modis_JD(PP_ModisFireCCI):
    """
    Preprocessing class for the burn day (in Julian Days) that a burned area is first seen on.
    
    Possible values (mask not image): 
        -2 = pixel not of burnable type e.g. water, urban areas or permanent snow/ice.
        -1 = pixel not observed in the month (possible cloud cover etc)
         0  = pixel is not burned 
        [1,366] = Julian Day of first detection when the pixel is burned 
    """

    filename_glob = "*JD.tif" 
    

In [164]:
test_obj = PP_Modis_JD("Modis Data")

In [165]:
test_obj.extract_data()

In [166]:
print(test_obj.data)

<xarray.DataArray (date: 3, band: 1, y: 25827, x: 35178)>
array([[[[-1, -1, -1, ..., -2, -2, -2],
         [-1, -1, -1, ..., -2, -2, -2],
         [-1, -1, -1, ..., -2, -2, -2],
         ...,
         [-2, -2, -2, ..., -2, -2, -2],
         [-2, -2, -2, ..., -2, -2, -2],
         [-2, -2, -2, ..., -2, -2, -2]]],


       [[[-1, -1, -1, ..., -2, -2, -2],
         [-1, -1, -1, ..., -2, -2, -2],
         [-1, -1, -1, ..., -2, -2, -2],
         ...,
         [-2, -2, -2, ..., -2, -2, -2],
         [-2, -2, -2, ..., -2, -2, -2],
         [-2, -2, -2, ..., -2, -2, -2]]],


       [[[-1, -1, -1, ..., -2, -2, -2],
         [-1, -1, -1, ..., -2, -2, -2],
         [-1, -1, -1, ..., -2, -2, -2],
         ...,
         [-2, -2, -2, ..., -2, -2, -2],
         [-2, -2, -2, ..., -2, -2, -2],
         [-2, -2, -2, ..., -2, -2, -2]]]], dtype=int16)
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 -26.0 -26.0 -26.0 -25.99 ... 52.99 52.99 53.0 53.0
  * y            (y) float64 83

## Confidence Level Implementation 