#Introduction
**MDR_layer_extractorV2** program utilizes the function to extract subsets from multiple multidimensional raster datasets. The program saves these to a designated directory, facilitating the analysis and management of specific raster layers, by optimizing storage. The program, utilizies user inputs and generates a new file containing the band numbers that comprise a raster. This feature facilitates efficient file organization and analysis.


---
The following libraries allow the program to run:
> * **arcpy** - Allows to run all train and test scripts in an interactive environment.
> * **os** - The directory service provides the ability to create and remove directory folders, gather data, change and find the current directory, and provide a means for users and the operating system to interact with each other.
> * **Pandas** -  Used to model, analyze, and manipulate data sets.
> * **re** - regular expression (or **re**) uses a set of strings or searches for certain patterns.

In [None]:
# Libraries used
import arcpy
import os
import pandas as pd
import re

- **acrpy.env.overwriteOutput** establishes the geodatabase as the base workspace for all subsequent operations.
It sets the environment in ArcGIS to overwrite existing outputs with the same name.

In [None]:
# Allows the file to be written over
arcpy.env.overwriteOutput = True

The user is asked to input the name of the data product, the input directory, the output directory, and the layer name to be extracted.

- An example of the data name would be the following:
> *Enter the name of the data product:* **SMERGE**

- An example of the input directory is the following:
> *Enter the input directory:* **E:\\share\\BIgRun\\Smerge**

- An example of the output directory is the following:
> *Enter the output directory:* **E:\\share\\BIgRun\\Smerge_layer**

- An example of the layer name is the following:
> *Enter the layer name to be extracted:* **RZSM**

In [None]:
# Set data product name and input directory
product = input("Enter the name of the data product: ")
input_dir = input("Enter the input directory: ")

# Set the output directory
output_dir = input("Enter the output directory: ")

# Set the layer name
layer_name = input("Enter the layer name to be extracted: ")

> **Note:**
> -  The user must create a text file named "*dates.txt*". This file will store the dates that will be used for the application.

- The "*dates.txt* " file, which contains the dates that will be used, is opened. If the file cannot be opened, the user is prompted to rerun the program.

- Once the file is open, the dates are split into a list.

- The user is then prompted to input the data offset. The offset is expected to be an integer. If a non-integer value is entered, the user will be prompted to enter a valid integer.

  An example of the date offset is the following:
  > *Enter the data offset. If there is none, enter 0:* **0**

In [None]:
# Read list of dates
text_file = open("dates.txt","r")
while not os.path.exists("dates.txt"):
  print('The date list text was not found. It should be in the same directory as the .py script.')
  co = input("Press enter to try again.")
date = text_file.read().split(',')

offset = input('Enter the data offset. If there is none, enter 0: ')
try:
  offset = int(offset)
except:
  while offset.isnumeric():
    offset = input('Offset given was not an integer value, please try again: ')

- The user is asked to input the following: the monthly lag, the file extension, the filename separator, the date position, the year to process, and the date format. The file extension is responsible for indicating the file's layout and structure. Similarly, the filename separator serves as a component of the file's title, enhancing its readability for users.

- An example of the number of monthly lags would be the following:
> *Enter the number of monthly lag:* **0**

- An example of the file extension would be the following:
> *Enter the file extension:* **.nc4**

- An example of the filename separator would be the following:
> *Enter the filename separator:* **_**

- An example of the date position would be the following:
> *Enter the position of the date within the filename, starting at index 0 (example ALB_2000123 position is 1):* **7**

- An example of the year batch would be the following:
> *Enter the year you wish to process (to process all leave blank and press enter):*

- An example of the date formatting would be the following:
> *If the file uses %Y/%j, type A. If the file uses %Y/%m/%d, type B:* **B**

In [None]:
# Monthly Lag
lag = input("Enter the number of monthly lag: ")
lag = int(lag)

# File type
ftype = input("Enter the file extension: ")

# Filename separator
sep = input("Enter the filename separator: ")

# Date position
d_pos = input('Enter the position of the date within the filenames, starting at index 0 (example ALB_2000123 position is 1): ')
d_pos = int(d_pos)

# Year batch
yer = input("Enter the year you wish to process (to process all, leave blank and press enter): ")

# Date Formatting
d_f = input('If the file uses %Y/%j, type A. If the file used %Y/%m/%d, type B: ')
while d_f != 'A' and d_f != 'B':
  d_f = input('Input ERROR. If the file uses %Y/%j, type A. If the file uses %Y/%m/%d, type B: ')
if d_f == 'A':
  d_format = %Y/%j
if d_f =='B':
  d_format == %Y/%m/%d

- The user will create a list in MD files.

- The code enters a loop that iterates through each date in the **date** list.
  - The program first converts a date formatted as '*%m/%d/%Y* ' into a datetime object, then adds a specified number of months to the date.

  - The variable **date_n** provides the year, which is extracted from it. **u_date_n** is created by adding the specified number of days to **date_n**. **l_date_n** is created  by subtracting the specified number of days from **date_n**.

  - Based on the input provided for formatting the date:
    - If '*A* ', then:
     - The day of the year is extracted from **date_n** and stored in the variable **doy**.

      - The year and day of the year of **u_date_n** are concatenated and saved under **u_date_c**.
      - The year and day of the year of **l_date_n** are concatenated and set to the variable **l_date_c**.
    - If '*B* ', then:
      - The year, month, and day of **u_date_n** are concatenated and stored in **u_date_c**.

      - The year, month, and day of **l_date_n** are concatenated and saved under **l_date_c**.

  - The loop will iterate through all the files in the directory specified by the **input_dir** variable.
    - When the current file ends with the specified file type **ftype**, the code continues to process the file.
      - The file names are split into a list of strings based on the separator **sep** and stored under **file_p**.

      - Utilizing the **d_pos** variable, the required element is extracted from the list containing date information. This variable specifies the position of the element within the list.
      - The regular expression **re** substitutes the first character of the date string **file_p** with an empty string.
      - The specified date format **d_format** is used to convert the extracted date string **file_p** into a datetime object, which will be stored in the variable **date_p**.
  - The code checks whether the statements hold true under the **date_p.year** variable. If both statements are true, the code continues its execution.
    - The first statement checks if the year of the **date_p** matches the **year** variable.

    - The second statement checks if the integer value of the file date **file_p** falls within the specified range of the lower and upper date limits, which are **l_date_c** and **u_date_c**, respectively.
   - When both conditions are satisfied, the code prints the file name, and subsequently adds its complete path, consisting of the input directory and the file itself, to the **md_files** list.
   - When the flag **g** is set to 1, it indicates that a matching file has been found. Consequently, the loop is exited by employing the **break** statement, as the matching has already been identified.
  - In the conditional statement, the program verifies if the variable **g** is equal to 1. If this condition is met, the code assigns the value 0 back to **g** to signal that the following iterations of the loop should proceed as usual. Subsequently, the **break** statement is executed, causing the loop to terminate.

In [None]:
# Create a list of MD files in the input directory
md_files = []

# Sets date formatting
for d in date:
  g=0
  date_n = pd.to_datetime(d, format = "%m%d%Y")
  date_n = date_n + pd.DateOffset(months=lag)
  u_date_n = date_n + pd.DateOffset(days==offset)
  l_date_n = date_n - pd.DateOffset(days==offset)
  year = date_n.year
  if d_f == 'A':
    doy = date_n.timetuple().tm_yday
    u_date_c = str(u_date_n.year)+str(u_date_n.timetupel().tm_yday).zfill(3)
    l_date_c = str(l_date_n.year)+str(l_date_n.timetuple().tm_yday).zfill(3)
  if d_f =='B':
    u_date_c = str(u_date_n.year)+str(u_date_n.month).zfill(2)+str(u_date_n.day).zfill(2)
    l_date_c = str(l_date_n.year)+str(l_date_n.month).zfill(2)+str(l_date_n.day).zfill(2)
  for file in os.listdir(input_dir):
    if file.endswith(ftype):
      file_p = file.split(sep)
      file_p = file_p[d_pos]
      file_p = re.sub("D",file_p[0])
      date_p = pd.to_datetime(file_p, format=d_format)
      if date_p.year == year and (int(l_date_c) <= int(file_p) and int(file_p) <= int(u_date_c)):
        print(file)
        md_files.append(os.path.join(input_dir,file))
        g=1
        break;
      if g==1:
        g=0
        break;

- The **date** variable at index **i** is converted into a datetime object and set to **date_in**. The format "*%m/%d/%Y* " specifies the format of the date string.
- Next, the day of the year is extracted from the **date_in** variable and assigned to the **doy_in** variable.
- Afterward, the output file name is constructed by concatenating the output directory, the product name, and the day of the year.
  - The *str(date_in.year)* function converts the year to a string.
  
  - The *str(doy_in).zfill(3)* function transforms the day of the year variable into a string and adds leading zeros to ensure a consistent length of three characters.
- The *arcpy.SubsetMultidimensionalRaster_md* function is used to create a new subset multidimensional raster dataset from the current **md_file**, the **output_file**, and the **layer_name**.
   - Finally, variable **i** is incremented by **1**, which triggers the program to run again with the next set of data.

- Each subset is then printed with the final statement being "Process complete".

In [None]:
i=0
# Iterate through the list of MD files
for md_file in md_files:
  # Converts the data format to mmddYYYY
  date_in = pd.to_datetime(date[i], format="%m%d%Y")
  doy_in = date_in.timetuple().tm_yday
  output_file = os.path.join(output_dir, product + '_'+ str(date_in.year) + str(doy_in).zfill(3) + '.tif')
  if str(yer) in md_file:
    print(output_file)
    print(md_file)
    # Creates a new subset multidimensional raster dataset
    arcpy.SubsetMultidimensionalRaster_md(md_file, output_file, layer_name)
  i=i+1

  print("Process complete")

- The **date** variable at index **i** is converted into a datetime object and set to **date_in**. The format "*%m/%d/%Y* " specifies the format of the date string.
- Using the DateOffset object, a new datetime object is generated by adding a time lag of **lag** months to the existing datetime **date_in**.
- Next, the day of the year is extracted from the **date_in** variable and assigned to the **doy_in** variable.
- Afterward, the output file name is constructed by concatenating the output directory, the product name, and the day of the year.
  - The *str(date_in.year)* function converts the year to a string.
  
  - The *str(doy_in).zfill(3)* function transforms the day of the year variable into a string and adds leading zeros to ensure a consistent length of three characters.
- The *arcpy.SubsetMultidimensionalRaster_md* function is used to create a new subset multidimensional raster dataset from the current **md_file**, the **output_file**, and the **layer_name**.
   - Finally, variable **i** is incremented by **1**, which triggers the program to run again with the next set of data.

- Each subset is then printed with the final statement being "Process complete".

In [None]:
i=0
# Iterate through the list of MD files
for md_file in md_files:
  # Converts the date format to mmddYYYY
  date_in = pd.to_datetime(date[i], format="%m%d%Y")
  # Adds Monthly lag
  date_in = date_in + pd.DateOffset(months=lag)
  doy_in = date_in.timetuple().tm_yday
  output_file = os.path.join(output_dir, product + '_' + str(date_in.year) + str(doy_in).zfill(3) + '.tif')
  if str(yer) in md_file:
    print(output_file)
    print(md_file)
    # Creates a new subset multidimensional raster dataset
    arcpy.SubsetMultidimensionalRaster_md(md_file, output_file, layername)
  i=i+1

  print("Process complete")

- The **date** variable at index **i** is converted into a datetime object and set to **date_in**. The format "*%m/%d/%Y* " specifies the format of the date string.
- Next, the day of the year is extracted from the **date_in** variable and assigned to the **doy_in** variable.
- Afterward, the output file name is constructed by concatenating the output directory, the product name, and the day of the year.
  - The *str(date_in.year)* function converts the year to a string.
  
  - The *str(doy_in).zfill(3)* function transforms the day of the year variable into a string.
- The *arcpy.SubsetMultidimensionalRaster_md* function is used to create a new subset multidimensional raster dataset from the current **md_file**, the **output_file**, and the **layer_name**.
   - Finally, variable **i** is incremented by **1**, which triggers the program to run again with the next set of data.

- Each subset is then printed with the final statement being "Process complete".


In [None]:
i=0
for md_file in md_files:
  # Converts the date format to mmddYYYY
  date_in = pd.to_datetime(date[i], format="%m%d%Y")
  doy_in = date_in.timetuple().tm_yday
  output_file = os.path.join(output_dir, product + '_' + str(date_in.year) + str(doy_in)+'.tif')
  if str(yer) in str(date[i]):
    print(date[i])
    print(output_file)
    print(md_file)
  i=i+1