# Introduction
The **auto_zonal_stats** program allows users to resample files, enabling raster file size conversion. The Auto Zonal Statistics program calculates statistics on raster values within zones of another dataset. The tool generates a raster output list, but only one statistic is calculated  at a time.

---
The following libraries are imported to allow the code to be run:
- **os** - The directory service provides the ability to create and remove directory folders, gather data, change and find the current directory, and provide a means for users and the operating system to interact with each other.

- **re** - regular expression (or **re**) specifies a set of strings, where a particular string will match the given regular expression.
> For more information visit: https://docs.python.org/3/library/re.html

- **Pandas** - Is a tool for modeling, analyzing, and manipulating data sets.
> For more information visit: https://www.geeksforgeeks.org/introduction-to-pandas-in-python/#




*arcpy*

---
- **arcpy** - ArcPy enables efficient and effective geographic data analysis, data management, map automation, and data conversion. With ArcPy, users can leverage the capabilities of Python to effortlessly handle complex geospatial tasks.

- **env** - ArcPy environmental settings provide access to both general geoprocessing settings and the settings of a specific tool.

- **arcpy.sa** - The arcpy.sa module in Python facilitates the analysis of raster and vector data. It leverages the functionalities offered by the ArcGIS Spatial Analyst extension.

---

 **Note:**
>  - To ensure the code is tailored to the user's requirements, edit the comments so that they are included when running the script.

In [None]:
import os
import re
import pandas as pd
import arcpy
from arcpy import env
from arcpy.sa import *

*raster_points*


---

- The **raster_points** function takes three arguments: **input_raster**, **output_raster**, and **shp_file**. This function generates a table that summarizes the values of a raster within the zones of another dataset.
  - The **input_raster** is the path to the directory that contains the original raster file.
  - The **output_table** is the route to the output directory where the files have been recorded in a table. **Output_table** will contain a summary of the values within each zone.
  - The **shp_file** argument refers to the file containing the geometric data for all features. The *shapefile* format is a digital vector storage format used to store geographic location and related attribute information.

- The function's initial step is to read the CSV file input and assess the rasters. The **input raster** is internally converted to a raster, using the cell size and alignment from the input value raster parameter.
> **Note:**
>  - For the **input raster** to be valid, it must be an integer raster format.

- Subsequently, the user is required to input the index variable, which will be assigned to the **zoneField** variable.
  - An example of the index variable is the following:
    > *Enter the index variable:* **PageName**

  When defining zones using features, an internal conversion from features to raster will take place. If the input zone is a raster with different raster cell sizes or alignments, resampling will also occur. To maintain consistency and control over the vector-to-raster conversion, it is advisable to use rasters as the zone input. This approach helps ensure predictable and desired results.

- The **output_table** will be exported to a CSV file and its filename is converted to '*.csv* '. The temporary table named "output_tableG" will be removed by utilizing the *arcpy.management.Delete* function. Then, the *arcpy.management.ClearWorkspaceCache* function will clear the ArcMap workspace cache, which stores information about the current geodatabase and other configurations.

In [None]:
def raster_points(input_raster, output_table, shp_file):
  # Read the input and check rasters
  in_raster = Raster(input_raster)

  zoneField = input("Enter the index variable: ")
  outZSaT1 = ZonalStatisticsAsTable(shp_file, zoneField, input_raster, "output_tableG", "DATA", "ALL", "CURRENT_SLICE", 90, "AUTO_DETECT", "ARITHMETIC", 360)

  arcpy.conversion.ExportTable("output_tableG", output_table.replace('.dbf','.csv'))
  arcpy.management.Delete("output_tableG")
  arcpy.management.ClearWorkspaceCache()
  #arcpy.sa.ExtractValuesToPoints(thirty_points, in_raster, thirty_points, 'NONE')

*find_uncommon_elements*

---

- The function takes two arguments, *list1* and *list2*, and returns a list of elements that are uncommon between the two input lists.
- First, the function converts the input lists into sets.
- Then, the difference operator (-) is used to find elements that are in one set but not the other.
  - For example, for *unique_to_set1*, the function finds elements that are in *set1* but not in *set2*.
- The function then combines the unique elements from both sets into a new set using the union method.
- Finally, the function converts the new set back to a list and returns it as the *result* list, which contains all the elements that are unique to either of the input lists (uncommon elements).

In [None]:
def find_uncommon_elements(list1, list2):
  # Convert lists to sets for efficient operations
  set1 = set(list1)
  set2 = set(list2)

  #Find elements that are unique to each set
  unique_to_set1 = set1 - set2
  unique_to_set2 = set2 - set1

  # Combine the unique elements and convert back to list
  result = list(unique_to_set1.union(unique_to_set2))
  return result

- This function obtains the current working directory and saves it in the variable named **parent_folder**. The variable **geodatabase_name** is assigned to the following string.

In [None]:
# Define the parent folder and the geodatabase name
parent_folder = os.getcwd()
geodatabase_name = "example2.gdb"

- The file **geodatabase_path** is created by concatenating the *parent_folder* and *geodatabase_name*. Then, the program verifies the existence of the file. If the file exists, it will print the path for the user to access. Conversely, if the file does not exist, a new file geodatabase will be generated for the path.

In [None]:
#Create the file geodatabase
geodatabase_path = os.path.join(parent_folder, geodatabase_name)
if os.path.exists(geodatabase_path):
  print(geodatabase_path)
else:
  arcpy.CreateFileGDB_management(parent_folder, geodatabase_name)

- To facilitate subsequent ArcGIS operations, the *workspace* property is modified, setting the default workspace for those operations to the specified geodatabase path.

In [None]:
# Set the geodatabase as the workspace
arcpy.env.workspace = geodatabase_path

- *SetLogHistory* is set to 'True', enabling geoprocessing history logging.
- The parallel processing factor sets all available CPU cores for usage.
- The Image Analyst extension is checked out, making it ready to use.
- By setting *matchMultidimensionalVariable* to 'False', matching of multidimensional variables in raster datasets is disabled.
- Overwriting of existing output datasets is allowed when *overwriteOutput* is set to 'True'.
- Lastly, the program sets the default cell size for raster operations by using the cell size of the raster dataset at the specified path.

 > **Note:**
>  - To ensure efficient processing, the **raster must be equal to or lower than the target resolution**.
>    - Using a target resolution that is too small can prolonged the runtime and can cause the zonal statistics to consider the cell size.

In [None]:
# Set up the environment
arcpy.SetLogHistory(True)
arcpy.env.parallelProcessingFactor = "\\"
arcpy.CheckOutExtension("ImageAnalyst")
arcpy.env.matchMultidimensionalVariable = False
arcpy.env.overwriteOutput = True
# asking for a raster that is at or below (cannot go too below, the runtime will be too long) target reso (its forcing zonal stats to recognize the cellSize)
arcpy.env.cellSize = input("Enter the full path of the raster that is at or below (but not too small, it can make runtime too long) the target resolution: ")

- To initiate the program, the user must provide three inputs: the name of the data product, the directory containing the input data, and the directory where the output will be saved.
  - An example of the data product name is the following:
  > *Enter the name of the data product:* **PPT**

  - An example of the input directory is the following:
  > *Enter the input directory:* **E:\\\share\\\BIgRun\Prism_PPT**

  - An example of the output directory is the following:
  > *Enter the output directory:* **E:\\\share\\\BIgRun\\\All_tables\\\B1000**

In [None]:
# Set the data product name
product = input("Enter the name of the data product: ")

# Set the input directory
input_dir = input("Enter the input directory: ")

# Set the output directory
output_dir = input("Enter the output directory: ")

- Consequently, the user is prompted to enter the full file path along with the name of the grid shape file, as well as the grid file's resolution.
  - An example of the full file path and the name of the grid shape file is the following:
    > *Enter the fill file path and name of the grid shape file:* '**E:\\\share\\\BIgRun\\\Grids2\\\Grid_1000.shp**'
  - An example of the grid file's resolution is the following:
    > *Enter the resolution on the grid file:* '**1000**'

- In the event that the file containing the file path and grid shape does not exist, the program will notify the user to provide a valid file name.

In [None]:
# Set the file path and name of grid shape file
grid_shape_file = input("Enter the full file path and name of the grid shape file: ")

# Set the resolution of the grid file
grid_re = input("Enter the resolution on the grid file: ")

while not os.path.exists(grid_shape_file):
  print("The shapefile was not found.")
  co = input("Please try again: ")

- The user is asked to input a text file that contains the list of dates to the variable **d_text** and subsequently opens and reads the file using the variable **text_file**.
  - An example would be:
  > *Enter the date list text file:* **mondays_2012_2015.txt**
- During this process, the program checks for the existence of the file. If the file is not found, the program prompts the user to try again.

-  Once the file is found, the program reads the entire contents of the file and splits it into a list using a comma as a delimiter. This list is then stored in the variable **date**.
- The user is requested to enter the text file containing the adjusted date list.
  - An example would be:
  > *Enter the adjusted date list text file:* **mondays_adjusted_2012_2015.txt**
- The program uploads this file to the variable **d_text1**, opens and reads the file, and saves it in the variable **text_file1**.
- The program then reads **text_file1**, splits it into a list using a comma as the delimiter, and stores it in the variable **date1**.
- Finally, the program extracts unique elements from both **date** and **date1** and assigns them to the variable **date**, effectively creating a new dataset.

In [None]:
# Uploads text file
d_text = input("Enter the date list text file: ")
# Opens and reads the list of dates
text_file = open(d_text, "r")
# Checks file's existence
while not os.path.exists(d_text):
  print("The date list text file was not found. It should be in the same directory as the '.py' script.")
  n = input("Press enter to try again.")
  text_file = open(d_text, "r")
# Reads file
date = text_file.read().split(',')

# Text file with adjusted data is uploaded
d_text1 = input("Enter the adjusted date list text file: ")
# Read list of dates
text_file1 = open(d_text1, "r")
# Reads file
date1 = text_file1.read().split(',')

# Finds unique elements from files date and date1
date = find_uncommon_elements(date, date1)

- The user must provide two inputs: the number of monthly lags and the day offset. The values will then be converted to integers, respectively.
  - An example of the number of monthly lags is the following:
    > *Enter the number of monthly lag:* **0**
  - An example of the day offset is the following:
    > *Enter the day offset:* **3**




In [None]:
# State the monthly lag
lag = input('Enter the number of monthly lag: ')
# Convert to integer
lag = int(lag)

# State the day offset
offset = input('Enter the day offset: ')
# Convert to integer
offset = int(offset)

- To specify a start date other than the default, provide the index of the desired start date. If no changes are needed, input: **1**.
  - An example would be the following:
   > *If you need to start on a different date, enter its index. If not, enter 1:* **1**

- The value will then be converted to an integer.

In [None]:
# State whether or not the starting date is different
start = input('If you need to start on a different date, enter its index. If not, enter 1: ')
# Convert to integer
start = int(start)

- The operation extracts a portion of a list. The extracted portion starts at the first date and includes all elements from the start of the specified index to the end of the list.

In [None]:
date = date[start:]

- The program uploads the text file containing the adjusted date list to the variable **d_text** and subsequently opens and reads the file using the variable **text_file**.
  - An example would be:
  > *Enter the adjusted date list text file:* **mondays_adjusted_2012_2015.txt**

- During this process, the program checks for the existence of the file. If the file is not found, the program prompts the user to try again.

-  Once the file is found, the program reads the entire contents of the file and splits it into a list using a comma as a delimiter. This list is then stored in the variable **date**.

In [None]:
# Uploads text file
d_text = input("Enter the adjusted date list text file: ")

# Opens and reads list of dates
text_file = open(d_text, "r")

# Checks file's existence
while not os.path.exists(d_text, "r"):
  print("The date list text file was not found. It should be in the same directory as the '.py' script.")
  n = input("Press enter to try again.")
  text_file = open(d_text, "r")

date = text_file.read().split(',')

- The user must provide two inputs: the number of monthly lags and the day offset. The values will then be converted to integers, respectively.
  - An example of the number of monthly lags is the following:
    > *Enter the number of monthly lag:* **0**
  - An example of the day offset is the following:
    > *Enter the day offset:* **3**




In [None]:
# State the monthly lag
lag = input("Enter the number of monthly lag: ")
# Convert to integer
lag = int(lag)

# State the day offset
offset = input("Enter the day offset: ")
# Convert to integer
offset = int(offset)

- To specify a start date other than the default, provide the index of the desired start date. If no changes are needed, input: **1**.
  - An example would be the following:
    > *If you need to start on a different date, enter its index. If not, enter 1:* **1**

- To accommodate 0-based indexing, the value is converted to an integer and 1 is subtracted from the starting index.

In [None]:
# State whether or not the starting date is different
start = input("If you need to start on a different date, enter its index. If not, enter 1: ")
# Convert to integer
start = int(start) - 1

- The operation extracts a portion of a list. The extracted portion starts at the first date and includes all elements from the start of the specified index to the end of the list.

In [None]:
date = date[start:]

- The file extension provides information about the file's structure and organization. Similarly, the filename separator, which is part of the file title, improves its readability for users. Moreover, users need to specify the date's position and the year that needs processing.
  - An example of the file type is the following:
  > *Enter the file extension:* **.crf** (Conditional Random Fields)

  - An example of a filename separator is the following:
  > *Enter the filename separator:* **_**

  - An example of the position of date is the following:
  > *Enter the position of the date within the filenames, starting at index 0:* **4**

  - An example of the year batch is the following:
  > *Enter the year you want to process. If you wish to process all, leave blank and press enter:*


In [None]:
# State the file type
ftype = input("Enter the file extension: ")

# State the filename separator
sep = input("Enter the filename separator: ")

# State the position of date
d_pos = input("Enter the position of the date within the filenames, starting at index 0 (for example: for 'ALB_2000123', the position is 1): ")
# Convert to integer
d_pos = int(d_pos)

# State the year batch
yer = input("Enter the year you want to process. If you wish to process all, leave blank and press enter: ")

*Date Formatting*

---

- In order to accommodate different date formats in a file, the user is prompted to input either 'A' or 'B'. The input is then evaluated for validity. If the input is not valid, the user will be asked to enter it again until a valid input is received.

  Depending on the input, the date format is assigned to the variable **d_format**.
  - If the input is '*A* ', the date format is '*%Y/%j* '
  - If the input is '*B* ', the date format is '*%Y/%m/%d* '

In [None]:
# Date Formatting
d_f = input("If the file use '%Y/%j' type: A, if '%Y/%m/%d' type: B: ")
while d_f != 'A' and d_f != 'B':
  d = input("Input ERROR. If file use '%Y/%j' type: A, if '%Y/%m/%d' type: B: ")

# Choice for formatting date
if d_f == 'A':
  d_format = '%Y/%j'

if d_f == 'B':
  d_format = '%Y/%m/%d'

- The **fdir** command is utilized to compile a catalog of directories within a designated directory.

  Subsequently, an empty list is established under the **ref_dir** variable.
   
  The program proceeds to ascertain whether the current file, denoted as **fd**, concludes with the specified file extension, **ftype**.
  
  If the condition is met, the file with the specified extension is appended to the **ref_dir** list.

- The code enters a loop that iterates through each date in the **date** list. For each date in the list, the **j** variable  is initialized to **False**.
  - The program first converts a date formatted as '*%m/%d/%Y* ' into a datetime object, then adds a specified number of months to the date.

  - The variable **date_n** provides the year, which is extracted from it. **u_date_n** is created by adding the specified number of days to **date_n**. **l_date_n** is created  by subtracting the specified number of days from **date_n**.

- The loop runs through every file in the directory  **ref_dir**. For each file in the directory, the variable **j** checks if it is set to **True**. If it is, the loop is exited immediately by executing the *break* statement. and the following code.
  - The filenames that match the pattern **f** are split using the separator specified in the **sep** variable, and the resulting parts are stored in **file_p**. Subsequently, the regular expression "\D" is used to replace any non-digit characters in the **d_pos** element of **file_p** with a " \ ". Finally, each element in **file_p** is converted to a datetime object using the format specified in **d_format**, and the resulting datetime objects are stored in **date_p**.
  - Based on the input provided for formatting the date:
    - If '*A* ', then:
      - The day of the year is extracted from **date_n** and stored in the variable **doy**.
      - The year and day of the year of **u_date_n** are concatenated and saved under **u_date_c**.
      - The year and day of the year of **l_date_n** are concatenated and set to the variable **l_date_c**.
    - If '*B* ', then:
      - The year, month, and day of **u_date_n** are concatenated and stored in **u_date_c**.
      - The year, month, and day of **l_date_n** are concatenated and saved under **l_date_c**.
    - If **date_p.year == year** (if the year of *date_p* variable matches the *year* variable), then:
      - The **input_raster** variable is assigned the full path of the **input_dir** and the filename **f**.
      - The **d_n** variable is assigned the concatenation of the string of the year from **date_n** with the string of the day of the year from the **date_n** variable.
      - The variables **product**, **d_n**, and **grid_re** are concatenated with a '.csv' extension and assigned to the variable **data_name**.
      - The **output_dir** and **data_name** are joined to create a full path and assigned to the variable **output_raster**.
      - The **data_name** and the string of the **date_n** will be combined and printed.
      - Finally, the **raster_points** function is called, taking as arguments the input raster file path, the output raster file path, and the grid shape file.
  - When the variable **f** is initialized to **True**, the previous code is executed, ending the loop.
- Finally, variable **i** is incremented by **1**, which triggers the program to run again with the next set of data.

In [None]:
# Creates list
fdir = os.listdir(input_dir)
ref_dir = []
for fd in fdir:
  if fd.endswith(ftype):
    ref_dir.append(fd)
i = start

# Sets format for date
for d in date:
  j = False
  # Converts current date to datetime and adds specified number of months
  date_n = pd.to_datetime(d, format='%m/%d/%Y') + pd.DateOffset(months=lag)
  # Extracts the year from date_n
  year = date_n.year
  # Adds specified days to date_n
  u_date_n = date_n + pd.DateOffset(days=offset)
  # Subtracts specified days from date_n
  l_date_n = date_n - pd.DateOffset(days=offset)
  for f in ref_dir:
    if j == True:
      break
    # Splits f using the separator (sep)
    file_p = f.split(sep)
    # Replaces non-digit charac. with a "\"
    file_p = re.sub("\D","\\", file_p[d_pos])
    # Converts to datetime
    date_p = pd.to_datetime(file_p, format = d_format)

    if d_f =='A':
      # Extracts day of year from date_n and stores in doy
      doy = date_n.timetuple().tm_yday
      # Concatenates the year with day of year of u_date_n
      u_date_c = str(u_date_n.year) + str(u_date_n.timetuple().tm_yday).zfill(3)
      # Concatenates the year with day of year of l_date_n
      l_date_c = str(l_date_n.year) + str(l_date_n.timetuple().tm_yday).zfill(3)

    if d_f == 'B':
      # Concatenates the year, month, and day of u_date_n
      u_date_c = str(u_date_n.year) + str(u_date_n.month).zfill(2) + str(u_date_n.day).zfill(2)
      # Concatenates the year, month, and day of l_date_n
      l_date_c = str(l_date_n.year) + str(l_date_n.month).zfill(2) + str(l_date_n.day).zfill(2)

    if date_p.year == year and (int(l_date_c) <= int(file_p) and int(file_p) <= int(u_date_c)):
      # Constructs full path
      input_raster = os.path.join(input_dir, f)
      # Concatenates the year and day of year of date_n
      # Has to by date index number, because the arcpy multipoint function has a name length limit
      d_n = str(date_n.year) + str(date_n.timetuple().tm_yday)
      # Concatenates the following variables
      data_name = product + "_" + d_n + "_" + grid_re + ".csv"
      # Constructs full path
      output_raster = os.path.join(output_dir, data_name)
      print(output_raster)
      print(file_p)
      print(data_name + " " + str(date_n))
      raster_points(input_raster, output_raster, grid_shape_file)

      # removes previously used files from list to prevent repeat data
      # ref_dir.remove(f)
      j = True
  i = i + 1