# Introduction

The **raster_merge** program is intended to integrate multiple raster files, enabling the amalgamation of diverse datasets into a single, extensive raster by merging them spatially. This feature is particularly valuable when working with imagery saved as distinct files, allowing them to be treated as components of a single image. The merged raster essentially represents a unified collection of rasters, allowing the user to generate a raster object from the amalgamated datasets.

---

The following libraries are important to allow the code to be run:
- **os** - The directory service provides the ability to create and remove directory folders, gather data, change and find the current directory, and provide a means for users and the operating system to interact with each other.

- **re** - regular expression (or **re**) specifies a set of strings, where a particular string will match the given regular expression.
> For more information visit: https://docs.python.org/3/library/re.html

- **Pandas** - Is a tool for modeling, analyzing, and manipulating data sets.
> For more information visit: https://www.geeksforgeeks.org/introduction-to-pandas-in-python/#

*arcpy*

---

- **arcpy** - ArcPy enables efficient and effective geographic data analysis, data management, map automation, and data conversion. With ArcPy, users can leverage the capabilities of Python to effortlessly handle complex geospatial tasks.

- **env** - ArcPy environmental settings provide access to both general geoprocessing settings and the settings of a specific tool.

- **arcpy.sa** - The arcpy.sa module in Python facilitates the analysis of raster and vector data. It leverages the functionalities offered by the ArcGIS Spatial Analyst extension.

*datetime*

---

- **datetime** - The Python Datetime module provides classes for working with dates and times.

---

> **Note:**
>  - To ensure the code is tailored to the user's requirements, edit the comments so that they are included when running the script.

In [None]:
import os
import re
import pandas  as pd
import arcpy
from arcpy import env
from arcpy.sa import *
from datetime import datetime

- Provide the path to the parent folder and specify the name of the geodatabase. These variables will be concatenated to ensure a smooth execution of the code.

In [None]:
# Define the parent folder and the geodatabase name
parent_folder = input('Enter the path to the parent folder: ')
geodatabase_name = input('Enter the name of the geodatabase: ')

# Create the file geodatabase
geodatabase_path = os.path.join(parent_folder, geodatabase_name)

> **Note**:
>  - The following code plays a crucial role in generating the geodatabase, which serves as the working directory for the application. Hence, it should be retained in the code regardless of whether it is being utilized or not.
>>    *arcpy.CreateFileGDB_management(parent_folder, geodatabase_name)*

In [None]:
## Generates the geodatabase (geodatabase is used as a working directory)
## Crucial to keep code as it is needed to run

#arcpy.CreateFileGDB_management(parent_folder, geodatabase_name)

- **arcpy.env.workspace** establishes the geodatabase as the base workspace for all subsequent operations.

- Then, the program confirms the Spatial Analyst extension, which is a prerequisite for geoprocessing operations.
- Subsequently, the process of comparing multidimensional variables is disabled, which can occasionally enhance performance.
- Moreover, the program permits the overwriting of existing output files. This feature is particularly valuable when running a process repeatedly or when uploading data.

In [None]:
# Set the geodatabase as the workspace
arcpy.env.workspace = geodatabase_path

# Set up the environment
arcpy.CheckOutExtension("Spatial")

arcpy.env.matchMultidimensionalVariable = False
#arcpy.env.cellSize = 200
arcpy.env.overwriteOutput = True

- The user will then be asked to state the input directory and the output directory.

  - An example of the input directory would be the following:
  > *Enter the input directory:* **E:\share\BIgRun\Albedo\early\A**

  - An example of the output directory would be the following:
  > *Enter the output directory:* **E:\share\BIgRun\Albedo\ewfolder**

In [None]:
# Set the input directory
input_dir = input("Enter the input directory: ")

# Set the output directory
output_dir = input("Enter the output directory: ")

- The user is requested to specify how many files are to be merged at the same time.
  - An example would be the following:
  > *Enter the number of files that need to be merged at a time:* **2**

- The variable **dir** utilizes a function to obtain a directory of all files and directories contained within the **input_dir**.
- To determine the total number of files, the program subtracts 1 from the length of the variable **dir** because **dir** includes both the current directory and the actual files. Since the current directory is not a file, it is excluded from the count.
- Finally, the program concludes by displaying the value stored in the variable **total_files**, which represents the total number of files in the directory.

In [None]:
# State the amount of files that will be merged at a time
file_n = input("Enter the number of files that need to be merged at a time: ")

# Get a list of all files and directories in input_dir
dir = os.listdir(input_dir)

# Subtracts 1 from the length of the list
total_files = len(dir) - 1

# Prints out the value of total_files
print(total_files)

- The provided code employs the arcpy library, which specializes in geospatial data processing. The code is divided into six distinct sections, each of which plays a crucial role in ensuring the smooth execution of the program. The following is a breakdown of the code:

  - Initialization:
    - *i* is initially set to '0', acting as an index to monitor the current file being processed.
    - *k* is assigned the value of '4', which specifies the number of files that can be processed simultaneously.
  - Loop Condition:
    - The *while*-loop will keep executing as long as the variable *i* is less than **total_files**, indicating that there are still files that need to be processed.
  - File Processing:
    - The **dir[i].split('.')** command is used to split the current file name at the dot character (.), which separates the file name from its extension.
    - The regular expression, **re.sub("\D", "", file_p1[1])**, is employed to remove all non-digit characters from the file name, leaving only the numeric portion.
  - Input File List:
    - The variable **f_inputs** was initialized as an empty list.
    - In the inner *for*-loop, we iterate **int(file_n)** times to generate and add input file paths to **f_inputs**. We combine the **input_dir** with the file names from **dir** to construct the complete paths.
  - Output File Name:
    - The output file path, represented as **f_out**, is created by combining the first and second elements of **file_p1** with an underscore separator. The resultant path is then appended with a '.tif' extension at its end.
  - Mosaic Operation:
    - The *arcpy.management.MosaicToNewRaster* tool performs a mosaic operation on input files. It requires several parameters, including:
      - **f_inputs** - The list of input file paths.
      - **output_dir** - The directory where the output mosaic raster will be saved.
      - **f_out** - The name of the output raster file.
      - **number_of_bands** - The number of bands in the output raster.
      - **mosaic_method** - The method for combining the input rasters. In this case, the mean method, denoted as 'MEAN', was used.
      - **pixel_type** - The data type of the output raster. In this instance, the setting is '*32_BIT_FLOAT*'.
  
 - Finally, the value of *i* is increased by 2 until total_files are completed.


In [None]:
i = 0
k = 4
while i < total_files:
    file_p1 = dir[i].split('.')
    file_d1 = re.sub("\D", "", file_p1[1])
    print(i)

    f_inputs = []
    for j in range(int(file_n)):
        f_inputs.append(os.path.join(input_dir, dir[i + j]))
    # f1 = os.path.join(input_dir,dir[i])
    # f2 = os.path.join(input_dir,dir[i+1])
    print(f_inputs)

    f_out = file_p1[0] + '_' + file_p1[1] + '.tif'

    arcpy.management.MosaicToNewRaster(f_inputs, output_dir, f_out, number_of_bands = 1, mosaic_method='MEAN', pixel_type='32_BIT_FLOAT')
    i = i + 2