# Data Preprocessing for HEC-RMS Input

This Jupyter notebook script is designed to preprocess a large dataset of TIFF files for use in the HEC-RMS (Hydrologic Engineering Center's River Analysis System) software.

The script performs the following tasks:

1. **Resampling TIFF files**: The script iterates through all TIFF files in the `input_folder` directory and resamples them to a desired pixel size (in this case, 0.0001). The resampled TIFF files are saved in the `output_folder` directory with a "_resampled" suffix.

2. **Extracting data using a shapefile**: The script then iterates through the resampled TIFF files in the `output_folder` directory and extracts the data using a specified shapefile (`shapefile_path`). The extracted data is saved in the `extracted` folder with an "_extracted" suffix.

3. **Converting TIFF to ASCII**: Finally, the script converts the extracted TIFF files in the `extracted` folder to the ASCII format and saves them in the `ascii` folder with a ".asc" extension.


## Import necessary libraries

In [None]:
import os
from osgeo import gdal

## Section 1: Resampling TIFF files

- The script iterates over all TIFF files in the input_folder directory. For each TIFF file,
- It resamples the data to a desired pixel size (in this case, 0.0001) using the resample_tiff function.
- Saves the resampled file in the output_folder directory with a "_resampled" suffix.

In [None]:
# Define input and output folders, and the shapefile path
input_folder = 'reshapefile'
output_folder = 'output'
shapefile_path = 'reshapefile/usb_cat.shp'

# Create the output folder if it doesn't exist
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

# Function to resample the input TIFF file
def resample_tiff(input_file, output_file, pixel_size):
    # Define resampling options
    resampling_options = gdal.WarpOptions(xRes=pixel_size, yRes=pixel_size, dstNodata=-999)
    
    # Perform resampling
    gdal.Warp(srcDSOrSrcDSTab=input_file, destNameOrDestDS=output_file, options=resampling_options)

# Iterate over all TIFF files in the input folder
for filename in os.listdir(input_folder):
    if filename.endswith('.tif'):
        input_file_path = os.path.join(input_folder, filename)
        
        # Define the desired pixel size for resampling
        desired_pixel_size = 0.0001
        
        # Define the output file path for the resampled data
        output_file_path = os.path.join(output_folder, f"{os.path.splitext(filename)[0]}_resampled.tif")
        
        # Resample the TIFF file
        resample_tiff(input_file_path, output_file_path, desired_pixel_size)

print("Resampling completed.")


## Section 2: Extracting data using a shapefile

- The script iterates over all resampled TIFF files in the input_folder directory (which is now the output_folder from the previous step).
- For each resampled TIFF file, it extracts the data using the shapefile specified by the shapefile_path variable.
- The extracted data is saved in the output_folder directory with an "_extracted" suffix.

In [None]:
# Define input and output folders, and the shapefile path
input_folder = 'output'
output_folder = 'extracted'
shapefile_path = 'reshapefile/usb_cat.shp'

# Create the output folder if it doesn't exist
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

# Function to extract by mask using a shapefile
def extract_by_mask(input_file, output_file, shapefile):
    # Define extraction options with shapefile as mask
    extraction_options = gdal.WarpOptions(cutlineDSName=shapefile, cropToCutline=True, dstNodata=-999)
    
    # Perform the extraction by mask
    gdal.Warp(srcDSOrSrcDSTab=input_file, destNameOrDestDS=output_file, options=extraction_options)

# Iterate over all resampled files in the output folder
for filename in os.listdir(input_folder):
    if filename.endswith('_resampled.tif'):
        input_file_path = os.path.join(input_folder, filename)
        
        # Define the output file path for the extracted data
        output_file_path = os.path.join(output_folder, f"{os.path.splitext(filename)[0]}_extracted.tif")
        
        # Extract data using the shapefile as a mask
        extract_by_mask(input_file_path, output_file_path, shapefile_path)

print("Extraction completed.")



## Section 3: Converting TIFF to ASCII

- The script iterates over all extracted TIFF files in the input_folder directory (which is now the extracted folder).
- For each TIFF file, it converts the data to the ASCII format and saves the resulting files in the output_folder directory with an ".asc" extension.

In [None]:
# Define input and output folders
input_folder = 'extracted'
output_folder = 'ascii'

# Create the output folder if it doesn't exist
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

# Function to convert TIFF to ASCII
def tiff_to_ascii(input_file, output_file):
    # Define translation options
    translate_options = gdal.TranslateOptions(format='AAIGrid')
    
    # Perform translation
    gdal.Translate(destName=output_file, srcDS=input_file, options=translate_options)

# Iterate over all files in the extracted folder
for filename in os.listdir(input_folder):
    if filename.endswith('.tif'):
        input_file_path = os.path.join(input_folder, filename)
        
        # Define the output file path for the ASCII file
        output_file_path = os.path.join(output_folder, f"{os.path.splitext(filename)[0]}.asc")
        
        # Convert TIFF to ASCII
        tiff_to_ascii(input_file_path, output_file_path)

print("Conversion to ASCII completed.")