# Climate Data Processing for 3PG Dataset

This script processes climate data for the 3PG dataset. It involves extracting, converting, and reformatting raster data using arcpy and regular expressions. The processed data is then saved in specified output folders and text files. The script can be divided into the following sections:

1. **Importing Packages and Setting Up Environment**:
   - Import required packages including arcpy, os, re, and time.
   - Check out the "Spatial" extension for arcpy.
   - Define the workspace for input and output files.
   - Enable overwrite output to allow file overwriting during processing.

2. **Folder and Time Period Configuration**:
   - Define the list of folders containing input data (e.g., "Switzerland_pr", "Switzerland_tasmax").
   - Specify the start and last years for data processing.

3. **Data Elaboration - Extraction, Conversion, and Projection**:
   - Iterate through each folder and process data for each file.
   - Extract the year and month from filenames using regular expressions.
   - Convert raster files to floating-point format using `RasterToFloat`.
   - Define the projection for the processed data using `DefineProjection`. The data are projected to the CH1903+ LV95 coordinate system.
   - Save processed data in the corresponding output folder.

4. **Output and Progress**:
   - Print progress and information for each processed file.

5. **File Organization and Text File Creation**:
   - Group processed files by year.
   - Write the file paths and years to a text file for each folder.

6. **Removing NODATA Header Lines**:
   - Iterate through each folder and remove NODATA header lines from .hdr files.

## Prerequisites

1. **ArcGIS Software**: This script requires arcpy, which is part of the ArcGIS software suite. Ensure you have a compatible version of ArcGIS installed.

2. **Input Data**: Prepare the input raster data and ensure it's located in the specified folders.

3. **Configuration**: Modify the script's parameters, such as workspace paths, time periods, and file extensions to match your dataset and requirements.

## Usage

1. **Open the Jupyter Notebook**: Launch your Jupyter Notebook environment. Modify the script's parameters to match your setup.

2. **Run the Notebook Cells**: Execute the notebook cells sequentially by clicking on each cell and pressing Shift + Enter. Make sure to run the cells in the correct order.

2. **Output**: Processed climate data files will be saved in the specified output folders. Text files containing lists of processed files grouped by year will also be created.

## Notes

- Verify the correctness and accuracy of the processed data after running the script.
- Review the regular expressions and processing steps to ensure they match the filename patterns and data structure.
- Check the projection details and coordinate system parameters in the `DefineProjection` function for proper assignment.
- Ensure compliance with data usage agreements and copyrights when processing external datasets.

## Author

Script written by Luca Ferrari

Contact: luca.ferrari@usys.ethz.ch

For inquiries or assistance, please contact the author.

This README content was generated with the assistance of an AI language model from OpenAI. The provided content is based on user input and has been tailored to the specific requirements of the project.

In [None]:
# %% Import packages
import arcpy
from arcpy import env
import os
import re
import time
arcpy.CheckOutExtension("Spatial")

In [None]:
# %% Define workspace
env.workspace = r"N:\Luca_data"
arcpy.env.overwriteOutput = True

folders = ["Switzerland_pr", "Switzerland_tasmax", "Switzerland_tasmin", "Switzerland_fst", "Switzerland_rsds", "Switzerland_vpd"]

start_year = 1980
last_year = 2018


In [None]:
# Iterate over each folder
for folder in folders:
    folder_path = os.path.join(env.workspace, "Chelsa_V2_Monthly\Resampled data 250m", folder)

    output_path = os.path.join(env.workspace, "3PG", folder)

    # Create output folder if it does not exist
    if not os.path.exists(output_path):
        os.makedirs(output_path)

    # Get the list of files in the directory using os.scandir()
    with os.scandir(folder_path) as entries:
        # Filter out directories and get only file names
        file_names = [entry.name for entry in entries if entry.is_file()]
        # Filter the list to only include .tif files
        file_names = [f for f in file_names if f.endswith('.tif')]

    # Define the regular expression pattern to extract the date from the filename
    pattern = r'\d{4}'

    # Iterate over the files in the directory
    for file_name in file_names:
        # Extract the date from the filename using the regular expression pattern
        match = re.search(pattern, file_name)
        if match:
            # Extract the year from the matched date
            year = int(match.group(0))
            
            # Check if the year is between 2013 and 2018
            if start_year <= year <= last_year:
                # Create the full path to the file
                file_path = os.path.join(folder_path, file_name)

                month_match = re.search(r"_(\d{2})\_", file_name)
                month = int(month_match.group(1))

                #output_file_path =  os.path.join(output_path, f"{extracted_word}_{month}_{year}.flt")
                match = re.search(r"(_\w+)", folder)
                extracted_word2 = match.group(1)
                y = "Y"
                output_file_path =  os.path.join(output_path, f"{y.upper()}{year}{extracted_word2}_{str(month).zfill(2)}.flt")
                    
                arcpy.conversion.RasterToFloat(
                    in_raster = file_path,
                    out_float_file = output_file_path
                )

                # Add a delay of 5 second before defining the projection
                time.sleep(2)

                arcpy.management.DefineProjection(
                    in_dataset=output_file_path,
                    coor_system='PROJCS["CH1903+_LV95",GEOGCS["GCS_CH1903+",DATUM["D_CH1903+",SPHEROID["Bessel_1841",6377397.155,299.1528128]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Hotine_Oblique_Mercator_Azimuth_Center"],PARAMETER["False_Easting",2600000.0],PARAMETER["False_Northing",1200000.0],PARAMETER["Scale_Factor",1.0],PARAMETER["Azimuth",90.0],PARAMETER["Longitude_Of_Center",7.439583333333333],PARAMETER["Latitude_Of_Center",46.95240555555556],UNIT["Meter",1.0]]'
                )

                # Print progress or other relevant information
                print(f"Processed: {file_path} -> {output_file_path}\n")


In [None]:
folders = ["Switzerland_pr", "Switzerland_tasmax", "Switzerland_tasmin", "Switzerland_fst", "Switzerland_rsds", "Switzerland_vpd"]

for folder in folders:
    folder_path = os.path.join(env.workspace, "3PG", folder)

    # Get the list of .flt files in the folder and extract the year from each file name
    file_list = [(file, re.search(r"\d{4}", file).group()) for file in os.listdir(folder_path) if file.endswith('.flt')]

    # Group the file paths by year
    file_paths_by_year = {}
    for file, year in file_list:
        if year in file_paths_by_year:
            file_paths_by_year[year].append(os.path.join(folder_path, file))
        else:
            file_paths_by_year[year] = [os.path.join(folder_path, file)]

    # Write the file paths to a text file with the folder name
    output_file = os.path.join(env.workspace, "3PG", f'{folder}.txt')
    with open(output_file, 'w') as f:
        for year, file_paths in file_paths_by_year.items():
            # Write the year
            f.write(f'{year} ')
            f.write('     '.join(file_paths))
            f.write('\n')


In [None]:
# Iterate over each folder
for folder in folders:
    folder_path = os.path.join(env.workspace, "3PG", folder)

    # Get the list of files in the directory using os.scandir()
    with os.scandir(folder_path) as entries:
        # Filter out directories and get only file names
        file_names = [entry.name for entry in entries if entry.is_file()]
        # Filter the list to only include .tif files
        file_names = [f for f in file_names if f.endswith('.hdr')]

    # Define the regular expression pattern to extract the date from the filename
    pattern = r'^NODATA\b(?!_value).*\n'

    # Iterate over the files in the directory
    for file_name in file_names:
            file_path = os.path.join(folder_path, file_name)

            with open(file_path, 'r') as file:
                lines = file.readlines()

            # Remove lines matching the pattern
            modified_lines = [line for line in lines if not re.match(pattern, line)]

            # Write the modified lines back to the file
            with open(file_path, 'w') as file:
                file.writelines(modified_lines)

            # Print progress or other relevant information
            print(f"Processed {file_path}\n")
