# Advanced Usage: The `MorphingWorkflow` Class

This notebook demonstrates the advanced, step-by-step approach to morphing EPW files using the `MorphingWorkflow` class.

While the `morph_epw` function is great for direct, one-shot tasks, the `MorphingWorkflow` class is designed for complex projects where you need full control over **filename parsing, custom renaming, and process validation**. It enforces a safe, four-step process that allows you to review and confirm each stage before executing the time-consuming morphing computation.

## General Setup

First, we'll import the necessary class and set up the paths for our files. 

**Important:** You must change the `jar_path` variable to the correct location of the `FutureWeatherGenerator_v3.0.0.jar` file on your system. You must also ensure that the EPW files exist at the specified paths.

In [42]:
import os
from pyfwg import MorphingWorkflow

# --- Configuration ---
# !!! IMPORTANT: You MUST change this path to the correct location on your PC !!!
jar_path = r"D:\OneDrive - Universidad de Cádiz (uca.es)\Programas\FutureWeatherGenerator_v3.0.0.jar"

# --- Define file paths for the examples ---
pattern_epw_dir = 'epws/w_pattern'
keyword_epw_dir = 'epws/wo_pattern' # Assuming a similar folder for the second example

## Step 1: Map Categories from Filenames

This is the first and most critical step for organizing your workflow. The `map_categories` method analyzes your source filenames and extracts meaningful data that will be used later for renaming the output files.

### `map_categories` Parameters

*   `epw_files` (`List[str]`): **Required.** A list of paths to the EPW files you want to process.
*   `input_filename_pattern` (`Optional[str]`, default: `None`): A Python regex string with **named capture groups** (e.g., `(?P<city>...)`) to extract structured data from filenames.
*   `keyword_mapping` (`Optional[Dict]`, default: `None`): A dictionary of rules. Searches the entire filename for keywords to assign categories. This is ideal for irregularly named files.
    *The innermost value can be a single string or a list of strings (e.g., `'seville': ['sevilla', 'svq']`).*

In [43]:
print("--- Running Step 1: map_categories ---")

# Instantiate a workflow object for this example
workflow = MorphingWorkflow()

--- Running Step 1: map_categories ---


Let's have a look at the filenames of the epws we are going to work with. First, the set that have a pattern that we can define with regex:

In [44]:
# Define the list of files for this specific case
epw_files_with_pattern = [os.path.join(pattern_epw_dir, f) for f in os.listdir(pattern_epw_dir)]
epw_files_with_pattern

['epws/w_pattern\\MAD_uhi-type-2.epw',
 'epws/w_pattern\\sevilla_uhi-type-1.epw']

And second, the set that does not have a pattern:

In [45]:
# Define the list of files for this specific case
epw_files_without_pattern = [os.path.join(keyword_epw_dir, f) for f in os.listdir(keyword_epw_dir)]
epw_files_without_pattern

['epws/wo_pattern\\MAD_ICU-type-2.epw',
 'epws/wo_pattern\\sevilla_in_this_one_the_uhi_is_type-1.epw']

In [46]:
# Define the mapping rules to normalize the extracted values
mapping_rules = {
    'city': {
        'seville': ['sevilla', 'SVQ'],
        'madrid': ['madrid', 'MAD']
    },
        'uhi': {
            'type_1': 'type-1',
            'type_2': 'type-2'
        }
}

### Example 1.1: Using a Regex Pattern with Normalization
First, if the filenames follow a regex pattern, you can introduce it in the `input_filename_pattern` argument. 

In [47]:
workflow_w_pattern = MorphingWorkflow()
workflow_w_pattern.map_categories(
    epw_files=epw_files_with_pattern,
    # This pattern extracts raw values like 'MAD' and 'uhi-tipo-2'
    input_filename_pattern=r'(?P<city>.*?)_(?P<uhi>.*)',
    # This dictionary then normalizes them to 'madrid' and 'type-2'
    keyword_mapping=mapping_rules
)

2025-08-16 09:00:06 - INFO - --- Step 1: Mapping categories from filenames ---
2025-08-16 09:00:06 - INFO - Mapped 'epws/w_pattern\MAD_uhi-type-2.epw': {'city': 'madrid', 'uhi': 'uhi-type-2'}
2025-08-16 09:00:06 - INFO - Mapped 'epws/w_pattern\sevilla_uhi-type-1.epw': {'city': 'seville', 'uhi': 'uhi-type-1'}
2025-08-16 09:00:06 - INFO - Category mapping complete.


The mapped categories are saved to the attribute `.epw_categories` as a dictionary.

In [48]:
workflow_w_pattern.epw_categories

{'epws/w_pattern\\MAD_uhi-type-2.epw': {'city': 'madrid', 'uhi': 'uhi-type-2'},
 'epws/w_pattern\\sevilla_uhi-type-1.epw': {'city': 'seville',
  'uhi': 'uhi-type-1'}}

Let's show it as a dataframe for a better visualization:

In [49]:
import pandas as pd
df = pd.DataFrame(workflow_w_pattern.epw_categories)
df

Unnamed: 0,epws/w_pattern\MAD_uhi-type-2.epw,epws/w_pattern\sevilla_uhi-type-1.epw
city,madrid,seville
uhi,uhi-type-2,uhi-type-1


### Example 1.2: Using Keyword-Only Search

However, if these are irregularly named files without a regex pattern, you should set the `input_filename_pattern` to `None`. Then, pyfwg will search the strings in the filenames and set the values according to the diccionary introduced in `keyword_mapping`. In this case, we have defined it as `mapping_rules`, so let's have a quick look at it:

In [50]:
mapping_rules

{'city': {'seville': ['sevilla', 'SVQ'], 'madrid': ['madrid', 'MAD']},
 'uhi': {'type_1': 'type-1', 'type_2': 'type-2'}}

In this example, pyfwg will create a category names 'city'. For that category, it will search the strings 'sevilla' and 'SVQ' in the epw filenames and if any of these matches, it will set the category value 'seville' for these. Let's continue and execute `map_categories` for files without regex pattern:

In [51]:
workflow_wo_pattern = MorphingWorkflow()
workflow_wo_pattern.map_categories(
    epw_files=epw_files_without_pattern,
    input_filename_pattern=None,
    keyword_mapping=mapping_rules
)

2025-08-16 09:00:24 - INFO - --- Step 1: Mapping categories from filenames ---
2025-08-16 09:00:24 - INFO - Mapped 'epws/wo_pattern\MAD_ICU-type-2.epw': {'city': 'madrid', 'uhi': 'type_2'}
2025-08-16 09:00:24 - INFO - Mapped 'epws/wo_pattern\sevilla_in_this_one_the_uhi_is_type-1.epw': {'city': 'seville', 'uhi': 'type_1'}
2025-08-16 09:00:24 - INFO - Category mapping complete.


Again, the mapped categories are saved to the attribute `.epw_categories` as a dictionary.

In [52]:
workflow_wo_pattern.epw_categories

{'epws/wo_pattern\\MAD_ICU-type-2.epw': {'city': 'madrid', 'uhi': 'type_2'},
 'epws/wo_pattern\\sevilla_in_this_one_the_uhi_is_type-1.epw': {'city': 'seville',
  'uhi': 'type_1'}}

Let's show it as a dataframe for a better visualization:

In [53]:
import pandas as pd
df = pd.DataFrame(workflow_wo_pattern.epw_categories)
df

Unnamed: 0,epws/wo_pattern\MAD_ICU-type-2.epw,epws/wo_pattern\sevilla_in_this_one_the_uhi_is_type-1.epw
city,madrid,seville
uhi,type_2,type_1


In [None]:
print("\n--- Running Step 1: map_categories (keyword-only) ---")

# Instantiate a separate workflow object
workflow_keywords = MorphingWorkflow()

# Assume a different set of files with irregular names
# You would need to create this folder and files to run this cell
# keyword_epw_files = [os.path.join(keyword_epw_dir, f) for f in os.listdir(keyword_epw_dir)]

# For demonstration, we will use the same files but a different method
workflow_keywords.map_categories(
    epw_files=epw_files_with_pattern,
    input_filename_pattern=None, # Set to None to activate keyword-only mode
    keyword_mapping=mapping_rules
)

## Step 2: Preview the Rename Plan

Now that the categories are mapped, we can generate a "dry run" plan to see exactly how our files will be renamed and organized. This step is crucial for catching errors before any computation happens.

### `preview_rename_plan` Parameters

*   `final_output_dir` (`str`): **Required.** The path to the directory where your final, renamed files will be saved.
*   `output_filename_pattern` (`str`): **Required.** A template string for the final filenames. It uses placeholders in braces `{}` that **must match the category names** you defined in `map_categories`.
    *   **CRITICAL:** This pattern **MUST** contain the placeholders `{ssp}` and `{year}` to prevent generated files from being overwritten.
    *   *Example:* `'{city}_{uhi}_{ssp}_{year}'`
*   `scenario_mapping` (`Optional[Dict]`, default: `None`): A dictionary to translate raw scenario names (e.g., `'ssp126'`) into a descriptive format (e.g., `'SSP1-2.6'`) for the `{ssp}` placeholder.

In [None]:
print("\n--- Running Step 2: preview_rename_plan ---")

# We will continue with the 'workflow' object from the first example.
workflow.preview_rename_plan(
    final_output_dir='./final_results_workflow',
    # The placeholders {city} and {uhi} match the pattern's capture group names
    output_filename_pattern='{city}_{uhi}_{ssp}_{year}',
    # The {ssp} placeholder will be populated from this mapping
    scenario_mapping={'ssp245': 'SSP2-4.5', 'ssp585': 'SSP5-8.5'}
)

## Step 3: Set the Morphing Configuration

This is the final configuration step. Here, you define all the parameters for the FutureWeatherGenerator tool itself and control the workflow's behavior (e.g., whether to delete temporary files).

### `set_morphing_config` Parameters

*   `fwg_jar_path` (`str`): **Required.** The path to the `FutureWeatherGenerator_v3.0.0.jar` file.
*   `run_incomplete_files` (`bool`, default: `False`): If `True`, the workflow will also process files that were only partially categorized.
*   `delete_temp_files` (`bool`, default: `True`): If `True`, temporary folders are deleted after processing.

> **Note:** The `fwg_show_tool_output` argument and all other `fwg_` arguments (e.g., `fwg_gcms`, `fwg_interpolation_method_id`, etc.) are identical to the ones in the `morph_epw` function. You can refer to the other notebook for a full list, or inspect the function's docstring.

You can also import the list of all valid GCMs to help you choose:
```python
from pyfwg import DEFAULT_GCMS
print(DEFAULT_GCMS)
```

In [None]:
print("\n--- Running Step 3: set_morphing_config ---")

from pyfwg import DEFAULT_GCMS
# print("Available GCMs:", DEFAULT_GCMS)

workflow.set_morphing_config(
    fwg_jar_path=jar_path,
    run_incomplete_files=False,
    delete_temp_files=False, # Set to False for debugging
    fwg_show_tool_output=True,
    fwg_gcms=['BCC_CSM2_MR'], # Use just one GCM for a quick test
    temp_base_dir=r'D:\temp_pyfwg_workflow' # Use a full path for clarity
)

## Step 4: Execute the Morphing Process

This is the final step. The `execute_morphing` method takes **no arguments**. It acts as the "Go" button, running the entire process based on the configuration from the previous three steps.

As a safeguard, it will raise an error if the configuration is invalid, so it's good practice to check the `is_config_valid` flag first.

In [None]:
print("\n--- Running Step 4: execute_morphing ---")

# This is only called after you are satisfied with the preview and config.
# We check the is_config_valid flag first as a safeguard.

# Uncomment the following lines to run the actual morphing:
# if workflow.is_config_valid:
#     workflow.execute_morphing()
# else:
#     print("\nExecution skipped because the configuration is invalid. Please check the warnings above.")

print("Script finished. Uncomment the final lines to execute the morphing.")