(year-specific-workflows)=
# Run year-specific workflows
This directory contains year-specific worfklows that can be manually edited to suit whatever needs. The current files all assume that that the biomass estimates are extrapolated for the entire kriging mesh grid. Years 2003-2023 are configured to read in the unconsolidated Echoview exports, but an example for how to read-in already consolidated NASC exports (similar to what is used for years 1995-2001) is included for each year. 

## Workflow parameters

At the top of each script file are a variety of arguments, which vary from year to year depending on different ingestion needs. 

- `VERBOSE`: A boolean parameter that when `True` prints logging messages the indicate script progress.
- `DATA_ROOT`: This is a parameter-of-convenience that defines the root directory of the dataset files. Importantly: all filepaths have to be `pathlib.Path`-objects.
- `REPORTS_DIR`: This is the directory for where the `*.xlsx` reports files are generated. 
- `NASC_PREPROCESSED` *2003-2023 only*: A boolean parameter that when `True` will read in a pre-formatted NASC export file. Otherwise, the raw Echoview exports will be read in and processed for use. 
- `NASC_EXPORTS_FILES`/`NASC_EXPORTS_SHEET`: The filepath and sheetname of the associated export file(s), with the assumption that it is an `*.xlsx` file.
- `BIODATA_FILE`/`BIODATA_SHEETS`: The filepath and sheet-key mapping for the biological master spreadsheet. `BIODATA_SHEETS` is a dictionary that maps the name of the sheets associated with the `catch`, `length`, and `specimen` datasets, e.g.
```python
        BIODATA_SHEETS = {
            "catch": "biodata_catch",
            "length": "biodata_length",
            "specimen": "biodata_specimen",
        }
```
- `HAUL_STRATA_FILE`/`HAUL_STRATA_SHEETS`: The filepath and sheet-key mapping for the haul-based stratification definitions. `HAUL_STRATA_SHEETS` is a dictionary that maps the name of the sheets associated with the `INPFC` and `KS` strata, e.g.
```python
        HAUL_STRATA_SHEETS = {
            "inpfc": "INPFC",
            "ks": "Base KS",
        }
```
- `GEOSTRATA_FILE`/`GEOSTRATA_SHEETS`: The filepath and sheet-key mapping for the geographic-based stratification definitions. `GEOSTRATA_SHEETS` is a dictionary that maps the name of the sheets associated with the `INPFC` and `KS` strata, e.g.
```python
        GEOSTRATA_SHEETS = {
            "inpfc": "INPFC",
            "ks": "stratification1",
        }
```
- `KRIGING_MESH_FILE`/`KRIGING_MESH_SHEET`: The filepath and sheetname for the kriging mesh file.
- `KRIGING_VARIOGRAM_PARAMETERS_FILE`/`KRIGING_VARIOGRAM_PARAMETERS_SHEET`: The filepath and sheetname for the kriging and variogram parameters file.
- `ISOBATH_FILE`/`ISOBATH_SHEET`: The filepath and sheetname for the 200m isobath file used for transforming spatial coordinates in the variogram and kriging analyses. 

## Quick access and data processing
Once a file has been fully parameterized and configured, then the files can be set up to be run from command line or from the code cells of a `Jupyter` notebook. This utilizes the `year_specific_workflow.py` file. This is configured to include two arguments:
- `year`: The specific workflow year/file, e.g. `hake_2023`.
- `verbose`: When this is included, logging messages are recorded and printed out at the conclusion of the script's execution.

This utilizes CLI hooks contained within the year-specific workflow sub-package and workflow files. The following example commands illustrate both the year-specification and optionally toggling `--verbose`.

::::{tab-set}
:::{tab-item} Windows
:sync: tab1

```powershell
# PowerShell example
python year_specific_workflow.py --year hake_2023
python year_specific_workflow.py --year hake_2021 --verbose
```
:::
:::{tab-item} Linux/macOS
:sync: tab2

```bash
# Bash/Linux/macOS example
python3 year_specific_workflow.py --year hake_2023
python3 year_specific_workflow.py --year hake_2021 --verbose
```
:::
::::

:::{admonition} Running scripts via Jupyter

Adding a **shebang** (`!`) before the commands enables the scripts (or custom-built batch/shell scripts) to be run within Jupyter. For example, this would look like: 

```powershell
!python year_specific_workflow.py --year hake_2001
```

when run via the terminal.
:::

Adding `--verbose` can provide a variety of logging information that can be manually inserted into the script files. For instance:

```powershell
!python year_specific_workflow.py --year hake_1995 --verbose
```

In [25]:
import subprocess

year = 1995

process = subprocess.Popen(
    ["python", "year_specific_workflow.py", "--year", f"hake_{year}", "--verbose"],
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    text=True
)

# Stream the output live
for line in process.stdout:
    print(line, end="")

Reading pre-generated NASC export file: 'C:/Data/EchopopData/echopop_1995/Exports/US&CAN_detailsa_1995_table1y+_ALL_final.xlsx'.
---- Converting AFSC NASC export format to FEAT
     Default interval distance: 0.5 nmi
     Default transect spacing: 10.0 nmi
     Including transects: 1 to 400
---- Filtering out off-effort transect intervals based on: C:\Data\EchopopData\echopop_1995\Kriging_files\Kriging_grid_files\Transect Bounds to 2011.xlsx
     Survey filter: 'survey == 199510'
3925 rows had missing vessel log distances.
These are incompatible with the workflow and will therefore be removed accordingly.
Longitude values in 'df_nasc' are in deg.W. However, they must be in deg.E. These have been converted accordingly.
Columns 'region_id', 'layer_height', and 'layer_mean_depth' missing. Stand-in values have been added.
Beginning biodata ingestion for: 'C:/Data/EchopopData/echopop_1995/Biological/1995-2023_biodata_redo.xlsx'.
Loading stratification files...
Load in haul-based stratificat

## Example batch scripting

These commands can also be run over multiple scripts by either directly inputting multiline commands into PowerShell/bash, or creating batch/shell files:

::::{tab-set}
:::{tab-item} Windows
:sync: tab1

```powershell
# PowerShell example
$years = 1995, 1998, 2001, 2003
$failedYears = @()
foreach ($year in $years) {
    Write-Host "Running workflow for year $year..."
    try {
        python year_specific_workflow.py --year hake_$year --verbose
        Write-Host "✅ Completed year $year"
    }
    catch {
        Write-Host "❌ Error encountered for year $year"
        $failedYears += $year
    }
}
```
:::
:::{tab-item} Linux/macOS
:sync: tab2

```bash
# Bash/Linux/macOS example
years=(1995 1998 2001 2003)
failed_years=()
for year in "${years[@]}"; do
    echo "Running workflow for year $year..."
    if python year_specific_workflow.py --year "hake_$year" --verbose; then
        echo "✅ Completed year $year"
    else
        echo "❌ Error encountered for year $year"
        failed_years+=("$year")
    fi
done
```
:::
::::