# Running year-specific workflows
This directory contains year-specific worfklows that can be manually edited to suit whatever needs. The current files all assume that that the biomass estimates are extrapolated for the entire kriging mesh grid. Years 2003-2023 are configured to read in the unconsolidated Echoview exports, but an example for how to read-in the pre-formatted NASC exports (similar to what is used for years 1995-2001) is included for each year. 

## Workflow parameters

At the top of each script file are a variety of arguments, which vary from year to year depending on different ingestion needs. 

- `VERBOSE`: A boolean parameter that when `True` prints logging messages the indicate script progress.
- `DATA_ROOT`: This is a parameter-of-convenience that defines the root directory of the dataset files. Importantly: all filepaths have to be `pathlib.Path`-objects.
- `REPORTS_DIR`: This is the directory for where the `*.xlsx` reports files are generated. 
- `NASC_PREPROCESSED` *2003-2023 only*: A boolean parameter that when `True` will read in a pre-formatted NASC export file. Otherwise, the raw Echoview exports will be read in and processed for use. 
- `NASC_EXPORTS_FILES`/`NASC_EXPORTS_SHEET`: The filepath and sheetname of the associated export file(s), with the assumption that it is an `*.xlsx` file.
- `BIODATA_FILE`/`BIODATA_SHEETS_MAP`: The filepath and sheet-key mapping for the biological master spreadsheet. `BIODATA_SHEETS_MAP` is a dictionary that maps the name of the sheets associated with the `catch`, `length`, and `specimen` datasets, e.g.
```python
        BIODATA_SHEETS_MAP = {
            "catch": "biodata_catch",
            "length": "biodata_length",
            "specimen": "biodata_specimen",
        }
```
- `HAUL_STRATA_FILE`/`HAUL_STRATA_SHEETS_MAP`: The filepath and sheet-key mapping for the haul-based stratification definitions. `HAUL_STRATA_SHEETS_MAP` is a dictionary that maps the name of the sheets associated with the `INPFC` and `KS` strata, e.g.
```python
        HAUL_STRATA_SHEETS_MAP = {
            "inpfc": "INPFC",
            "ks": "Base KS",
        }
```
- `GEOSTRATA_FILE`/`GEOSTRATA_SHEETS_MAP`: The filepath and sheet-key mapping for the geographic-based stratification definitions. `GEOSTRATA_SHEETS_MAP` is a dictionary that maps the name of the sheets associated with the `INPFC` and `KS` strata, e.g.
```python
        GEOSTRATA_SHEETS_MAP = {
            "inpfc": "INPFC",
            "ks": "stratification1",
        }
```
- `KRIGING_MESH_FILE`/`KRIGING_MESH_SHEET`: The filepath and sheetname for the kriging mesh file.
- `KRIGING_VARIOGRAM_PARAMETERS_FILE`/`KRIGING_VARIOGRAM_PARAMETERS_SHEET`: The filepath and sheetname for the kriging and variogram parameters file.
- `ISOBATH_FILE`/`ISOBATH_SHEET`: The filepath and sheetname for the 200m isobath file used for transforming spatial coordinates in the variogram and kriging analyses. 

## Quick access and data processing
Once a file has been fully parameterized and configured, then the files can be set up to be run from command line or from the code cells of a `Jupyter` notebook. This utilizes the `year_specific_workflow.py` file and, optionally, `run_year_specific.bat`. This is configured to include two arguments:
- `year`: The specific workflow year/file, e.g. `hake_2023`.
- `verbose`: When this is included, logging messages are recorded and printed out at the conclusion of the script's execution.

### Python example (from terminal)

In [None]:
python year_specific_workflow.py --year hake_2023
python year_specific_workflow.py --year hake_2021 --verbose

### Python example (in a notebook)

In [None]:
!python year_specific_workflow.py --year hake_2023
!python year_specific_workflow.py --year hake_2021 --verbose

### Batch example (from terminal)

In [None]:
run_year_specific --year hake_2019
run_year_specific --year hake_2017 --verbose

### Batch example (in a notebook)

In [None]:
!run_year_specific.bat --year hake_2019
!run_year_specific.bat --year hake_2017 --verbose

### Example verbose output

In [10]:
!python year_specific_workflow.py --year hake_1995 --verbose

Aged-length haul counts report saved to 'C:/Users/Brandyn/Documents/GitHub/EchoPro_data/echopop_1995/reports/aged_length_haul_counts.xlsx'.
Total haul length counts report saved to 'C:/Users/Brandyn/Documents/GitHub/EchoPro_data/echopop_1995/reports/total_length_haul_counts.xlsx'.
Kriged aged biomass mesh report saved to 'C:/Users/Brandyn/Documents/GitHub/EchoPro_data/echopop_1995/reports/kriged_aged_biomass_mesh_full.xlsx'.
Kriged aged biomass mesh report saved to 'C:/Users/Brandyn/Documents/GitHub/EchoPro_data/echopop_1995/reports/kriged_aged_biomass_mesh_nonzero.xlsx'.
Kriged mesh report saved to 'C:/Users/Brandyn/Documents/GitHub/EchoPro_data/echopop_1995/reports/kriged_biomass_mesh_full.xlsx'.
Kriged mesh report saved to 'C:/Users/Brandyn/Documents/GitHub/EchoPro_data/echopop_1995/reports/kriged_biomass_mesh_nonzero.xlsx'.
Kriged age-length abundance report saved to 'C:/Users/Brandyn/Documents/GitHub/EchoPro_data/echopop_1995/reports/kriged_length_age_abundance_report.xlsx'.
Krige

top-level pandera module will be **removed in a future version of pandera**.
If you're using pandera to validate pandas objects, we highly recommend updating
your import:

```
# old import
import pandera as pa

# new import
import pandera.pandas as pa
```

If you're using pandera to validate objects from other compatible libraries
like pyspark or polars, see the supported libraries section of the documentation
for more information on how to import pandera:

https://pandera.readthedocs.io/en/stable/supported_libraries.html


```
```

Reading pre-generated NASC export file: 'C:/Users/Brandyn/Documents/GitHub/EchoPro_data/echopop_1995/Exports/US&CAN_detailsa_1995_table1y+_ALL_final.xlsx'.
---- Converting AFSC NASC export format to FEAT
     Default interval distance: 0.5 nmi
     Default transect spacing: 10.0 nmi
     Including transects: 1 to 400
---- Filtering out off-effort transect intervals based on: C:\Users\Brandyn\Documents\GitHub\EchoPro_data\echopop_1995\Kriging_files\Kriging_g