NOAA-OWP · CarsonPruitt-NOAA · Nov 17, 2023 · Nov 8, 2023 · Nov 8, 2023 · Nov 8, 2023
diff --git a/.github/workflows/lint_and_format.yaml b/.github/workflows/lint_and_format.yaml
@@ -0,0 +1,22 @@
+name: Lint and Format Using Pre-Commit
+
+on:
+  pull_request:
+    branches:
+      - dev
+      - main
+  workflow_dispatch:
+
+permissions:
+  contents: read
+
+jobs:
+  lint-and-format:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v3
+    - name: Set up Python
+      uses: actions/setup-python@v4
+      with:
+        python-version-file: pyproject.toml
+    - uses: pre-commit/action@v3.0.0
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -14,11 +14,10 @@ repos:
         -   id: check-json
 
 -   repo: https://github.com/PyCQA/flake8
-    rev: 6.0.0
+    rev: 6.1.0
     hooks:
         -   id: flake8
-            entry: pflake8
-            additional_dependencies: [pyproject-flake8]
+            additional_dependencies: [flake8-pyproject]
 
 -   repo: https://github.com/psf/black
     rev: 23.7.0

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -65,7 +65,7 @@ If you would like to contribute, please follow these steps:
     # Check all files in the repo
     pre-commit run -a
 
-    # Run only the black formatting tool
+    # Run only the flake8 formatting tool
     pre-commit run -a flake8
     ```
 

diff --git a/Pipfile b/Pipfile
@@ -39,7 +39,7 @@ scipy = "==1.10.1"
 gval = "==0.2.3"
 flake8 = "==6.0.0"
 black = "==23.7.0"
-pyproject-flake8 = "==6.0.0.post1"
+flake8-pyproject = "==1.2.3"
 pre-commit = "==3.3.3"
 isort = "==5.12.0"
 urllib3 = "==1.26.18"

diff --git a/Pipfile.lock b/Pipfile.lock
diff --git a/data/wbd/generate_pre_clip_fim_huc8.py b/data/wbd/generate_pre_clip_fim_huc8.py
@@ -131,7 +131,7 @@ def pre_clip_hucs_from_wbd(wbd_file, outputs_dir, huc_list, number_of_jobs, over
     if number_of_jobs > total_cpus_available:
         print(
             f'Provided: -j {number_of_jobs}, which is greater than than amount of available cpus -2: '
-            f'{total_cpus_available -2} will be used instead.'
+            f'{total_cpus_available - 2} will be used instead.'
         )
         number_of_jobs = total_cpus_available - 2
 

diff --git a/data/write_parquet_from_calib_pts.py b/data/write_parquet_from_calib_pts.py
@@ -231,7 +231,7 @@ def create_parquet_files(
     if number_of_jobs > total_cpus_available:
         logging.info(
             f'Provided: -j {number_of_jobs}, which is greater than than amount of available cpus -1: '
-            f'{total_cpus_available -1} will be used instead.'
+            f'{total_cpus_available - 1} will be used instead.'
         )
         number_of_jobs = total_cpus_available - 1
 

diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md
@@ -1,6 +1,45 @@
 All notable changes to this project will be documented in this file.
 We follow the [Semantic Versioning 2.0.0](http://semver.org/) format.
 
+## v4.4.6.0 - 2023-11-9 - [PR#1030](https://github.com/NOAA-OWP/inundation-mapping/pull/1030)
+
+This PR introduces the `.github/workflows/lint_and_format.yaml` file which serves as the first step in developing a Continuous Integration pipeline for this repository. 
+The `flake8-pyproject` dependency is now used, as it works out of the box with the `pre-commit` GitHub Action in the GitHub Hosted Runner environment.
+In switching to this package, a couple of `E721` errors appeared. Modifications were made to the appropriate files to resolve the `flake8` `E721` errors.
+Also, updates to the `unit_tests` were necessary since Branch IDs have changed with the latest code.  
+
+A small fix was also included where `src_adjust_ras2fim_rating.py` which sometimes fails with an encoding error when the ras2fim csv sometimes is created or adjsuted in windows.
+
+### Changes
+- `.pre-commit-config.yaml`: use `flake8-pyproject` package instead of `pyproject-flake8`.
+- `Pipfile` and `Pipfile.lock`: updated to use `flake8-pyproject` package instead of `pyproject-flake8`.
+- `data`
+    - `/wbd/generate_pre_clip_fim_huc8.py`: Add space between (-) operator line 134.
+    - `write_parquet_from_calib_pts.py`: Add space between (-) operator line 234.
+- `src`
+    - `check_huc_inputs.py`: Change `== string` to `is str`, remove `import string`
+    - `src_adjust_ras2fim_rating.py`: Fixed encoding error.
+- `tools`
+    - `eval_plots.py`: Add space after comma in lines 207 & 208
+    - `generate_categorical_fim_mapping.py`: Use `is` instead of `==`, line 315
+    - `hash_compare.py`: Add space after comma, line 153.
+    - `inundate_mosaic_wrapper.py`: Use `is` instead of `==`, line 73.
+    - `inundation_wrapper_nwm_flows.py`: Use `is not` instead of `!=`, line 76.
+    - `mosaic_inundation.py`: Use `is` instead of `==`, line 181.
+- `unit_tests`
+    - `clip_vectors_to_wbd_test.py`: File moved to data/wbd directory, update import statement
+    - `filter_catchments_and_add_attributes_params.json`: Update Branch ID
+    - `outputs_cleanup_params.json`: Update Branch ID
+    - `split_flows_params.json`: Update Branch ID
+    - `usgs_gage_crosswalk_params.json`: Update Branch ID & update argument to gage_crosswalk.run_crosswalk
+    - `unit_tests/usgs_gage_crosswalk_test.py`: Update params to gage_crosswalk.run_crosswalk
+
+### Additions 
+- `.github/workflows/`
+    - `lint_and_format.yaml`: Add GitHub Actions Workflow file for Continuous Integration environment (lint and format test).
+
+<br/><br/>
+
 ## v4.4.5.0 - 2023-10-26 - [PR#1018](https://github.com/NOAA-OWP/inundation-mapping/pull/1018)
 
 During a recent BED attempt which added the new pre-clip system, it was erroring out on a number of hucs. It was issuing an error in the add_crosswalk.py script. While a minor bug does exist there, after a wide number of tests, the true culprit is the memory profile system embedded throughout FIM. This system has been around for at least a few years but not in use. It is not 100% clear why it became a problem with the addition of pre-clip, but that changes how records are loaded which likely affected memory at random times.

diff --git a/src/check_huc_inputs.py b/src/check_huc_inputs.py
@@ -4,7 +4,6 @@
 import argparse
 import os
 import pathlib
-import string
 from glob import glob
 from logging import exception
 
@@ -59,8 +58,8 @@ def __clean_huc_value(huc):
 
 def __check_for_membership(hucs, accepted_hucs_set):
     for huc in hucs:
-        if (type(huc) == string) and (not huc.isnumeric()):
-            msg = f"Huc value of {huc} does not appear to be a number."
+        if (type(huc) is str) and (not huc.isnumeric()):
+            msg = f"Huc value of {huc} does not appear to be a number. "
             msg += "It could be an incorrect value but also could be that the huc list "
             msg += "(if you used one), is not unix encoded."
             raise KeyError(msg)

diff --git a/src/src_adjust_ras2fim_rating.py b/src/src_adjust_ras2fim_rating.py
@@ -52,7 +52,9 @@ def create_ras2fim_rating_database(ras_rc_filepath, ras_elev_df, nwm_recurr_file
     print('Reading RAS2FIM rating curves from csv...')
     log_text = 'Processing database for RAS2FIM flow/WSE at NWM flow recur intervals...\n'
     col_filter = ["fid_xs", "flow", "wse"]
-    ras_rc_df = pd.read_csv(ras_rc_filepath, dtype={'fid_xs': object}, usecols=col_filter)  # , nrows=30000)
+    ras_rc_df = pd.read_csv(
+        ras_rc_filepath, dtype={'fid_xs': object}, usecols=col_filter, encoding="unicode_escape"
+    )  # , nrows=30000)
     ras_rc_df.rename(columns={'fid_xs': 'location_id'}, inplace=True)
     # ras_rc_df['location_id'] = ras_rc_df['feature_id'].astype(object)
     print('Duration (read ras_rc_csv): {}'.format(dt.datetime.now() - start_time))

diff --git a/tools/eval_plots.py b/tools/eval_plots.py
@@ -204,8 +204,8 @@ def scatterplot(dataframe, x_field, y_field, title_text, stats_text=False, annot
     axes.tick_params(labelsize='xx-large')
 
     # Define y axis label and x axis label.
-    axes.set_ylabel(f'{y_field.replace("_"," ")}', fontsize='xx-large', weight='bold')
-    axes.set_xlabel(f'{x_field.replace("_"," ")}', fontsize='xx-large', weight='bold')
+    axes.set_ylabel(f'{y_field.replace("_", " ")}', fontsize='xx-large', weight='bold')
+    axes.set_xlabel(f'{x_field.replace("_", " ")}', fontsize='xx-large', weight='bold')
 
     # Plot diagonal line
     diag_range = [0, 1]

diff --git a/tools/generate_categorical_fim_mapping.py b/tools/generate_categorical_fim_mapping.py
@@ -312,7 +312,7 @@ def reformat_inundation_maps(
         handle = os.path.split(extent_grid)[1].replace('.tif', '')
         diss_extent_filename = os.path.join(gpkg_dir, f"{handle}_{huc}_dissolved.gpkg")
         extent_poly_diss["geometry"] = [
-            MultiPolygon([feature]) if type(feature) == Polygon else feature
+            MultiPolygon([feature]) if type(feature) is Polygon else feature
             for feature in extent_poly_diss["geometry"]
         ]
 

diff --git a/tools/hash_compare.py b/tools/hash_compare.py
@@ -150,7 +150,7 @@ def compare_gpkg(file1, file2, list_of_failed_files=[], verbose=False):
     except AssertionError as e:
         print(f"\n {str(e)} \n")
         print("  The following files failed assert_geodataframe_equal: ")
-        print(f"    -{file1.rsplit('/',1)[-1]} ")
+        print(f"    {file1.rsplit('/', 1)[-1]} ")
         list_of_failed_files.append(f1_gdf)
 
 

diff --git a/tools/inundate_mosaic_wrapper.py b/tools/inundate_mosaic_wrapper.py
@@ -70,8 +70,9 @@ def produce_mosaicked_inundation(
         raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), hydrofabric_dir)
 
     # If the "hucs" argument is really one huc, convert it to a list
-    if type(hucs) == str:
+    if type(hucs) is str:
         hucs = [hucs]
+
     # Check that huc folder exists in the hydrofabric_dir.
     for huc in hucs:
         if not os.path.exists(os.path.join(hydrofabric_dir, huc)):

diff --git a/tools/inundation_wrapper_nwm_flows.py b/tools/inundation_wrapper_nwm_flows.py
@@ -73,7 +73,7 @@ def run_recurr_test(fim_run_dir, branch_name, huc_id, magnitude, mask_type='huc'
 
     # Check if magnitude is list of magnitudes or single value.
     magnitude_list = magnitude
-    if type(magnitude_list) != list:
+    if type(magnitude_list) is not list:
         magnitude_list = [magnitude_list]
 
     for magnitude in magnitude_list:

diff --git a/tools/mosaic_inundation.py b/tools/mosaic_inundation.py
@@ -178,7 +178,7 @@ def mosaic_final_inundation_extent_to_poly(inundation_raster, inundation_polygon
         extent_poly = gpd.GeoDataFrame.from_features(list(results), crs=src.crs)
         extent_poly_diss = extent_poly.dissolve(by="extent")
         extent_poly_diss["geometry"] = [
-            MultiPolygon([feature]) if type(feature) == Polygon else feature
+            MultiPolygon([feature]) if type(feature) is Polygon else feature
             for feature in extent_poly_diss["geometry"]
         ]
 

diff --git a/unit_tests/clip_vectors_to_wbd_test.py b/unit_tests/clip_vectors_to_wbd_test.py
@@ -4,10 +4,11 @@
 import os
 import unittest
 
-import clip_vectors_to_wbd as src
 import pytest
 from unit_tests_utils import FIM_unit_test_helpers as ut_helpers
 
+import data.wbd.clip_vectors_to_wbd as src
+
 
 class test_clip_vectors_to_wbd(unittest.TestCase):
 

diff --git a/unit_tests/filter_catchments_and_add_attributes_params.json b/unit_tests/filter_catchments_and_add_attributes_params.json
@@ -1,10 +1,10 @@
 {
     "valid_data": {
         "outputDestDir": "/outputs/unit_test_data",
-        "input_catchments_filename": "/data/outputs/unit_test_data/02020005/branches/3246000006/gw_catchments_reaches_3246000006.gpkg",
-        "input_flows_filename": "/data/outputs/unit_test_data/02020005/branches/3246000006/demDerived_reaches_split_3246000006.gpkg",
-        "output_catchments_filename": "/data/outputs/unit_test_data/02020005/branches/3246000006/gw_catchments_reaches_filtered_addedAttributes_3246000006.gpkg",
-        "output_flows_filename": "/data/outputs/unit_test_data/02020005/branches/3246000006/demDerived_reaches_split_filtered_3246000006.gpkg",
+        "input_catchments_filename": "/data/outputs/unit_test_data/02020005/branches/2274000033/gw_catchments_reaches_2274000033.gpkg",
+        "input_flows_filename": "/data/outputs/unit_test_data/02020005/branches/2274000033/demDerived_reaches_split_2274000033.gpkg",
+        "output_catchments_filename": "/data/outputs/unit_test_data/02020005/branches/2274000033/gw_catchments_reaches_filtered_addedAttributes_2274000033.gpkg",
+        "output_flows_filename": "/data/outputs/unit_test_data/02020005/branches/2274000033/demDerived_reaches_split_filtered_2274000033.gpkg",
         "wbd_filename": "/data/outputs/unit_test_data/02020005/wbd8_clp.gpkg",
         "huc_code": "02020005"
     }

diff --git a/unit_tests/outputs_cleanup_params.json b/unit_tests/outputs_cleanup_params.json
@@ -1,8 +1,8 @@
 {
     "valid_specific_branch_data": {
-        "src_dir": "/data/outputs/unit_test_data/02020005/branches/3246000009",
+        "src_dir": "/data/outputs/unit_test_data/02020005/branches/2274000018",
         "deny_list": "/foss_fim/config/deny_branches.lst",
-        "branch_id": "3246000009",
+        "branch_id": "2274000018",
         "verbose": true
     },
     "valid_directory_data": {

diff --git a/unit_tests/split_flows_params.json b/unit_tests/split_flows_params.json
@@ -4,10 +4,10 @@
         "max_length": 1500,
         "slope_min": 0.001,
         "lakes_buffer_input": 20,
-        "flows_filename": "/data/outputs/unit_test_data/02020005/branches/3246000005/demDerived_reaches_3246000005.shp",
-        "dem_filename": "/data/outputs/unit_test_data/02020005/branches/3246000005/dem_thalwegCond_3246000005.tif",
-        "split_flows_filename": "/data/outputs/unit_test_data/02020005/branches/3246000005/demDerived_reaches_split_3246000005.gpkg",
-        "split_points_filename": "/data/outputs/unit_test_data/02020005/branches/3246000005/demDerived_reaches_split_points_3246000005.gpkg",
+        "flows_filename": "/data/outputs/unit_test_data/02020005/branches/2274000031/demDerived_reaches_2274000031.shp",
+        "dem_filename": "/data/outputs/unit_test_data/02020005/branches/2274000031/dem_thalwegCond_2274000031.tif",
+        "split_flows_filename": "/data/outputs/unit_test_data/02020005/branches/2274000031/demDerived_reaches_split_2274000031.gpkg",
+        "split_points_filename": "/data/outputs/unit_test_data/02020005/branches/2274000031/demDerived_reaches_split_points_2274000031.gpkg",
         "wbd8_clp_filename": "/data/outputs/unit_test_data/02020005/wbd8_clp.gpkg",
         "lakes_filename": "/data/outputs/unit_test_data/02020005/nwm_lakes_proj_subset.gpkg",
         "nwm_streams_filename": "/data/outputs/unit_test_data/02020005/nwm_subset_streams_levelPaths.gpkg"

diff --git a/unit_tests/usgs_gage_crosswalk_params.json b/unit_tests/usgs_gage_crosswalk_params.json
@@ -1,11 +1,12 @@
 {
     "valid_data": {
         "usgs_gages_filename": "/data/outputs/unit_test_data/02020005/usgs_subset_gages.gpkg",
-        "input_flows_filename": "/data/outputs/unit_test_data/02020005/branches/3246000005/demDerived_reaches_split_filtered_3246000005.gpkg",
-        "input_catchment_filename": "/data/outputs/unit_test_data/02020005/branches/3246000005/gw_catchments_reaches_filtered_addedAttributes_3246000005.gpkg",
-        "dem_filename": "/data/outputs/unit_test_data/02020005/branches/3246000005/dem_meters_3246000005.tif",
-        "dem_adj_filename": "/data/outputs/unit_test_data/02020005/branches/3246000005/dem_thalwegCond_3246000005.tif",
-        "output_table_filename": "/data/outputs/unit_test_data/02020005/branches/3246000005/usgs_elev_table.csv",
-        "branch_id": "3246000005"
+        "input_flows_filename": "/data/outputs/unit_test_data/02020005/branches/2274000028/demDerived_reaches_split_filtered_2274000028.gpkg",
+        "input_catchment_filename": "/data/outputs/unit_test_data/02020005/branches/2274000028/gw_catchments_reaches_filtered_addedAttributes_2274000028.gpkg",
+        "dem_filename": "/data/outputs/unit_test_data/02020005/branches/2274000028/dem_meters_2274000028.tif",
+        "dem_adj_filename": "/data/outputs/unit_test_data/02020005/branches/2274000028/dem_thalwegCond_2274000028.tif",
+        "output_table_filename": "/data/outputs/unit_test_data/02020005/branches/2274000028/usgs_elev_table.csv",
+        "output_directory": "/data/outputs/unit_test_data/02020005/branches/2274000028",
+        "branch_id": "2274000028"
     }
 }
diff --git a/unit_tests/usgs_gage_crosswalk_test.py b/unit_tests/usgs_gage_crosswalk_test.py
@@ -52,7 +52,7 @@ def test_GageCrosswalk_success(self):
             params["input_flows_filename"],
             params["dem_filename"],
             params["dem_adj_filename"],
-            params["output_table_filename"],
+            params["output_directory"],
         )
 
         # Make sure that the usgs_elev_table.csv was written