Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[21pt] PR: Adding Open Street Map bridge data pulling and healing scripts #1122

Merged
merged 34 commits into from May 6, 2024

Conversation

LauraKeys-NOAA
Copy link
Contributor

@LauraKeys-NOAA LauraKeys-NOAA commented Apr 16, 2024

This PR includes 2 scripts to add Open Street Map bridge data into the HAND process: a script that pulls data from OSM and a script that heals those bridges in the HAND grids. Both scripts should be run as part of a pre-processing step for FIM runs. They only need to be run if we think OSM data has changed a lot or for any new FIM versions.

A new docker image is also required for pull_osm_bridges.py (acquire and preprocess) script.

Additions

  • data/bridges/pull_osm_bridges.py: First pre-processing script that pulls OSM data and saves bridge lines out as separate shapefiles by HUC8 to a specified location
  • src/heal_bridges_osm.py: Second pre-processing script that uses the pre-saved OSM bridge lines and heals max HAND values across those bridge lines. Healed HAND grids are saved to a specified location.

Changes

  • Pipfile, Pipfile.lock: Adjusted files to add new python package to docker image.
  • data
    • clip_vectors_to_wdbd.py: Updated to pre-clip new bridge data. Logging upgraded.
    • generate_pre_clip_fim_huc8.py: Updated to pre-clip new bridge data. Logging added and a system for multi-process logging.
  • src
    • delineate_hydros_and_produce_HAND.sh: add python call to run heal_bridges_osm.py after hydraulic properties are calculated.
    • bash_variables.env: Added new variable for OSM bridges and adjusted pre-clip output date
    • utils
      • shared_functions.py: removed function no longer in use.
      • shared_variables.py: removed variables no longer in use.

Deployment Plan

  • Create new docker container. Note: A new docker image will be held back for a week or so as the new container is only required for bridge data acquisition. Another docker image update is coming soon and this will be built as that time.
  • Run pull_osm_bridges.py to get all new raw OSM bridge data. Now completed. Saved the file as inputs/osm/bridges/240426/osm_all_bridges.gpkg.
  • Copy new raw bridge data, osm_all_bridges.gpkg to all 6 enviros.
  • Run generate_pre_clip_fim_huc8.py to get a full new CONUS and Alaska set of pre-clip files. This is now done using folder name of 240502, however another PR is coming soon requiring all Alaska HUCs to be re-run. After the new Alaska pre-clips are run, it can be merged into 240502.
  • Copy all new pre-clip data to all enviros.
  • Ready for fim pipeline run.

Testing

  • pull_osm_bridges.py:
    • Completed running the tool to load all full-CONUS and Alaska data. This creates a new input file
  • heal_bridges_osm.py:
    • tested on a full-CONUS run
    • tested on TX shapefiles with ~200 HUCs and 5 HUCs
    • tested with specified HUCs of interest list

Issuer Checklist (For developer use)

You may update this checklist before and/or after creating the PR. If you're unsure about any of them, please ask, we're here to help! These items are what we are going to look for before merging your code.

  • Informative and human-readable title, using the format: [_pt] PR: <description>
  • Links are provided if this PR resolves an issue, or depends on another other PR
  • If submitting a PR to the dev branch (the default branch), you have a descriptive Feature Branch name using the format: dev-<description-of-change> (e.g. dev-revise-levee-masking)
  • Changes are limited to a single goal (no scope creep)
  • The feature branch you're submitting as a PR is up to date (merged) with the latest dev branch
  • pre-commit hooks were run locally
  • Any change in functionality is tested
  • Passes all unit tests locally (inside interactive Docker container, at /foss_fim/, run: pytest unit_tests/)
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • CHANGELOG updated with template version number, e.g. 4.x.x.x
  • Reviewers requested
  • Add yourself as an assignee in the PR as well as the FIM Technical Lead

Merge Checklist (For Technical Lead use only)

  • Update CHANGELOG with latest version number and merge date
  • Update the Citation.cff file to reflect the latest version number in the CHANGELOG
  • If applicable, update README with major alterations

@LauraKeys-NOAA LauraKeys-NOAA marked this pull request as ready for review April 17, 2024 21:27
@EmilyDeardorff EmilyDeardorff added the enhancement New feature or request label Apr 18, 2024
@EmilyDeardorff
Copy link
Contributor

This is potentially user error but I've run into the following error. I'm wondering if I used the wrong input for the -w command? Or if there's a different issue.

I ran this command:

python3 /foss_fim/data/bridges/pull_osm_bridges.py -w /data/inputs/wbd/WBD_National_GDB.gdb -p /osm_bridges/test_outputs/ -l /osm_bridges/test_outputs/osm_bridges.shp -m /osm_bridges/test_outputs/osm_bridges_midpts.shp

and received this error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py", line 3652, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'HUC8'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/foss_fim/data/bridges/pull_osm_bridges.py", line 145, in <module>
    process_osm_bridges(**args)
  File "/foss_fim/data/bridges/pull_osm_bridges.py", line 75, in process_osm_bridges
    huc_bridge_file = os.path.join(location, f"huc{huc['HUC8']}_osm_bridges.shp")
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/series.py", line 1007, in __getitem__
    return self._get_value(key)
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/series.py", line 1116, in _get_value
    loc = self.index.get_loc(label)
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py", line 3654, in get_loc
    raise KeyError(key) from err
KeyError: 'HUC8'

@LauraKeys-NOAA
Copy link
Contributor Author

Per @EmilyDeardorff 's finding, I'll modify the requirements to specify that the WBD / HUC8 shapefiles must contain the field "HUC8"

@EmilyDeardorff
Copy link
Contributor

I ran into the following error when testing the optional -w buffer width argument for heal_bridges_osm.py.

root@6bfc3e9cec1d:/# python3 foss_fim/src/heal_bridges_osm.py -g /data/outputs/dev-branch-outlet-backpools/ -s /osm_bridges/tx_hucs/tx_hucs.shp -b /osm_bridges/test_outputs/bridge_lines_folder/ -o /osm_bridges/test_outputs/ -u /osm_bridges/test_outputs/final_osm_hand/ -w 10
Opening CONUS HUC8 shapefile
** Processing 11080006
Traceback (most recent call last):
  File "//foss_fim/src/heal_bridges_osm.py", line 217, in <module>
    burn_bridges(**args)
  File "//foss_fim/src/heal_bridges_osm.py", line 122, in burn_bridges
    process_bridges_in_huc(
  File "//foss_fim/src/heal_bridges_osm.py", line 32, in process_bridges_in_huc
    osm_gdf.geometry = osm_gdf.buffer(buffer_width)
  File "/usr/local/lib/python3.10/dist-packages/geopandas/base.py", line 3634, in buffer
    return _delegate_geo_method(
  File "/usr/local/lib/python3.10/dist-packages/geopandas/base.py", line 85, in _delegate_geo_method
    data = getattr(a_this, op)(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/geopandas/array.py", line 707, in buffer
    vectorized.buffer(self._data, distance, resolution=resolution, **kwargs),
  File "/usr/local/lib/python3.10/dist-packages/geopandas/_vectorized.py", line 1020, in buffer
    return shapely.buffer(data, distance, quad_segs=resolution, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/shapely/decorators.py", line 77, in wrapped
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/shapely/constructive.py", line 181, in buffer
    return lib.buffer(
TypeError: ufunc 'buffer' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

@EmilyDeardorff
Copy link
Contributor

I also ran into the following error with the -r resolution optional flag. This is possibly just a case of the wrong input format being used. I tried 10, '10', and 10m as inputs.

root@6bfc3e9cec1d:/# python3 foss_fim/src/heal_bridges_osm.py -g /data/outputs/dev-branch-outlet-backpools/ -s /osm_bridges/tx_hucs/tx_hucs.shp -b /osm_bridges/test_outputs/bridge_lines_folder/ -o /osm_bridges/test_outputs/ -u /osm_bridges/test_outputs/final_osm_hand/ -r 10
Opening CONUS HUC8 shapefile
** Processing 11080006
/usr/local/lib/python3.10/dist-packages/rasterstats/io.py:335: NodataWarning: Setting nodata to -999; specify nodata explicitly
  warnings.warn(
Traceback (most recent call last):
  File "//foss_fim/src/heal_bridges_osm.py", line 217, in <module>
    burn_bridges(**args)
  File "//foss_fim/src/heal_bridges_osm.py", line 122, in burn_bridges
    process_bridges_in_huc(
  File "//foss_fim/src/heal_bridges_osm.py", line 49, in process_bridges_in_huc
    w = (xmax - xmin) // resolution
TypeError: unsupported operand type(s) for //: 'float' and 'str'

@LauraKeys-NOAA
Copy link
Contributor Author

@EmilyDeardorff it looks like I have some old versions of packages like shapely that use those data, and when I updated them, I get those errors now. They're being treated as strings instead of numeric values, so they'll just need a typecast. I'll push that update in just a second.

@RobHanna-NOAA
Copy link
Contributor

Can I get some information to what is needed to plug this in. It sounds like I need to full CONUS scale pull for the new input files. Then how does the next part work? Do we need to run it against all HUCs as part of pre-clip? or is it part of the run_unit_wb.sh or part of the post processing?

@EmilyDeardorff
Copy link
Contributor

Can I get some information to what is needed to plug this in. It sounds like I need to full CONUS scale pull for the new input files. Then how does the next part work? Do we need to run it against all HUCs as part of pre-clip? or is it part of the run_unit_wb.sh or part of the post processing?

This PR introduces two new scripts to be run individually, in order. The first script to be run is data/bridges/pull_osm_bridges.py and the second script to be run is src/heal_bridges_osm.py. They are not currently called in the pre-clip script. From what I understand, they just need to be run every time we thing the OSM bridges might have changed? Although @LauraKeys-NOAA might be able to clarify that.

EmilyDeardorff
EmilyDeardorff previously approved these changes Apr 23, 2024
Copy link
Contributor

@EmilyDeardorff EmilyDeardorff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions/Notes:

  • I would suggest updating the sample usage text to have the updated path (python3 foss_fim/src/heal_bridges_osm.py) rather than the current text (python3 /data/bridges/heal_bridges_osm.py)
  • The most recent docker image (dev_20240416_adb6e36) does not have the necessary osmnx package, but that issue will be resolved after the merge. I got around this during testing by using pip install osmnx within the docker container.

Testing Results:

Tested data/bridges/pull_osm_bridges.py and src/heal_bridges_osm.py. on the Texas HUCs wbd (on Dev1 at /home/laura.keys/wbd/tx_hucs.shp).

I tested the following commands, both of which ran as expected:

python3 /foss_fim/data/bridges/pull_osm_bridges.py -w /osm_bridges/tx_hucs/tx_hucs.shp -p /osm_bridges/test_outputs/ -l /osm_bridges/test_outputs/osm_bridges.shp -m /osm_bridges/test_outputs/osm_bridges_midpts.shp → ✅worked as expected

python3 foss_fim/src/heal_bridges_osm.py -g /data/outputs/dev-branch-outlet-backpools/ -s /osm_bridges/tx_hucs/tx_hucs.shp -b /osm_bridges/test_outputs/bridge_lines_folder/ -o /osm_bridges/test_outputs/ -u /osm_bridges/test_outputs/final_osm_hand/ → ✅worked as expected

I also tested the following three optional commands for the heal_bridges_osm.py script:

  • -r --resolution → ✅worked as expected after adjustment
  • -w --buffer_width → ✅worked as expected after adjustment
  • -i --hucs_of_interest → ✅worked as expected

I looked over the output results and from these scripts and they looked like I was expecting. I did not run FIM pipeline on the results.

@LauraKeys-NOAA
Copy link
Contributor Author

I updated the sample usage text per @EmilyDeardorff 's suggestion (since the example path I had listed wasn't accurate in the least) and also added in a note where we would incorporate lidar into HAND grids in the future

@RobHanna-NOAA
Copy link
Contributor

We have a new python package which needs to be added and a new Pipfile and Pipfile.lock checked in with this branch. For details on how to do this see FIM4 - Updating Python Packages for Docker. I can help guide if you like.

@CarsonPruitt-NOAA
Copy link
Collaborator

CarsonPruitt-NOAA commented Apr 25, 2024

Just realized we have a new input that needs to be feed into pull_owm_bridges. It is a gkpg with just one layer of HUC8s. Test copies at smaller scale exist but no full scale versions. I looked around and no existed. Generally, we need to identify that they exist, then make/find a one (or get help making one if need be) and document it well. These are the types of things that are critical to know for testing and deployment. I looked and couldn't find one, so am loading up the full WBD_National.gkpg and will save out the HUC8 layer, then progagate it to all enviros. :)

Thinking about this a little more... It may be more efficient to change the code up a bit to optionally accept a layer name as an argument. If a layer name is specified, then the code will know to just load that specific layer from the geopackage. This way, we don't have to keep around 2 copies of the exact same dataset. The WBD gpkg is quite a large vector dataset.

@RobHanna-NOAA
Copy link
Contributor

RobHanna-NOAA commented Apr 25, 2024

Just realized we have a new input that needs to be feed into pull_owm_bridges. It is a gkpg with just one layer of HUC8s. Test copies at smaller scale exist but no full scale versions. I looked around and no existed. Generally, we need to identify that they exist, then make/find a one (or get help making one if need be) and document it well. These are the types of things that are critical to know for testing and deployment. I looked and couldn't find one, so am loading up the full WBD_National.gkpg and will save out the HUC8 layer, then progagate it to all enviros. :)

Thinking about this a little more... It may be more efficient to change the code up a bit to optionally accept a layer name as an argument. If a layer name is specified, then the code will know to just load that specific layer from the geopackage. This way, we don't have to keep around 2 copies of the exact same dataset. The WBD gpkg is quite a large vector dataset.

I was playing with the script. It is all built based on shape files and was finicky, so I changed it over to using only gkpg, which is also easier to deploy. So, I made a copy of the WBD_National.HUC8 layer, stripped out all of the HUCs in the HUC2 area of 22. I set it's default projection to 4269 as the OSM api seems to like that better. I am still testing it and making some tweaks but it is going well. When the script creates the final gkpg output, I make sure it is in 5070.

@CarsonPruitt-NOAA
Copy link
Collaborator

I really want to avoid having to keep the same dataset around in a separate gpkg just to have it in a different projection. Can we use our default input WBD gpkg and reproject to 4269 after we load it in?

@RobHanna-NOAA
Copy link
Contributor

I really want to avoid having to keep the same dataset around in a separate gpkg just to have it in a different projection. Can we use our default input WBD gpkg and reproject to 4269 after we load it in?

Yes. I had trouble getting it load correctly against OSM, but have that worked out now. We can now use the full WBD_National.gpkg.

@RobHanna-NOAA
Copy link
Contributor

Continued to have trouble with WBD_National.gpkg which I think is because it's ESRI:102039, but I when I switch to another WBD that we have in use, WBD_National_EPSG_5070.gpkg. The problem is that the OSM api seems to only like EPSG:4326 going in. When I tried to reproject from ESRI:102039, memory couldn't seem to handle it. In all other places in our tools, all are masked first. This was the first time in FIM that anyone loaded the full WBD_National asking for the HUC8 layer, then re-projecting it.

Strange. All is well now using WBD_National_EPSG_5070.gpkg. It still has all HUC layers (2, 4, 6, 8) but not sure why we don't have just a WBD_National with HUC8 only. Even WBD_National_EPSG_5070.gpkg has all 4 layers.

Oh well. I also added multi-proc, a ton of more outputs, error trapping, fixed some errors, etc. I am going to retry a full scale now and see how we do.

RobHanna-NOAA
RobHanna-NOAA previously approved these changes May 6, 2024
@CarsonPruitt-NOAA CarsonPruitt-NOAA merged commit 4466db0 into dev May 6, 2024
1 check passed
@CarsonPruitt-NOAA CarsonPruitt-NOAA deleted the dev-osm-bridges branch May 6, 2024 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[13pt] Create module for downloading OSM bridge data [21pt] OSM Bridge elevation interpolation technique
4 participants