This notebook describes how to access and assemble the pre-cached and prepped data files from ScienceBase to build out an aggregate national table in PostGIS and then build materialized views for use in the Biogeographic Information System. It builds on processes run from the cacheNHDRepoCatalogs.py and cacheFlowlineData.py scripts in this repo.

In [1]:
import requests
from IPython.display import display

def convert_size(size_bytes):
    import math
    if size_bytes == 0:
        return "0B"
    size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
    i = int(math.floor(math.log(size_bytes, 1024)))
    p = math.pow(1024, i)
    s = round(size_bytes / p, 2)
    return "%s %s" % (s, size_name[i])

In [2]:
parentId = '5644f3c1e4b0aafbcd0188f1' # ScienceBase ID for the Data Reference Library; our overall virtual repository for this kind of stuff
tag = 'NHDPlusV1' # ScienceBase Tag applied to these particular items; essentially the mini-repositories for NHDPlusV1 data corresponding to FTP file directories
scienceBaseQueryURL = f'https://www.sciencebase.gov/catalog/items?parentId={parentId}&filter=tags%3D{tag}&fields=files&format=json'

In [3]:
processItems = requests.get(scienceBaseQueryURL).json()

Cutting right to the chase, there's really not much to worry about at this point. All of the flow line files were created with a particular file name pattern. They are zipped shapefiles containing an extraction of just the flowline data from each individual processing unit. Some of the river basins/geographic regions contain more than one processing unit, so we need to pull the files we care about into a list.

The idea from here will be to run a utility to rrab up these files and load them into PostGIS. I output the file name and size (in MB) just for reference. The only thing needed is the file URL to download the file to wherever it needs to go for loading to PostGIS.

In [4]:
filePattern = 'FlowlineExtract.zip'

for item in processItems['items']:
    for file in [f for f in item['files'] if f['name'].find(filePattern) > -1]:
        print(file['url'], file['name'], convert_size(file['size']))

https://www.sciencebase.gov/catalog/file/get/5a3d5d1ee4b0d05ee8b8e6cc?f=__disk__18%2F11%2Fd5%2F1811d54f43ff8f06cdcb27e1ec47d5ed8390c868 NHDPlus05V01_03_NHD_FlowlineExtract.zip 165.44 MB
https://www.sciencebase.gov/catalog/file/get/5a3d5d1ee4b0d05ee8b8e6cc?f=__disk__72%2Ff3%2Fcd%2F72f3cda315895d6f2ab90e94f8117463e3396af4 NHDPlus06V01_03_NHD_FlowlineExtract.zip 51.48 MB
https://www.sciencebase.gov/catalog/file/get/5a3d5d1ee4b0d05ee8b8e6cc?f=__disk__9b%2Fda%2F33%2F9bda332a1534f5f67a3d966ca79d9ccf9f2d74f4 NHDPlus07V01_03_NHD_FlowlineExtract.zip 221.21 MB
https://www.sciencebase.gov/catalog/file/get/5a3d5d1ee4b0d05ee8b8e6cc?f=__disk__73%2F47%2F0f%2F73470fd86aa137afff0a40464d57d348ea46cfa2 NHDPlus08V01_03_NHD_FlowlineExtract.zip 156.24 MB
https://www.sciencebase.gov/catalog/file/get/5a3d5d1ee4b0d05ee8b8e6cc?f=__disk__06%2F73%2F9f%2F06739ff7687441135e9b328404336d794add0c33 NHDPlus10LV01_03_NHD_FlowlineExtract.zip 298.53 MB
https://www.sciencebase.gov/catalog/file/get/5a3d5d1ee4b0d05ee8b8e6cc?

One of the main things we need to do often in our work is find the stream segments from flowline data that are in some area of interest. There are a couple ways to do this, but the first one we need to stand up uses the expedient of establishing a point that is in the middle of the multiline geometry. The following SQL code will establish a materialized view from a notional table name for the aggregate data instantiated in PostGIS as nhd.nhdplusv1_flowline. This can be modified as needed for whatever we end up creating as schema/table.

```
 SELECT transformed.gid,
    transformed.ftype,
    transformed.comid,
    transformed.lengthkm,
    (st_line_interpolate_point(st_linemerge(transformed.the_geom), (0.5)::double precision))::geometry(Point,5070) AS the_geom
   FROM ( SELECT nhdplusv1_flowline.gid,
            nhdplusv1_flowline.ftype,
            nhdplusv1_flowline.comid,
            nhdplusv1_flowline.lengthkm,
            (st_transform(nhdplusv1_flowline.the_geom, 5070))::geometry(MultiLineString,5070) AS the_geom
           FROM nhd.nhdplusv1_flowline) transformed;
```