# Get Global Hydrography from NGA TDX-Hydro 

This notebook demonstrates how to use functions in the [WikiWatershed/global-hydrography](https://github.com/WikiWatershed/global-hydrography) package to fetch data files from the TDX-Hydro dataserts released by the [US National Geospatial-Intelligence Agency (NGA)](https://www.nga.mil).

It uses processes that were explored in these notebooks:
- `sandbox/explore_data_sources.ipynb`
- `sandbox/reading_files.ipynb`

## Python Imports
Using common conventions and following the [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html): 
- https://google.github.io/styleguide/pyguide.html#s2.2-imports

In [1]:
import os
from pathlib import Path
from importlib import reload

import fsspec
# import pandas as pd
import geopandas as gpd
# import pyogrio
# import pyarrow as pa

## Set Paths for Data Inputs/Outputs
Use the [`pathlib`](https://docs.python.org/3/library/pathlib.html) library, whose many benfits for managing paths over  `os` library or string-based approaches are described in [this blog post](https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f).
- [pathlib](https://docs.python.org/3/library/pathlib.html) user guide: https://realpython.com/python-pathlib/

In [2]:
# Confirm your current working directory (cwd) and repo/project directory
working_dir = Path.cwd()
project_dir = working_dir.parent
data_dir = project_dir / 'data_temp' # a temporary data directory that we .gitignore
data_dir

PosixPath('/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp')

## Create local file system using `fsspec` library

We'll use the Filesystem Spec ([`fsspec`](https://filesystem-spec.readthedocs.io)) library and its extensions throughout this project to provide a unified pythonic interface to local, remote and embedded file systems and bytes storage.

In [3]:
# Create local file system using fsspec library
# local_fs = fsspec.implementations.local.LocalFileSystem()
local_fs = fsspec.filesystem('local') 

In [4]:
# List files in our temporary data directory
local_fs.ls(data_dir)

['/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/global_hydrography.qgz',
 '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/geoglows-v2',
 '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/.DS_Store',
 '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/test_downcast_gdf.parquet',
 '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/test_gdf.parquet',
 '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/test_pa_gdf.parquet',
 '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/test_ga_pa_df.parquet',
 '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/test_gpd_gdf.parquet',
 '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/test_pa_geo_df.parquet',
 '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/io_10m_annual_lulc',
 '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/nhdplus2',
 '/Users/aaufdenkampe/Docume

In [5]:
# List file details (equivalent to file info)
local_data_list = local_fs.ls(data_dir, detail=True)
# Show first item's details
local_data_list[0]

{'name': '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/global_hydrography.qgz',
 'size': 136499,
 'type': 'file',
 'created': 1721235118.8378024,
 'islink': False,
 'mode': 33188,
 'uid': 502,
 'gid': 20,
 'mtime': 1721235118.8372164,
 'ino': 301838904,
 'nlink': 1}

# NGA TDX-Hydro

Data downloadable from the National Geospatial-Intelligence Agency (NGA) Office for Geomatics website, https://earth-info.nga.mil/, under the "Geosciences" tab.

The [TDX-Hydro Technical Document](https://earth-info.nga.mil/php/download.php?file=tdx-hydro-technical-doc) provides detailed information on how the datasets were developed and validated.

In [6]:
# Create local data directory
tdx_dir = data_dir / 'nga'
tdx_dir

PosixPath('/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/nga')

In [7]:
local_fs.exists(tdx_dir)

True

## Get TDX-Hydro Metadata

TDX-Hydro datasets are organized into 55 continental sub-units using the same 10-digit Level 2 codes (HYBAS_ID) developed by [HydroSHEDS v1 HydroBASINS](https://www.hydrosheds.org/products/hydrobasins). More information on the semantics of these codes are provided in the [HydroBASINS Technical Documentation](https://data.hydrosheds.org/file/technical-documentation/HydroBASINS_TechDoc_v1c.pdf).

The exact boundaries of the TDX-Hydro 55 continental sub-units differ from HydroBASINS, and are provided by the "Basin GeoJSON File with ID Numbers" (95 MB).

In [8]:
tdx_root_url = 'https://earth-info.nga.mil/php/download.php'

tdx_hydrobasins_filename = Path('hydrobasins_level2.geojson')
tdx_hydrobasins_filepath = tdx_dir / tdx_hydrobasins_filename
# Download URL for Basin GeoJSON File with ID Numbers
tdx_hydrobasins_url = f'{tdx_root_url}?file={tdx_hydrobasins_filename.stem}'
tdx_hydrobasins_url

'https://earth-info.nga.mil/php/download.php?file=hydrobasins_level2'

In [9]:
# Set up file system for TDX-Hydro HTTP filesystem, which unfortunately 
# isn't set up in accessible directories so files need to be accessed one at a time.
tdx_fs = fsspec.filesystem(protocol='http')

In [13]:
# Get info on the file, which should only take a few seconds
tdx_fs.info(tdx_hydrobasins_url)

{'name': 'https://earth-info.nga.mil/php/download.php?file=hydrobasins_level2',
 'size': 95389402,
 'mimetype': 'application/octet-stream',
 'url': 'https://earth-info.nga.mil/php/download.php?file=hydrobasins_level2',
 'type': 'file'}

In [15]:
%%time
# Get the remote file and save to local directory, returns None
if not tdx_hydrobasins_filepath.exists:
    tdx_fs.get(tdx_hydrobasins_url, str(tdx_hydrobasins_filepath))
else:
    print('We have it!')

We have it!
CPU times: user 147 µs, sys: 94 µs, total: 241 µs
Wall time: 217 µs


In [16]:
# Confirm info of local file matches remote file
local_fs.info(tdx_hydrobasins_filepath)

{'name': '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/nga/hydrobasins_level2.geojson',
 'size': 95389402,
 'type': 'file',
 'created': 1721407912.568478,
 'islink': False,
 'mode': 33188,
 'uid': 502,
 'gid': 20,
 'mtime': 1715117902.7792468,
 'ino': 267369098,
 'nlink': 1}

### Alternate approach using requests.

Given that fsspec might not have benefits for a php system

In [None]:
import requests
import io

In [None]:
%%time
response = requests.get(tdx_hydrobasins_url)
response.ok

CPU times: user 1.44 s, sys: 1.11 s, total: 2.56 s
Wall time: 1min 36s


True

In [None]:
hydro_content = response.content
hydro_content?

[0;31mType:[0m        bytes
[0;31mString form:[0m b'{\n"type": "FeatureCollection",\n"name": "hydrobasins_level2",\n"crs": { "type": "name", "prope <...> 900621202275, 46.956377156575549 ], [ 97.958333333333371, 46.954166666666694 ] ] ] ] } }\n]\n}\n'
[0;31mLength:[0m      95389402
[0;31mDocstring:[0m  
bytes(iterable_of_ints) -> bytes
bytes(string, encoding[, errors]) -> bytes
bytes(bytes_or_buffer) -> immutable copy of bytes_or_buffer
bytes(int) -> bytes object of size given by the parameter initialized with null bytes
bytes() -> empty bytes object

Construct an immutable array of bytes from:
  - an iterable yielding integers in range(256)
  - a text string encoded using the specified encoding
  - any object implementing the buffer API.
  - an integer

In [None]:
with open(tdx_dir / 'test.json', mode='wb') as localfile:
    localfile.write(response.content)

In [None]:
import json

In [None]:
hydro_dict = json.loads(hydro_content.decode('utf-8'))