# Get Global Hydrography from NGA TDX-Hydro 

This notebook demonstrates how to use functions in the [WikiWatershed/global-hydrography](https://github.com/WikiWatershed/global-hydrography) package to fetch data files from the TDX-Hydro dataserts released by the [US National Geospatial-Intelligence Agency (NGA)](https://www.nga.mil).

It uses processes that were explored in these notebooks:
- `sandbox/explore_data_sources.ipynb`
- `sandbox/reading_files.ipynb`

# Installation and Setup

Carefully follow our **[Installation Instructions](README.md#get-started)**, especially including:
- Creating a virtual environment for this repository (step 3)

## Python Imports
Using common conventions and following the [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html): 
- https://google.github.io/styleguide/pyguide.html#s2.2-imports

In [1]:
import os
from pathlib import Path
from importlib import reload

import fsspec
import s3fs
import numpy as np
import pandas as pd
import geopandas as gpd
import pyogrio
import pyarrow as pa

In [2]:
# Confirm conda environment
os.environ['CONDA_DEFAULT_ENV']

'hydrography'

In [3]:
# Custom functions for Global Hydrography
import global_hydrography as gh

### If you get `ModuleNotFoundError`:

Then follow Installation instructions Step **4. Add your `global_hydrography` Path to Miniconda/Anaconda sites-packages** in the main ReadMe, running the following in your console, replacing `/your/path/to/global_hydrography/src` with your specific path.

```console
conda develop '/your/path/to/global_hydrography/src'
```

Then restart the kernel and rerun the imports above.


In [4]:
! conda develop '/Users/aaufdenkampe/Documents/Python/global-hydrography/src'

path exists, skipping /Users/aaufdenkampe/Documents/Python/global-hydrography/src
completed operation for: /Users/aaufdenkampe/Documents/Python/global-hydrography/src


In [5]:
# Explore the namespace for global-hydrography modules, functions, etc.
dir(gh)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 'io']

In [6]:
# Check imports
gh.io.hybas_region_dict

{'af': 'Africa',
 'ar': 'North American Arctic',
 'as': 'Central and South-East Asia',
 'au': 'Australia and Oceania',
 'eu': 'Europe and Middle East',
 'gr': 'Greenland',
 'na': 'North America and Caribbean sa South America',
 'si': 'Siberia'}

In [7]:
# Reload the module as necessary during development.
reload(gh.io)

<module 'global_hydrography.io' from '/Users/aaufdenkampe/Documents/Python/global-hydrography/src/global_hydrography/io.py'>

## Set Paths for Data Inputs/Outputs
Use the [`pathlib`](https://docs.python.org/3/library/pathlib.html) library, whose many benfits for managing paths over  `os` library or string-based approaches are described in [this blog post](https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f).
- [pathlib](https://docs.python.org/3/library/pathlib.html) user guide: https://realpython.com/python-pathlib/

In [8]:
# Confirm your current working directory (cwd) and repo/project directory
working_dir = Path.cwd()
project_dir = working_dir.parent
data_dir = project_dir / 'data_temp' # a temporary data directory that we .gitignore
data_dir

PosixPath('/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp')

## Create local file system using `fsspec` library

We'll use the Filesystem Spec ([`fsspec`](https://filesystem-spec.readthedocs.io)) library and its extensions throughout this project to provide a unified pythonic interface to local, remote and embedded file systems and bytes storage.

In [9]:
# Create local file system using fsspec library
# local_fs = fsspec.implementations.local.LocalFileSystem()
local_fs = fsspec.filesystem('local') 

In [10]:
# List files in our temporary data directory
local_fs.ls(data_dir)

['/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/global_hydrography.qgz',
 '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/geoglows-v2',
 '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/.DS_Store',
 '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/io_10m_annual_lulc',
 '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/nhdplus2',
 '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/hydrobasins',
 '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/nga']

In [12]:
# List file details (equivalent to file info)
local_data_list = local_fs.ls(data_dir, detail=True)
# Show first item's details
local_data_list[0]

{'name': '/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/global_hydrography.qgz',
 'size': 136080,
 'type': 'file',
 'created': 1715865642.721543,
 'islink': False,
 'mode': 33188,
 'uid': 502,
 'gid': 20,
 'mtime': 1715865642.721089,
 'ino': 270190000,
 'nlink': 1}

# NGA TDX-Hydro

Data downloadable from the National Geospatial-Intelligence Agency (NGA) Office for Geomatics website, https://earth-info.nga.mil/, under the "Geosciences" tab.

The [TDX-Hydro Technical Document](https://earth-info.nga.mil/php/download.php?file=tdx-hydro-technical-doc) provides detailed information on how the datasets were developed and validated.

In [17]:
# Create local data directory
tdx_dir = data_dir / 'nga'
tdx_dir

PosixPath('/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/nga')

In [16]:
local_fs.exists(tdx_dir)

True

## Get TDX-Hydro Metadata

TDX-Hydro datasets are organized into 55 continental sub-units using the same 10-digit Level 2 codes (HYBAS_ID) developed by [HydroSHEDS v1 HydroBASINS](https://www.hydrosheds.org/products/hydrobasins). More information on the semantics of these codes are provided in the [HydroBASINS Technical Documentation](https://data.hydrosheds.org/file/technical-documentation/HydroBASINS_TechDoc_v1c.pdf).

The exact boundaries of the TDX-Hydro 55 continental sub-units differ from HydroBASINS, and are provided by the "Basin GeoJSON File with ID Numbers" (95 MB).

In [24]:
tdx_root_url = 'https://earth-info.nga.mil/php/download.php'

tdx_hydrobasins_filename = 'hydrobasins_level2'
# Download URL for Basin GeoJSON File with ID Numbers
tdx_hydrobasins_url = f'{tdx_root_url}?file={tdx_hydrobasins_filename}'
tdx_hydrobasins_url

'https://earth-info.nga.mil/php/download.php?file=hydrobasins_level2'

In [25]:
# Set up file system for TDX-Hydro HTTP filesystem, which unfortunately 
# isn't set up in accessible directories so files need to be accessed one at a time.
tdx_fs = fsspec.filesystem(protocol='http')

In [26]:
# Get info on the file, which should only take a few seconds
tdx_fs.info(tdx_hydrobasins_url)

{'name': 'https://earth-info.nga.mil/php/download.php?file=hydrobasins_level2',
 'size': 95389402,
 'mimetype': 'application/octet-stream',
 'url': 'https://earth-info.nga.mil/php/download.php?file=hydrobasins_level2',
 'type': 'file'}

In [27]:
%%time
# Get the remote file and save to local directory, returns None
tdx_fs.get(tdx_hydrobasins_url, str(tdx_dir))


CPU times: user 2.85 s, sys: 1.5 s, total: 4.36 s
Wall time: 59 s


[None]

In [34]:
# NOTE: the file is named 'download.php?file=hydrobasins_level2'
local_fs.ls(tdx_dir)[5]

'/Users/aaufdenkampe/Documents/Python/global-hydrography/data_temp/nga/download.php?file=hydrobasins_level2'

TODO: Wrap the command below into a function that:
- includes a 'local_filename' argument
- checks if the 'local_filename' exists
- If not, downloads the file and renames it to 'local_filename'

The check might look something like this:

```py
if local_filepath.exists():
    print('We have it!')
else:
    tdx_fs.get(tdx_hydrobasins_url, str(tdx_dir))
    print('Dowloaded!')
```


## Get TDX-Hydro GPKG files by HYBAS_ID

# End