For some of the functionalities of rgispy, you will need [RGIS](https://github.com/bmfekete/RGIS) in your local environment. 

```
mamba create -n rgis python=3.9 gdal ipykernel geopandas xarray rasterio rioxarray sqlalchemy geoalchemy2 psycopg2 climata

~/my-conda-envs/rgis/pip install git+git://github.com/dvignoles/rgispy@main
```

Swap to your 'rgis' kernel in this notebook.

In [1]:
import pandas as pd
import xarray as xr
from pathlib import Path

In [2]:
from rgispy.network import gdbn_to_netcdf_base
from rgispy.mask import get_mask_ds, get_point_mask_from_df
from rgispy.sample import sample_ds
from rgispy.postprocess import join_sampled_files, georeference_sampled, normalize_sampled_files, get_sampled_df_byattr

In [3]:
# Change to wherever you want the outputs of this notebook to end up
OUTPUT_DIR = Path.cwd().joinpath('demo_outputs')
if not OUTPUT_DIR.exists():
    OUTPUT_DIR.mkdir()

In [4]:
# The datastream we want to sample
ds = Path('/asrc/ecr/balazs/WBMdsFiles/CONUS/Network_03min/TCfull+WBM20WTempPrist/CONUS_Output_Discharge_TCfull+WBM20WTempPrist_03min_dTS2020.gds.gz')

# the WBM network we are working on
net_gdbn = Path('/asrc/ecr/balazs/GHAAS2/RGISarchive/CONUS/Network/HydroSTN30/03min/Static/CONUS_Network_HydroSTN30_03min_Static.gdbn.gz')

# The network converted to netcdf (we'll create this)
net_nc = OUTPUT_DIR.joinpath('CONUS_Network_HydroSTN30_Static.nc')

# The Mask we will use to sample the WBM output grids (we'll create this)
mask_nc  = OUTPUT_DIR.joinpath('CONUS_Mask_HydronSTN30_Static.nc')

### Setup

For starters we need a representation of the network we work with in python. 

In [5]:
help(gdbn_to_netcdf_base)

Help on function gdbn_to_netcdf_base in module rgispy.network:

gdbn_to_netcdf_base(in_gdbn: pathlib.Path, out_netcdf: pathlib.Path, project: str = '') -> pathlib.Path
    Convert .gdbn rgis network to netcdf network compatible with rgispy
    Raises:
        Exception: unable to encode maximum value
        Exception: unable to encode maximum value
    Returns:
        Path: Path to created netcdf network



In [6]:
if not net_nc.exists():
    gdbn_to_netcdf_base(net_gdbn, net_nc, project="Demo")

network = xr.open_dataset(net_nc)
network

In [7]:
network['Order']

To sample WBM outputs, you basically need to know which WBM Network CellIDs you are interested in. 

The gauges in the csv below have been "snapped" to the network and associated with a CellID. 

If you have a `gdbc` file of snapped features, you can use `rgis2table <gdbc file> > myfeatures.csv` to export them. 

In [None]:
gauges_subset = pd.read_csv('input_data/CONUS_Gauges_HydroSTN30_03min_Static_Subset.csv', dtype={'station_id':str})
gauges_subset

Using the CellIDs of the gauges, we create a mask of the network. We will then iterate over the records in our WBM output keeping only our desired cells. 

In [None]:
mask = get_mask_ds(network)
gauges_mask = get_point_mask_from_df(gauges_subset, network, wbm_fieldname='Cellid')
mask = mask.assign(Gauges=gauges_mask)
mask.to_netcdf(mask_nc)
mask

`sample_ds` iterates over the datastream and samples a list of Masks. In this case we are passing the function our mask netcdf file, and just one mask to sample, `'Gauges'`.

If you would like to sample gdbc files, you can first convert them to datastreams with `rgis2ds --template <network gdbn file>` or use `rgispy.sample.sample_gdbc` which does the conversion at runtime. 

### Sampling Outputs

In [None]:
help(sample_ds)

In [None]:
from rgispy.sample import sample_gdbc
help(sample_gdbc)

In [None]:
sample_ds(
    mask_nc,
    ds, 
    ['Gauges',],
    OUTPUT_DIR,
    2020,
    'Discharge',
    'Daily',
)

### Sampled Results

The outputs of the sampling process are in wide format with the first column being the CellID identifier

Each Year of data (each from a different datastream) will output as its own csv

The next section demonstrates some convenience pandas wrappers for reading in these csvs. 

In [None]:
discharge_csvs = sorted(OUTPUT_DIR.joinpath('Gauges', 'Daily').glob('Discharge*.csv'))
gauges_sample =join_sampled_files(discharge_csvs)
gauges_sample.head()

In [None]:
# extract lat lons from the network
georeference_sampled(gauges_sample, network)

In [None]:
normalize_sampled_files(discharge_csvs, 'Discharge', gauges_subset)

In [None]:
# You can select by attribute (in this case station_id)
get_sampled_df_byattr(discharge_csvs, gauges_subset, 'station_id', '03036500', normalize=False, stacked=True, variable='Discharge',)

## USGS Data

You can use the climata package to download usgs data. USGS Dicharge needs to be converted from cubic feet to cubic meters per second.

In [None]:
from climata.usgs import DailyValueIO

In [None]:
stations = gauges_subset.station_id
dates = pd.date_range(start="1/1/2020", end="12/31/2020", freq="D").tolist()

In [None]:
DISCHARGE = "00060" # ft^3/s
RIVTEMP = '00010' # Celsius
FT3_TO_M3 = 0.0283168

In [None]:
def download_usgs_df(station_id: str, param_id: str, date_list=None) -> pd.DataFrame:
    
    if date_list is None:
        date_list = pd.date_range(start="1/1/1990", end="12/31/2020", freq="D").tolist()
    
    data = DailyValueIO(
        start_date=date_list[0],
        end_date=date_list[-1],
        station=station_id,
        parameter=param_id,
    )

    if len(data.keys()) == 0:
        return (station_id, param_id, None)
    else:
        for series in data:
            value = [r[1] for r in series.data]
            dates = [r[0] for r in series.data]

        df = pd.DataFrame(value, index=dates)
        df['station_id'] = station_id
        return(station_id, param_id, df)

In [None]:
station, param, df = download_usgs_df(stations[0], DISCHARGE, dates)

In [None]:
df.head()

In [None]:
usgs_discharge = OUTPUT_DIR.joinpath('usgs_discharge.csv')

if not usgs_discharge.exists():
    results = []
    nones = []
    
    for i, gauge in enumerate(stations):
        
        result = download_usgs_df(gauge, DISCHARGE, dates)
        if result[2] is None:
            nones.append(result)
        
        results.append(result)
        
        print(f"{i * 10} %")
        
    gauge_dfs = [x[2] for x in results if x[2] is not None]
    usgs_discharge_df = pd.concat(gauge_dfs)
    usgs_discharge_df = usgs_discharge_df.rename(columns={0: "discharge"}).set_index('station_id', append=True)
    usgs_discharge_df.index = usgs_discharge_df.index.rename(['date', 'station_id'])
    usgs_discharge_df = usgs_discharge_df.sort_values(['station_id', 'date'])

    # convert to m^3/s
    usgs_discharge_df['discharge'] = usgs_discharge_df['discharge'] * FT3_TO_M3
    usgs_discharge_df = usgs_discharge_df.rename(columns={'discharge':'usgs_discharge'})
    usgs_discharge_df.to_csv(usgs_discharge)

    print(f"{len(nones)} gauges returned no usgs results")

In [None]:
usgs_discharge_df

From here you can compare the USGS discharge and WBM results directly. 

In [None]:
# cleanup (delete everything)
import shutil
shutil.rmtree(OUTPUT_DIR)