In [1]:
from pathlib import Path 
import pandas as pd
from rgispy.routines.sample import sample_wbm_dsdir, sample_wbm_gdbcdir
from rgispy.core import RgisDataStream, RgisPoint, Rgis
import os

In [2]:
RGISARCHIVE = Path(os.environ['RGISARCHIVE3'])
NETWORK = RGISARCHIVE.joinpath('CONUS/River-Network/HydroSTN30/15min/Static/CONUS_River-Network_HydroSTN30_15min_Static.gdbn.gz')

In [3]:
#rgispy needs a location to store temporary files
os.environ['SCRATCH'] = '/scratch/danielv'

In [4]:
tmp = Path.cwd().joinpath('tmp_output')
if not tmp.exists:
    tmp.mkdir()

## Conceptual Overview

To sample datastream files, you can use the `dsSampling` utility from RGIS. 

```
dsSampling -D <domain>.ds -M <mapper>.mapper -o <output>.gdbt <input>.gds
```

`<domain>`: The domain is another datastream file representing the domain used. You can call `rgis2domain` on the approrpiate river network `gdbn` file to create the domain. 

`<mapper>`: The mapper is a sampling feature, such as dam locations, basins, etc. You can create a mapper file by calling `rgis2mapper` on either a `.gdbp` point file or a `.gdbd` network aligned polygon file. 

`<output>`: The output is a rgis `.gdbt` table with the sampled time series. 

`<input>`: The input is any uncompressed rgis datastream. 

All of the CLI commands are available in rgispy to facilitate easier scripting. 

In [5]:
gdbp_path = tmp.joinpath('dams_15min.gdbp.gz')
domain_path = tmp.joinpath('CONUS_15min.gds')
mapper_path = tmp.joinpath('dams_15min.mapper')

Our sampling locations

In [6]:
dams = pd.read_csv('dams_15min.csv')
dams

Unnamed: 0.1,Unnamed: 0,ID,grand_id,XCoord,YCoord
0,0,1,40,-122.375,48.875
1,1,2,41,-121.125,48.625
2,2,3,42,-120.375,48.125
3,3,4,43,-121.375,47.875
4,4,5,44,-121.375,47.875
5,5,6,47,-121.625,47.875
6,6,7,48,-121.875,47.875
7,7,8,49,-120.125,48.625
8,8,9,50,-121.625,47.625


Convert csv -> gdbp -> mapper

In [7]:
gdbp = RgisPoint.from_df(dams, xcol='XCoord', ycol='YCoord')
gdbp = gdbp.to_file(gdbp_path, gzipped=True, replace_path=True)

In [8]:
rgis = Rgis()
rgis.rgis2mapper(NETWORK, gdbp, mapper_path)

PosixPath('/asrc/ecr/danielv/projects/rgispy/examples/sample/tmp_output/dams_15min.mapper')

Create domain file

In [9]:
rgis.rgis2domain(NETWORK, domain_path)

PosixPath('/asrc/ecr/danielv/projects/rgispy/examples/sample/tmp_output/CONUS_15min.gds')

Do sampling. 

The NaN value for sample 4 is a result of it being a duplicate sample location. Samples 4 and 5 are the same cell location. dsSampling will only return the value for one of these duplicate locations. 

In [10]:
gds1980 = RgisDataStream(Path('gds/CONUS_Output_RiverDischarge_DummyExp_15min_dTS1980.gds.gz'))

In [11]:
sample1980 = gds1980.dsSampling(domain_path, mapper_path)
sample1980.to_file('tmp_output/dams_15min_RiverDischarge_1980dTS.gdbt', replace_path=True)
sample1980.df()

Unnamed: 0,ID,Name,SampleID,Date,Value
0,1,Record:1,1,1980-01-01,55.056126
1,2,Record:2,2,1980-01-01,1.476103
2,3,Record:3,3,1980-01-01,0.976209
3,4,Record:4,4,1980-01-01,
4,5,Record:5,5,1980-01-01,11.689842
...,...,...,...,...,...
3289,3290,Record:3290,5,1980-12-31,10.778742
3290,3291,Record:3291,6,1980-12-31,56.845242
3291,3292,Record:3292,7,1980-12-31,141.801865
3292,3293,Record:3293,8,1980-12-31,6.614797


## Convenience Functions

Understanding the above is important for flexibility when scripting. However, you can use these all in one functions in most cases. 

As of the time of writing, this will only work with directories containing daily datastreams. By default the routine will create daily, monthly, and annual samples. Variables passed in as `accum_vars`, such as Runoff & Precipitation by default, will be summed temporally. All others will be averages temporaly. 

You can also pass in simple filters such as a year or variable if you wish to subset the datastreams in the directory. 

In [12]:
help(sample_wbm_dsdir)

Help on function sample_wbm_dsdir in module rgispy.routines.sample:

sample_wbm_dsdir(dsdir, network, samplers, output_dir, workers=1, ts_aggregate=True, outputs_only=True, ghaas_bin=None, scratch_dir=None, variables=[], accum_vars=['Runoff', 'Precipitation'], compress=False, filters=[])



In [13]:
sample_wbm_dsdir(
    Path('./gds'),
    NETWORK,
    [gdbp_path,],
    tmp.joinpath('sample_gds_output'),
    workers=2
)


        Workers: 2
        Aggregate Monthly/Annual: True

        Domain: CONUS
        Resolution: 15min
        Experiment: DummyExp
        Network: CONUS_River-Network_HydroSTN30_15min_Static.gdbn.gz

        Point Variables: RiverDischarge
        Zone Variables: 
        Sum Aggregation Variables: 
        Samplers: dams_15min.gdbp.gz
    


{'dams_15min': {'RiverDischarge': {'dTS': [PosixPath('/asrc/ecr/danielv/projects/rgispy/examples/sample/tmp_output/sample_gds_output/CONUS_DummyExp_15min/15min_CONUS_Output_RiverDischarge_DummyExp_15min_dTS1980.csv'),
    PosixPath('/asrc/ecr/danielv/projects/rgispy/examples/sample/tmp_output/sample_gds_output/CONUS_DummyExp_15min/15min_CONUS_Output_RiverDischarge_DummyExp_15min_dTS1981.csv')],
   'mTS': [PosixPath('/asrc/ecr/danielv/projects/rgispy/examples/sample/tmp_output/sample_gds_output/CONUS_DummyExp_15min/15min_CONUS_Output_RiverDischarge_DummyExp_15min_mTS1980.csv'),
    PosixPath('/asrc/ecr/danielv/projects/rgispy/examples/sample/tmp_output/sample_gds_output/CONUS_DummyExp_15min/15min_CONUS_Output_RiverDischarge_DummyExp_15min_mTS1981.csv')],
   'aTS': [PosixPath('/asrc/ecr/danielv/projects/rgispy/examples/sample/tmp_output/sample_gds_output/CONUS_DummyExp_15min/15min_CONUS_Output_RiverDischarge_DummyExp_15min_aTS1980.csv'),
    PosixPath('/asrc/ecr/danielv/projects/rgispy/e

The same functionality is available for gdbc files. These will simply be converted to datastreams using the provided river network before using the same sampling routine.

In [14]:
sample_wbm_gdbcdir(
    Path('./gdbc'),
    NETWORK,
    [gdbp_path,],
    tmp.joinpath('sample_gdbc_output'),
    workers=2
)


        Workers: 2
        Aggregate Monthly/Annual: True

        Domain: CONUS
        Resolution: DummyExp
        Experiment: RiverDischarge
        Network: CONUS_River-Network_HydroSTN30_15min_Static.gdbn.gz

        Point Variables: Output
        Zone Variables: Output
        Sum Aggregation Variables: 
        Samplers: dams_15min.gdbp.gz
    


{'dams_15min': {'Output': {'dTS': [PosixPath('/asrc/ecr/danielv/projects/rgispy/examples/sample/tmp_output/sample_gdbc_output/CONUS_RiverDischarge_DummyExp/15min_CONUS_Output_RiverDischarge_DummyExp_15min_dTS1980.csv'),
    PosixPath('/asrc/ecr/danielv/projects/rgispy/examples/sample/tmp_output/sample_gdbc_output/CONUS_RiverDischarge_DummyExp/15min_CONUS_Output_RiverDischarge_DummyExp_15min_dTS1981.csv')],
   'mTS': [PosixPath('/asrc/ecr/danielv/projects/rgispy/examples/sample/tmp_output/sample_gdbc_output/CONUS_RiverDischarge_DummyExp/15min_CONUS_Output_RiverDischarge_DummyExp_15min_mTS1980.csv'),
    PosixPath('/asrc/ecr/danielv/projects/rgispy/examples/sample/tmp_output/sample_gdbc_output/CONUS_RiverDischarge_DummyExp/15min_CONUS_Output_RiverDischarge_DummyExp_15min_mTS1981.csv')],
   'aTS': [PosixPath('/asrc/ecr/danielv/projects/rgispy/examples/sample/tmp_output/sample_gdbc_output/CONUS_RiverDischarge_DummyExp/15min_CONUS_Output_RiverDischarge_DummyExp_15min_aTS1980.csv'),
    Posi

The same functionality is also exposed via the CLI using `rgispySample`

In [15]:
%%bash 
rgispySample --help

Usage: rgispySample [OPTIONS] DIRECTORY

Options:
  -d, --outputdirectory DIRECTORY
  -v, --variable TEXT             If specified, filter to these variables
  -s, --sampler FILE
  -f, --filter TEXT               File name must contain filter str (case
                                  insenstive)
  -n, --network FILE
  -w, --workers INTEGER
  -a, --accum-var TEXT
  -t, --aggregatetime             Create monthly and annual results from daily
  -z, --gzipped                   compress csvs with gzip
  -g, --gdbc                      Directory contains gdbc, not datastreams
  --help                          Show this message and exit.


In [16]:
%%bash -s "$NETWORK" "$gdbp_path"
mkdir -p tmp_output/sample_gds_cli_output
rgispySample -d ./tmp_output/sample_gds_cli_output \
    -n $1 \
    -w 2 \
    -s $2 \
    ./gds


        Workers: 2
        Aggregate Monthly/Annual: False

        Domain: CONUS
        Resolution: 15min
        Experiment: DummyExp
        Network: CONUS_River-Network_HydroSTN30_15min_Static.gdbn.gz

        Point Variables: RiverDischarge
        Zone Variables: 
        Sum Aggregation Variables: 
        Samplers: dams_15min.gdbp.gz
    


In [17]:
%%bash
# cleanup
rm -r tmp_output