# **Subset CONUS and run ParFlow-CLM**
Subsetting a small box domain for 1 month to test data assembly workflows for a CONUS2 domain. 

This is a first cut eventually we need to cover everything in [this outline](https://docs.google.com/document/d/1TNZCPCYj1qsA4OlMlN3NB6XSOuV5fUmY4w4f6O7dm_8/edit?pli=1&tab=t.0) from Reed


#### Inputs needed for training: 
**Transient:** 
- evap-trans file (trasient) 
- pressure (starting and labeled)

**Static inputs:** 
- slopes: x & y (2*2D)
- Perm: Kx, ky & Kz (3*3D) (we have just need to be added)
- Porosity (1*3D) (we have just need to be added)
- Van Genuchten (2*3D) (sres, ssat)
- Specific Storage (1*3D)
- Mannings (1*2D)
- Flow barrier? 


### Questions: 
- Do we want to generate Perm, Porosity VG and SS once for all of CONUS2 or do them on the fly now for subsets -- I'm thinking generate them once for all. 

### Import the required libraries

In [8]:
import matplotlib.pyplot as plt
import numpy as np
import os
from parflow import Run
from parflow.tools.io import read_pfb, read_clm, write_pfb
from parflow.tools.fs import mkdir
from parflow.tools.settings import set_working_directory
import subsettools as st
import hf_hydrodata as hf

In [9]:
# You need to register on https://hydrogen.princeton.edu/pin before you can use the hydrodata utilities
#email = input('Enter your hydrgen email address')
#pin = input('Enter your hydrogen PIN')
email='lecondon@email.arizona.edu'
pin=1234
#print('Registering ' + email + ' (PIN=' + pin + ') for HydroData download' ) #use lecondon@email.arizona.edu and 1234
hf.register_api_pin(email, pin)

### 1. Define variables to access datasets in Hydrodata to subset and define write paths

#### Set your variables to specify which static and climate forcing data you would like to subset in Hydrodata

In [4]:
runname = "test_box1"
base_dir = os.path.join("/Users/laura/Documents/Research/NAIRR")
variable_list=['slope_x', 'slope_y', 'pme', 'ss_pressure_head', 'pf_indicator', 'pf_flowbarrier', 'mannings', 'specific_storage', 'sres' , 'ssat' , 'top_patch', 'porosity', 'permeability_x', 'permeability_y' , 'permeability_z']

# provide information about the datasets you want to access for run inputs using the data catalog
start = "2005-10-01"
end = "2005-10-03"
grid = "conus2"
var_ds = "conus2_domain"

# set the directory paths where you want to write your subset files and make directories for static and transient inputs
input_dir = os.path.join(base_dir, f"{runname}_{grid}_{end[:4]}WY")
static_write_dir = os.path.join(input_dir, "static")
mkdir(static_write_dir)
transient_write_dir = os.path.join(input_dir, "transient")
mkdir(transient_write_dir)

### 2. Get the desired ParFlow i/j bbox from user provided geospatial information 

In [10]:
#Define a box domain using the i,j indices
box_size = 64 #assuming a square box 
lower_left= [1000,1000] # lowerleft corner of the box using i,j indices
ij_bounds = tuple([lower_left[0], lower_left[1], lower_left[0]+box_size, lower_left[1]+ box_size])

nj = ij_bounds[3] - ij_bounds[1]
ni = ij_bounds[2] - ij_bounds[0]
print(f"bounding box: {ij_bounds}")
print(f"nj: {nj}")
print(f"ni: {ni}")

# Read the mask file and check what portion of the domain is in the active CONUS2 domain 
options = {
      "dataset":"conus2_domain", "variable": "mask",  "grid_bounds": ij_bounds
}
mask = hf.get_gridded_data(options)
outside_frac = (np.count_nonzero(np.isnan(mask)))/(box_size*box_size)*100
print(str(outside_frac) + '% of the domain is outside the mask')


bounding box: (1000, 1000, 1064, 1064)
nj: 64
ni: 64
0.0% of the domain is outside the mask


### 4. Subset ParFlow Files

**Note: We either need to get the parameter table and use this to translate the indicator file to the other variables we need or we need to save them out for CONUS2 somewhere. 

Additional needed variables: 
- permeability
- porosity
- VGN Alpha 
- VGN n

In [11]:
static_paths = st.subset_static(ij_bounds, dataset=var_ds, write_dir=static_write_dir, var_list=variable_list)
print(ij_bounds)

  static_paths = st.subset_static(ij_bounds, dataset=var_ds, write_dir=static_write_dir, var_list=variable_list)


Wrote slope_x.pfb in specified directory.
Wrote slope_y.pfb in specified directory.
Wrote pme.pfb in specified directory.
Wrote ss_pressure_head.pfb in specified directory.
Wrote pf_indicator.pfb in specified directory.
Wrote pf_flowbarrier.pfb in specified directory.
Wrote mannings.pfb in specified directory.
Wrote specific_storage.pfb in specified directory.
Wrote sres.pfb in specified directory.
Wrote ssat.pfb in specified directory.
Wrote top_patch.pfb in specified directory.
Wrote porosity.pfb in specified directory.
Wrote permeability_x.pfb in specified directory.
Wrote permeability_y.pfb in specified directory.
Wrote permeability_z.pfb in specified directory.
(1000, 1000, 1064, 1064)


### 4. Subset transient PF pressure files

In [None]:
#Get the pressure files for a single month
dataset = "conus2_baseline"
start_date = '2002-10-01'
end_date= '2002-10-02'

#Get the pressure files from hydrodata
options_p = {
      "dataset": dataset, "variable": "pressure_head", "temporal_resolution": "hourly",
      "start_time": start_date, "end_time": end_date, "grid_bounds": ij_bounds, 
}
data_p = hf.get_gridded_data(options)
#hf.get_gridded_files(options)
print(data_p.shape)
print('Pressure files downloaded from Hydrodata')

#Get the evaptrans files from hydrodata
options_et = {
      "dataset": dataset, "variable": "parflow_evaptrans", "temporal_resolution": "hourly",
      "start_time": start_date, "end_time": end_date, "grid_bounds": ij_bounds, 
}
data_et = hf.get_gridded_data(options_et)
#hf.get_gridded_files(options)
print(data_et.shape)
print('Evaptrans files downloaded from Hydrodata')

#Write out the pressure and evaptrans as pfbs
for hour in range(data_p.shape[0]):
    file_name=f'{transient_write_dir}/pressure.{str(hour).zfill(5)}.pfb'
    write_pfb(file=file_name, array=data_p[hour,:,:,:], dist=False)

    file_name=f'{transient_write_dir}/evaptrans.{str(hour).zfill(5)}.pfb'
    write_pfb(file=file_name, array=data_et[hour,:,:,:], dist=False)


print('Pressure and ET files written to transient directory')


(64, 64)
Pressure files downloaded from Hydrodata




ValueError: Timeout error from server. Try again later or try to reduce the size of data in the API request using time or space filters.

### 5. Subset transient evaptrans files

In [None]:
#Get the pressure files for a single month
dataset = "conus2_baseline"
variable = "pressure_head"
start_date = '2003-06-01'
end_date= '2003-06-02'

options = {
      "dataset": dataset, "variable": "pressure_head", "temporal_resolution": "hourly",
      "start_time": start_date, "end_time": end_date, "grid_bounds": ij_bounds, 
}
data = hf.get_gridded_data(options)
#hf.get_gridded_files(options)
print(data.shape)
print('Pressure files downloaded from Hydrodata')

for hour in range(data.shape[0]):
    file_name=f'{transient_write_dir}/pressure.{str(hour).zfill(5)}.pfb'
    write_pfb(file=file_name, array=data[hour,:,:,:], dist=False)
print('Pressure files written to transient directory')

#### Not used just data catalog searching examples

In [36]:
#Doing some data catalog searching to pick the pressure files to get
datasets = hf.get_datasets(variable = "pressure_head")
print(datasets)

options = {"dataset": "conus2_baseline", "grid": "conus2"}
variables = hf.get_variables(options)
print(variables)

options = {
   "dataset": "conus2_baseline", "variable": "pressure_head",
}
metadata = hf.get_catalog_entry(options)
print(metadata)

datasets = hf.get_datasets()
print(datasets)

options = {"dataset": "conus2_domain"}
variables = hf.get_variables(options)
print(variables)

['conus1_baseline_85', 'conus1_baseline_mod', 'conus1_current_conditions', 'conus2_baseline']
['evapotranspiration', 'ground_evap', 'ground_evap_heat', 'ground_heat', 'ground_temp', 'infiltration', 'irrigation', 'latent_heat', 'outward_longwave_radiation', 'pressure_head', 'saturation', 'sensible_heat', 'soil_moisture', 'soil_temp', 'streamflow', 'subsurface_storage', 'surface_water_storage', 'swe', 'transpiration', 'transpiration_leaves', 'water_table_depth']
{'id': '558', 'dataset': 'conus2_baseline', 'dataset_version': '', 'file_type': 'pfb', 'variable': 'pressure_head', 'dataset_var': 'press', 'temporal_resolution': 'hourly', 'units': 'm', 'aggregation': '-', 'grid': 'conus2', 'path': 'spinup.wy{wy}.out.press.{wy_hour:05d}.pfb', 'file_grouping': 'wy_hour', 'entry_start_date': '2002-10-01', 'entry_end_date': '2003-09-30', 'documentation_notes': '', 'site_type': '', 'variable_type': 'subsurface', 'has_z': 'TRUE', 'dataset_type': 'parflow', 'datasource': 'hydroframe', 'paper_dois': No