# DO Ecology Extraction

This notebook takes Ecology's bounding scenario model output files in NetCDF format (available at https://fortress.wa.gov/ecy/ezshare/EAP/SalishSea/SalishSeaModelBoundingScenarios.html) and extracts the bottom DO values at just the nodes of interest (based on the domain nodes shapefile created using [ProcessGrid](ProcessGrid.ipynb)). A new NetCDF file is created.

The Ecology NetCDF files are over 370 GB each. The CDF file output by this notebook is much smaller, about 450 MB depending on the size of the domain, which makes later processing steps easier to repeat.

In [1]:
exist_cdf = "model_results/2008_SSM4_WQ_Exist1_nodes.nc"
reference_cdf = "model_results/2008_SSM4_WQ_Ref1_nodes.nc"
domain_nodes_shp = "gis/ssm domain nodes.shp"

do_output_cdf = "model_results/bottom do 2008.nc"

from netCDF4 import Dataset
import geopandas as gpd
import pandas as pd
import numpy as np
%matplotlib widget

Load the shapefile containing the domain nodes as a GeoDataFrame. This gives us the node IDs to extract data for

In [2]:
domain_nodes = gpd.read_file(domain_nodes_shp)
domain_nodes.set_index('node_id', inplace=True)
domain_nodes.head()

Unnamed: 0_level_0,depth,geometry
node_id,Unnamed: 1_level_1,Unnamed: 2_level_1
4369,45.183998,POINT (515771.670 5333564.600)
4370,51.813999,POINT (515865.500 5334886.900)
4371,51.813999,POINT (516496.440 5336059.500)
4372,55.544998,POINT (517099.320 5337226.800)
4373,60.431,POINT (518039.180 5338339.500)


In [3]:
exist = Dataset(exist_cdf)
exist

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): IJK(160120), Time(8760)
    variables(dimensions): float32 Var_1(Time, IJK), float32 Var_2(Time, IJK), float32 Var_3(Time, IJK), float32 Var_4(Time, IJK), float32 Var_5(Time, IJK), float32 Var_6(Time, IJK), float32 Var_7(Time, IJK), float32 Var_8(Time, IJK), float32 Var_9(Time, IJK), float32 Var_10(Time, IJK), float32 Var_11(Time, IJK), float32 Var_12(Time, IJK), float32 Var_13(Time, IJK), float32 Var_14(Time, IJK), float32 Var_15(Time, IJK), float32 Var_16(Time, IJK), float32 Var_17(Time, IJK), float32 Var_18(Time, IJK), float32 Var_19(Time, IJK), float32 Var_20(Time, IJK), float32 Var_21(Time, IJK), float32 Var_22(Time, IJK), float32 Var_23(Time, IJK), float32 Var_24(Time, IJK), float32 Var_25(Time, IJK), float32 Var_26(Time, IJK), float32 Var_27(Time, IJK), float32 Var_28(Time, IJK), float32 Var_29(Time, IJK), float32 Var_30(Time, IJK), float32 Var_31(Time, IJK), float32 Var_

Define the values of the dimension IJK we want.

IJK is a representation of the 10 depth points per node, zero-indexed, so to get the bottom points we need to multiply the node number minus 1 by 10 and add 9. For instance, if we wanted the bottom point of node 1 we'd get IJK index 9, and for node 2 we'd get IJK index 19. This simplifies to the expression in the cell below.

In [4]:
node_ids = domain_nodes.sort_index().index.to_numpy()
ijk_index = node_ids * 10 - 1
display(ijk_index)
print(len(ijk_index))

array([ 43689,  43699,  43709, ..., 160099, 160109, 160119])

6120


Extract the existing-condition bottom DO

In [5]:
exist_bottom_do = exist['Var_10'][:,ijk_index]
exist_bottom_do

masked_array(
  data=[[ 5.88764,  6.36058,  6.74573, ..., 10.7888 , 10.9379 , 10.6644 ],
        [ 5.92326,  6.37053,  6.84681, ..., 11.2257 , 11.5005 , 11.6176 ],
        [ 5.22892,  5.70077,  6.14503, ..., 11.2945 , 11.7367 , 11.7856 ],
        ...,
        [ 6.49599,  6.451  ,  6.48955, ..., 11.3442 , 11.4697 , 11.1686 ],
        [ 6.59279,  6.74018,  7.01533, ..., 11.2688 , 11.4488 , 11.1914 ],
        [ 6.63089,  6.77024,  7.02655, ..., 11.2138 , 11.4021 , 11.1586 ]],
  mask=False,
  fill_value=1e+20,
  dtype=float32)

In [6]:
exist_bottom_do.shape

(8760, 6120)

Now repeat this for the reference condition

In [7]:
ref = Dataset(reference_cdf)
ref_bottom_do = ref['Var_10'][:,ijk_index]
ref_bottom_do

masked_array(
  data=[[ 5.89682,  6.3745 ,  6.76485, ..., 10.8232 , 10.9671 , 10.702  ],
        [ 5.93117,  6.3823 ,  6.86365, ..., 11.2624 , 11.53   , 11.6579 ],
        [ 5.23647,  5.712  ,  6.15984, ..., 11.3305 , 11.7651 , 11.8256 ],
        ...,
        [ 6.50992,  6.46475,  6.50368, ..., 11.3948 , 11.4882 , 11.2272 ],
        [ 6.60842,  6.75745,  7.03503, ..., 11.3251 , 11.4731 , 11.2507 ],
        [ 6.64774,  6.78907,  7.04787, ..., 11.2739 , 11.4339 , 11.2185 ]],
  mask=False,
  fill_value=1e+20,
  dtype=float32)

In [8]:
do_output = Dataset(do_output_cdf, "w")
timeDim = do_output.createDimension('time', exist_bottom_do.shape[0])
nodeDim = do_output.createDimension('node', exist_bottom_do.shape[1])
nodeVar = do_output.createVariable('node', "i4", ('node'))
do_output['node'][:] = node_ids
# Time values are not given in the Ecology output files, so recreate them based on a 1-hour
# (1/4-day) interval
timeVar = do_output.createVariable('time', "f4", ('time'))
do_output['time'][:] = np.arange(0, exist_bottom_do.shape[0]/24, 1/24)
existVar = do_output.createVariable('existing', "f4", ('time','node'))
do_output['existing'][:] = exist_bottom_do
refVar = do_output.createVariable('reference', "f4", ('time','node'))
do_output['reference'][:] = ref_bottom_do
do_output

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): time(8760), node(6120)
    variables(dimensions): int32 node(node), float32 time(time), float32 existing(time, node), float32 reference(time, node)
    groups: 

In [9]:
do_output.close()