# Field and index subsetting using Pydap

*Subsetting* is the act of choosing parts of a dataset based on the type of one or more of its variables. There are several types of subsetting operations as follows.

## Field subsetting
Choosing specific variables (fields) from the dataset. A dataset in DAP is made up of a number of variables and those may be Structures or Sequences that contain fields.

In [1]:
import sys
from pydap.client import open_url

# Open remote file
dataset_url = 'https://iaos.opendap.terradue.com/thredds/dodsC/SMOS_SMAP/netCDF/north/2018/20181007_north_mix_sit_v100.nc'
dataset = open_url(dataset_url)

In [2]:
list(dataset.keys())

['smos_thickness',
 'smos_thickness_unc',
 'smap_thickness',
 'smap_thickness_unc',
 'combined_thickness',
 'combined_thickness_unc',
 'flags']

In [3]:
dataset.smos_thickness

<BaseType with data BaseProxy('https://iaos.opendap.terradue.com/thredds/dodsC/SMOS_SMAP/netCDF/north/2018/20181007_north_mix_sit_v100.nc', 'smos_thickness', dtype('>f4'), (896, 608), (slice(None, None, None), slice(None, None, None)))>

In [4]:
# An alternative syntax
dataset['smos_thickness']

<BaseType with data BaseProxy('https://iaos.opendap.terradue.com/thredds/dodsC/SMOS_SMAP/netCDF/north/2018/20181007_north_mix_sit_v100.nc', 'smos_thickness', dtype('>f4'), (896, 608), (slice(None, None, None), slice(None, None, None)))>

It is also possible to obtain the raw metadata for the dataset, using a simple http request, appending the suffix `.dds` (DAP2):

In [5]:
import requests
import pprint

r = requests.get(dataset_url + '.dds')
pprint.pprint(r.text)

('Dataset {\n'
 '    Float32 smos_thickness[X = 896][Y = 608];\n'
 '    Float32 smos_thickness_unc[X = 896][Y = 608];\n'
 '    Float32 smap_thickness[X = 896][Y = 608];\n'
 '    Float32 smap_thickness_unc[X = 896][Y = 608];\n'
 '    Float32 combined_thickness[X = 896][Y = 608];\n'
 '    Float32 combined_thickness_unc[X = 896][Y = 608];\n'
 '    Byte flags[X = 896][Y = 608];\n'
 '} SMOS_SMAP/netCDF/north/2018/20181007_north_mix_sit_v100.nc;\n')


In [6]:
r = requests.get(dataset_url + '.dds' + '?smos_thickness')
pprint.pprint(r.text)

('Dataset {\n'
 '    Float32 smos_thickness[X = 896][Y = 608];\n'
 '} SMOS_SMAP/netCDF/north/2018/20181007_north_mix_sit_v100.nc;\n')


### Notes

In the example above, we just asked for metadata but we did not download any data from the server. This is accomplished by leverging the index subsetting. 

## Index subsetting
Choosing parts of an array based on the indexes of that array's dimensions. This operation always returns an array of the same rank as the original, although the size of the return array will (likely) be smaller. Index subsetting uses the bracket syntax described subsequently.

Subsetting fixed-size arrays in their index space is accomplished using square brackets. For an array with N dimensions, N sets of brackets are used, even if the array is only subset on some of the dimensions. The names of array variables are fully qualified names (FQNs) so it's possible to name arrays in structures and/or Groups. Array index values are zero-based as with a number of programming languages such as C and Java. Every array has a known starting index value of zero. Within the square brackets, several subexpressions are allowed:

* **[ ]** 
return all of elements elements for a particular dimension.
* **[ n ]** 
return only the value at a single index, where 0 <= n < N for a dimension of size N. This slicing operator does not reduce the dimensionality of an array, but does return a dimension size of one for the dimension to which this is applied.
* **[ start : step : last ]** 
return every value whose index is in the range start <= index <= last and where (index - start) % step == 0. This is the complete version of the syntax.
* **[ start : last ]**
return the values whose index is in the range start <= index <= last.
* **[ start : ]** 
return the values whose index is in the range start <= index <= the dimension size - 1.
* **[ start : step : ]**
return every value whose index is in the range start <= index <= dimension size - 1 and where (index - start) % step == 0.
Subsetting can be applied to any array. It can also be applied to a scalar, but in this case, the only legal forms are [0] or [].

In [7]:
import sys
from pydap.client import open_url

# Open remote file
dataset_url = 'https://iaos.opendap.terradue.com/thredds/dodsC/SMOS_SMAP/netCDF/north/2018/20181007_north_mix_sit_v100.nc'
dataset = open_url(dataset_url)

# Select just a variable
smos_thickness = dataset['smos_thickness']

In [8]:
smos_thickness.shape

(896, 608)

In [9]:
# This downloads all the variable's values
smos_thickness[:]

<BaseType with data array([[-9.9900000e+02, -9.9900000e+02, -9.9900000e+02, ...,
         3.0115128e-01,  3.0980819e-01,  3.0980819e-01],
       [-9.9900000e+02, -9.9900000e+02, -9.9900000e+02, ...,
         1.0556738e-01,  1.4467032e-01,  1.4467032e-01],
       [-9.9900000e+02, -9.9900000e+02, -9.9900000e+02, ...,
         1.0556738e-01,  3.7567174e-01,  3.0476522e-01],
       ...,
       [-9.9900000e+02, -9.9900000e+02, -9.9900000e+02, ...,
        -9.9900000e+02, -9.9900000e+02, -9.9900000e+02],
       [-9.9900000e+02, -9.9900000e+02, -9.9900000e+02, ...,
        -9.9900000e+02, -9.9900000e+02, -9.9900000e+02],
       [-9.9900000e+02, -9.9900000e+02, -9.9900000e+02, ...,
        -9.9900000e+02, -9.9900000e+02, -9.9900000e+02]], dtype=float32)>

In [10]:
# We download only the 149th element of the 1st dimension
smos_thickness[0,148]

<BaseType with data array([[0.18291631]], dtype=float32)>

In [11]:
# We download only a subset of the 1st dimension
smos_thickness[0,140:150]

<BaseType with data array([[-9.9900000e+02, -9.9900000e+02, -9.9900000e+02, -9.9900000e+02,
        -9.9900000e+02, -9.9900000e+02, -9.9900000e+02, -9.9900000e+02,
         1.8291631e-01,  1.6078155e-01]], dtype=float32)>

### Notes

In [12]:
type(smos_thickness[:])

pydap.model.BaseType

You can think of a *BaseType* object as a thin layer around Numpy arrays, until you realize that the data attribute can be any object implementing the array interface! This is how the DAP client works – instead of assigning an array with data directly to the attribute, it's assigned a special object which behaves like an array and acts as a proxy to a remote dataset. In the example above it was a Numpy array:

In [13]:
array = smos_thickness[:].data
type(array)

numpy.ndarray

In [14]:
array.shape

(896, 608)

## References

* https://docs.opendap.org/index.php/DAP2:_Constraint_Expressions
* https://docs.opendap.org/index.php/DAP4:_Specification_Volume_1#Constraints
* https://pydap.readthedocs.io/en/latest/index.html