# Filtering a Sequence Data Type using Pydap

While *subsetting* provides ways to choose data based on the dataset structure and the types of the variables, *filters* provide a way to choose data based on their values. The values to be returned are denoted using one or more simple predicates. The general syntax for a filter expression is to follow a subset (projection) expression with a pipe (|) and one or more filter predicates. Multiple predicates are separated by commas and the value of complete predicate is the logical AND of the comma-separated subexpressions.

**Filter expressions can only be applied to Sequence variables (or arrays of them).**

A **Sequence can be thought of as a relational data table**, with each column representing a different data variable, and each row representing a different measurement of a set of values (also called an "instance"). For example, an ocean temperature profile can be stored as a Sequence with two columns: pressure and temperature. Each measurement is a pressure and a temperature, and is contained in one row. A weather station's data can be stored as a Sequence with time in one column, and each weather variable occupying another column.

You can find a good example of a Sequence at:

* http://test.opendap.org/dap/data/ff/gsodock.dat
* http://test.opendap.org/dap/data/ff/gsodock.dat.ascii (ASCII format) 

This is a 24-hour record of measurements at a weather station on a dock in Rhode Island. Each record consists of a dozen different variables including air temperature, wind speed, and direction, as well as depth, temperature and salinity of the water. The data is arranged into 144 measurements of each of the twelve variables.

In [12]:
from pydap.client import open_url
dataset = open_url('http://test.opendap.org/dap/data/ff/gsodock.dat')

In [13]:
print(type(dataset))

<class 'pydap.model.DatasetType'>


In [14]:
keys = list(dataset.keys())
for key in keys:
    print(key)

URI_GSO-Dock


In [51]:
uri_gso_dock = dataset['URI_GSO-Dock']

Type of *URI_GSO-Dock* is Sequence:

In [50]:
print(type(uri_gso_dock))

<class 'pydap.model.SequenceType'>


And it has 12 variables:

In [52]:
keys = list(uri_gso_dock.keys())
for key in keys:
    print(key)

Time
Depth
Sea_Temp
Salinity
DO_percent
pH
Turbidity
Air_Temp
Wind_Speed
Wind_Direction
Barometric_Pres
Solar_Radiation


We now filter, applying a constraint expression:

In [49]:
filtered_data = uri_gso_dock.Salinity[(uri_gso_dock.Depth > 2) & (uri_gso_dock.Time>35234.5)]

for i, salinity in enumerate(filtered_data.iterdata()):
    print(salinity)

29.12
29.12
29.12
29.12
29.12
29.12
29.13
29.13
29.16
29.17
29.16
29.2
29.27
29.34
29.47
29.77
29.72
29.79
29.82
29.86
29.6
29.21


## References
* https://docs.opendap.org/index.php/QuickStart#Sequence_Data
* https://docs.opendap.org/index.php/DAP4:_Specification_Volume_1#Filters