## How to Query Data from the MAAP via Python Client

Supported collections can be subsetted through the MAAP Query Service. The data’s storage arrangement determines the Query Service compatibility. At the time of writing (6/9/2020), the GEDI Calibration/Validation Field Survey Dataset is the only valid dataset for this service. However, more data will be made available for querying as the MAAP team continues to develop expanded services for the platform. Users can interact with the service through the MAAP Python client, since the `maap-py` Python library can make requests to the MAAP API's query URL, which calls the MAAP Query Service.

First, we import the `json` module, import the `MAAP` package, and create a new MAAP class.

In [1]:
# import the json module
import json

# import the MAAP package
from maap.maap import MAAP

# create MAAP class
maap = MAAP()

## How to use `maap.executeQuery()`

We use the `executeQuery()` function to return a response object, containing the server's response to our HTTP request. This object can be used to view the response headers, access the raw data of the response, or parse the response as a JavaScript Object Notation (JSON). JSON is a data-interchange format, designed to be easy for humans to read and write.

### `executeQuery` parameters

* `src` - dictionary-like object specifying the dataset to query. At the moment the only valid option is:
    ```json
    { 
      "Collection": { 
          "ShortName": "GEDI Cal/Val Field Data_1", 
          "VersionId": "1" 
      } 
    }
    ```

* `query` dictionary-like object specifying parameters for query. Dictionary can include `bbox`, `where` and `fields`:
    * `bbox` - list of floats identifying a bounding box of geographic coordinates
    * `where` - dictionary of fields and corresponding values. Used to query the data set for rows with matching values for the corresponding fields.
    * `fields` - a list of fields to return, a subset of all fields available for the corresponding dataset.
* `poll_results`:  Parameter which must be `True` to use the timeout parameter
* `timeout`: Waiting period for a response. Indicates the maximum number of seconds to wait for a response. Note that `timeout` has a default value of '180' and requires that the `poll_results` parameter be `True`. Depending on the request, it may be necessary to modify the timeout to make sure the server has enough time to process the request.

### Query Searching for a Project Name

In this example, we assign our `collection` dictionary to `src`, a dictionary-like object specifying the dataset to query. We also assign the `bbox` and `fields` parameters for `query`, a dictionary-like object which specifies the parameters for the query. The `bbox` parameter is a GeoJSON-compliant bounding box ([minX, minY, maxX, maxY]) which is used to filter data spatially. GeoJSON is a format for encoding geographic data structures. More information about the bounding box can be found in the standard specification of the GeoJSON format, located here - https://tools.ietf.org/html/rfc7946#section-5. The `fields` parameter is a list of fields to return in the query response. In this case, we assign 'project' to `fields`.

In [10]:
collection = {
    "Collection": {
        "ShortName": "GEDI Cal/Val Field Data_1",
        "VersionId": "1"
    }
}

# use the executeQuery() function to get a response object
response = maap.executeQuery( 

  # dictionary-like object specifying the dataset to query  
  src = collection, 

  # dictionary-like object specifying parameters for query
  query = { 
      
    # bounding box to spatially filter data
    "bbox": [9.31, 0.53, 9.32, 0.54], 
      
    # list of fields to return in query response
    "fields": ["project"],
  }
)

We can check the 'Content-Type' header of our response to see the content type of the response. In the following code, the 'Content-Type' header is checked to determine if it is JSON or not, so that a print statement can be run to display the name of the project.

In [4]:
# if the 'Content-Type' is json, creates variable with json version of the response
if (response.headers.get("Content-Type") is "application/json"):
    data = response.json()
    
# if the 'Content-Type' is not json, creates variable with unicode content of the response
else:
    data = response.text
    
# prints project name
print(json.dumps(json.loads(data)[0], indent=2))

{
  "project": "gabon_mondah"
}


### Query Inspecting a Single Observation

Like the previous example, we use the `executeQuery()` function to return a response object and assign our `collection` dictionary to `src`. For this example, we assign the `bbox` parameter for `query` again. 

In [12]:
# get a response object
response = maap.executeQuery( 
  
  # dictionary-like object specifying the dataset to query  
  src = collection, 

  # dictionary-like object specifying parameters for query
  query = {"bbox": [9.315, 0.535, 9.32, 0.54]}
)

As in the last example, we can check the 'Content-Type' header to determine if the content type is JSON or not and use the appropriate print statement. The output displays the variables for a single observation. A list of the variables and their units and descriptions can be found [here](https://maap-project.readthedocs.io/en/latest/query/gedi_calval_data_doc.html).

In [13]:
# if the 'Content-Type' is json, creates variable with json version of the response
if (response.headers.get("Content-Type") is "application/json"):
    
    data = response.json()
    
# if the 'Content-Type' is not json, creates variable with unicode content of the response
else:
    data = response.text
    
# prints project name
print(json.dumps(json.loads(data)[0], indent=2))

{
  "project": "gabon_mondah",
  "plot": "NASA11",
  "subplot": "8",
  "survey": "AfriSAR_ESA_2016",
  "tree.date": "2016-02-10",
  "family": "Euphorbiaceae",
  "species": "Maprounea membranacea",
  "pft": null,
  "wsg": 0.588,
  "wsg.sd": 0.0941339098036177,
  "tree": "100774",
  "stem": "1",
  "x": 535563.55327949,
  "y": 59509.0587297838,
  "z": null,
  "status": 1,
  "allom.key": 2,
  "a.stem": 0.00502654824574367,
  "h.t": null,
  "h.t.mod": 10.0039117856082,
  "d.stem": 0.08,
  "d.stem.valid": 1,
  "d.ht": 1.3,
  "c.w": null,
  "m.agb": 21.9747916494591,
  "id": 83078,
  "private": 0,
  "date": "2016-02-01",
  "region": "Af",
  "vegetation": "TropRF",
  "map": 3083.93471636915,
  "mat": 25.6671529098763,
  "pft.modis": "Evergreen Broadleaf trees",
  "pft.name": null,
  "latitude": 0.538705025207016,
  "longitude": 9.31982893597376,
  "p.sample": 0,
  "p.stemmap": 0,
  "p.origin": "C",
  "p.orientation": -2.18195751718555,
  "p.shape": "R",
  "p.majoraxis": 100,
  "p.minoraxis": 1

### Query Using Multiple Parameters Using `where`

In the output of the previous example, we can see that the field `"species"` has the value `"Maprounea membranacea"`. Let's say we are interested in finding observations for the `"gabon_mondah"` project within the same bounding box as the previous example which have the species `"Aucoumea klaineana"` or `"Coelocaryon sp."`. We can do this using `where`, a dictionary-like object which maps fields to required values within a query. To help demonstrate how to use `where`, we can create a function (in this example named `species_query`) which utilizes the `executeQuery()` function and prints the number of results as well as the first result.

In [14]:
def species_query(query={}, timeout=180):
    """
    Function which utilizes the `executeQuery()` function and prints the number of results as well as the first result.
    """
    # use the executeQuery() function to get a response object
    response = maap.executeQuery(

      # dictionary-like object specifying the dataset to query
      src = collection,

      # dictionary-like object specifying parameters for query
      query = query,
      
      # parameter which must be True to use the timeout parameter
      poll_results = True,
      
      # waiting period for a response
      timeout = timeout
    )

    # if the 'Content-Type' is json, creates variable with json version of the response
    if (response.headers.get("Content-Type") is "application/json"):
        data = response.json()
    # if the 'Content-Type' is not json, creates variable with unicode content of the response
    else:
        data = response.text

    # parses string to create a Python list
    data = json.loads(data)
    # get the number of results within `data`
    num_results = len(data)
    # if `data` is not null and contains at least one result, the number of results and the first result are printed
    if((data is not None) and (num_results > 0)):
        first_result = data[0]
        print(f"Number of results: {num_results}")
        print(f"First result: {json.dumps(first_result, indent=2)}")  
    # else prints "No result"
    else:
        print(num_results)
        print("No result")
    

Let's call the `species_query` function. We enter the same bounding box values as in the previous example. This time around, we enter `where` in our query and set the `project` as `gabon_mondah` and the `species` as `Aucoumea klaineana`. We can set a list of fields to return in query response using `fields`. For this example, we can choose to return only the project, family, species, latitude, and longitude values. After completing our query, we can manually set the timeout value (in this example '200').

In [15]:
# call `species_query` function with bounding box values, where the project is "gabon_mondah", 
# the species is "Aucoumea klaineana", fields include "project", "family", "species", "latitude", and "longitude",
# and the timeout value is 200 (use the scrollbar to see the entire function call)
species_query({
    "bbox": [9.315, 0.535, 9.32, 0.54],
    "where": {
        "project": "gabon_mondah",
        "species": "Aucoumea klaineana"
    },
    "fields": ["project", "family", "species", "latitude", "longitude"]
}, 200)

Number of results: 648
First result: {
  "project": "gabon_mondah",
  "family": "Burseraceae",
  "species": "Aucoumea klaineana",
  "latitude": 0.538705025207016,
  "longitude": 9.31982893597376
}


We now see that there are 648 results with the species Aucoumea klaineana and the latitude and longitude coordinates for the first result. To see this information for Coelocaryon sp., we can copy the code from the above cell, changing the `species` to `Coelocaryon sp.` within the function argument.

In [16]:
# call `species_query` function with bounding box values, where the project is "gabon_mondah", 
# the species is "Coelocaryon sp.", fields include "project", "family", "species", "latitude", and "longitude",
# and the timeout value is 200 (use the scrollbar to see the entire function call)
species_query({
    "bbox": [9.315, 0.535, 9.32, 0.54],
    "where": {
        "project": "gabon_mondah",
        "species": "Coelocaryon sp."
    },
    "fields": ["project", "family", "species", "latitude", "longitude"]
}, 200)

Number of results: 204
First result: {
  "project": "gabon_mondah",
  "family": "Myristicaceae",
  "species": "Coelocaryon sp.",
  "latitude": 0.538705025207016,
  "longitude": 9.31982893597376
}
