# PDS Search API demo, OSIRIS-REx OVIRS Data Visualization
## PART 1: Explore a Collection

The purpose of this notebook is to demostrate how the PDS web API can be used to access the PDS data for a scientific use case.

The documention of the PDS web API is available on https://nasa-pds.github.io/pds-api/

This notebook is available on https://github.com/NASA-PDS/pds-api-notebook

<b>2 Use cases:</b>
 - <u>Part1</u>: explore the collection
 - <u>Part2</u>: find/visualize specific the data you are interested in
 
 <br/>
 
 WARNING: This notebook is a demo and not a real scientific use case. It might contain mistake in the way the data is used or displayed.

In [1]:
from __future__ import print_function
from pprint import pprint
import time
from tqdm.notebook import tqdm
import matplotlib.pyplot as plt

The PDS API is accessed using a python client library documented on https://nasa-pds.github.io/pds-api-client/

In [2]:
import pds.api_client as pds_api

## Use the PDS demo web API server 

Connect to the demo server. See User Interface of the web API: https://pds-gamma.jpl.nasa.gov/api/swagger-ui.html

<b>Note: </b> this piece of code will be wrapped into a helper function so that 1 line will be enough to connect to the API using a default host

In [3]:
configuration = pds_api.Configuration()

# demo server
configuration.host = 'https://pds.nasa.gov/api/search/1/'

api_client = pds_api.ApiClient(configuration)


## Explore a collection

### Get the identifier of your collection of interest

Search for a collection of interest on https://pds.nasa.gov/datasearch/keyword-search/ (or https://pds.nasa.gov > Data Search > Keyword Search):
- Search for "osiris rex calibrated ovirs collection" (https://pds.nasa.gov/datasearch/keyword-search/search.jsp?q=osiris+rex+ovirs+calibrated+collection)
- Click on the first collection on the results
- Copy the IDENTIFIER displayed at the top of the Collection Information


### Get the properties available for the observational products belonging to the selected collection

Get the properties available for the product belonging to the collection of interest using `CollectionsProductsApi.products_of_a_collection` 
(see https://nasa-pds.github.io/pds-api-client/api/api_client.api.html#api_client.api.collections_products_api.CollectionsProductsApi)

API responses have the structure `{"summary": {}, data= []}`
(see https://nasa-pds.github.io/pds-api-client/api/api_client.models.html#api_client.models.products.Products)

We do not get the product descriptions yet but only the available properties with `only_summary=True` option.

In [None]:
lidvid = 'urn:nasa:pds:orex.ovirs:data_calibrated::10.0'
from pds.api_client.apis.paths.products_identifier_members import ProductsIdentifierMembers
products = ProductsIdentifierMembers(api_client)

###### The following gets an error due to a bug in the registry api!
api_response = products.get(path_params={"identifier": lidvid},
                            query_params={"limit": 0}, # limit=0 gets a summary, not the products
                            accept_content_types=("application/json",))
print(api_response)

### Request specific properties of all the observational products of the collection

In [5]:
# NEW VERSION
from pds.api_client.apis.paths.products import Products
products = Products(api_client)
criteria = '( ref_lid_target eq "urn:nasa:pds:context:target:asteroid.101955_bennu" )'
properties_of_interest = ['orex:spatial.orex:latitude', 'orex:spatial.orex:longitude', 'ref_lid_instrument', 'orex:spatial.orex:target_range', 'ops:Data_File_Info.ops:file_ref']
start = 0
limit = 500
prods = []
pbar = tqdm()

start_time = time.time()
while True:
    pbar.update(int(start/500))
    print("start:", start)
    api_response = products.get(query_params={"q": criteria,
                                              "start": start, "limit": limit,
                                               "fields": properties_of_interest},
                                accept_content_types=("application/json",)).body

    if api_response.data:
        prods.extend(api_response.data)
        start += limit
    else:
        break
    
#    try:
#        api_response = products.get(query_params={"q": criteria,
#                                                  "start": start, "limit": limit,
#                                                  "fields": properties_of_interest},
#                                    accept_content_types=("application/json",)).body
#    except:
#        break
#    else:
#        if api_response.data:
#            prods.extend(api_response.data)
#            start += limit
#        else:
#            break
    
elapsed = time.time() - start_time
print(f'retrieved {start} products in {elapsed/60.0:.1f} minutes')

pbar.close()

0it [00:00, ?it/s]

start: 0
start: 500
start: 1000
start: 1500
start: 2000
start: 2500
start: 3000
start: 3500
start: 4000
start: 4500
start: 5000
start: 5500
start: 6000
start: 6500
start: 7000
start: 7500
start: 8000
start: 8500
start: 9000
start: 9500
start: 10000


ApiTypeError: ErrorMessage is missing 2 required arguments: ['message', 'request']

Properties have a syntax alike the PDS4 model: class/attributes. 

Get the latitude, longitude and target_range of the observational products belonging to the collection, with `fields=` option.

The API results are <b>paginated</b>, to get all the results we need to loop through the pages.

In [None]:
properties_of_interest = ['orex:spatial.orex:latitude', 'orex:spatial.orex:longitude', 'orex:spatial.orex:target_range']

start = 0
limit = 500
prods = []
pbar = tqdm()

start_time = time.time()
while True:
    pbar.update(int(start/500))
    
    api_response = products.get(path_params={"identifier": lidvid},
                            query_params={"start": start, "limit": limit,
                                          "fields": properties_of_interest},
                            accept_content_types=("application/json",)).body
    
    
#    api_response = collection_products_api.products_of_a_collection(
#        lidvid, 
#        start=start, 
#        limit=limit, 
#        fields=properties_of_interest)

    if api_response.data:
        prods.extend(api_response.data)
        start += limit
    else:
        break

elapsed = time.time() - start_time
print(f'retrieved {start} products in {elapsed/60.0:.1f} minutes')

pbar.close()

### Product description comes with default properties + requested properties

See for example

In [None]:
prods[140000]

### Filter out records with no valid values

Some records have fill values for the fields we are interested in (e.g. `latitude == -9999`, we want to remove them from our results.


In [None]:
def at_least_one_valid_value(p):
    return ((p['orex:spatial.orex:latitude'] !=  '-9999') \
           and (p['orex:spatial.orex:latitude'] != None))

def filter_out_fillvalues(products):
    properties = []
    for product in products:
        if at_least_one_valid_value(product.properties):
            p = product.properties
            p['id'] = product.id
            properties.append(p)
    return properties

properties = filter_out_fillvalues(products)
        
print(f"The values of the selected properties are")
pprint(properties[:3])

### Transpose to extract lat,lon and target range as columns, ready to plot 

In [None]:
def transpose(properties):
    lat = [float(p['orex:spatial.orex:latitude']) for p in properties]
    lon = [float(p['orex:spatial.orex:longitude']) for p in properties]
    target_range = [float(p['orex:spatial.orex:target_range']) for p in properties]
    return lat, lon, target_range

lat, lon, target_range = transpose(properties)
print(f'The target_range values for the selected products are {target_range[:3]}')

### Plot the lat,lon of the observations, colored by target_range

In [None]:
def observation_map(lat, lon, target_range, vmax=25):
    fig, ax = plt.subplots()

    ax.set_xlabel('longitude')
    ax.set_ylabel('latitude')
    ax.set_title('orex.ovirs products lat,lon')

    im = ax.scatter(lon, lat, c=target_range, vmin=0, vmax=vmax)

    cbar = fig.colorbar(im, ax=ax)
    cbar.set_label('target range (km)')
    
observation_map(lat,lon, target_range, vmax=25)


### Overview of the observation target_range with an histogram

In [None]:
plt.hist(target_range, range=(0, 15))

### Get observations around specific spot (lat=12, lon=24) with target range closer than 4 km

In [None]:
lidvids = [p['id'] for p in properties if float(p['orex:spatial.orex:target_range']) < 4.0 
          and abs(float(p['orex:spatial.orex:latitude']) - 12.0) < 3.0
          and abs(float(p['orex:spatial.orex:longitude']) - 24.0) < 3.0]
print(f'The lidvids of the selected products are {lidvids}')

### Get the full product description

In [None]:
products_api = pds_api.ProductsApi(api_client)
product = products_api.products_by_lidvid(lidvids[0])
print(product)

### Get the file path of the data

In [None]:
product.properties['ops:Data_File_Info.ops:file_ref']