# Acquiring CMOR Meteoroid Stream Survey Data from the PDS Registry via Peppi

This notebook demonstrates how to download data from the SBN Canadian Meteor Orbit Radar (CMOR) PDS Bundle, and covert the results to both a `pandas` dataframe (for working with in memory), as well as a `.csv` file (for serialization to local disk).

The `peppi` library is used as the primary interface for querying and retrieving data from the PDS Registry API.

## Imports

In [None]:
# If the pds.peppi module has not been installed yet, you can run "pip install pds.peppi" within
# your development environment to get the latest version.
import pds.peppi as peppi

import pdr
import os

from pprint import pprint
from os.path import abspath, join
from urllib.request import urlretrieve

## Peppi initialization

To use the `peppi` library to pull data from the PDS Registry API, you must first create the `peppi.PDSRegistryClient` object that defines the endpoint of the PDS Registry API. This client object can then be used to create an instance of the `peppi.Products` class, where queries can be defined and results retrieved iteratively.

Typically, at minimum you will need to know the LIDVID for the Bundle or Collection containing the desired data. For this demo, we will start with the CMOR Bundle LIDVID (`urn:nasa:pds:gbo.meteoroid.cmor.radar-survey::1.0`), and use it to drill into an associated Collection, then into the data itself.

`peppi.Products` objects provide a `get` function, which configures a query for all products that match the provided LIDVID.

In [None]:
cmor_bundle_lidvid = "urn:nasa:pds:gbo.meteoroid.cmor.radar-survey::1.0"

client = peppi.PDSRegistryClient()
products = peppi.Products(client).get(cmor_bundle_lidvid)

## Retrieving Query Results

`peppi` uses "lazy-evaluation" of queries defined on `Products` instances, meaning that no query is performed until one iterates over the `Products` instance itself to retrieve results. Below we can see how to gather results of our `get` query into a seperate list of products matching our desired LIDVID.

In [None]:
matching_products = [product for product in products]

We can now examine specific properties of the matching Bundles to determine the LIDVID(s) of a Collection associated to this Bundle.

In [None]:
for matching_product in matching_products:
    print(matching_product.properties['lidvid'], 
          matching_product.properties['ops:Label_File_Info.ops:file_ref'],
          matching_product.properties['pds:Bundle_Member_Entry.pds:lidvid_reference'])
    print()

There should be a matching product for the Bundle hosted at `sbnarchive.psi.edu`

In [None]:
cmor_bundle = matching_products[0]

Next, we query for one of the Collections in the Bundle, using the associated Bundle Member LIDVID Reference.

In [None]:
cmor_collection_lidvid = cmor_bundle.properties['pds:Bundle_Member_Entry.pds:lidvid_reference'][0]
products = peppi.Products(client).get(cmor_collection_lidvid)

Again, to actualy perform the query we iterate over the `peppi.Products` instance and gather the results into a separate list

In [None]:
matching_products = [product for product in products]

for matching_product in matching_products:
    print(matching_product.properties['lidvid'], 
          matching_product.properties['ops:Label_File_Info.ops:file_ref'], 
          matching_product.properties['ops:Data_File_Info.ops:file_ref'])
    print()

We should see results for the query hosted at hosted at `sbnarchive.psi.edu`

In [None]:
cmor_collection = matching_products[0]

## Downloading Data From a Collection

The `collection_gbo.meteoroid.cmor.radar-survey_data_inventory.csv` file referenced by the `ops:Data_File_Info.ops:file_ref` property contains a listing of LIDVIDs pointing to the individual data products that comprise this Collection. We now download both the PDS Label of the Collection, as well as the `.csv` data product, then use the `pdr` library to read the contents of the `.csv` file into a `pandas.DataFrame`. 

In [None]:
# Download the PDS4 label to current user's home directory
remote_label_path = cmor_collection.properties['ops:Label_File_Info.ops:file_ref'][0]
local_label_path  = join(os.environ['HOME'], remote_label_path.split('/')[-1])

print(f"Downloading {remote_label_path} to {abspath(local_label_path)}")
result = urlretrieve(remote_label_path, local_label_path)

# Download the .csv product to current user's home directory
remote_data_path = cmor_collection.properties['ops:Data_File_Info.ops:file_ref'][0]
local_data_path  = join(os.environ['HOME'], remote_data_path.split('/')[-1])

print(f"Downloading {remote_data_path} to {abspath(local_data_path)}")
result = urlretrieve(remote_data_path, local_data_path)

In [None]:
# Use Planetary Data Reader to read the data into a pandas DataFrame
data = pdr.read(abspath(local_label_path))
print(data.keys())
print(data['TABLE_0'].keys())

In [None]:
# Extract LIDVIDs of Data products from DataFrame
cmor_data_lidvids = data['TABLE_0']['LIDVID_LID']
print(list(cmor_data_lidvids))

Finally, we use one of these Data LIDVIDs (`urn:nasa:pds:gbo.meteoroid.cmor.radar-survey:data:complexes_tab::1.0`) to repeat the process of query and download for both the PDS4 Label and associated Data product.

In [None]:
cmor_data_lidvid = cmor_data_lidvids[0]
products = peppi.Products(client).get(cmor_data_lidvid)

matching_products = [p for p in products]

for matching_product in matching_products:
    print(matching_product.properties['lidvid'], 
          matching_product.properties['ops:Label_File_Info.ops:file_ref'], 
          matching_product.properties['ops:Data_File_Info.ops:file_ref'])
    print()

In [None]:
cmor_complexes = matching_products[0]

# Download the PDS4 label to current user's home directory
remote_label_path = cmor_complexes.properties['ops:Label_File_Info.ops:file_ref'][0]
local_label_path  = join(os.environ['HOME'], remote_label_path.split('/')[-1])

print(f"Downloading {remote_label_path} to {abspath(local_label_path)}")
result = urlretrieve(remote_label_path, local_label_path)

# Download the complexes.tab product to current user's home directory
remote_data_path = cmor_complexes.properties['ops:Data_File_Info.ops:file_ref'][0]
local_data_path  = join(os.environ['HOME'], remote_data_path.split('/')[-1])

print(f"Downloading {remote_data_path} to {abspath(local_data_path)}")
result = urlretrieve(remote_data_path, local_data_path)

The Planetary Data Reader library can again be used to pull the label and data information into a `DataFrame`

In [None]:
# use Planetary Data Reader to read the data into a pandas DataFrame
data = pdr.read(abspath(local_label_path))
print(data.keys())
data['table']

From here, we can leverage `pandas` to commit the Data to disk in `.csv` format.

In [None]:
print(f'Outputting CSV {local_data_path}.csv')
data['table'].to_csv(f'{local_data_path}.csv')