## Welcome to the Planetary Data Reader Example Jupyter Notebook!

The Planetary Data Reader (`pdr`) is a Python package that provides a single, straightforward interface to accessing observationa plantetary science data. It is currently under active development and will eventually support approximately all data hosted by the Planetary Data System (PDS). The basic command is just `pdr.read(fn)` where `fn` is either the observational data file or the detached label file (if applicable).

This notebook demonstrates the basic operation of `pdr` as well some key features.

_**Note:** This notebook must download data from the PDS and therefore requires an internet connection. `pdr` does not require an internet connection to be used locally when the data are already present._

In [None]:
import pdr # Import the `pdr` library
import requests # We need this package to retrieve the remote data. It is not generally required.
import os #

First, we'll look at an image from LROC (which uses the older PDS3 standards and has an attached label).

In [None]:
# This is the URL to the data file on the PDS servers
img_url = 'https://pds.lroc.asu.edu/data/LRO-L-LROC-5-RDR-V1.0/LROLRC_2001/DATA/BDR/NAC_ROI/NECTARISLOA/NAC_ROI_NECTARISLOA_E176S0413_20M.IMG'
filename = img_url.split('/')[-1]
# This retrieves the data and saves it to the current working directory
# It will print the total data volume (8227536) when it completes
if not os.path.exists(filename):
    req = requests.get(img_url)
    open(filename, 'wb').write(req.content)

### Reading image data:
Now that we have our file downloaded (run the above cell), we can easily read the data in with `pdr.read()`. This returns an object with attributes that correspond to each of the data objects defined in the label. The attributes can be queried and accessed with a `keys()` method, similar to a Python `dict`.

Let's read the file and see what kinds of data it contains.

In [None]:
lroc_data = pdr.read(filename) # That's the magic function call!
print(f'The keys are {lroc_data.keys()}')

If you are not familiar with the PDS data formats, you might now be surprised to learn that this "image" file contains three different data objects. Most PDS data have a `LABEL` object, which contains the metadata associated with this observation (like observation time, calibration constants, provenance information, etc.); it is returned as Parameter Value Language (`PVL`) data object. The `IMAGE` object contains an array of observational data values, and is a `numpy.ndarray`. And the `DATA_SET_MAP_PROJECTION`, which is particular to this data set, contains map projection data.

Let's print the contents of each of these objects, just to see what they contain.

In [None]:
for key in lroc_data.keys():
    print(f'{key}:')
    print(lroc_data[key])

**Oopsie!** An `AttributeError` was thrown while attempting to access `DATA_SET_MAP_PROJECTION`. This is intentional. Many files in the PDS have small format (.FMT), catalog (.CAT), or other supplementary files referenced in their labels. These files are usually stored in a separate location on the PDS than the data and label files are. Many times these supplementary files are not necessary to read the data. But if they are, `pdr` currently does not load them and throws an error when you attempt to access that data object. Note that the `LABEL` and `IMAGE` objects were accessed without issue.

#### `.show()` convience method (for visiualizing image data):
`pdr` has a convenience method called `.show()` which helps to quickly visualize the image data. Null values---typically defined in the label or drawn from a list of universal null values---are masked in cyan, but that doesn't change their value in the data object. This method is soley for visualizing the data for browsing or triage purposes.

In [None]:
lroc_data.show()

### Reading table data:
Now let's look at a table-like data.  The same `pdr.read()` command works, and `pdr` figures out the format of the data and correctly interprets it. Table-like data are represented as a `pandas.DataFrame`.

First we'll read a table from the Apollo 15 Heat Flow Experiment that is in the PDS4 standard format. Then we'll read some MRO SHARAD data that is in the PDS3 standard format. **We also demonstrate that data can be opened with either a detached label file or a data file. Though label files are preferred.**

_**Note:** Currently, pdr does not use the same internal functions to open PDS4 and PDS3 data. PDS4 data is less optimized---it reads data accurately, but more slowly---and we recommend using PDS3 labels whenever both are available due to this._

In [None]:
# These are the URLs for the Apollo 15 HFE data and label files
apollo_url = 'https://pds-geosciences.wustl.edu/lunar/urn-nasa-pds-a15_17_hfe_concatenated/data/split/a15p1f4_split.tab'
apollo_lbl = 'https://pds-geosciences.wustl.edu/lunar/urn-nasa-pds-a15_17_hfe_concatenated/data/split/a15p1f4_split.xml'
# This downloads the data, same way as above
apollo_fn = apollo_url.split('/')[-1]
if not os.path.exists(apollo_fn):
    req = requests.get(apollo_url)
    open(apollo_fn, 'wb').write(req.content)
apollo_lbl = apollo_lbl.split('/')[-1]
if not os.path.exists(apollo_lbl):
    reqlbl = requests.get(apollo_lbl)
    open(apollo_lbl, 'wb').write(reqlbl.content)
apollo_from_data_file = pdr.read(apollo_fn)
apollo_from_lbl_file = pdr.read(apollo_lbl)
# This checks that the outputs of `pdr` are identical whether you pass it the data file or the label file.
print('Do the data file and label file produce identical outputs?',
      all(apollo_from_data_file['a15p1f4_split']==apollo_from_lbl_file['a15p1f4_split']))

Tables are output as pandas dataframes when read from a PDS3 label and as PDS_ndarray (this is a reclassed numpy array) when read from a PDS4 label (as shown here). PDS4 labels are output as dictionaries. 

In [None]:
print(f'The keys are {apollo_from_data_file.keys()}')
for key in apollo_from_data_file.keys():
    print(f'{key}:')
    print(type(apollo_from_data_file[key]))
    print(apollo_from_data_file[key])


An example with a MRO-RSS file, a PDS3 formatted table, can be seen below. Note again the differing output types from using a a PDS3 (pandas dataframe) vs. PDS4 (PDS_ndarray) label file for a table.

Here we also demonstrate the use of the `lazy=` keyword. This causes only the label to be read in and allows the each product referenced in a label file to be loaded in separately.

In [None]:
mrorss_url = 'https://pds-geosciences.wustl.edu/mro/mro-m-rss-5-sdp-v1/mrors_1xxx/data/shadr/jgmro_110b2_sha.tab'
mrorss_lbl_url = 'https://pds-geosciences.wustl.edu/mro/mro-m-rss-5-sdp-v1/mrors_1xxx/data/shadr/jgmro_110b2_sha.lbl'
mrorss_fn = mrorss_url.split('/')[-1]
if not os.path.exists(mrorss_fn):
    req = requests.get(mrorss_url)
    open(mrorss_fn, 'wb').write(req.content)
mrorss_lbl = mrorss_url.split('/')[-1]
if not os.path.exists(mrorss_lbl):
    req_lbl = requests.get(mrorss_lbl_url)
    open(mrorss_lbl, 'wb').write(req_lbl.content)

mrorss_data=pdr.read(mrorss_lbl, lazy=True)

print(f'The keys are {mrorss_data.keys()}')
for key in mrorss_data.keys():
    print(f'{key}:')
    print(type(mrorss_data[key]))
    print(mrorss_data[key])

#### `.dump_browse()` convience method (for outputting browse products of any data type)

Similar to the `.show()` method, the `dump_browse()` method can output a masked browse image. However, there are some key differences.

(1) `.dump_browse()` will create a browse file on your computer drive, not a visual output on your display.

(2) While `.show()` only works on array data (meant for images), the `.dump_browse()` feature will output a file for each key.

Let's give it a go:

In [None]:
mrorss_data.dump_browse()

There should now be two new files in the folder you have this jupyter notebook in. 

    -jgmro_110b2_sha_LABEL.lbl
    -jgmro_110b2_sha_SHADR_COEFFICIENTS_TABLE.csv
    
Each file is named by the name of the original file with the addition of \_key where key is the name of the corresponding dictionary key.

Notice that only the keys that were loaded in have corresponding files when running `dump_browse()`.

The label is output as a PVL parsed .lbl file. When PDS labels don't conform to the strict standards of PVL (which they often don't) the \_LABEL.lbl will be accompanied by a \_LABEL.badpvl.txt which is the backup output. It is simply a text file of the LABEL key. In this case the \_LABEL.lbl will be an empty text file.

We can use this same convenience method with the image data from above (requires running of first 2 cells after import in case you jumped here). This will output the image we displayed above as a .jpg and also output a parsed .lbl which will both be saved in the folder the jupyter notebook is executing from.

In [None]:
lroc_data.dump_browse()