## Welcome to the Planetary Data Reader Example Jupyter Notebook!

The planetary data reader (pdr) is a simple, easy to use python package created by Million Concepts to allow planetary scientists to easily open and access their data. Here we will demonstrate the basics of working with pdr and some of the features it contains. 

_**Note:** This notebook downloads data from the PDS and therefore requires an internet connection. pdr does not require an internet connection to be used locally when not downloading data._

We start by importing pdr into the workspace so we have access to its functions and capabilities.

In [None]:
import pdr
import requests

### Downloading Data:
In this notebook, we will be using the requests package to download data from the PDS. In the future pdr will have a built in downloader utility. This notebook will be updated to show its usage at that time.

Our first example will be an image from LROC (which uses the PDS3 standards and has an attached label).

In [None]:
img_url = 'https://pds.lroc.asu.edu/data/LRO-L-LROC-5-RDR-V1.0/LROLRC_2001/DATA/BDR/NAC_ROI/NECTARISLOA/NAC_ROI_NECTARISLOA_E176S0413_20M.IMG'
filename = img_url.split('/')[-1]
req = requests.get(img_url)
open(filename, 'wb').write(req.content)

### Reading image data:
Now that we have our file downloaded (run the above cell), we can easily read the data in with `pdr.read()`. This will output a dictionary of the different products referenced in the label. Here we will list the keys to see what these products are and then output to the screen each one to view the format.

In [None]:
lroc_data = pdr.read(filename)
print(f'The keys are {lroc_data.keys()}')
for key in lroc_data.keys():
    print(f'{key}:')
    print(lroc_data[key])

You'll notice the above cell produces an error when trying to output `DATA_SET_MAP_PROJECTION`. This is intentional. Many files in the PDS have small format (.FMT), catalog (.CAT), or other supplementary files referenced in their labels. These files are usually in a separate folder from where the data and label files are. Many times these supplementary files are not necessary to read the data. Because of this pdr currently does not load in these supplementary files if they are unnecessary to read the data file.

#### `.show()` convenience method (for visualizing image data):
The label is currently output as a PVL (Parameter Value Language) object. The image data is a numpy.ndarray. There is a convience method built into pdr for visualizing the image data called `.show()`. Here, null values (based on null values defined in the label file as well as a list of universal nulls) are masked in cyan. This doesn't change their value in the numpy array above, this method is soley for visualizing the data for browsing or triage purposes.

In [None]:
lroc_data.show()

### Reading table data:
Now let's look at a table, which also uses the same `pdr.read()` command. This will output a dictionary of the different products referenced in the label. Here we will list the keys to see what these products are and then output to the screen each one to view the format.

We'll use a simple table from Apollo 15 data that is in the PDS4 standard format. We then use a table from MRO SHARAD that is in the PDS3 standard format. **We also demonstrate that data can be opened with either a detached label file or a data file. Though label files are preferred.**

_**Note:** Currently, pdr does not use the same internal functions to open PDS4 and PDS3 data. PDS4 data is less optimized and we recommend using PDS3 labels whenever both are available due to this._

In [None]:
apollo_url = 'https://pds-geosciences.wustl.edu/lunar/urn-nasa-pds-a15_17_hfe_concatenated/data/split/a15p1f4_split.tab'
apollo_lbl = 'https://pds-geosciences.wustl.edu/lunar/urn-nasa-pds-a15_17_hfe_concatenated/data/split/a15p1f4_split.xml'
req = requests.get(apollo_url)
apollo_fn = apollo_url.split('/')[-1]
reqlbl = requests.get(apollo_lbl)
apollo_lbl = apollo_lbl.split('/')[-1]
open(apollo_fn, 'wb').write(req.content)
open(apollo_lbl, 'wb').write(reqlbl.content)
apollo_from_data_file = pdr.read(apollo_fn)
apollo_from_lbl_file = pdr.read(apollo_lbl)
print('The outputs from both data and label are the same: True/False?')
print(all(apollo_from_data_file['a15p1f4_split']==apollo_from_lbl_file['a15p1f4_split']))

Tables are output as pandas dataframes when read from a PDS3 label and as PDS_ndarray (this is a reclassed numpy array) when read from a PDS4 label (as shown here). PDS4 labels are output as dictionaries. 

In [None]:
print(f'The keys are {apollo_from_data_file.keys()}')
for key in apollo_from_data_file.keys():
    print(f'{key}:')
    print(type(apollo_from_data_file[key]))
    print(apollo_from_data_file[key])


An example with a MRO-RSS file, a PDS3 formatted table, can be seen below. Note again the differing output types from using a a PDS3 (pandas dataframe) vs. PDS4 (PDS_ndarray) label file for a table.

Here we also demonstrate the use of the `lazy=` keyword. This causes only the label to be read in and allows the each product referenced in a label file to be loaded in separately.

In [None]:
mrorss_url = 'https://pds-geosciences.wustl.edu/mro/mro-m-rss-5-sdp-v1/mrors_1xxx/data/shadr/jgmro_110b2_sha.tab'
req = requests.get(mrorss_url)
mrorss_lbl_url = 'https://pds-geosciences.wustl.edu/mro/mro-m-rss-5-sdp-v1/mrors_1xxx/data/shadr/jgmro_110b2_sha.lbl'
req_lbl = requests.get(mrorss_lbl_url)
mrorss_fn = mrorss_url.split('/')[-1]
mrorss_lbl = mrorss_lbl_url.split('/')[-1]
open(mrorss_fn, 'wb').write(req.content)
open(mrorss_lbl, 'wb').write(req_lbl.content)

mrorss_data=pdr.read(mrorss_lbl, lazy=True)

print(f'The keys are {mrorss_data.keys()}')
for key in mrorss_data.keys():
    print(f'{key}:')
    print(type(mrorss_data[key]))
    print(mrorss_data[key])

The SHADR_HEADER_TABLE and SHADR_COEFFICIENTS_TABLE are not loaded in yet becuase we used `lazy=True`. Let's load one now:

In [None]:
mrorss_data.load('SHADR_COEFFICIENTS_TABLE')
print(mrorss_data['SHADR_COEFFICIENTS_TABLE'])

#### `.dump_browse()` convience method (for outputting browse products of any data type)

Similar to the `.show()` method, the `dump_browse()` method can output a masked browse image. However, there are some key differences.

(1) `.dump_browse()` will create a browse file on your computer drive, not a visual output on your display.

(2) While `.show()` only works on array data (meant for images), the `.dump_browse()` feature will output a file for each key.

Let's give it a go:

In [None]:
mrorss_data.dump_browse()

There should now be two new files in the folder you have this jupyter notebook in. 

    -jgmro_110b2_sha_LABEL.lbl
    -jgmro_110b2_sha_SHADR_COEFFICIENTS_TABLE.csv
    
Each file is named by the name of the original file with the addition of \_key where key is the name of the corresponding dictionary key.

Notice that only the keys that were loaded in have corresponding files when running `dump_browse()`.

The label is output as a PVL parsed .lbl file. When PDS labels don't conform to the strict standards of PVL (which they often don't) the \_LABEL.lbl will be accompanied by a \_LABEL.badpvl.txt which is the backup output. It is simply a text file of the LABEL key. In this case the \_LABEL.lbl will be an empty text file.

We can use this same convenience method with the image data from above (requires running of first 2 cells after import in case you jumped here). This will output the image we displayed above as a .jpg and also output a parsed .lbl which will both be saved in the folder the jupyter notebook is executing from.

In [None]:
lroc_data.dump_browse()